The Fundamentals of Pointers

Written by Mike James

Friday, 20 May 2022

Article Index
The Fundamentals of Pointers
Pointers in C

Page 1 of 2

Despite the fact that pointers have been long regarded as "dangerous" they are still deeply embedded in the way we do things. Much of the difficulty in using them stems from not understanding where they originate from. Pointers are a sophisticated abstraction that wraps some fundamentals of assembly language.

The whole concept of a pointer is bound up in the idea of a memory location and its address.

Despite the fact that pointers have been long regarded as "dangerous" they are still essential programming material.

In modern programming the pointer has been transmuted into the "safe pointer" and more recently into the managed reference - but it's still a pointer.

Let's look at the idea that is at the bottom of it all.

Pointers are natural

In assembly language you refer to a memory location by its address, for example 2000 refers to the memory location at address 2000. The confusion inherent in the idea of a pointer starts at this early stage of development.

Are you talking about the thing itself i.e. the number 2000 or the thing stored at memory location 2000.

For example, does:

 LDA 2000

mean "load the A register with the value 2000" or "load the A register with the contents of address 2000"?

The ambiguity is usually solved by using an extra symbol if you mean the numeric value:

 LDA #2000

means "load the A register with the value 2000" and

 LDA 2000

means "load the A register with the contents of address 2000".

Indirect addressing

Once you know the rules it's easy enough but mistakes are still common - especially when you start using indirect addressing.

Indirect addressing is where the value stored in a memory location is treated as an address to another memory location and it is a common feature of most hardware.

So for example, the command:

 LDA @2000

would mean load the A register with the value stored in the memory location whose address is stored in memory location at 2000.

The idea is that the value stored in a memory location can be data or it can be an address of another memory location. Direct addressing puts the address that the data is stored at in the instruction.

instruction address---------->data

Indirect addressing puts the address of the location that holds the address of the location that holds the data.

instruction address--------->address---------->data

Confused?

Well so were thousands of novice assembly language programmers.

Redirection is where it just gets complicated enough for mistakes to be rule rather than the exception and redirection is something pointers allow you to do without thinking twice.

And once you have redirection you an easily invent re-redirection and so on - each one more difficult and dangerous than the last.

Pointer Abstractions

When high level languages got going the idea of addresses and the whole idea of memory locations were hidden behind the facade of the variable.

When you use a variable you are using an address of a memory location as part of an instruction.

There is no doubt that when you write something like

 TOTAL=SUM+10

you are referring to the contents of SUM and TOTAL and there is no hint of addresses or redirection. The addresses are still there but they have been abstracted away into the idea of a variable.

Many high level languages stop right there.

But some don't - they re-invent the whole concept of addressing and indirection by way of pointer variables. A pointer variable implements indirection by being a storage location that has the address of another storage location or in this case variable.

That is, a pointer variable contains the address of another variable.

Pascal was one of the first truly high level languages to include included pointers from very early in its development and the pattern it adopted was used by C# and many other modern languages. It is worth seeing how it implemented pointers.

Of course Pascal being a strongly typed language means that pointers are typed as well.

That is, a pointer can only point to a variable of one type.

This is a strange idea at first because the all pointer variables are pointers and they store addresses so you might think that in a simple world all pointer variables would be of the same type - pointer say. But it turns out to be better to make the type of a pointer include what it points at. So instead of a simple pointer type you have pointer to integer, pointer to float, pointer to string and so on.

For example,

 var a:^integer

declares a to be a pointer to an integer and nothing but an integer.

When a pointer is first defined it contains the special value nil to mean that it isn't pointing to anything. This is where we meet the first big problem with pointers - they don't always point at anything!

To give it something to point at you have to use the procedure NEW in Pascal. The statement NEW(pointer) allocates enough storage for the type of data that the pointer is supposed to point at.

For example, NEW(a) would allocate enough storage for an integer and set a to point at it. Notice that the type and amount of storage allocated by NEW is determined by the type of the pointer or rather what it points at.

You can deallocate the storage that a pointer is pointing at using the complementary procedure DISPOSE. That is DISPOSE(a) frees the storage that a is pointing at for reuse. You can assign pointers but this is the only legal pointer operation.

For example, if a and b are integer pointers then

 NEW(a);
 b:=a

results in a and b pointing to the same area of storage.

To refer to the value actually stored in the area of memory that the pointer points at you have to use the ^ symbol in Pascal.

This is often called the dereferencing operator.

So a is a pointer to an area of memory that holds an integer and a^ is the actual value stored there.

If you have followed the ideas so far you should be able to tell me the difference between:

 a:=b

and

 a^:=b^

The first makes a and b point at the same area of memory and the second one makes the area of memory that a points at hold the same value as the area of memory that b points at.

The most common error that beginners make is a:=b^, i.e. they try to assign the value pointed at to the pointer. Being strongly typed Pascal picks this error up at compile time.

Notice that even though Pascal is a high level language it is easy to fall into the habit of referring to areas of memory. The idea of an address and indirection is lurking just below the surface. But it is possible to describe all of this without such primitive concepts.

All you need to avoid introducing the pointer to "memory" is the idea of an anonymous variable, i.e. a variable without a name. In this way of explaining things NEW(a) creates an anonymous integer variable that a is set to point at. Notice that even though this description is slightly higher level it is still possible to make very strange errors using pointers to an anonymous variable.

For example, it is quite possible to lose an anonymous variable by overwriting all of the pointers to it!

Another favourite error is to DISPOSE of the memory that a pointer is pointing at but then still carry on using it - DISPOSE doesn't change the value of a pointer!

In short Pascal programmers discovered all of the errors that plague the use of pointers - derferencing null pointers and dereferencing pointers that no longer point to valid data.

Using NEW and DISPOSE a Pascal programmer can create dynamic data structures such as strings, linked lists, stacks and so on. This ability to create such dynamic data structures is the main reason for the existence of pointers in programming languages. The only real alternative to using pointers is to provide advanced dynamic data structures as standard types.

For example, you can use pointers to program a variable length string of characters but modern languages make advanced and dynamic structure available without the use of pointers.

<ASIN:0393301214>

<ASIN:1871962714>

Prev - Next >>

Last Updated ( Saturday, 28 May 2022 )