Exploring Memory layout and Pointers in Swift
Memory can be thought of a sequence of one or more bytes, with each byte addressable in some way. Computers today typically have three levels of memory, with each layer providing more storage space - but also slower access times - than the layer above
- CPU Registers - addressable directly by register names (R1, R2 etc).
- RAM - addressable by a memory address assigned to each byte stored in it.
- Disk - addressing depends on the filesystem being used.
In this post, we are mainly concerned with RAM memory since that is where all the structs and classes we define end up being stored at runtime.
If you were to pause execution of any running program and sample a chunk of memory, it might look something like this :
A long stream of 0s and 1s that is undecipherable to us, but makes total sense to the computer. We normally group these in groups of eight called bytes.
1 byte can be represented using 2 hex digits. Assigning random addresses to this chunk gives us this visualization -
Top row = Byte value, Bottom row = Address.
Looking at raw bytes like this tells you nothing about what the data actually means. The addresses above could be 5 bytes of an Int64, or an array of [Int8], or something else altogether. The program managing this chunk of memory has context on how to interpret this sequence of bytes.
Swift has an enum called
MemoryLayout that provides information about a type’s size (in bytes) and layout in memory. Let’s define a struct with two properties and check it’s MemoryLayout -
The size of struct is the sum of size of it’s individual properties.
TestStruct has a Int32 property which takes 4 bytes and a Bool property which takes 1 byte. So the size of struct itself is 5 bytes.
When multiples structs are placed sequentially in memory (like in an array), structs really like to placed at a standard distance from each other, padding the remaining bytes as necessary. This distance is known as alignment.
The actual memory footprint of a struct is a multiple of alignment, rather than size. This memory footprint is called stride.
Visually, multiple instances of TestStruct stored sequentially might look something like this in memory -
If you change integer property to be of type Int64 rather than Int32, the alignment of the struct would become 8 bytes. Consequently, it’s stride will also increase to be a multiple of alignment, i.e 16 bytes.
A pointer is nothing more than a memory address, which can be represented using a simple UInt. However, this simple representation is hard to work with since UInt doesn’t tell us anything about the thing being pointed at (a.k.a the pointee). Swift’s pointer types can either by typed (like
char * in C) or raw (
void *). Further, they can be mutable (i.e allow changing the pointee) or immutable. Typed pointers are generic over a type
Swift also provides buffer variants of the above four types. Buffer pointers denote a range of memory addresses rather than a single address and have collection methods like
reduce to make dealing with a chunk of contiguous memory easier.
Enough theory! Let’s see how to use these types in practice.
There are very few instances when you actually need to create a pointer from scratch. More commonly, you will have an existing type whose pointer you need. Swift provides nice closure based APIs to access any type’s pointer. Let’s see how that looks -
The global function
withUnsafePointer() takes in a generic type T and a closure
(UnsafePointer<T>) -> Result. You can do your pointer magic within the body of the closure and return any value you fancy. The closure provides a clean separation between safe and unsafe parts of your code. In the above code, you cannot change the value of
test using it’s pointer. If we want to do that, we need to use the
withUnsafeMutablePointer function instead -
In C lingo, the first pointer instance you get inside the closure would be of type
const TestStruct *const, while the second would be of type
Swift also provides closure based APIs to access (and modify) raw bytes of any type. Let’s see how that works.
The pointer type in the closure is
UnsafeMutableRawBufferPointer rather than a simple
UnsafeMutableRawPointer. This is because a non buffer pointer doesn’t do any bounds checking, meaning you can access bytes outside the scope of
test struct using the pointer, which would be confusing. With buffer pointer, the count of the pointer is same as the size it occupies in memory, and bounds checking ensures you get an error if you do something like
mutableBufferPointer on a pointer whose count is 5.
withUnsafeBytes as well if you only need to read the bytes. As you’d expect, the pointer we get in the closure has type
You might be wondering why 42 - a 4 byte Int32 - is printed as [42, 0, 0, 0] rather than [0, 0, 0, 42]. This has to with endianness. Intel CPUs are little endian, meaning the least significant byte of a multibyte type is stored first.
In C, array elements are always stored sequentially in memory. This is not true for Swift, where array elements are optimized for storage space. You can verify this by a simple test
How come two distinct array elements have the same memory address? Arrays in Swift are not pure value types, but reference types that masquerade as value types. The reason for this warrants a new blog post in itself (look up Copy on Write if you are curious). If you wish to use pointers to arrays, it’s best to use instance methods
withUnsafeBufferPointer() rather than the global methods.