Dynamic Arrays - Memory Allocation | Rohith Gandhi Ganesan

What are Dynamic Arrays ?

Dynamic arrays are data structures that do not have a fixed size (but are limited by system memory). For example if you use Java or Python the ArrayList and List data structures respectively behaves like dynamic arrays where the length of the array doesn't have to be defined during compile time. Look at the following snippet of code which shows static and dynamic arrays in Rust.

// Static array
let mut arr1: [i32; 3] = [0; 3]; // size is set to 3
arr1[0] = 1;
arr1[1] = 2;
arr1[2] = 3;

// Dynamic array
let mut arr2 = Vec::new(); // No size defined
arr2.push(1);
arr2.push(2);
arr2.push(3);

Memory allocation in static arrays

Static arrays, i.e. arrays with fixed size are usually small size. In Rust, these small size arrays with fixed length are stored on the stack. However, in languages like Java or Python the arrays are always stored on the heap.

Memory allocation in dynamic arrays

When trying to understand how memory is allocated for dynamic arrays, the two main concepts to know is array capacity and array length/size. The capacity is how much elements the array can hold and length is the actual number of elements in the array. Initially when a Vec is initialized in Rust its capacity is set to 0. On the first push operation of an element, the capacity is set to a non-zero value (4 by default) and doubled evertyime the capacity limit is reached. The following piece of code shows how capacity grows with the increase in size of an array.

fn main() {
    let mut v = Vec::new();
    println!("Initial Capacity: {}", v.capacity());

    for i in 0..5 {
        v.push(i);
        println!("Length: {}, Capacity: {}", v.len(), v.capacity());
    }
}

The output of the above program is the following

Initial Capacity: 0
Length: 1, Capacity: 4
Length: 2, Capacity: 4
Length: 3, Capacity: 4
Length: 4, Capacity: 4
Length: 5, Capacity: 8

The total capacity of the array is increased exponentially to reduce the number of re-allocation that would be required. Re-allocations are O(n) operation where n is the length of the array. This is because, new memory has to be allocated, the elements have to copied into the new address and the old memory has to be freed. What this also means is that there could be additional memory allocated for the array without any elements in it. If you want to trim the excess memory so that array capacity = array length, then you can call the shrink_to_fit method which shrinks the allocation but does provide guarantees (in Java, you would call the trimToSize method). Check out this nice post on the amortized cost for re-allocation.

If you haven't noticed, one of the key reasons we allocate/re-allocate the memory of an array is to ensure that the addresses of the elements are contiguous in memory. This mainly gives us the advantage of performing lookups in O(1) time with some pointer arithmetic. However, if memory addresses are scattered, then you have to traverse the entire array to look up an element which leads to O(n) of worst-case complexity (this is what linked list does).

Customer memory allocator

Now that we have an understanding of how memory is allocated/freed for a dynamic array, let's write a simple allocator in Rust that grows the memory 1.5 times when capacity fills up instead of doubling capacity which is the default behaviour. To start of with lets define a struct for our custom dynamic array. This array is made up of a pointer which always points to the address of the first element in the array unless the array is empty in which case it is null, length which specifies the total number of elements in the array and capacity which specifies total available space in memory for the array.

use std::alloc::{Layout, alloc, dealloc, realloc};
use std::ptr;

struct CustomVec<T> {
    ptr: *mut T,
    len: usize,
    cap: usize,
}

Lets add some implementation methods for initializing the vector, appending elements and growing the capacity of the array.

impl<T> CustomVec<T> {
    fn new() -> Self {
        Self {
            ptr: ptr::null_mut(),
            len: 0,
            cap: 0,
        }
    }

    fn push(&mut self, item: T) {
        if self.len == self.cap {
            self.grow();
        }

        unsafe {
            self.ptr.add(self.len).write(item);
        }

        self.len += 1;
    }

    fn grow(&mut self) {
        let new_capacity;
        if self.cap == 0 {
            new_capacity = 1;
        } else {
            new_capacity = (self.cap as f64 * 1.5).ceil() as usize;
        }

        let new_layout = Layout::array::<T>(new_capacity).unwrap();
        let new_ptr = unsafe {
            if self.cap == 0 {
                alloc(new_layout) as *mut T
            } else {
                let old_layout = Layout::array::<T>(self.cap).unwrap();
                realloc(self.ptr as *mut u8, old_layout, new_layout.size()) as *mut T
            }
        };

        if new_ptr.is_null() {
            panic!("Allocation failed!");
        }

        self.ptr = new_ptr;
        self.cap = new_capacity;
    }
}

The main method of interest in the above code is grow. In this method, we calculate the new capacity of the array by multiplying the old capacity by 1.5 times. Then we create a memory layout for the new capacity and directly allocate it if capacity is 0, if not, we call realloc to resize the memory block to the new capacity. If there isn't enough contiguous memory available in the heap, then realloc will also take care of allocating a new block in memory, copying over the elements, updating the pointer to the new address and freeing the old memory block. Let's also add a Drop trait to ensure we deallocate memory when an instance of our CustomVec<T> goes out of scope.

impl<T> Drop for CustomVec<T> {
    fn drop(&mut self) {
        if !self.ptr.is_null() {
            unsafe {
                let layout = Layout::array::<T>(self.cap).unwrap();
                dealloc(self.ptr as *mut u8, layout);
            }
        }
    }
}

Note: In the above implementation, we do not call the destructor for each element T. Therefore, if the CustomVec owns elements of type Vec or String they will not be deallocated. However, this should for work for primitive types like i32, f64 etc.

The Drop implementation simply fetches the memory layout of the instance and deallocates the memory. Let's look at the following usage of CustomVec and its output.

fn main() {
    let mut vec = CustomVec::new();
    for i in 1..=10 {
        vec.push(i);
        println!(
            "Added item: {}, vec lenght: {}, vec capacity: {}",
            i, vec.len, vec.cap
        );
    }
}

The above program outputs the following result:

Added item: 1, vec lenght: 1, vec capacity: 1
Added item: 2, vec lenght: 2, vec capacity: 2
Added item: 3, vec lenght: 3, vec capacity: 3
Added item: 4, vec lenght: 4, vec capacity: 5
Added item: 5, vec lenght: 5, vec capacity: 5
Added item: 6, vec lenght: 6, vec capacity: 8
Added item: 7, vec lenght: 7, vec capacity: 8
Added item: 8, vec lenght: 8, vec capacity: 8
Added item: 9, vec lenght: 9, vec capacity: 12
Added item: 10, vec lenght: 10, vec capacity: 12