Lecture 1.1

Contents

Course Information

In this lecture we discussed the setup of the course and deadlines too. However, you can already find those on the homepage of this website

Threads

Most modern CPUs have more than one core. Which means that, in theory, a program can do more than a single thing at the same time. In practice this can be done by creating multiple processes or multiple threads within one process. How this works will be explained in lecture 1.2.

For now, you can think of threads as a method with which your program can execute multiple parts of your program at the same time. Even though the program is being executed in parallel, all threads still share the same memory * except* for their stacks1. That means that, for example, a global variable can be accessed by multiple threads.

Data races

This parallelism can cause a phenomenon called a data race. Let's look at the following C program:

static int a = 3;

int main() {
    a += 1;
}

And let's now also look at the associated assembly:

main:
 push   rbp
 mov    rbp,rsp
 mov    eax,DWORD PTR [rip+0x2f18]        # 404028 <a>
 add    eax,0x1
 mov    DWORD PTR [rip+0x2f0f],eax        # 404028 <a>
 mov    eax,0x0
 pop    rbp
 ret    

Notice, that after some fiddling with the stack pointer, the += operation consists of three instructions:

  • loading the value of a into a register (eax)
  • adding 1 to that register
  • storing the result back into memory

However, what happens if more than one thread were to execute this program at the same time? The following could happen:

thread 1thread 2
load the value of a into eaxload the value of a into eax
add 1 to thread 1's eaxadd 1 to thread 2's eax
store the result in astore the result in a

Both threads ran the code to increment a. But even though a started at 3, and was incremented by both threads, the final value of a if 4. Not 5. That's because both threads read the value of 3, added 1 to that, and both stored back the value 4.

To practically demonstrate the result of such data races, we can look at the following c program:

#include <stdio.h>
#include <threads.h>

int v = 0;

void count(int * delta) {
    // add delta to v 100 000 times
    for (int i = 0; i < 100000; i++) {
        v += *delta;
    }
}

int main() {
    thrd_t t1, t2;
    int d1 = 1, d2 = -1;

    // run the count function with delta=1
    thrd_create(&t1, count, &d1);
    // run the count function with delta=-1 at the same time
    thrd_create(&t2, count, &d2);

    thrd_join(t1, NULL);
    thrd_join(t2, NULL);

    printf("%d\n", v);
}

Since we increment and decrement v 100 000 times, you'd expect the result to be 0. However, that's not the case. You can run this program yourself to see (just compile it with a compiler that supports C11), but I've run it a couple of times and these were the results of my 8 runs:

run 1run 2run 3run 4run 5run 6run 7run 8
-893462878623767-83430-63039-15282-82377-65402

You see, the result is always different, and never 0. That's because some of the additions and subtractions are lost due to data races. We will now look at why this happens.

Global variables and Data Race prevention in Rust

It turns out, sharing memory between threads is often an unsafe thing to do. Generally, there are two ways to share memory between threads. One option we've seen, where multiple threads access a global variable. The other possibility, is that two thread get a pointer or reference to a memory location that is for example on the heap, or on the stack of another thread.

Notice that for example, in the previous c program, the threads could have also modified the delta through the int * passed in. The deltas for the two threads are stored on the stack of the main thread. But, because the c program never updates the delta, it only reads the value, sharing the delta is something that's totally safe to do.

Thus, sharing memory between threads can be safe. As long as at most one thread can mutate the memory, all is fine!

In Rust, there's a rule that these mechanics may remind you of. If not, take a look at the lecture notes from Software Fundamentals. A piece of memory can, at any point in time, only be borrowed by a single mutable reference or any number of immutable references - references that cannot change what's stored at the location they reference. These rules make it fundamentally impossible to create a data race involving references to either the heap or another thread's stack.

But what about global variables? Can those still cause data races? Let's try to recreate the C program from above in Rust:

use std::thread;

static mut v: i32 = 0;

fn count(delta: i32) {
    for _ in 0..100_000 {
        v += delta;
    }
}

fn main() {
    let t1 = thread::spawn(|| count(1));
    let t2 = thread::spawn(|| count(-1));

    t1.join().unwrap();
    t2.join().unwrap();

    println!("{}", v);
}

In Rust, you make a global variable by declaring it static, and we make it mutable, so we can modify it in count(). thread::spawn creates threads, and .join() waits for the thread to finish. But, if you try compiling this you will encounter an error:

error[E0133]: use of mutable static is unsafe and requires unsafe function or block
--> src/main.rs:31:9
   |
31 |         v += delta;
   |         ^^^^^^^^^^ use of mutable static
   |
= note: mutable statics can be mutated by multiple threads: 
        aliasing violations or data races will cause undefined behavior

The error even helpfully mentions that this can cause data races.

Why then, can you even make a static variable mutable? If any access causes the program not to compile? Well, this compilation error is actually one we can circumvent. We could put the modification of the global variable in an unsafe block. We will talk about the exact behaviour of unsafe blocks in lecture 3

That means, we can make this program compile by modifying it like so:

use std::thread;

static mut v: i32 = 0;

fn count(delta: i32) {
    for _ in 0..100_000 {
        // add an unsafe block
        unsafe {
            v += delta;
        }
    }
}

fn main() {
    let t1 = thread::spawn(|| count(1));
    let t2 = thread::spawn(|| count(-1));

    t1.join().unwrap();
    t2.join().unwrap();

    // add an unsafe block
    unsafe {
        println!("{}", v);
    }
}

Even though the program now compiles, we did introduce the same data race problem as we previously saw in C. This program will rarely give 0 as its result. Is there a way to solve this?

Mutexes

Let's first look how we can solve the original data race problem in C. What you can do to make sure data races do not occur, is to add a critical section. A critical section is a part of the program, in which you make sure that no other threads execute that part at the same time. One way to create a critical section is to use a mutex. A mutex's state can be seen as a boolean. It's either locked, or unlocked. A mutex can safely be shared between threads, and if one thread tries to lock the shared mutex, one of two things can occur:

  • The mutex is unlocked. If this is the case, the mutex is immediately locked.
  • The mutex is currently locked by another thread. The thread that tries to lock it has to wait.

If a thread has to wait to lock the thread, generally the thread is suspended such that it doesn't consume any CPU resources while it waits. The OS may schedule another thread that is not waiting on a lock.

However, what's important is that this locking operation is atomic. In other words, a mutex really can only be locked by one thread at a time without fear of data races. Let's go back to C and see how we would use a mutex there:


int v = 0;
mtx_t m;

void count(int * delta) {
    for (int i = 0; i < 100000; i++) {
        // start the critical section by locking m
        mtx_lock(&m);
        v += *delta;
        // end the section by unlocking m. This is very important
        mtx_unlock(&m);
    }
}

int main() {
    thrd_t t1, t2; int d1 = 1, d2 = -1;
    // initialize the mutex.
    mtx_init(&m, mtx_plain);

    thrd_create(&t1, count, &d1);
    thrd_create(&t2, count, &d2);

    thrd_join(t1, NULL);
    thrd_join(t2, NULL);

    printf("%d\n", v);
}

The outcome of this program, in contrast to the original program, is always zero. That's because any time v is updated, the mutex is locked first. If another thread starts executing the same code, it has to wait until the other thread unlocks it, therefore the two threads can't update v at the same time.

But still, a lot of things can go wrong in this c program. For example, we need to make sure that any time we use v, we lock the mutex. If we forget it once, our program is unsafe. And if we forget to unlock the lock once? Well, then the program may get stuck forever.

Mutexes in Rust

In Rust, the process of using a mutex is slightly different. To start, a mutex and the variable it protects are not separated. Intead of making a v and an m like in the C program above, we combine the two:

#![allow(unused)]
fn main() {
let a: Mutex<i32> = Mutex::new(0);
}

Here, a is a mutex, and inside the mutex, an integer is stored. However, the storage location of our value within the mutex is private. That means, you cannot access it from the outside. The only way to read or write the value inside a mutex is to lock it. This conveniently makes a safety concern we'd have in C impossible: you can never update the value without locking the mutex.

In Rust, the lock function returns a so-called mutex guard. Let's look at that:

use std::sync::Mutex;

fn main() {
    let a: Mutex<i32> = Mutex::new(0);

    // locking returns a guard. we'll talk later at what the unwrap is for.
    let mut guard = a.lock().unwrap();

    // the guard is a mutable pointer to the integer inside our mutex
    // that means we can use it to increment the value, from what was initially 0
    *guard += 1;

    // the scope of the main function ends here. This drops the `guard` 
    // variable, and automatically unlocks the mutex
}

The guard both acts as a way to access the value inside the mutex, and as an indicator of how long to lock the mutex. As soon as the guard is dropped (i.e. goes out of scope), it automatically unlocks the mutex that it was associated with.

What may surprise you, is that we never declared a to be mutable. The reason mutability is important in Rust, is to prevent data races and pointer aliasing. Both of those involve accessing the same piece of memory mutably at the same time. But a mutex already prevents that: we can only lock a mutex once. That means that since using a mutex already prevents both mutable aliasing and data races, so we don't really need the mutability rules for mutexes. This is all possible, because the .lock() function does not need a mutable reference to the mutex. And that allows us to share references to this mutex safely between threads since Rust's rules would prevent us from sharing a mutable reference between threads.

With that knowledge, let's look at the an improved Rust version of the counter program:

use std::thread;
use std::sync::Mutex;

static v: Mutex<i32> = Mutex::new(0);

fn count(delta: i32) {
    for _ in 0..100_000 {
        // add an unsafe block
        *v.lock().unwrap() += delta;
    }
}

fn main() {
    let t1 = thread::spawn(|| count(1));
    let t2 = thread::spawn(|| count(-1));

    t1.join().unwrap();
    t2.join().unwrap();

    // add an unsafe block
    println!("{}", v.lock().unwrap());
}

Lock Poisoning

You may notice that we need to use unwrap every time we lock. That's because lock returns a Result. Which means, it can fail. To learn why, we need to consider the case in which a program crashes while it holds a lock.

When a thread crashes (or panics, in Rust language), it doesn't crash other threads, nor will it exit the program. That means, that if another thread was waiting to acquire the lock, it may be stuck forever when the thread that previously had the lock crashed before unlocking it.

To prevent threads getting stuck like that, the lock is "poisoned" when a thread crashes while holding it. From that moment onward, all other threads that try to lock the same lock, will fail to lock, and instead see the Err() variant of the Result. And what does unwrap do? It crashes the thread when it sees an Err() variant. In effect, this makes sure that if one thread crashes while holding the lock, all threads crash when acquiring the same lock. And even though crashing may not be desired behaviour, it's a lot better than being stuck forever.

If you really want to make sure your threads don't crash when the lock is poisoned, you could handle the error by using matching. But, it's actually not considered bad practice to unwrap lock results.

Sharing heap-allocated values between threads

Sometimes, it's not desirable to use global variables to share information between threads. Instead, you may want to pass a reference to a local variable to a thread. For example, with the counter example we may want multiple pairs of threads to be counting up and down at the same time. But that's not possible when we use a global, since that's shared between all threads.

Let's look at how we may try to implement this:

use std::thread;
use std::sync::Mutex;
fn count(delta: i32, v: &Mutex<i32>) {
    for _ in 0..100_000 {
        *v.lock().unwrap() += delta
    }
}

fn main() {
    // v is a local variable
    let v = Mutex::new(0);

    // we pass an immutable reference to it to count
    // note: this is possible since locking doesn't require 
    // a mutable reference
    let t1 = thread::spawn(|| count(1, &v));
    let t2 = thread::spawn(|| count(-1, &v));

    t1.join().unwrap();
    t2.join().unwrap();

    println!("{}", v.lock().unwrap());
}

But you'll find that this doesn't work. That's because it's possible for sub-threads to run for longer than the main thread. It could be that main() has already returned and deallocated v before the counting has finished.

As it turns out, this isn't actually possible because we join the threads before that happens. But this is something Rust cannot track. Scoped threads could solve this, but let's first look into how to solve this without using scoped threads.

What we could do is allocate v on the heap. The heap will be around for the entire duration of the program, even when main exits. In Software Fundamentals we have learned that we can do this by putting our value inside a Box.

fn main() {
    let v = Box::new(Mutex::new(0));

    // rest
}

But Box doesn't allow us to share its content between threads (or even two functions) because that would make it impossible to figure out when to deallocate the value. If you think about it, if we share the value between the two threads, which thread should free the value? The main thread potentially can't do it if it exited before the threads did.

Rc and Arc

As an alternative to Box, there is the Rc type. It also allocates its contents on the heap, but it allows us to reference those contents multiple times. So how does it know when to deallocate?

Rc stands for "reference counted". Reference counting means that every time we create a new reference to the Rc, internally a counter goes up. On the flip side, every time we drop a reference, the counter goes down. When the counter reaches zero, nothing references the Rc any more and the contents can safely be deallocated.

Let's look at an example:

use std::rc::Rc;
fn create_vectors() -> (Rc<Vec<i32>>, Rc<Vec<i32>>) {
    // here we create a reference counted value with a vector in it
    // The reference count starts at 1, since a references the Vec
    let a = Rc::new(vec![1, 2, 3]);

    // here we clone a. Cloning an Rc doesn't clone the contents. 
    // Instead, a new reference to the same Vec is created. Because
    // we create 2 more references here, at the end the reference 
    // count is `3`
    let ref_1 = a.clone(); // doesn't clone the vec! only the reference!
    let ref_2 = a.clone(); // doesn't clone the vec! only the reference!

    // so here, the reference count is 3
  
    // but only ref_1 and ref_2 are returned. Not a.
    // Instead, a is dropped at the end of the function. 
    // But dropping a won't deallocate the Vector since the 
    // reference count is still 2.
    (ref_1, ref_2)
}

fn main() {
    // here we put ref_1 and ref_2 in a and b.
    // However, both a and b refer to the same vector,
    // with a reference count of 2.
    let (a, b) = create_vectors(); // Both are the same vector
    println!("{:?}", a);
    println!("{:?}", b);
  
   // here, finally both a and b are dropped. This makes the
   // reference count first go down to 1, (when a is dropped) 
   // and then to 0. When b is dropped, it notices the reference
   // count reaching zero and it frees the vector since it is now.
   // sure nothing else references the vector anymore
}

To avoid data races, an Rc does not allow you to mutate its internals. If you want to anyway, we could use a Mutex again since a Mutex allows us to modify an immutable value by locking.

Note that an Rc is slightly less efficient than using normal references or Boxes. That's because every time you clone an Rc or drop an Rc, the reference count needs to be updated.

Send and Sync

Let's try to write the original counter example with a reference counted local variable:

use std::rc::Rc;
use std::sync::Mutex;
use std::thread;

fn count(delta: i32, v: Rc<Mutex<i32>>) {
  for i in 0..100_000 {
    *v.lock().unwrap() += delta
  }
}

fn main() {
  // make an Rc
  let v = Rc::new(Mutex::new(0));
  // clone it twice for our two threads
  let (v1, v2) = (v.clone(), v.clone());
  
  // start the counting as we have done before
  let t1 = thread::spawn(|| count(1, v1));
  let t2 = thread::spawn(|| count(-1, v2));

  t1.join().unwrap();
  t2.join().unwrap();

  println!("{}", v.lock().unwrap());
}

You will find that this still does not compile! As it turns out, it is not safe for us to send an Rc from one thread to another or to use one concurrently from multiple threads. That's because every time an Rc is cloned or dropped, the reference counter has to be updated. If we make mistakes with that, we might free the contained value too early, or not at all. But if we update the reference count from multiple threads, we could create a data race again. We can solve this by substituting our Rc for an Arc.

Just like an Rc is slightly less efficient than a Box, an Arc is slightly less efficient than an Rc. That's because when you clone an Arc, the reference count is atomically updated. In other words, it uses a critical section to prevent data races (though it doesn't actually use a mutex, it's a bit smarter than that).

use std::sync::{Arc, Mutex};
use std::thread;

fn count(delta: i32, v: Arc<Mutex<i32>>) {
  for i in 0..100_000 {
    *v.lock().unwrap() += delta
  }
}

fn main() {
  // make an Rc
  let v = Arc::new(Mutex::new(0));
  // clone it twice for our two threads
  let (v1, v2) = (v.clone(), v.clone());

  // start the counting as we have done before
  let t1 = thread::spawn(|| count(1, v1));
  let t2 = thread::spawn(|| count(-1, v2));

  t1.join().unwrap();
  t2.join().unwrap();

  println!("{}", v.lock().unwrap());
}

And finally, our program works!

But there is another lesson to be learned here: Apparently, some datatypes can only work safely when they are contained within a single thread. An Rc is an example of this, but there's also Cells and RefCells, and for example, you also can't really move GPU rendering contexts between threads.

In Rust, there are two traits (properties of types): Send and Sync that govern this.

  • A type has the Send property, if it's safe for it to be sent to another thread
  • A type has the Sync property, if it's safe for it to live in one thread, and be referenced and read from in another thread.

An Rc is neither Send nor Sync, while an Arc is both as long as the value in the Arc is also Send and Sync.

Send and Sync are automatically implemented for almost any type. A type only doesn't automatically get these properties if the type explicitly opts out of being Send or Sync, or if one of the members of the type isn't Send or Sync.

For example, an integer is both Send and Sync, and so is a struct only containing integers. But an Rc explicitly opts out of being Send and Sync:

#![allow(unused)]
fn main() {
// from the rust standard library

impl<T: ?Sized> !Send for Rc<T> {}
impl<T: ?Sized> !Sync for Rc<T> {}
}

And a struct with in it an Rc:

#![allow(unused)]
fn main() {
use std::rc::Rc;
struct Example {
  some_field: Rc<i64>,
}
}

is also not Send and not Sync.

It is possible for a type to be only Send or only Sync if that's required.

Scoped threads

In Rust 1.63, a new feature was introduced named scoped threads. This allows you to tell the compiler that threads must terminate before the end of a function. And that means, it becomes safe to share local variables with threads.

Let us, for the last time, look at the counter example, but implement it without having to allocate a reference counted type on the heap:

use std::sync::Mutex;
use std::thread;

fn count(delta: i32, v: &Mutex<i32>) {
  for i in 0..100_000 {
    *v.lock().unwrap() += delta
  }
}

fn main() {
  let v = Mutex::new(0);

  // create a scope
  thread::scope(|s| {
    // within the scope, spawn two threads
    s.spawn(|| count(1, &v));
    s.spawn(|| count(-1, &v));
    
    // at the end of the scope, the scope automatically
    // waits for the two threads to exit
  });
  
  // here the threads have *definitely* exited. 
  // and `v` is still alive here.
  println!("{}", v.lock().unwrap());
}

Because the scope guarantees the threads to exit before the main thread does, it's safe to borrow variables from the local scope of the main function. This means that allocating on the heap becomes unnecessary.

Read-Write locks

In some cases, a Mutex is not the most efficient locking primitive to use. Usually that's when a lock is read way more often than it's written to. Any time the code wants to read from a Mutex, it has to lock it, disallowing any other threads to either read or write to it. But when a thread just want to read from a Mutex, it's perfectly safe for other threads to also read from it at the same time. This is the problem that an std::sync::RwLock solves. It's like a Mutex, but has two different lock functions. read (which returns an immutable reference to its internals) and write, which works like lock on mutexes. A RwLock allows multiple threads to call read at the same time. You can find more documentation at them here.

1

Technically there are also thread-local variables which are global, but also not shared between threads. Instead, every thread will get its own copy of the variable.