Lecture 1.1
Contents
- Course Information
- Threads
- Data races
- Global variables and Data Race prevention in Rust
- Mutexes
- Mutexes in Rust
- Lock Poisoning
- Sharing heap-allocated values between threads
- Read-Write locks
Course Information
In this lecture we discussed the setup of the course and deadlines too. However, you can already find those on the homepage of this website
Threads
Most modern CPUs have more than one core. Which means that, in theory, a program can do more than a single thing at the same time. In practice this can be done by creating multiple processes or multiple threads within one process. How this works will be explained in lecture 1.2.
For now, you can think of threads as a method with which your program can execute multiple parts of your program at the same time. Even though the program is being executed in parallel, all threads still share the same memory * except* for their stacks1. That means that, for example, a global variable can be accessed by multiple threads.
Data races
This parallelism can cause a phenomenon called a data race. Let's look at the following C program:
static int a = 3;
int main() {
a += 1;
}
And let's now also look at the associated assembly:
main:
push rbp
mov rbp,rsp
mov eax,DWORD PTR [rip+0x2f18] # 404028 <a>
add eax,0x1
mov DWORD PTR [rip+0x2f0f],eax # 404028 <a>
mov eax,0x0
pop rbp
ret
Notice, that after some fiddling with the stack pointer, the +=
operation consists of three instructions:
- loading the value of
a
into a register (eax
) - adding
1
to that register - storing the result back into memory
However, what happens if more than one thread were to execute this program at the same time? The following could happen:
thread 1 | thread 2 |
---|---|
load the value of a into eax | load the value of a into eax |
add 1 to thread 1's eax | add 1 to thread 2's eax |
store the result in a | store the result in a |
Both threads ran the code to increment a
. But even though a
started at 3
, and was incremented by both threads, the
final value of a
if 4
. Not 5
. That's because both threads read the value of 3
, added 1
to that, and both
stored back the value 4
.
To practically demonstrate the result of such data races, we can look at the following c program:
#include <stdio.h>
#include <threads.h>
int v = 0;
void count(int * delta) {
// add delta to v 100 000 times
for (int i = 0; i < 100000; i++) {
v += *delta;
}
}
int main() {
thrd_t t1, t2;
int d1 = 1, d2 = -1;
// run the count function with delta=1
thrd_create(&t1, count, &d1);
// run the count function with delta=-1 at the same time
thrd_create(&t2, count, &d2);
thrd_join(t1, NULL);
thrd_join(t2, NULL);
printf("%d\n", v);
}
Since we increment and decrement v 100 000
times, you'd expect the result to be 0
. However, that's not the case. You
can run this program yourself to see (just compile it with a compiler that supports C11), but I've run it a couple of
times and these were the results of my 8 runs:
run 1 | run 2 | run 3 | run 4 | run 5 | run 6 | run 7 | run 8 |
---|---|---|---|---|---|---|---|
-89346 | 28786 | 23767 | -83430 | -63039 | -15282 | -82377 | -65402 |
You see, the result is always different, and never 0
. That's because some of the additions and subtractions are lost
due to data races. We will now look at why this happens.
Global variables and Data Race prevention in Rust
It turns out, sharing memory between threads is often an unsafe thing to do. Generally, there are two ways to share memory between threads. One option we've seen, where multiple threads access a global variable. The other possibility, is that two thread get a pointer or reference to a memory location that is for example on the heap, or on the stack of another thread.
Notice that for example, in the previous c program, the threads could have also modified the delta through the int *
passed in. The deltas for the two threads are stored on the stack of the main thread. But, because the c program never
updates the delta, it only reads the value, sharing the delta is something that's totally safe to do.
Thus, sharing memory between threads can be safe. As long as at most one thread can mutate the memory, all is fine!
In Rust, there's a rule that these mechanics may remind you of. If not, take a look at the lecture notes from Software Fundamentals. A piece of memory can, at any point in time, only be borrowed by a single mutable reference or any number of immutable references - references that cannot change what's stored at the location they reference. These rules make it fundamentally impossible to create a data race involving references to either the heap or another thread's stack.
But what about global variables? Can those still cause data races? Let's try to recreate the C program from above in Rust:
use std::thread; static mut v: i32 = 0; fn count(delta: i32) { for _ in 0..100_000 { v += delta; } } fn main() { let t1 = thread::spawn(|| count(1)); let t2 = thread::spawn(|| count(-1)); t1.join().unwrap(); t2.join().unwrap(); println!("{}", v); }
In Rust, you make a global variable by declaring it static, and we make it mutable, so we can modify it in
count()
. thread::spawn
creates threads, and .join()
waits for the thread to finish. But, if you try compiling
this you will encounter an error:
error[E0133]: use of mutable static is unsafe and requires unsafe function or block
--> src/main.rs:31:9
|
31 | v += delta;
| ^^^^^^^^^^ use of mutable static
|
= note: mutable statics can be mutated by multiple threads:
aliasing violations or data races will cause undefined behavior
The error even helpfully mentions that this can cause data races.
Why then, can you even make a static variable mutable? If any access causes the program not to compile? Well, this
compilation error is actually one we can circumvent. We could put the modification of the global variable in
an unsafe block
. We will talk about the exact behaviour of unsafe blocks in lecture 3
That means, we can make this program compile by modifying it like so:
use std::thread; static mut v: i32 = 0; fn count(delta: i32) { for _ in 0..100_000 { // add an unsafe block unsafe { v += delta; } } } fn main() { let t1 = thread::spawn(|| count(1)); let t2 = thread::spawn(|| count(-1)); t1.join().unwrap(); t2.join().unwrap(); // add an unsafe block unsafe { println!("{}", v); } }
Even though the program now compiles, we did introduce the same data race problem as we previously saw in C. This program will rarely give 0 as its result. Is there a way to solve this?
Mutexes
Let's first look how we can solve the original data race problem in C. What you can do to make sure data races do not occur, is to add a critical section. A critical section is a part of the program, in which you make sure that no other threads execute that part at the same time. One way to create a critical section is to use a mutex. A mutex's state can be seen as a boolean. It's either locked, or unlocked. A mutex can safely be shared between threads, and if one thread tries to lock the shared mutex, one of two things can occur:
- The mutex is unlocked. If this is the case, the mutex is immediately locked.
- The mutex is currently locked by another thread. The thread that tries to lock it has to wait.
If a thread has to wait to lock the thread, generally the thread is suspended such that it doesn't consume any CPU resources while it waits. The OS may schedule another thread that is not waiting on a lock.
However, what's important is that this locking operation is atomic. In other words, a mutex really can only be locked by one thread at a time without fear of data races. Let's go back to C and see how we would use a mutex there:
int v = 0;
mtx_t m;
void count(int * delta) {
for (int i = 0; i < 100000; i++) {
// start the critical section by locking m
mtx_lock(&m);
v += *delta;
// end the section by unlocking m. This is very important
mtx_unlock(&m);
}
}
int main() {
thrd_t t1, t2; int d1 = 1, d2 = -1;
// initialize the mutex.
mtx_init(&m, mtx_plain);
thrd_create(&t1, count, &d1);
thrd_create(&t2, count, &d2);
thrd_join(t1, NULL);
thrd_join(t2, NULL);
printf("%d\n", v);
}
The outcome of this program, in contrast to the original program, is always zero. That's because any time v
is updated,
the mutex is locked first. If another thread starts executing the same code, it has to wait until the other thread unlocks it,
therefore the two threads can't update v
at the same time.
But still, a lot of things can go wrong in this c program. For example, we need to make sure that any time we use v, we lock the mutex. If we forget it once, our program is unsafe. And if we forget to unlock the lock once? Well, then the program may get stuck forever.
Mutexes in Rust
In Rust, the process of using a mutex is slightly different. To start, a mutex and the variable it protects are not
separated. Intead of making a v
and an m
like in the C program above, we combine the two:
#![allow(unused)] fn main() { let a: Mutex<i32> = Mutex::new(0); }
Here, a
is a mutex, and inside the mutex, an integer is stored. However, the storage location of our value within the
mutex is private. That means, you cannot access it from the outside. The only way to read or write the value inside a
mutex is to lock it. This conveniently makes a safety concern we'd have in C impossible: you can never update the
value without locking the mutex.
In Rust, the lock function returns a so-called mutex guard. Let's look at that:
use std::sync::Mutex; fn main() { let a: Mutex<i32> = Mutex::new(0); // locking returns a guard. we'll talk later at what the unwrap is for. let mut guard = a.lock().unwrap(); // the guard is a mutable pointer to the integer inside our mutex // that means we can use it to increment the value, from what was initially 0 *guard += 1; // the scope of the main function ends here. This drops the `guard` // variable, and automatically unlocks the mutex }
The guard both acts as a way to access the value inside the mutex, and as an indicator of how long to lock the mutex. As soon as the guard is dropped (i.e. goes out of scope), it automatically unlocks the mutex that it was associated with.
What may surprise you, is that we never declared a
to be mutable. The reason mutability is important in Rust, is to
prevent data races and pointer aliasing. Both of those involve accessing the same piece of memory mutably at the same
time. But a mutex already prevents that: we can only lock a mutex once. That means that since using a mutex already
prevents both mutable aliasing and data races, so we don't really need the mutability rules for mutexes. This is all possible,
because the .lock()
function does not need a mutable reference to the mutex. And that allows us to share references to this mutex safely between threads since Rust's rules would prevent us from sharing a mutable reference between threads.
With that knowledge, let's look at the an improved Rust version of the counter program:
use std::thread; use std::sync::Mutex; static v: Mutex<i32> = Mutex::new(0); fn count(delta: i32) { for _ in 0..100_000 { // add an unsafe block *v.lock().unwrap() += delta; } } fn main() { let t1 = thread::spawn(|| count(1)); let t2 = thread::spawn(|| count(-1)); t1.join().unwrap(); t2.join().unwrap(); // add an unsafe block println!("{}", v.lock().unwrap()); }
Lock Poisoning
You may notice that we need to use unwrap
every time we lock. That's because lock
returns
a Result. Which
means, it can fail. To learn why, we need to consider the case in which a program crashes while it holds a lock.
When a thread crashes (or panics, in Rust language), it doesn't crash other threads, nor will it exit the program. That means, that if another thread was waiting to acquire the lock, it may be stuck forever when the thread that previously had the lock crashed before unlocking it.
To prevent threads getting stuck like that, the lock is "poisoned" when a thread crashes while holding it. From that
moment onward, all other threads that try to lock the same lock, will fail to lock, and instead see the Err()
variant of the
Result. And what does unwrap do? It crashes the thread when it sees an Err()
variant. In effect, this makes sure that
if one thread crashes while holding the lock, all threads crash when acquiring the same lock. And even though crashing
may not be desired behaviour, it's a lot better than being stuck forever.
If you really want to make sure your threads don't crash when the lock is poisoned, you could handle the error by using matching. But, it's actually not considered bad practice to unwrap lock results.
Sharing heap-allocated values between threads
Sometimes, it's not desirable to use global variables to share information between threads. Instead, you may want to pass a reference to a local variable to a thread. For example, with the counter example we may want multiple pairs of threads to be counting up and down at the same time. But that's not possible when we use a global, since that's shared between all threads.
Let's look at how we may try to implement this:
use std::thread; use std::sync::Mutex; fn count(delta: i32, v: &Mutex<i32>) { for _ in 0..100_000 { *v.lock().unwrap() += delta } } fn main() { // v is a local variable let v = Mutex::new(0); // we pass an immutable reference to it to count // note: this is possible since locking doesn't require // a mutable reference let t1 = thread::spawn(|| count(1, &v)); let t2 = thread::spawn(|| count(-1, &v)); t1.join().unwrap(); t2.join().unwrap(); println!("{}", v.lock().unwrap()); }
But you'll find that this doesn't work. That's because it's possible for sub-threads to run for longer than the main
thread. It could be that main()
has already returned and deallocated v
before the counting has finished.
As it turns out, this isn't actually possible because we join the threads before that happens. But this is something Rust cannot track. Scoped threads could solve this, but let's first look into how to solve this without using scoped threads.
What we could do is allocate v
on the heap. The heap will be around for the entire duration of the program, even
when main
exits.
In Software Fundamentals
we have learned that we can do this by putting our value inside a Box
.
fn main() { let v = Box::new(Mutex::new(0)); // rest }
But Box
doesn't allow us to share its content between threads (or even two functions) because that would make it
impossible to figure out when to deallocate the value. If you think about it, if we share the value between the two
threads, which thread should free the value? The main
thread potentially can't do it if it exited before the threads
did.
Rc and Arc
As an alternative to Box
, there is the Rc
type. It also allocates its contents on the heap, but it allows us to
reference those contents multiple times. So how does it know when to deallocate?
Rc
stands for "reference counted". Reference counting means that every time we create a new reference to the Rc
,
internally a counter goes up. On the flip side, every time we drop a reference, the counter goes down. When the counter reaches
zero, nothing references the Rc
any more and the contents can safely be deallocated.
Let's look at an example:
use std::rc::Rc; fn create_vectors() -> (Rc<Vec<i32>>, Rc<Vec<i32>>) { // here we create a reference counted value with a vector in it // The reference count starts at 1, since a references the Vec let a = Rc::new(vec![1, 2, 3]); // here we clone a. Cloning an Rc doesn't clone the contents. // Instead, a new reference to the same Vec is created. Because // we create 2 more references here, at the end the reference // count is `3` let ref_1 = a.clone(); // doesn't clone the vec! only the reference! let ref_2 = a.clone(); // doesn't clone the vec! only the reference! // so here, the reference count is 3 // but only ref_1 and ref_2 are returned. Not a. // Instead, a is dropped at the end of the function. // But dropping a won't deallocate the Vector since the // reference count is still 2. (ref_1, ref_2) } fn main() { // here we put ref_1 and ref_2 in a and b. // However, both a and b refer to the same vector, // with a reference count of 2. let (a, b) = create_vectors(); // Both are the same vector println!("{:?}", a); println!("{:?}", b); // here, finally both a and b are dropped. This makes the // reference count first go down to 1, (when a is dropped) // and then to 0. When b is dropped, it notices the reference // count reaching zero and it frees the vector since it is now. // sure nothing else references the vector anymore }
To avoid data races, an Rc
does not allow you to mutate its internals. If you want to anyway, we could use a Mutex
again
since a Mutex
allows us to modify an immutable value by locking.
Note that an
Rc
is slightly less efficient than using normal references orBox
es. That's because every time you clone anRc
or drop anRc
, the reference count needs to be updated.
Send and Sync
Let's try to write the original counter example with a reference counted local variable:
use std::rc::Rc; use std::sync::Mutex; use std::thread; fn count(delta: i32, v: Rc<Mutex<i32>>) { for i in 0..100_000 { *v.lock().unwrap() += delta } } fn main() { // make an Rc let v = Rc::new(Mutex::new(0)); // clone it twice for our two threads let (v1, v2) = (v.clone(), v.clone()); // start the counting as we have done before let t1 = thread::spawn(|| count(1, v1)); let t2 = thread::spawn(|| count(-1, v2)); t1.join().unwrap(); t2.join().unwrap(); println!("{}", v.lock().unwrap()); }
You will find that this still does not compile! As it turns out, it is not safe for us to send an Rc
from one thread
to another or to use one concurrently from multiple threads. That's because every time an Rc
is cloned or dropped, the
reference counter has to be updated. If we make mistakes with that, we might free the contained value too early, or not
at all. But if we update the reference count from multiple threads, we could create a data race again. We can solve this
by substituting our Rc
for an Arc
.
Just like an Rc
is slightly less efficient than a Box
, an Arc
is slightly less efficient than an Rc
. That's because
when you clone an Arc
, the reference count is atomically updated. In other words, it uses a critical section to prevent
data races (though it doesn't actually use a mutex, it's a bit smarter than that).
use std::sync::{Arc, Mutex}; use std::thread; fn count(delta: i32, v: Arc<Mutex<i32>>) { for i in 0..100_000 { *v.lock().unwrap() += delta } } fn main() { // make an Rc let v = Arc::new(Mutex::new(0)); // clone it twice for our two threads let (v1, v2) = (v.clone(), v.clone()); // start the counting as we have done before let t1 = thread::spawn(|| count(1, v1)); let t2 = thread::spawn(|| count(-1, v2)); t1.join().unwrap(); t2.join().unwrap(); println!("{}", v.lock().unwrap()); }
And finally, our program works!
But there is another lesson to be learned here: Apparently, some datatypes can only work safely when they are
contained within a single thread. An Rc
is an example of this, but there's also Cell
s and RefCell
s, and
for example, you also can't really move GPU rendering contexts between threads.
In Rust, there are two traits (properties of types): Send
and Sync
that govern this.
- A type has the
Send
property, if it's safe for it to be sent to another thread - A type has the
Sync
property, if it's safe for it to live in one thread, and be referenced and read from in another thread.
An Rc
is neither Send
nor Sync
, while an Arc
is both as long as the value in the Arc
is also Send
and Sync
.
Send
and Sync
are automatically implemented for almost any type. A type only doesn't automatically get
these properties if the type explicitly opts out of being Send
or Sync
, or if one of the members of the type
isn't Send
or Sync
.
For example, an integer is both Send
and Sync
, and so is a struct only containing integers.
But an Rc
explicitly opts out of being Send
and Sync
:
#![allow(unused)] fn main() { // from the rust standard library impl<T: ?Sized> !Send for Rc<T> {} impl<T: ?Sized> !Sync for Rc<T> {} }
And a struct with in it an Rc
:
#![allow(unused)] fn main() { use std::rc::Rc; struct Example { some_field: Rc<i64>, } }
is also not Send
and not Sync
.
It is possible for a type to be only Send
or only Sync
if that's required.
Scoped threads
In Rust 1.63, a new feature was introduced named scoped threads. This allows you to tell the compiler that threads must terminate before the end of a function. And that means, it becomes safe to share local variables with threads.
Let us, for the last time, look at the counter example, but implement it without having to allocate a reference counted type on the heap:
use std::sync::Mutex; use std::thread; fn count(delta: i32, v: &Mutex<i32>) { for i in 0..100_000 { *v.lock().unwrap() += delta } } fn main() { let v = Mutex::new(0); // create a scope thread::scope(|s| { // within the scope, spawn two threads s.spawn(|| count(1, &v)); s.spawn(|| count(-1, &v)); // at the end of the scope, the scope automatically // waits for the two threads to exit }); // here the threads have *definitely* exited. // and `v` is still alive here. println!("{}", v.lock().unwrap()); }
Because the scope guarantees the threads to exit before the main thread does, it's safe to borrow variables from the local scope of the main function. This means that allocating on the heap becomes unnecessary.
Read-Write locks
In some cases, a Mutex
is not the most efficient locking primitive to use. Usually that's when a lock is read way more
often than it's written to. Any time the code wants to read from a Mutex
, it has to lock it, disallowing any other
threads to either read or write to it. But when a thread just want to read from a
Mutex
, it's perfectly safe for other threads to also read from it at the same time. This is the problem that
an std::sync::RwLock
solves. It's like a Mutex
, but has two different lock functions. read
(which returns an
immutable reference to its internals) and write
, which works like lock
on mutexes. A RwLock
allows multiple
threads to call read
at the same time. You can find more documentation at
them here.
Technically there are also thread-local variables which are global, but also not shared between threads. Instead, every thread will get its own copy of the variable.