Lecture 2: References

Ownership and References
Traits
Sized data
- Slices
- Strings

Ownership and References

We recommend that you read the rust book section about ownership and the book section about references to learn about this. In the lecture we discuss it at length, but the rust book already contains an excellent written explanation of the ownership rules (and why ownership rules!).

However, here is a summary of ownership rules and references:

Ownership:
- Each value in Rust has an owner, in the form of a variable binding.
- A value can have only one owner at a time.
- When the owner goes out of scope, the value will be dropped.
- Ownership can be moved to another owner using assignments or function calls
References:
- You can reference a value using & or &mut. This borrows the value, but does not transfer ownership.
- A reference cannot keep existing after the owner is dropped or moved.
- While a mutable reference to a value exists (which is until the mutable reference is dropped), no other references to the value can exist.

Traits

In lecture 5, we will talk about traits in much more detail. That mean's that although information stated here will be true, they may not be the whole truth.

A trait is something that marks a type. For example, a trait may indicate that values of a certain type can be copied, or compared to one another for equality. If a type is marked by a trait, it is said that that type implements the trait. For example, it is possible to determine whether two integers are equal. Therefore, integers implement Eq. However, floating point numbers do not implement Eq. The reason for that is, among other reasons, that there are multiple different bit patterns that all mean Not a Number for floats.

For this lecture, only a few traits are important. Clone, Copy and Sized. If a type implements Clone, it is possible to duplicate a value of that type. For example:

fn main() {
let a = 3;
let b = a.clone();
// both a and b are usable now (they are both 3)
}

Both a and b contain the value 3, and they both own that value. Of course, they do own a different copy of the variable since a value can have only one owner. If a type implements Clone, it has a method called .clone() which can be used to clone the value.

Some types that implement Clone, also implement Copy. If a type implements Copy, it signifies to the rust compiler that cloning the type is trivial. For example, cloning an integer is trivial. You do that all the time by moving it around. Types that implement Copy can be moved around freely. In the example above, the a.clone() is not necessary. a is an integer, so

fn main() {
let a = 3;
let b = a;
// a and b are both still usable now, since integers implement Copy
}

leaves both a 3 in a and in b.

Not all types that are Clone, are also Copy. Most structs are not Copy. Let's take a Vec for example:

fn main() {
let a = Vec::new();
let b = a;
// only b is usable now, a is moved into b.
}

The same code as before, now leaves a unusable (the compiler will complain if you use a) after a is assigned to b. Vec is not Copy so simply assigning it does not copy it. It moves the ownership from a to b. If you want both a and b to own the same vec, you need to use .clone().

fn main() {
let a = Vec::new();
let b = a.clone();
// both a and b are usable now
}

This makes the fact that you are cloning explicit. Cloning a Vec may take a considerable amount of time if the Vec is large. If the compiler were to do it in the background, you may get weird performance issues. Instead, you need to explicitly say when you want to clone a Vec, so you know at which points you're paying the performance cost.

Do note that cloneing itself is not bad. Sometimes you need to, and usually it's not actually that slow.

To see what traits a type implements, you can go to the type's documentation page. For example for the Vec: https://doc.rust-lang.org/stable/std/vec/struct.Vec.html#trait-implementations. You will see a line like impl<T> Clone for Vec<T> where T: Clone .... That means that a Vec containing a type T implements Clone only if it is possible to Clone that type T, which makes sense.

Lastly, there are types that are Sized, which we will talk about in the next section.

Sized data

Some types in rust are Sized. Actually, many types are Sized. A type implements Sized if that type has a size known at compile time. A struct automatically implements Sized if all of its members also implement Sized. That makes sense, if all members have a known size at compile time, the struct's size is simply the sum of the members¹.

Almost all types implement Sized. For example, integers, floats, booleans, Vec and most structs and references to types.

So what types don't implement Sized? One example is the slice type. You can read a lot more about it in the rust book. You write the slice type as [T]. That looks a lot like the type of an array: [T; n] where n is the length. A slice is an array of unknown length. Therefore, we can't know its size at compile time, and thus [T] can't implement Sized.

Values of types that don't implement Sized, can't be stored in variables on the stack. So how do we use a slice? A reference to any type always implements Sized. Regardless of whether the type referenced implements Sized. Thus, we can't say let a: [T], but we can say let a: &[T]. A reference simply denotes a location in memory. We may not know the length of the array at that location at compile time, but we can store the location of the data in a variable and pass it around.

Note that when I say that the compiler doesn't know a size at compile time, I don't mean that the size can change constantly, like with a Vec. Consider the following function:

fn test(a: [u32]) {
  unimplemented!()
}

fn main() {
  test([1, 2, 3]);
  test([1, 2, 3, 4]);
}

test is called twice. Each time, with a different length array. The size of each array is perfectly known at compile time (3 and 4 elements). But should the size of a be in the test function? 3 or 4 elements?

fn test(a: &[u32]) {
  unimplemented!()
}

fn main() {
  test(&[1, 2, 3]);
  test(&[1, 2, 3, 4]);
}

However, if as above we give test a reference, we only give test the location of the array we pass it. So regardless of the length of the array, what we pass to test always has the same size.

Slices

Generally, we call a reference to an array (like above) a slice. A slice comprises two parts. The location the data lives at (like discussed above), but also the length of that data. This makes it possible to refer to segments of arrays and pass those around. Let's look at another example.

fn remove_first_last(a: &[i32]) -> &[i32] {
  if a.len() >= 2 {
     &a[1..a.len()-2]
  } else {
    a
  }
}

fn main() {
  let array /*:[i32; 4]*/ = [1, 2, 3, 4];
  let result = remove_first_last(&array);
  println!("{:?}", result)
}

This program should be pretty easy to understand. On line 10 we give remove_first_last a slice (with length 4, and pointing at array). However, remove_first_last doesn't actually remove any elements. It just returns a new slice with a different starting position and length. result acts like it's a new array. However, it actually is just a reference to the elements [2, 3] of the original array variable. You can still use both the original array and result. However, at this point you can use neither to modify the array. Because remember the rules of borrowing! There can only be a single mutable reference to a value, and if there is one, there can be no non-mutable references. Because result references array, array cannot be mutated (and the compiler will reject your code if you even try).

And now you may start to understand why this rule exists. Since both result and array refer to the same data, if one of the two modifies the array, the other will immediately notice. This makes your program extremely hard to reason about!

Strings

Before and after you use rust, how many string types you know about

Rust has a lot of different types that all seem to just mean "a string of text". If you did C before, you may know that it represents all strings as char *s. What are all these extra types for in Rust?

Let's start out with the simplest. &str is pretty much the same thing as a char * in C, and it will be the string type you will use most. A string literal has this type, so you can write:

fn main() {
  let a: &str = "test";
}

There is a difference however. A &str in Rust is not the same as a &[u8] like it would be in C. This is because &str works with UTF-8 encoded unicode data. Sometimes, you do want to work with just bytes, in which case there's the &[u8] type. So that covers two string types already.

You may know that in C, you can't always just add more letters to a string. To do that, you may need to use the malloc function and first find a space large enough for the letters to fit in. Note that we had a similar problem previously with arrays. We called a resizable array, a Vec.

Well, we call a resizable &str a String! Internally it's pretty much a Vec of UTF-8 encoded characters. It's allocated on the heap, and automtacally resizes if you add more data.

And those are all the string types you really need to know about for now. There are more, specifically to interoperate with C code (CStr, CString), or to represent strings received from the operating system (OsStr, OsString), but you will probably not need those much in the near future.

Actually, the size of a struct isn't strictly the sum of its members. Usually, some padding bytes are inserted to ensure alignment and optimize access times.

Software Fundamentals