Lecture 3: Enumeration types and Error Handling

Enums
- Enums with data
- Tagged Unions
Matching
Option
- Niche optimisation
Result

Enums

A simple enum in Rust might look familliar to you if you have already seen them in other languages. They are a datatype with a finite number of "options", a finite number of values that they can represent. Let's look at an example:

enum Color {
  Red,
  Blue,
  Yellow,
  Orange,
  Green
}

pub fn main() {
  let a: Color = Color::Red;
  let b = Color::Blue;
  
  // bring all "variants" into scope
  use Color::*;
  
  // which allows us to omit the 'Color::'
  let c = Blue;
}

The type of these variables (made explicit in variable a) is Color. The value can be either red, blue, yellow, orange or green, we call these options "variants".

Just like you might be used to in C, you can choose numbers to represent variants.

#[repr(u8)] // this is optional, it tells rust to store this enum as a `u8`
enum Color {
  Red = 5,
  Blue = 7,
  // if you don't specify one, it just continues counting.
  // so yellow will be 8
  Yellow,
  // after 8 we jump to 10
  Orange = 10,
  Green = 15,
}

fn main() {
    let a = Color::Red as u8;
    
    // Note that this does not work!
    // let c = 5 as Color;
    // *even* though 5 is a valid variant number. 
}

By default, (if you don't explicitly assign numbers) enum variants start counting at 0, in steps of 1.

Enums with data

Sofar, enums are pretty similar to C or Java's enums. However, in Rust enums are a lot more powerful. Enums can have data associated with each variant. Let's look at an example with IP addresses. There are both IPV6 and IPV4 addresses, and their representation is slightly different.

#![allow(unused)]
fn main() {
enum IpAddress {
  V4(u8, u8, u8, u8),
  V6(u128)
}
}

This definition says, that an ip address can either be a V4 address. In that case, the type contains four 8-bit integers. However, if the address is a V6 address, the value contains a single 128-bit integer.

We use it like this:

enum IpAddress {
 V4(u8, u8, u8, u8),
 V6(u128)
}

fn main() {
    let a_v4_address = IpAddress::V4(192, 168, 0, 1);
  
    // a u128 in hexadecimal notation. 
    // Notice that we are allowed to use underscores to separate parts of numbers!
    let a_v6_address = IpAddress::V6(0xDEAD_BEEF_CAFE_BABE_BAAA_AAAD_1234_5678);
}

Tagged Unions

This may look a lot like the definition of a union if you're used to C. Just like with unions, an enum with data is a bit like multiple overlapping structs. The same piece of memory can have multiple interpretations based on what variant is stored there. Therefore, enum types with associated data are sometimes called "tagged unions".

The "tagged" part refers to the fact that unlike unions in C, Rust's enums contain a tag that represent which of the variants is currently active. An example:

#include<stdio.h>

union FloatOrInt {
  float f;
  int i;
}

void main() {
    // one variable
    FloatOrInt u;
    
    // we fill it as if it's a float, with a float value
    u.f = 0.5;
    
    printf("%f", u.f);
    
    // now we use the same space for an integer
    u.i = 5;
    printf("%d", u.i);
    
    // oops! even though an integer is stored in u,
    // we interpreted it as a float by accident
    // C will compile this code, even though it's undefined
    // behavior. 
    printf("%f", u.f);
}

On the final line, C allows us to interpret a memory location that stores an integer as a float. Since the bit representation of a float is entirely different from that of an int, this will print nonsense. Because in C, unions have no tags, at runtime there is no way for code to check which variant is active: float or int.

In rust, enums are tagged. An example similar to the above example in C:

enum FloatOrInt {
  Float{f: f64},
  Int{i: i64},
}

fn main() {
  // initialize as a float
  let u = FloatOrInt::Float{f: 0.5};
  
  // we cannot now do:
  // println!("{}", u.f);
  // since we haven't checked that u is a float 
  // (here it's obvious but that's not always true)
  
  // check that u "matches" (see the next section) 
  // the "Float" tag
  if let FloatOrInt::Float {f} = u {
    println!("{}", f);
  }

  // this code will not execute since the tag is `Float`,
  // not `Int`. We cannot accidentally interpret the memory
  // as an integer.
  if let FloatOrInt::Int {i} = u {
    println!("{}", i);
  }
}

Here, if let Tag { ...fields...} = value { block of code } means that if value has the tag Tag, then put the contents in the variables mentioned in fields and make those variables available in block of code (and only if the tag matches, run the code that block of code). We call this process matching, and we will dive deeper into that in the next section.

To read more about using enums, take a look at the rust book!

Matching

If you have an enum with data, you sometimes want to get this data out. We briefly looked at matching with if let in the previous section on tags. If we go back to the scenario of IP addresses, what if we want to write a function like the following, which returns the last byte of an IP address regardless of whether it's a v4 or v6 address.

#![allow(unused)]
fn main() {
fn last_byte(addr: IpAddress) -> u8 {
  ...
}
}

We could use if let again:

#![allow(unused)]
fn main() {
fn last_byte(addr: IpAddress) -> u8 {
  // is it a v4 address? Return the last byte
  if let IpAddress::V4(_, _, _, last) = addr {
    return last
  }

  // is it a v6 address? Return the last byte
  if let IpAddress::V6(value) = addr {
    return (value & 0xff) as u8
  }
  
  // well it can't be anything except v4 or v6,
  // so we can just crash if we get here (we won't ever)
  unreachable!()
}
}

In general, you can do this by using a match statement.

#![allow(unused)]
fn main() {
fn last_byte(addr: IpAddress) -> u8 {
  match addr {
    IpAddr::V4(_, _, _, a) => a,
    IpAddr::V6(v) => v & 0xff,
  }
}
}

However, matching doesn't just work on enums with data in them. It works on any type:

#![allow(unused)]
fn main() {
fn print_color(c: Color) {
  match c {
    Color::Red => println!("red"),
    Color::Blue => println!("blue"),
    Color::Yellow => println!("yellow"),
    Color::Orange => println!("orange"),
    Color::Green => println!("green"),
  }
}
}

Or on integers, where it behaves more like a switch statement:

#![allow(unused)]
fn main() {
fn test(a: usize) {
  match a {
    0 => prinln!("the number is zero"),
    1 ..= 10 => println!("the number is small"),
    11 | 12 => println!("the number is 11 or 12"),
    x => println!("the number is something else, specifically: {}", x),
  }
}
}

Note that regardless what you use match for, a match always must be exhaustive. That means, whatever the value we match on it, one match arm must execute. So in the case of an enum, every single variant needs to be handled, or a "catch-all" needs to be provided. In the example above, the x case is executed if none of the other arms executed.

Sometimes you don't care about the value in the catch-all arm, and you can replace the x with a _ to throw it away.

Note that arms are evaluated in-order. That means you can't put the catch-all arm first since then it would always be triggered and none of the arms below would ever even be checked.

Options

One good application of enums, is when, for example, you have a function and it may or may not return a value. You could represent that as follows:

#![allow(unused)]
fn main() {
enum MayReturnNumber {
  Value(i32),
  Nothing
}

fn test(a: i32) -> MayReturnNumber {
  if a.is_even() {
    MayReturnNumber::Value(a / 2)
  } else {
    MayReturnNumber::Nothing
  }
}
}

This pattern is so common that the standard library provides a type like this. It's called Option<T>. The definition of it is as follows:

#![allow(unused)]
fn main() {
/// From the standard library! 
/// Available without importing in any program
pub enum Option<T> {
  /// No value.
  None,
  /// Some value of type `T`.
  Some(T),
}
}

And you would use it like

#![allow(unused)]
fn main() {
fn test(a: i32) -> Option<i32> {
  if a.is_even() {
    Some(a / 2)
  } else {
    None
  }
}
}

Notice that because Option is so-called "generic over a type T". It works for any return type. In the example above we say that the type inside the option is i32, but it could be anything.

Niche Optimisation

Note: The following few paragraphs are about some internals of enums. You may find it interesting, but won't need much of it in your exercises. You can also skip it if you're in a hurry.

So how do enums work? Internally, they are so-called "tagged enums". Option<T> would desugar roughly to:

#![allow(unused)]
fn main() {
struct Option<T> {
  is_some: bool,
  data: T
}
}

Larger enums would get a number instead of a boolean determining which of the variants they are. And if more than one of the variants contains data, all the data is stored in the same location in the struct (this is possible since an enum is never two variants at once). Very similar to a union in c.

But this means that using an enum has a bit of overhead. This "tag" (the boolean or number determining what variant it is) needs some space.

Okay, let's keep that in the back of our mind for a bit. A common thing to do with for example Options, is to store a reference in it. The reason for that is that rust doesn't ever allow a reference to be null. So if you want to have a type that works like a null pointer, you'd quickly arrive at Option<&T> (where T is of course any type you like). But, you may say, that's inefficient! now we store both a pointer and a tag. Why can't pointers just be null?

Well actually, that's incorrect. An Option<&T> does not take more space than a single pointer. That's cause the rust compiler realises that &T has a niche. As I just said, a reference can never be null. Thus, rust stores an Option<&T> just as a &T, and sets the reference to null when the option is None. As a programmer you can use this Option as normal, but a bit of space is saved under the hood.

To read more about this, read this and this from the rust documentation

Results

The Result type is like Option. It's part of the standard library, and looks roughly like this:

#![allow(unused)]
fn main() {
pub enum Result<T, E> {
  Ok(T),
  Err(E),
}
}

With a result, you can represent an operation that could fail. For example:

#![allow(unused)]

fn main() {
/// Reading a file can fail with one of two reasons here.
/// Either it couldn't be opened, or it couldn't be read.
pub enum FileError {
  Open,
  Read
}

/// When we read a file, we return either the contents as a String,
/// or a FileError.
pub fn read_file(path: String) -> Result<String, FileError> {
  // we first open and see if this succeeded
  // notice that open() also returns a Result,
  // so here too we match on Err or Ok
  match std::fs::File::open(path) {
    // if not, return an error, inside the Err variant
    Err(_) => Err(FileError::Open),
    // if opening worked, we fet an f
    Ok(mut f) => {
      // we allocate a string on the heap
      let mut res = String::new();
      // and try to read the contents into it
      match f.read_to_string(&mut res) {
        // this may succeed, then we're done 
        // and can return the result in an "Ok"
        Ok(_) => Ok(res),
        // if it didn't succeed, we return Err again, but with a different error code (Read)
        Err(_) => Err(FileError::Read)
      };
    }
  };
}
}

This may immediately look a bit cumbersome. So let me also immediately simplify it a lot!

#![allow(unused)]

fn main() {
/// This example is the same as below, except that the standard library
/// already defined a set of error codes for when you interact with the filesystem
/// (since those kinds of operations usually return the same kinds of errors. 
/// For example, reading, writing or creating files)
pub fn read_file(path: String) -> Result<String, std::io::Error> {
  // see the question mark at the end? That just
  // means that if open returned an error, immediately
  // return the function with an error too. That means
  // that only if the file was opened successfully, we 
  // continue with an open file in `f`.
  let f = std::fs::File::open(path)?;
  
  // now we again create a string to put the result in
  let mut res = String::new();
  
  // and read. This may again return an error. If so,
  // we again use the question mark to say that we want
  // to return the function if the operation fails.
  f.read_to_string(&mut res)?;
  
  // if we get here, all previous operations worked, since 
  // we did not return yet.
  res
}
}

And once more without all the comments, so you can see better how concise this is:

#![allow(unused)]
fn main() {
pub fn read_file(path: String) -> Result<String, std::io::Error> {
    let f = std::fs::File::open(path)?;
  
    let mut res = String::new();
    f.read_to_string(&mut res)?;

    res
}
}

Actually, you can also use the ? operator to work with options:

#![allow(unused)]
fn main() {
/// a and b are both maybe an integer. This function returns
/// a + b when both a and b are Some.
fn add_options(a: Option<i32>, b: Option<i32>) -> Option<i32> {
  // definitely_a definitely contains an integer, because
  // if a was None, the question mark makes the function immediately
  // return None.
  let definitely_a = a?;
  
  // definitely_b is now also definitely an integer
  let definitely_b = b?;
  
  // integers we can add, and then we wrap the whole thing in a Some to
  // say that the function returns something.
  Some(a + b)
}
}

To read more about results and the question mark operator, you can read the chapter from the rust book about it. However, here is a short summary:

functions that can fail, return a Result.
A Result has two generic type parameters. The first, is what type the function returns when it succeeds. The right is the type that the function returns when it fails.
If a function returns a Result or an Option then inside it, you can use the ? operator.
The ? operator returns the current function immediately, if it finds an Err or a None value.

Software Fundamentals