Lecture 7: Testing:

If you are interested in reading a book about this topic, we highly recommend this one by one of our own professors from the TU Delft, Mauricio Finavaro Aniche: https://www.effective-software-testing.com/. Some of this lecture has been inspired by it.

Simple Tests

A very simple technique to see if software works as it is supposed to, is to provide it with inputs and see if the outputs match your expectation. You might find yourself doing this manually, going through some inputs you know and printing the output. Although this works for small programs, for larger programs there may be many different inputs you might want to test. Furthermore, every time you change a complex program you really should ideally retest all inputs to see if you haven't broken anything.

A more sustainable way to test software is through automated testing. Writing automatic tests in Rust is quite easy, although it may take you a little longer to write one than manually trying an input. Usually, it simply boils down to writing a function with the #[test] annotation on it, and calling the function with your input(s).

#![allow(unused)]
fn main() {
fn fibonacci(n: u64) -> u64 {
    match n {
        0 => 0,
        1 => 1,
        n => fibonacci(n - 1) * fibonacci(n - 2)
    }
}

#[test]
fn test_fibonacci() {
    assert_eq!(fibonacci(0), 0);
}
}

Whenever cargo test is run in the root of a project, all functions marked with #[test] in that project are ran. Now you can keep these test cases around, such that every time you change your code, you can re-run cargo test to verify that everything is still working as expected.

As you can see in the example above, we use assert_eq! to test the input and output of fibonacci. This is an assertion. An assertion is a statement about your code, which you expect to pass. By making the right assertions, bad code will cause one or more of your assertions to fail, which you can observe. In Rust, there are three kinds of assertions built-in:

  • assert!(e) checking if e`` evaluates to true`.
  • assert_eq!(a, b) checking if a is equal to b, it is similar to assert!(a == b), but shorter and better output.
  • assert_ne!(a, b) checking if a is not equal to b.

Other kinds of assertions can be constructed out of these simple ones, you can for example give assert!() more complex expressions.

Rust's assertions optionally support messages, for example:

#![allow(unused)]
fn main() {
assert!(a.is_ok(), "a was not ok, and instead was {}", a.unwrap_err());
}

would give a nice error message with the exact error a was whenever a was not ok.

Cool! However, the test we just wrote did not catch the bug in our fibonacci program. We should have added the to recursive cases, not multiplied! We only tested one of our base cases, so we did not catch this.

That brings us to an important question: how many tests should we make to be sure that our program works as expected?

For the fibonacci code you might say that the answer is 3. One for each base case and some test that shows that the recursive case is correct.

#![allow(unused)]
fn main() {
fn fibonacci(n: u64) -> u64 {
    match n {
        0 => 0,
        1 => 1,
        42 => 42,
        n => fibonacci(n - 1) + fibonacci(n - 2)
    }
}
}

But with those three tests, you would not find the bug in the snippet of code above. You may show that the base cases and the recursive cases are correct, but you most likely didn't test whether the right output was given for n=42.

Thus the real answer to that question, how many tests we should make, is simple (but unsattisfying): you can't know. A very important rule in software testing is, as uttered by Dijkstra, "Testing can be used to show the presence of bugs, but never to show their absence". Still, testing can be very useful to improve your confidence about your code's correctness, and our confidence might increase as we test more of it.

Besides normal assertions, Rust also supports debug assertions. Those seem like the normal ones, but start with debug like debug_assert_eq!(a, b). These kinds of assertions get triggered when you compile in debug mode (which is what cargo run does by default), but when you compile in release mode, with all optimisations turned on, as with cargo run --release these assertions will not be checked. This is useful when you put assertions in the middle of your normal code, and you don't want your code to crash when running in production.

Coverage

So how do we make sure we test all of our code? One possibility is to use tools such as tarpaulin or llvm-cov, which both perform a very similar task: generating coverage reports. With code coverage tools we can measure which parts of a program get executed by our tests, which can help show us what tests we still have left to write.

How that works, is that the compiler will, together with these special tools, insert a special function call after every single line of code in your program telling the coverage tool that that line is executed. The code below illustrates that, though many details are omitted:

#![allow(unused)]
fn main() {
fn fibonacci(n: u64) -> u64 {
    cov(2); match n {
        0 => {cov(3); 0},
        1 => {cov(4); 1},
        n => {cov(5); fibonacci(n - 1) + fibonacci(n - 2)}
    }
}

// called when a certain line is covered
fn cov(line: usize) {todo!()}
}

Note that this may make your code slightly slower, since these extra function calls do have overhead.

Let's use tarpaulin as an example. With the above code in a project, we can simply run cargo tarpaulin -o html and after some running the tool will generate an html file called tarpaulin-report.html which will look like the example below (note that you can click on it! may work better with light theme):

Types of Coverage

Cargo tarpaulin shows us which lines in our program are and aren't executed. We call this line coverage. Although useful, don't make the mistake of thinking that when all lines are covered by your testcases, there are no bugs left in your program. When a line of code is reachable in multiple ways, for example by one of several conditions being true, and you only test one such path, any of the other paths may still be incorrect.

There are different metrics we can use for coverage. For example, cargo llvm-cov supports region coverage. A single code region may contain multiple regions, for example in boolean expressions a > 3 || b < 5 would be two regions.

One step stronger still is branch coverage. The previous example contains 2 regions, but 4 possible branches: a > 3, a <= 3, b < 5 and b >= 5. Under branch coverage, a test needs to verify each of these. Unfortunately, branch coverage is not very well supported in Rust yet.

And still, even with these stronger types of coverage, you have to remember that with 100% coverage there can still be bugs!

Types of Tests

When we test code, there are different kinds of tests of different granularity, which have commonly used names.

  • Unit Tests: A test of a single isolated part of a program (unit) like a single function.
  • Integration Test: A test of various systems together to see whether they work together correctly.
  • End-to-End/System test: A test of the entire system, from start to finish. Usually the last thing you do.
  • Smoke Test: Often the very first test you make to just see if one very common usage of your program doesn't completely crash. Useful while you start writing a program before you do proper testing to simply see if you haven't done something very stupid.

These kinds of tests have different granularity, and are useful for different purposes. When an integration test or end to end test fails, it might not be obvious what the reason of the failure is, it could be anywhere in the system. With a unit test you can make sure that each basic component works correctly. Once you are sure that that is the case, you can work your way up to integration tests and end to end tests to see if more and more of a system is correct.

Test Driven Development

Up to now, we have assumed that we write tests after we wrote software, to see if the software we wrote first is actually correct. However, that does not always have to be the order. When a bug is found in a large or complicated project, it can be hard to find what exactly causes it. In that case it can be helpful to first create a test that reliably triggers the bug and fails. Then, it becomes very clear what has to be done to fix the bug: make the test pass. We call this test driven development, where we write the test first for the software that still has to come.

In Rust, we sometimes use a derived term: compiler driven development. That's when we use the compiler to help us write software. We start with wrong software and keep following compiler error messages until the code compiles, and more often than not is also correct!