Concurrency Assignment
Grep
For this assignment, you will be implementing a concurrent version of grep using some of the concurrency primitives you learned about in the lecture. Afterwards, you are going to dive into the inner workings of some of these primitives.
Grep is a command-line utility for searching a file or directory for lines that match a regular expression.
For example, this searches the current directory for files that contain rust
or Rust
.
You can use Grep to search through a directory for specific words in files with the following command:
grep -r "[Rr]ust" .
You may be more familiar with using grep together with pipes. For example:
cat <somefile> | grep <somestring>
This is however not what you will implement in this assignment. Instead, the tool you will make will only be able
to search through directories (recursively). The program will function like grep -r
. This is very similar to
what a well-known tool does called ripgrep. We won't ask you to match
ripgrep's performance, since ripgrep has been optimized a lot. However, what may be interesting to know is that
the author of ripgrep is also the author of the regex library that you will most likely use for this assignment.
Example usage
A simple example can be found in the template in ./examples/example1/
. It contains a single file.
If you run cargo run -- banana ./examples/example1/
, the expected output is:
>>> (#0) ./examples/example1/file.txt
apple banana pear
^^^^^^
For a slightly more complex example, take a look at ./examples/example2/
.
If you run cargo run -- banana ./examples/example2/
, the expected output is:
>>> (#0) ./examples/example2/file1.txt
banana
^^^^^^
>>> (#1) ./examples/example2/dir/file3.txt
banana
^^^^^^
Finally, it is possible to specify multiple directories.
If we specify both examples using cargo run -- banana ./examples/example1/ ./examples/example2/
, the expected output is:
>>> (#0) ./examples/example2/file1.txt
banana
^^^^^^
>>> (#1) ./examples/example1/file.txt
apple banana pear
^^^^^^
>>> (#2) ./examples/example2/dir/file3.txt
banana
^^^^^^
For a big example, if you run the following on the linux source code (use git clone https://github.com/torvalds/linux --depth=1
to clone the repo), you should find 6 results.
The [mM]icrodevice
is the regex that grep should try to match, the .
is the directory it should search in, .
meaning the current directory.
cargo run -- [mM]icrodevice .
Learning Objectives
In this assignment, the goal is to learn about:
- Implementing concurrency using Rust primitives
- Understanding and utilizing atomic operations
- Exploring Rust’s threading and parallelism libraries
- Optimizing concurrent code for performance
Requirements
This assignment consists of two main parts.
- Source code you hand in through a GitLab repository you get (7 points)
- A report which should be a PDF you hand in the root of the same git repository (3 points)
With each part you can gain points, which start counting at 0 and are capped at a 10 which will be your grade. It is required to hand in both source code and a report. Not doing one of the two at all means you fail the assignment.
Source Code: 7 points
In order to receive up to 4 points for this assignment, the code you hand in must:
- Be able to search a file or directory for all occurrences of a regex pattern.
- This means if a pattern matches multiple times in a single file, you must report all matches
- Print output while it is searching. (It shouldn't buffer the results and print them at the end)
- Print the results only using the provided
Display
implementation. Results should be put intoGrepResult
structs and printed. This will make sure their formatting is consistent. - Make sure the
search_ctr
numbers during printing are in increasing order. - Make sure the output is well-formed.
- Be concurrent, that is, should use multiple threads.
- Be thread safe. Any pattern that may work for a limited input, or which depends on being lucky with how the scheduler schedules your threads is considered incorrect.
- Not use any unsafe code.
- Compile on stable rust (1.81).
- Dynamically determine how many threads it should use based on the number of cores the system has.
- You must not spawn many times more threads than your system has cores. On large folders, you should not spawn millions of threads.
Banned Libraries
You are not allowed to use any of the following libraries:
num_cpus
(this is implemented in std nowadays)walkdir
- Any library that implements the behaviour of
grep
already (if unsure: ask a TA)
To gain additional points, you can do any of the following tasks:
- Be faster than GNU
grep
and show this in your report based on a benchmark: 2 points - Your code does not panic (hint:
Mutex
is special): 0.5 points - Experiment with rayon: 1 point
- Implement the same code using
rayon
library, and choose this topic for one of your report subjects
- Implement the same code using
- Experiment with async: 1 point
- Implement the same code using async rust, and choose this topic for one of your report subjects
Rayon and Async Rust
You can also choose
rayon
orasync rust
as your report subject and not implement grep using it. That is still a valid report, but will not get you the 1 extra point for the source code section
Report: 3 points
Write a report consisting of the following
- Write up on two topics (one per team member) of your choice from the list below: 2 points
- performance report with benchmark if your implementation is faster than GNU
grep
1 point
The report must not exceed 3 pages:
- 1 page maximum for each topic
- Additional page for the performance report
You should choose two subjects from the following list, one per group member. Note that you do not actually need to implement your code using any of these techniques.
A report that aims to get a 10 should, both in style and in citation, try to emulate Stacked borrows: an aliasing model Rust
What to write?
The goal of the report is to showcase your grasp of a concept. Rather than demonstrating that you’ve understood the documentation, aim to go beyond it in your investigation. The expectation is for the topics to be explored, analyzed, and deeply understood. Teach us something new—something we might not know! Explore various resources such as papers, blog posts, and videos (We expect you to cite at least one reputable source). The idea is to craft each topic write-up like a small-scale exploratory paper. You could also think of creative ways to apply the concept differently than usual. It’s up to you—make it fun, make it interesting!
Arc
vsRc
in std: explain the differences, when you would use each, and how each works internally. Talk about the difference between strong and weak reference counts, and whatSend
andSync
mean. Generally show to us that you understand the implementation.Mutex
vsRwLock
in std: explain the differences, in which situations you would use each, and how each works internally. Talk about how aRwLock
is implemented, and what the term "interior mutability" means and what this has to do with aMutex
. Generally show to us that you understand the two.- (
std::sync::mpsc
) Channels: explain how a channel works internally. Talk about the different channel variants, and based on what factors Rust chooses the variant at runtime. This may mean you have to look at the source code implementation of channels. Talk about the advantages of a channel compared to a Mutex. Generally show to us that you understand how channels work. - Atomics and Memory Ordering: Explain how atomics work in hardware on x86_64, and how they relate to the cache of a cpu. Explain what a memory ordering is, and why the different memory orderings are important and how this relates to memory models. Why does
compare_exchange
need two memory orderings? Also demonstrate the use of an atomic variable in yourgrep
assignment. - Rayon vs Advanced Computing Systems: you may notice that the
rayon
library can do many things that the OpenMP library can do which you used during the Advanced Computing Systems course. Report on how rayon schedules tasks, how it is different to OpenMP and what your experience of using it is compared to OpenMP. Also talk about what the advantage is of using rayon over the custom thread-based approach you used for the base implementation of yourgrep
assignment. Note that you can also implementgrep
using rayon for bonus points, see the requirements forgrep
. - Async/Await: Rust has a feature that allows code to asynchronously wait for jobs to complete, moving scheduling from the operating system to user space. Report on how async/await works, how it is different to async/await in other languages like for example Python, C# or JavaScript, and in what situations you might want to use async/await over normal threading. Also discuss what an executor is. Note that you can also implement
grep
using async/await for bonus points, see the requirements forgrep
.