Build Lightning-Fast Data Processing in Rust: From Single Thread to Parallel Performance
Introduction
Following our deep dive into Rust's capabilities, I'll take you on a hands-on small project. In this project, we'll harness Rust's power, to build to generate a large dataset and compare performance between single-threaded and parallel processing. This example uses two powerful libraries, rand
and rayon
to get the job done.
This is your practical guide to seeing Rust's performance metrics in action. If you have been following my previous Rust article: "What is Rust, and What is it For?" This tutorial will show you exactly how these pieces fit together.
Setting Up the Environment
Prerequisites
- Basic understanding of type programming languages
- Code Editor
Step 1: Setting Up the Project
For the most popular OS you can go to: Rust Language Page, and will quite easy installation, for windows subsystem for Linux, and also make the first easy tutorial of your first Rust program.
Now that you have already installed the Rust and did the checks that is actually in your OS we can proceed with the next part.
Initialize a new Rust project:
I'm using VS Code it works for me, if you have another code editor no problem just open your terminal and add the following code to start building the project:
cargo new rust_performance_demo
Next move to that folder:
cd rust_performance_demo
File structure
├── Cargo.lock
├── Cargo.toml
└── src
└── main.rs
This creates some folders and files like: Cargo.toml
file and a src/lib.rs
file.
Add dependencies to Cargo.toml
: Open the Cargo.toml
file and add the following dependencies:
[package]
name = "rust_performance_demo"
version = "0.1.0"
edition = "2021"
[dependencies]
rand = "0.8"
rayon = "1.7"
After you added the dependencies build the program this like in Python pip
or in Node npm
for Rust is:
cargo build
Now, we need to modify the file main.rs
inside src folder with the following code step by step code explanation:
Explanation
Generating a Dataset
Inside the main function fn main()
add the following code steps: The first thing I do in this program is generate a large dataset. I create a vector (a dynamic array) of 50 million random integers, each between 0 and 99. To achieve this, I use the rand
library to generate random numbers and fill up the vector. Here's how I do it:
let size = 50_000_000;
let mut rng = rand::thread_rng();
let data: Vec<u32> = (0..size).map(|_| rng.gen_range(0..100)).collect();
println!("Generated a vector of {} elements.", size);
What's happening here? I use the thread_rng()
method to get a random number generator, and I generate 50 million random numbers using rng.gen_range(0..100)
. The map
function is perfect for transforming a range into random numbers, and I collect them all into a vector.
Measuring Single-Threaded Performance
Next, I calculate the sum of all the numbers in the vector using a single-threaded approach. I use Rust's built-in iter()
method to loop through each element, cast it to a u64
(since the sum can get quite large), and sum everything up:
let start = Instant::now();
let sum_single: u64 = data.iter().map(|&x| x as u64).sum();
let duration_single = start.elapsed();
println!("Single-threaded sum: {}, took: {:?}", sum_single, duration_single);
I also measure how long this operation takes using std::time::Instant
. The elapsed()
method gives me the duration and print it out both the sum and the time taken.
Measuring Parallel Performance
Now comes the exciting part: parallel processing. Rust's rayon
library makes parallelism incredibly simple. Instead of using iter()
to loop through the data, I use par_iter()
(from rayon
), which splits the work across multiple threads automatically:
let start = Instant::now();
let sum_parallel: u64 = data.par_iter().map(|&x| x as u64).sum();
let duration_parallel = start.elapsed();
println!("Parallel sum: {}, took: {:?}", sum_parallel, duration_parallel);
This approach processes the vector much faster by utilizing all the available CPU cores. Again, I measure and print the time taken.
Ensuring Correctness
It's not enough for the parallel version to be faster, it must also produce the same result as the single-threaded version. To confirm this, I use Rust's assert_eq!
macro:
assert_eq!(sum_single, sum_parallel);
If the two sums don't match, the program will panic. This ensures that parallelism doesn't compromise accuracy.
Printing Results
Finally, I print a comparison of the single-threaded and parallel times:
println!("\nPerformance Comparison:");
println!(" - Single-threaded: {:?}\n - Parallel: {:?}", duration_single, duration_parallel);
Full code
use rand::Rng;
use rayon::prelude::*;
use std::time::Instant;
fn main() {
// Generate a large dataset
let size = 50_000_000;
let mut rng = rand::thread_rng();
let data: Vec<u32> = (0..size).map(|_| rng.gen_range(0..100)).collect();
println!("Generated a vector of {} elements.", size);
// Measure single-threaded sum
let start = Instant::now();
let sum_single: u64 = data.iter().map(|&x| x as u64).sum();
let duration_single = start.elapsed();
println!("Single-threaded sum: {}, took: {:?}", sum_single, duration_single);
// Measure parallel sum
let start = Instant::now();
let sum_parallel: u64 = data.par_iter().map(|&x| x as u64).sum();
let duration_parallel = start.elapsed();
println!("Parallel sum: {}, took: {:?}", sum_parallel, duration_parallel);
// Check correctness
assert_eq!(sum_single, sum_parallel);
println!("\nPerformance Comparison:");
println!(" - Single-threaded: {:?}\n - Parallel: {:?}", duration_single, duration_parallel);
}
This gives us a clear view of the performance improvement provided by parallelism.
Now it's time to make it run type the following:
cargo run
And you will see the following result:
You can see crystal clear the comparison of single threaded operation vs parallel threaded cores full CPU, in milliseconds or seconds single threaded took the operation: ~5 seconds and for multiple cores just ~ 1.7 seconds amazing.
We can figure out and maybe try to do another small program in different programming languages and make comparisons. What will be out of the scope of this tutorial is in your hands to give a try, maybe C or C++.
Conclusion
This hands-on project demonstrates the remarkable power of Rust in handling intensive data processing tasks. By comparing single-threaded and parallel approaches with a substantial dataset of 50 million numbers, we've seen how Rust's safety guarantees don't come at the cost of performance. The rayon library makes parallel programming surprisingly accessible, with just a simple change from iter() to par_iter(), we can harness the full potential of modern multi-core processors while maintaining computational accuracy. What makes this example particularly valuable is that it showcases Rust's practical benefits: the ability to write safe, concurrent code without the typical headaches of thread management and race conditions. Whether you're building high-performance systems, working with big data, or developing complex applications, Rust's combination of safety, control, and efficiency makes it an excellent choice for modern software development. Have you tried implementing parallel processing in Rust? Or any other language? I'd love to hear about your experiences! Drop a comment below sharing your results or thoughts about it. If you found this tutorial helpful, just gaining insights into this language, consider subscribing to stay updated on more practical Rust tutorials in the future.
If you have any questions or errors, please let me know, soon I will add the GitHub repository.