Build Lightning-Fast Data Processing in Rust: From Single Thread to Parallel Performance

Build Lightning-Fast Data Processing in Rust: From Single Thread to Parallel Performance

·

6 min read

Introduction

Following our deep dive into Rust's capabilities, I'll take you on a hands-on small project. In this project, we'll harness Rust's power, to build to generate a large dataset and compare performance between single-threaded and parallel processing. This example uses two powerful libraries, rand and rayon to get the job done.

This is your practical guide to seeing Rust's performance metrics in action. If you have been following my previous Rust article: "What is Rust, and What is it For?" This tutorial will show you exactly how these pieces fit together.

Setting Up the Environment

Prerequisites

  • Basic understanding of type programming languages
  • Code Editor

Step 1: Setting Up the Project

For the most popular OS you can go to: Rust Language Page, and will quite easy installation, for windows subsystem for Linux, and also make the first easy tutorial of your first Rust program.

Now that you have already installed the Rust and did the checks that is actually in your OS we can proceed with the next part.

Initialize a new Rust project:

I'm using VS Code it works for me, if you have another code editor no problem just open your terminal and add the following code to start building the project:

cargo new rust_performance_demo

Next move to that folder:

cd rust_performance_demo

File structure

├── Cargo.lock
├── Cargo.toml
└── src
    └── main.rs

This creates some folders and files like: Cargo.toml file and a src/lib.rs file. Add dependencies to Cargo.toml: Open the Cargo.toml file and add the following dependencies:

[package]
name = "rust_performance_demo"
version = "0.1.0"
edition = "2021"

[dependencies]
rand = "0.8"
rayon = "1.7"

After you added the dependencies build the program this like in Python pip or in Node npm for Rust is:

cargo build

Now, we need to modify the file main.rsinside src folder with the following code step by step code explanation:

Explanation

Generating a Dataset

Inside the main function fn main() add the following code steps: The first thing I do in this program is generate a large dataset. I create a vector (a dynamic array) of 50 million random integers, each between 0 and 99. To achieve this, I use the rand library to generate random numbers and fill up the vector. Here's how I do it:

let size = 50_000_000;
let mut rng = rand::thread_rng();
let data: Vec<u32> = (0..size).map(|_| rng.gen_range(0..100)).collect();

println!("Generated a vector of {} elements.", size);

What's happening here? I use the thread_rng() method to get a random number generator, and I generate 50 million random numbers using rng.gen_range(0..100). The map function is perfect for transforming a range into random numbers, and I collect them all into a vector.

Measuring Single-Threaded Performance

Next, I calculate the sum of all the numbers in the vector using a single-threaded approach. I use Rust's built-in iter() method to loop through each element, cast it to a u64(since the sum can get quite large), and sum everything up:

let start = Instant::now();
let sum_single: u64 = data.iter().map(|&x| x as u64).sum();
let duration_single = start.elapsed();
println!("Single-threaded sum: {}, took: {:?}", sum_single, duration_single);

I also measure how long this operation takes using std::time::Instant. The elapsed() method gives me the duration and print it out both the sum and the time taken.

Measuring Parallel Performance

Now comes the exciting part: parallel processing. Rust's rayon library makes parallelism incredibly simple. Instead of using iter() to loop through the data, I use par_iter()(from rayon), which splits the work across multiple threads automatically:

let start = Instant::now();
let sum_parallel: u64 = data.par_iter().map(|&x| x as u64).sum();
let duration_parallel = start.elapsed();
println!("Parallel sum: {}, took: {:?}", sum_parallel, duration_parallel);

This approach processes the vector much faster by utilizing all the available CPU cores. Again, I measure and print the time taken.

Ensuring Correctness

It's not enough for the parallel version to be faster, it must also produce the same result as the single-threaded version. To confirm this, I use Rust's assert_eq! macro:

assert_eq!(sum_single, sum_parallel);

If the two sums don't match, the program will panic. This ensures that parallelism doesn't compromise accuracy.

Printing Results

Finally, I print a comparison of the single-threaded and parallel times:

println!("\nPerformance Comparison:");
println!(" - Single-threaded: {:?}\n - Parallel:       {:?}", duration_single, duration_parallel);

Full code

use rand::Rng;
use rayon::prelude::*;
use std::time::Instant;

fn main() {
    // Generate a large dataset
    let size = 50_000_000;
    let mut rng = rand::thread_rng();
    let data: Vec<u32> = (0..size).map(|_| rng.gen_range(0..100)).collect();

    println!("Generated a vector of {} elements.", size);

    // Measure single-threaded sum
    let start = Instant::now();
    let sum_single: u64 = data.iter().map(|&x| x as u64).sum();
    let duration_single = start.elapsed();
    println!("Single-threaded sum: {}, took: {:?}", sum_single, duration_single);

    // Measure parallel sum
    let start = Instant::now();
    let sum_parallel: u64 = data.par_iter().map(|&x| x as u64).sum();
    let duration_parallel = start.elapsed();
    println!("Parallel sum: {}, took: {:?}", sum_parallel, duration_parallel);

    // Check correctness
    assert_eq!(sum_single, sum_parallel);

    println!("\nPerformance Comparison:");
    println!(" - Single-threaded: {:?}\n - Parallel:       {:?}", duration_single, duration_parallel);
}

This gives us a clear view of the performance improvement provided by parallelism.

Now it's time to make it run type the following:

cargo run

And you will see the following result:

Cargo run result

You can see crystal clear the comparison of single threaded operation vs parallel threaded cores full CPU, in milliseconds or seconds single threaded took the operation: ~5 seconds and for multiple cores just ~ 1.7 seconds amazing.

We can figure out and maybe try to do another small program in different programming languages and make comparisons. What will be out of the scope of this tutorial is in your hands to give a try, maybe C or C++.

Conclusion

This hands-on project demonstrates the remarkable power of Rust in handling intensive data processing tasks. By comparing single-threaded and parallel approaches with a substantial dataset of 50 million numbers, we've seen how Rust's safety guarantees don't come at the cost of performance. The rayon library makes parallel programming surprisingly accessible, with just a simple change from iter() to par_iter(), we can harness the full potential of modern multi-core processors while maintaining computational accuracy. What makes this example particularly valuable is that it showcases Rust's practical benefits: the ability to write safe, concurrent code without the typical headaches of thread management and race conditions. Whether you're building high-performance systems, working with big data, or developing complex applications, Rust's combination of safety, control, and efficiency makes it an excellent choice for modern software development. Have you tried implementing parallel processing in Rust? Or any other language? I'd love to hear about your experiences! Drop a comment below sharing your results or thoughts about it. If you found this tutorial helpful, just gaining insights into this language, consider subscribing to stay updated on more practical Rust tutorials in the future.

If you have any questions or errors, please let me know, soon I will add the GitHub repository.

References