map, filter and fold

map for transforming data streams

map is a method applied to an iterator that takes a function as an argument and applies it to each element of the collection, returning a new iterator with the transformed elements.

Imagine you have a list of cats, and each cat needs a little transformation, say, you want to rename them or add some attributes. map allows you to do this.

The main task map — data transformation. That is, each entity from the data stream is taken, a function is applied to it, and the output is a new stream with modified entities.

Some points:

  • map does not modify the original iterator, but creates a new one.

  • In Rust iterators lazyi.e. they do not perform calculations until it is necessary. This means that the call map by itself will not perform any transformations until a method that consumes the iterator is used.

Let's start with the simplest and most classic example – transforming a list of numbers:

fn main() {
    let numbers = vec![1, 2, 3, 4, 5];

    let squared_numbers: Vec<i32> = numbers.iter().map(|&x| x * x).collect();

    println!("{:?}", squared_numbers);
}

Here we took the vector numbersand with the help of map transformed each element into its square. As a result, squared_numbers will become [1, 4, 9, 16, 25].

Please note:

  • We used iter()to get an iterator over the references to the elements, then apply mapand at the end we collected the result into a new vector using collect().

  • Iterator laziness: map does not perform conversion before calling collect.

Now let's make the task more complicated. Let's say we have a list of structures, and we need to extract a certain field from them:

struct Cat {
    name: String,
    age: u8,
}

fn main() {
    let cats = vec![
        Cat { name: String::from("Mittens"), age: 2 },
        Cat { name: String::from("Whiskers"), age: 5 },
        Cat { name: String::from("Shadow"), age: 3 },
    ];

    let cat_names: Vec<String> = cats.iter().map(|cat| cat.name.clone()).collect();

    println!("{:?}", cat_names);
}

A structure was created here Cat and a list of cats. With the help of map we extract the names of all the cats into a new vector. Note that we had to use clone()to avoid ownership issues (the borrow checker in Rust never sleeps).

But what if you want to transform the data according to some condition? Let's take the example of transforming cat names to add age to them:

fn main() {
    let cats = vec![
        Cat { name: String::from("Mittens"), age: 2 },
        Cat { name: String::from("Whiskers"), age: 5 },
        Cat { name: String::from("Shadow"), age: 3 },
    ];

    let cat_descriptions: Vec<String> = cats.iter().map(|cat| {
        format!("{} is {} years old", cat.name, cat.age)
    }).collect();

    println!("{:?}", cat_descriptions);
}

Now each element has become a string describing the cat's name and age.

Some tips:

  1. Avoid unnecessary cloning: If you need to keep links to the original data, it is better to use iter() instead of into_iter()so as not to consume the collection.

  2. Make sure the function has no side effects: map is intended for pure transformations. If the function passed to maphas side effects (such as changes in external state), this can lead to unpredictable results.

  3. Combining with other iterators: map combines well with other iterator methods such as filter, enumerate And fold. For example, you can first filter the elements and then apply to them map.

Filtering data with filter

Basically, filter – this is conditional sievethrough which the data is sifted, and at the output only gold nuggets.

filter takes a predicate function that returns true or false. If the predicate returned for the element truethe element remains in the data stream; if false – he leaves.

Let's say there is a list of numbers and you want to keep only the even ones:

fn main() {
    let numbers = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

    let even_numbers: Vec<i32> = numbers.into_iter().filter(|&x| x % 2 == 0).collect();

    println!("{:?}", even_numbers);
}

The result will be [2, 4, 6, 8, 10].

filter iterates over each element of the iterator and applies the predicate function to it. If the predicate returns truethe element remains in the output stream.

As well as map, filter lazy. This means that no data is actually filtered until you call a method that consumes the iterator, such as collect().

Examples of use

Let's say there is an array of numbers, and you need to select only those that are divisible by 3:

fn main() {
    let numbers = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

    let divisible_by_three: Vec<i32> = numbers.into_iter().filter(|&x| x % 3 == 0).collect();

    println!("{:?}", divisible_by_three);
}

Conclusion: [3, 6, 9].

Let's say you have a list of cats and you need to select only those that are over 3 years old:

struct Cat {
    name: String,
    age: u8,
}

fn main() {
    let cats = vec![
        Cat { name: String::from("Mittens"), age: 2 },
        Cat { name: String::from("Whiskers"), age: 5 },
        Cat { name: String::from("Shadow"), age: 3 },
        Cat { name: String::from("Luna"), age: 7 },
    ];

    let adult_cats: Vec<&Cat> = cats.iter().filter(|&cat| cat.age > 3).collect();

    for cat in adult_cats {
        println!("{} is {} years old", cat.name, cat.age);
    }
}

Here we leave only those cats that are over 3 years old. Result:

Whiskers is 5 years old
Luna is 7 years old

Sometimes you need to filter data by more complex criteria. For example, there is a list of users, and you need to select only those who have an active subscription and are over 18 years old:

struct User {
    name: String,
    age: u8,
    has_active_subscription: bool,
}

fn main() {
    let users = vec![
        User { name: String::from("Alice"), age: 22, has_active_subscription: true },
        User { name: String::from("Bob"), age: 17, has_active_subscription: true },
        User { name: String::from("Charlie"), age: 19, has_active_subscription: false },
        User { name: String::from("Dave"), age: 30, has_active_subscription: true },
    ];

    let active_adults: Vec<&User> = users.iter()
        .filter(|&user| user.age > 18 && user.has_active_subscription)
        .collect();

    for user in active_adults {
        println!("{} is {} years old and has an active subscription", user.name, user.age);
    }
}

Result:

Alice is 22 years old and has an active subscription
Dave is 30 years old and has an active subscription

If you are working with large amounts of data, you should pay attention to the library Rayonwhich allows parallelization of iterators.

Example:

use rayon::prelude::*;

fn main() {
    let numbers: Vec<i32> = (0..1_000_000).collect();

    let even_numbers: Vec<i32> = numbers.par_iter().filter(|&&x| x % 2 == 0).cloned().collect();

    println!("Found {} even numbers", even_numbers.len());
}

Using a parallel iterator par_iterto speed up filtration.

Aggregating data using fold

In Rust the function fold is used to sequentially accumulate values ​​from a data stream. In other words, it is a convolution tool that allows you to reduce a data stream to a single value.

The basic idea is that fold takes an initial value and iteratively applies a function that takes an accumulator and the current element of the stream, updates the accumulator, and returns it in the next step.

Method signature fold looks like this:

fn fold<B, F>(self, init: B, f: F) -> B
where
    F: FnMut(B, Self::Item) -> B
  • init: initial value of the accumulator.

  • f: a function that is applied to the accumulator and each element of the iterator.

  • The return value is the accumulator after all elements have been processed.

A classic example is summing numbers in a list:

fn main() {
    let numbers = vec![1, 2, 3, 4, 5];

    let sum: i32 = numbers.iter().fold(0, |acc, &x| acc + x);

    println!("Sum: {}", sum);
}

Here we start with an accumulator equal to 0and at each step we add the current value from the data stream to it. As a result, sum will be equal 15.

Let's say you have an array of strings and you want to combine them into one string:

fn main() {
    let words = vec!["Hello", "Rust", "World"];

    let sentence: String = words.iter().fold(String::new(), |mut acc, &word| {
        acc.push_str(word);
        acc.push(' ');
        acc
    });

    println!("Sentence: {}", sentence.trim());
}

We start with an empty string, and at each step we add the next word to the accumulator. Note the call trim() when outputting the result, to remove the extra space at the end.

You can use it fold to calculate the factorial of a number:

fn main() {
    let n = 5;
    
    let factorial: i32 = (1..=n).fold(1, |acc, x| acc * x);

    println!("Factorial of {} is {}", n, factorial);
}

Here we start with an accumulator equal to 1and at each step we multiply it by the current value in the range from 1 to n. As a result, for n = 5 the factorial will be equal to 120.

Examples of use

Let's say there is a list of numbers and you need to get the sum of even numbers and the sum of odd numbers in one operation:

fn main() {
    let numbers = vec![1, 2, 3, 4, 5, 6];

    let (sum_even, sum_odd) = numbers.iter().fold((0, 0), |(even_acc, odd_acc), &x| {
        if x % 2 == 0 {
            (even_acc + x, odd_acc)
        } else {
            (even_acc, odd_acc + x)
        }
    });

    println!("Sum of even numbers: {}", sum_even);
    println!("Sum of odd numbers: {}", sum_odd);
}

Here we start with a tuple (0, 0) to store two accumulators: one for even and one for odd numbers. Then at each step we check whether the number is even and update the corresponding accumulator.

Complex data structure is not a problem for fold. Let's say you need to group elements by some key, for example, by the first letter of the string:

use std::collections::HashMap;

fn main() {
    let words = vec!["apple", "banana", "apricot", "blueberry", "avocado"];

    let grouped: HashMap<char, Vec<&str>> = words.iter().fold(HashMap::new(), |mut acc, &word| {
        acc.entry(word.chars().next().unwrap())
            .or_insert(Vec::new())
            .push(word);
        acc
    });

    for (key, value) in &grouped {
        println!("{}: {:?}", key, value);
    }
}

Here we start from scratch HashMap and at each step we add words to the corresponding groups by the first letter.

One of the cool features fold — the ability to work with different types of data. For example, you can aggregate information about users by collecting data from several fields:

struct User {
    name: String,
    age: u8,
}

fn main() {
    let users = vec![
        User { name: String::from("Alice"), age: 30 },
        User { name: String::from("Bob"), age: 25 },
        User { name: String::from("Charlie"), age: 35 },
    ];

    let total_age: u8 = users.iter().fold(0, |acc, user| acc + user.age);
    let names: String = users.iter().fold(String::new(), |mut acc, user| {
        acc.push_str(&user.name);
        acc.push(' ');
        acc
    });

    println!("Total age: {}", total_age);
    println!("Names: {}", names.trim());
}

Here fold used to sum up the ages of users and combine their names into one string.


Conclusion

Can be easily combined map, filter And fold. For example, you can first filter out unwanted elements using filterthen transform the remaining data using mapand finally reduce them to one value using fold.

Example:

fn main() {
    let numbers = vec![1, 2, 3, 4, 5, 6, 7, 8, 9, 10];

    let sum_of_squares_of_even_numbers: i32 = numbers.iter()
        .filter(|&&x| x % 2 == 0)  // Отфильтруем чётные числа
        .map(|&x| x * x)           // Преобразуем их в квадраты
        .fold(0, |acc, x| acc + x); // Просуммируем их

    println!("Sum of squares of even numbers: {}", sum_of_squares_of_even_numbers);
}

So feel free to combine map, filter, fold and other iterator methods in your projects!

Now it's time to implement them in your projects and experience the full power of functional programming in Rust.

Greenplum, an analytical MPP DBMS, is suddenly no longer an open source system. How does this affect the industry? What systems can replace it? Will the architecture of data processing systems have to be changed? We will discuss this in an open lesson on August 21.

The lesson will analyze the current situation and consider options for further action. As a result of the lesson, students will receive recommendations for building data processing systems based on alternative solutions, which they will then be able to use in their work. Sign up using the link.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *