comparison of collections without taking into account the order

While working on a Django Rest Framework (DRF) project, I encountered the need to write tests for an API that returned unsorted data. Sorting data in the API was not required, and doing it just for the sake of tests seemed illogical. Using sets to solve this problem turned out to be impossible, since the elements of a set must be hashable, which dictionaries are not. I was looking for a built-in way to compare unsorted data in pytest, but I did not find any such means. But I came across discussion in the pytest community, where users asked for such a feature to be implemented, and pytest developers suggested that someone else do it as a plugin. This is how the idea of ​​creating pytest-unordered.


Sets

At first glance, the use of sets (set) seems like a natural solution for such problems. However, this approach has several significant limitations:

  1. Inability to work with non-hashable elements:

    • Sets require that elements be hashable, which makes them impossible to use with elements such as lists or dictionaries.

    • Trying to convert collections with complex data structures into sets results in errors and the need for additional conversions.

  2. Loss of structure information:

  3. Inability to compare nested structures:

    • Using sets does not work for nested structures such as lists of dictionaries or complex JSON objects. Comparing such structures requires a more flexible approach.

assertCountEqual

For comparing unordered collections, the Python standard module library also provides a method unittest.TestCase.assertCountEqual. This method checks that two lists contain the same number of occurrences of each element, regardless of their order:

import unittest

def test_list_equality(self):
    actual = [3, 1, 2, 2]
    expected = [1, 2, 3, 2]
    assert unittest.TestCase().assertCountEqual(actual, expected)

However, unittest.TestCase.assertCountEqual has its drawbacks:

  1. Not pretty:

  2. Inability to compare complex data structures:

The pytest-unordered approach

To solve the above problems, pytest-unordered uses an approach similar to that used in pytest.approx: function unordered creates an object that overrides the comparison method __eq__This makes the following possible:

  • Support for complex data structures:
    pytest-unordered allows you to compare lists, tuples, and even nested data structures such as lists of dictionaries without having to convert them to sets. Simply wrap the desired elements of the structure in unordered().

  • Saving duplicates:
    Unlike sets, pytest-unordered Correctly handles cases where collections may contain duplicate elements, preserving their number.

  • Simplifying test code:
    Usage pytest-unordered makes tests more readable and understandable. No need to worry about pre-sorting or transforming data before comparing it.

Features and usage examples of pytest-unordered

Let's look at some usage examples pytest-unordered.

First you need to install the package using pip:

pip install pytest-unordered

Comparing lists without regard to order

Let's look at a simple example of comparing lists:

from pytest_unordered import unordered

def test_list_equality():
    actual = [3, 1, 2]
    expected = [1, 2, 3]
    assert actual == unordered(expected)

Here unordered allows you to check that two lists contain the same elements, regardless of their order.

Comparison of dictionary lists

pytest-unordered also supports comparison of dictionary lists:

def test_lists_of_dicts():
    actual = [
        {"name": "Alice", "age": 30},
        {"name": "Bob", "age": 25}
    ]
    expected = unordered([
        {"name": "Bob", "age": 25},
        {"name": "Alice", "age": 30}
    ])
    assert actual == expected

This test checks that both lists contain the same dictionaries, regardless of their order.

Complex data structures

pytest-unordered allows individual collections within complex structures to be marked as unordered.

def test_nested():
    expected = unordered([
        {"customer": "Alice", "orders": unordered([123, 456])},
        {"customer": "Bob", "orders": [789, 1000]},
    ])
    actual = [
        {"customer": "Bob", "orders": [789, 1000]},
        {"customer": "Alice", "orders": [456, 123]},
    ]
    assert actual == expected

Here, the external customer lists as well as Alice's orders are checked without regard to the order of the elements, while Bob's orders are checked with regard to the order.

Working with duplicates

pytest-unordered correctly handles cases where collections contain duplicates:

def test_with_duplicates():
    actual = [1, 2, 2, 3]
    expected = [3, 2, 1]
    assert actual == unordered(expected)
def test_with_duplicates():
        actual = [1, 2, 2, 3]
        expected = [3, 2, 1]
>       assert actual == unordered(expected)
E       assert [1, 2, 2, 3] == [3, 2, 1]
E         Extra items in the left sequence:
E         2

It is easy to see that using sets to compare these lists would produce a different result.

Checking collection types

If a single argument is passed to the unordered function as a collection, a check will be performed to ensure that the collection types match:

assert [1, 20, 300] == unordered([20, 300, 1])
assert (1, 20, 300) == unordered((20, 300, 1))

If the container types are different, the check will fail:

    assert [1, 20, 300] == unordered((20, 300, 1))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError

An exception is made for generators:

assert [4, 0, 1] == unordered((i*i for i in range(3)))

To disable type checking, you can pass the elements as separate arguments:

assert [1, 20, 300] == unordered(20, 300, 1)
assert (1, 20, 300) == unordered(20, 300, 1)

You can also specify the parameter explicitly. check_type:

assert [1, 20, 300] == unordered((20, 300, 1), check_type=False) 

What does pytest have to do with it?

The function for comparing collections without regard to order is not specific to pytestBut pytest-unordered integrates with it through the implementation of a hook pytest_assertrepr_compare. This allows you to use the plugin's capabilities directly in tests on pytest and get convenient error messages when checks fail. There was already an example with duplicates above. Here is an example of a message when replacing one element:

def test_unordered():
    assert [{"a": 1, "b": 2}, 2, 3] == unordered(2, 3, {"b": 2, "a": 3})
def test_unordered():
>       assert [{"a": 1, "b": 2}, 2, 3] == unordered(2, 3, {"b": 2, "a": 3})
E       AssertionError: assert [{'a': 1, 'b': 2}, 2, 3] == [2, 3, {'b': 2, 'a': 3}]
E         One item replaced:
E         Common items:
E         {'b': 2}
E         Differing items:
E         {'a': 1} != {'a': 3}
E         
E         Full diff:
E           {
E         -     'a': 3,
E         ?          ^
E         +     'a': 1,
E         ?          ^
E               'b': 2,
E           }

Implementation of the comparison algorithm

The key part pytest-unordered – this is class UnorderedListwhich implements the logic of comparing collections without taking into account the order in the method compare_to.

The compare_to method code
def compare_to(self, other: List) -> Tuple[List, List]:
    extra_left = list(self)
    extra_right = []
    reordered = []
    placeholder = object()
    for elem in other:
        try:
            extra_left.remove(elem)
            reordered.append(elem)
        except ValueError:
            extra_right.append(elem)
            reordered.append(placeholder)
    placeholder_fillers = extra_left.copy()
    for i, elem in reversed(list(enumerate(reordered))):
        if not placeholder_fillers:
            break
        if elem == placeholder:
            reordered[i] = placeholder_fillers.pop()
    self[:] = [e for e in reordered if e is not placeholder]
    return extra_left, extra_right

Method compare_to performs the actual comparison of collection elements. Initially, all elements self are copied to extra_left. The elements of the compared list are checked for presence in extra_left. Found elements are removed from extra_leftand the missing ones are added to extra_right. If all elements are found, they are placed in reordered in the correct order. Placeholders are used for missing elements, which are then replaced by the elements of the comparison list in the order in which they were encountered. The remaining elements are then returned from extra_left And extra_right.

Reordering elements in compare_to is used to create a visual display of data when comparing it visually in the development environment. If elements are out of place, using placeholders helps to determine the exact positions of missing and erroneous elements. This greatly improves the readability of error messages in the IDE and makes it easier to debug tests. Here is an example:

def test_reordering():
    expected = unordered([
        {"customer": "Charlie", "orders": [123, 456]},
        {"customer": "Alice", "orders": unordered([123, 456])},
        {"customer": "Bob", "orders": [789, 1000]},
    ])
    actual = [
        {"customer": "Alice", "orders": [456, 123]},
        {"customer": "Bob", "orders": [789, 1000]},
        {"customer": "Charles", "orders": [123, 456]},
    ]
    assert actual == expected
Without reordering

Without reordering

After reordering

After reordering

Advantages of pytest-unordered

pytest-unordered provides many advantages over using sets or pre-sorting data:

  1. The plugin eliminates the need for additional data transformations and makes tests simpler and clearer.

  2. pytest-unordered allows you to easily compare complex and nested data structures without any additional effort.

  3. Unlike sets, pytest-unordered handles duplicates correctly.

  4. Test code using unordered becomes more readable and easier to maintain because it reflects the true intent of the test – to compare a set of elements, ignoring their order.

  5. Reordering elements makes visual comparison of different collections more intuitive in the development environment.

Conclusion

If you are working with data whose order does not matter, try pytest-unordered in your projects. This will help you write simpler, more efficient, and more understandable tests.

At the time of writing repository pytest-unordered collected 40 stars on GitHub. In addition to me, other people took part in the development three personsfor which I am very grateful. I invite interested members of the community to discuss the project and take part in its development.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *