Home » 10 Surprising Things You Can Do with Python’s collections Module

10 Surprising Things You Can Do with Python’s collections Module

10 Surprising Things You Can Do with Python's collections Module
Image by Editor | ChatGPT

 

Introduction

 
Python’s standard library is extensive, offering a wide range of modules to perform common tasks efficiently.

Among these, the collections module is a standout example, which provides specialized container data types that can serve as alternatives to Python’s general-purpose built-in containers like dict, list, set, and tuple. While many developers are familiar with some of its components, the module hosts a variety of functionalities that are surprisingly useful and can simplify code, improve readability, and boost performance.

This tutorial explores ten practical — and perhaps surprising — applications of the Python collections module.

 

1. Counting Hashable Objects Effortlessly with Counter

 
A common task in almost any data analysis project is counting the occurrences of items in a sequence. The collections.Counter class is designed specifically for this. It’s a dictionary subclass where elements are stored as keys and their counts are stored as values.

from collections import Counter

# Count the frequency of words in a list
words = ['galaxy', 'nebula', 'asteroid', 'comet', 'gravitas', 'galaxy', 'stardust', 'quasar', 'galaxy', 'comet']
word_counts = Counter(words)

# Find the two most common words
most_common = word_counts.most_common(2)

# Output results
print(f"Word counts: {word_counts}")
print(f"Most common words: {most_common}")

 

Output:

Word counts: Counter({'galaxy': 3, 'comet': 2, 'nebula': 1, 'asteroid': 1, 'gravitas': 1, 'stardust': 1, 'quasar': 1})
Most common words: [('galaxy', 3), ('comet', 2)]

 

2. Creating Lightweight Classes with namedtuple

 
When you need a simple class just for grouping data, without methods, a namedtuple is a useful, memory-efficient option. It allows you to create tuple-like objects that have fields accessible by attribute lookup as well as being indexable and iterable. This makes your code more readable than using a standard tuple.

from collections import namedtuple

# Define a Book namedtuple
# Fields: title, author, year_published, isbn
Book = namedtuple('Book', ['title', 'author', 'year_published', 'isbn'])

# Create an instance of the Book
my_book = Book(
    title="The Hitchhiker"s Guide to the Galaxy',
    author="Douglas Adams",
    year_published=1979,
    isbn='978-0345391803'
)

print(f"Book Title: {my_book.title}")
print(f"Author: {my_book.author}")
print(f"Year Published: {my_book.year_published}")
print(f"ISBN: {my_book.isbn}")

print("n--- Accessing by index ---")
print(f"Title (by index): {my_book[0]}")
print(f"Author (by index): {my_book[1]}")
print(f"Year Published (by index): {my_book[2]}")
print(f"ISBN (by index): {my_book[3]}")

 

Output:

Accessing book data by field name
Title (by field name): The Hitchhiker's Guide to the Galaxy
Author (by field name): Douglas Adams
Year Published (by field name): 1979
ISBN (by field name): 978-0345391803

Accessing book data by index
Title (by index): The Hitchhiker's Guide to the Galaxy
Author (by index): Douglas Adams
Year Published (by index): 1979
ISBN (by index): 978-0345391803

 

You can think of a namedtuple as similar to a mutable C struct, or as a data class without methods. They definitely have their uses.

 

3. Handling Missing Dictionary Keys Gracefully with defaultdict

 
A common frustration when working with dictionaries is the KeyError that occurs when you try to access a key that doesn’t exist. The collections.defaultdict is the perfect solution. It’s a subclass of dict that calls a factory function to supply a default value for missing keys. This is especially useful for grouping items.

from collections import defaultdict

# Group a list of tuples by the first element
scores_by_round = [('contestantA', 8), ('contestantB', 7), ('contestantC', 5),
                   ('contestantA', 7), ('contestantB', 7), ('contestantC', 6),
                   ('contestantA', 9), ('contestantB', 5), ('contestantC', 4)]
grouped_scores = defaultdict(list)

for key, value in scores_by_round:
    grouped_scores[key].append(value)

print(f"Grouped scores: {grouped_scores}")

 

Output:

Grouped scores: defaultdict(, {'contestantA': [8, 7, 9], 'contestantB': [7, 7, 5], 'contestantC': [5, 6, 4]})

 

4. Implementing Fast Queues and Stacks with deque

 
Python lists can be used as stacks and queues, even though they are not optimized for these operations. Appending and popping from the end of a list is fast, but doing the same from the beginning is slow because all other elements have to be shifted. The collections.deque (double-ended queue) is designed for fast appends and pops from both ends.

First, here’s an example of a queue using deque.

from collections import deque

# Create a queue
d = deque([1, 2, 3])
print(f"Original queue: {d}")

# Add to the right
d.append(4)
print("Adding item to queue: 4")
print(f"New queue: {d}")

# Remove from the left
print(f"Popping queue item (from left): {d.popleft()}")  

# Output final queue
print(f"Final queue: {d}")

&nbsp

Output:

Original queue: deque([1, 2, 3])
Adding item to queue: 4
New queue: deque([1, 2, 3, 4])
Popping queue item (from left): 1
Final queue: deque([2, 3, 4])

 

And now let’s use deque to create a stack:

from collections import deque

# Create a stack
d = deque([1, 2, 3])
print(f"Original stack: {d}")

# Add to the right
d.append(5)
print("Adding item to stack: 5")
print(f"New stack: {d}")

# Remove from the right
print(f"Popping stack item (from right): {d.pop()}")

# Output final stack
print(f"Final stack: {d}")

 

Output:

Original stack: deque([1, 2, 3])
Adding item to stack: 5
New stack: deque([1, 2, 3, 5])
Popping stack item (from right): 5
Final stack: deque([1, 2, 3])

 

5. Remembering Insertion Order with OrderedDict

 
Before Python 3.7, standard dictionaries did not preserve the order in which items were inserted. To solve this, the collections.OrderedDict was used. While standard dicts now maintain insertion order, OrderedDict still has unique features, like the move_to_end() method, which is useful for tasks like creating a simple cache.

from collections import OrderedDict

# An OrderedDict remembers the order of insertion
od = OrderedDict()
od['a'] = 1
od['b'] = 2
od['c'] = 3

print(f"Start order: {list(od.keys())}")

# Move 'a' to the end
od.move_to_end('a')
print(f"Final order: {list(od.keys())}")

 

Output:

Start order: ['a', 'b', 'c']
Final order: ['b', 'c', 'a']

 

6. Combining Multiple Dictionaries with ChainMap

 
The collections.ChainMap class provides a way to link multiple dictionaries together so they can be treated as a single unit. It’s often much faster than creating a new dictionary and running multiple update() calls. Lookups search the underlying mappings one by one until a key is found.

Let’s create a ChainMap named chain and query it for keys.

from collections import ChainMap

# Create dictionaries
dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 3, 'c': 4}

# Create a ChainMap
chain = ChainMap(dict1, dict2)

# Print dictionaries
print(f"dict1: {dict1}")
print(f"dict2: {dict2}")

# Query ChainMap for keys and return values
print("nQuerying ChainMap for keys")
print(f"a: {chain['a']}")
print(f"c: {chain['c']}")
print(f"b: {chain['b']}")

 

Output:

dict1: {'a': 1, 'b': 2}
dict2: {'b': 3, 'c': 4}

Querying keys for values
a: 1
c: 4
b: 2

 

Note that, in the above scenario, ‘b’ is found in first in dict1, the first dictionary in chain, and so it is the value associated with this key that is returned.

 

7. Keeping a Limited History with deque’s maxlen

 
A deque can be created with a fixed maximum length using the maxlen argument. If more items are added than the maximum length, the items from the opposite end are automatically discarded. This is perfect for keeping a history of the last N items.

from collections import deque

# Keep a history of the last 3 items
history = deque(maxlen=3)
history.append("cd ~")
history.append("ls -l")
history.append("pwd")
print(f"Start history: {history}")

# Add a new item, push out the left-most item
history.append("mkdir data")
print(f"Final history: {history}")

 

Output:

Start history: deque(['cd ~', 'ls -l', 'pwd'], maxlen=3)
Final history: deque(['ls -l', 'pwd', 'mkdir data'], maxlen=3)

 

8. Creating Nested Dictionaries Easily with defaultdict

 
Building on defaultdict, you can create nested or tree-like dictionaries with ease. By providing a lambda function that returns another defaultdict, you can create dictionaries of dictionaries on the fly.

from collections import defaultdict
import json

# A function that returns a defaultdict
def tree():
    return defaultdict(tree)

# Create a nested dictionary
nested_dict = tree()
nested_dict['users']['user1']['name'] = 'Felix'
nested_dict['users']['user1']['email'] = 'user1@example.com'
nested_dict['users']['user1']['phone'] = '515-KL5-5555'

# Output formatted JSON to console
print(json.dumps(nested_dict, indent=2))

 

Output:

{
  "users": {
    "user1": {
      "name": "Felix",
      "email": "user1@example.com",
      "phone": "515-KL5-5555"
    }
  }
}

 

9. Performing Arithmetic Operations on Counters

 
News flash: you can perform arithmetic operations, such as addition, subtraction, intersection, and union, on Counter objects. This is a powerful tool for comparing and combining frequency counts from different sources.

from collections import Counter

c1 = Counter(a=4, b=2, c=0, d=-2)
c2 = Counter(a=1, b=2, c=3, d=4)

# Add counters -> adds counts for common keys
print(f"c1 + c2 = {c1 + c2}")

# Subtract counters -> keeps only positive counts
print(f"c1 - c2 = {c1 - c2}")

# Intersection -> takes minimum of counts
print(f"c1 & c2 = {c1 & c2}")

# Union -> takes maximum of counts
print(f"c1 | c2 = {c1 | c2}")

 

Output:

c1 + c2 = Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2})
c1 - c2 = Counter({'a': 3})
c1 & c2 = Counter({'b': 2, 'a': 1})
c1 | c2 = Counter({'a': 4, 'd': 4, 'c': 3, 'b': 2})

 

10. Efficiently Rotating Elements with deque

 
The deque object has a rotate() method that allows you to rotate the elements efficiently. A positive argument rotates elements to the right; a negative, to the left. This is much faster than slicing and re-joining lists or tuples.

from collections import deque

d = deque([1, 2, 3, 4, 5])
print(f"Original deque: {d}")

# Rotate 2 steps to the right
d.rotate(2)
print(f"After rotating 2 to the right: {d}")

# Rotate 3 steps to the left
d.rotate(-3)
print(f"After rotating 3 to the left: {d}")

 

Output:

Original deque: deque([1, 2, 3, 4, 5])
After rotating 2 to the right: deque([4, 5, 1, 2, 3])
After rotating 3 to the left: deque([2, 3, 4, 5, 1])

 

Wrapping Up

 
The collections module in Python is a killer collection of specialized, high-performance container datatypes. From counting items with Counter to building efficient queues with deque, these tools can make your code cleaner, more efficient, and more Pythonic. By familiarizing yourself with these surprising and powerful features, you can solve common programming problems in a more elegant and effective way.
 
 

Matthew Mayo (@mattmayo13) holds a master’s degree in computer science and a graduate diploma in data mining. As managing editor of KDnuggets & Statology, and contributing editor at Machine Learning Mastery, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, language models, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *