
Image by Editor | ChatGPT
# Introduction
Python is one of the most popular languages used in the data science sphere, valued for its simplicity, versatility, and powerful ecosystem of libraries, including NumPy, pandas, scikit-learn, and TensorFlow. While these tools provide much of the heavy lifting, Python itself includes a range of features that can help you write cleaner, faster, and more efficient code. Many of these capabilities go unnoticed, yet they can improve how you structure and manage your projects.
In this article, we explore five lesser-known but beneficial Python features that every data scientist should have in their toolkit.
# 1. The else
Clause on Loops
Did you know for
and while
loops in Python can have an else
clause?
While this may sound counterintuitive at first, the else
block executes only when the loop completes without a break
statement. This is useful when you search through a dataset and want to run some logic only if a specific condition was never met.
for row in dataset:
if row['target'] == 'desired_value':
print("Found!")
break
else:
print("Not found.")
In this snippet, the else
block executes only when the loop finishes without encountering a break. This lets you avoid creating extra flags or conditions outside the loop.
# 2. The dataclasses
Module
The dataclasses module, introduced in Python 3.7, provides a decorator and helper functions that automatically generate special methods like __init__()
, __repr__()
, and __eq__()
for your classes. This is useful in data science when you need lightweight classes to store parameters, results, or configuration settings without writing repetitive boilerplate code.
from dataclasses import dataclass
@dataclass
class ExperimentConfig:
learning_rate: float
batch_size: int
epochs: int
With @dataclass
, you get a clean constructor, a readable string representation, and comparison capabilities.
# 3. The Walrus Operator (:=
)
The walrus operator (:=
), introduced in Python 3.8, lets you assign values to variables as part of an expression. This is useful when you want to both calculate and test a value without repeating the calculation in multiple places.
data = [1, 2, 3, 4, 5]
if (avg := sum(data) / len(data)) > 3:
print(f"Average is {avg}")
Here, avg
is assigned and checked at the same time. This removes the need for another line and makes your code easier to read.
# 4. enumerate()
for Indexed Loops
When you need both the index and the value while iterating, enumerate()
is the most Pythonic way to do it. It takes any iterable (like a list, tuple, or string) and returns pairs of (index, value) as you loop.
for i, row in enumerate(data):
print(f"Row {i}: {row}")
This improves readability, reduces the chance of errors, and makes your intent clearer. It’s useful in data science when iterating over rows of data or results with positions that matter.
# 5. The collections
Module
Python’s collections
module provides specialized container datatypes that can be more efficient and expressive than using only lists or dictionaries. Among the most popular is Counter
, which can count elements in an iterable with minimal code.
from collections import Counter
word_counts = Counter(words)
most_common = word_counts.most_common(5)
Need an ordered dictionary? Use OrderedDict
. Need a dictionary with default values? Try defaultdict
. These tools eliminate the need for verbose manual logic and can even improve performance in large-scale data processing.
# Conclusion
Tools like the else
clause on loops, dataclasses
, and the walrus operator can eliminate unnecessary boilerplate and make logic more concise. Functions like enumerate()
and modules like collections
help you iterate, count, and organize data with elegance and efficiency. By incorporating these lesser-known gems into your workflow, you can reduce complexity, avoid common pitfalls, and focus more on solving the actual data problem rather than wrangling your code.
Jayita Gulati is a machine learning enthusiast and technical writer driven by her passion for building machine learning models. She holds a Master’s degree in Computer Science from the University of Liverpool.