Applying Functions to DataFrames: Understanding Pandas apply()

Pandas is a cornerstone library in Python’s data science stack that offers versatile functions for data manipulation and analysis. Among its functionalities, the apply() function stands out for its ability to apply a function along the axis of the DataFrame, either on rows or columns. This tutorial delves into the intricacies of using apply() to enhance your data manipulation skills.

Introduction to apply()

link to this section

The apply() function in Pandas enables the user to apply a function along the axis of the DataFrame. Its syntax is as follows:

DataFrame.apply(func, axis=0, raw=False, result_type=None, args=(), **kwds) 
  • func : The function to apply to each column/row.
  • axis : Axis along which the function is applied. 0 for applying function to each column, 1 for applying function to each row.
  • raw : Determines if data is passed as ndarray to the function. Default is False.
  • result_type : Accepts three values: ‘expand’, ‘reduce’, ‘broadcast’, or None. Determines the type of the results.
Datathreads Advertisement - On-Premise ETL,BI, and AI Platform

Using apply()

link to this section

Applying Function to Each Column

By default, apply() operates on columns:

import pandas as pd 
    
df = pd.DataFrame({ 'A': [1, 2, 3], 'B': [4, 5, 6] }) 
result = df.apply(lambda x: x * 10) 
print(result) 

Applying Function to Each Row

To apply a function to each row, set axis to 1:

result = df.apply(lambda x: x.sum(), axis=1) 
print(result) 

Using Predefined Functions

You can also use predefined functions with apply() :

import numpy as np 
    
result = df.apply(np.sum, axis=0) 
print(result) 

Handling Different Data Types

The apply() function can handle mixed data types as well:

df = pd.DataFrame({ 'A': [1, 'two', 3], 'B': [4, 5, 'six'] }) 
result = df.applymap(lambda x: f"Value: {x}") 
print(result) 

Use Cases

link to this section

1. Data Cleaning

apply() is handy for cleaning and transforming data:

df['Cleaned_Column'] = df['Dirty_Column'].apply(clean_function) 

2. Advanced Aggregations

Perform complex aggregations beyond the capabilities of simple aggregation functions:

result = df.apply(custom_aggregate_function, axis=1) 

3. Row-wise Operations

Conduct operations that need to consider the entire row:

df['New_Column'] = df.apply(lambda row: row['A'] * row['B'], axis=1) 

Conclusion

link to this section

The Pandas apply() function is a powerful tool for applying functions to DataFrames, allowing for fine-tuned data manipulation and transformation. By mastering apply() , you unlock a world of possibilities in terms of data cleaning, transformation, and advanced aggregations, paving the way for more insightful data analysis and robust machine learning models. Keep practicing and exploring, and soon you’ll be wielding apply() like a Pandas pro!