Efficiently apply multiple Aggregation functions in Pandas
Pandas: Performance Optimization Exercise-19 with Solution
Write a Python program that uses the agg method to apply multiple aggregation functions to a DataFrame and compares the performance with applying each function individually.
Sample Solution :
Python Code :
# Import necessary libraries
import pandas as pd
import numpy as np
import time
# Create a sample DataFrame
np.random.seed(0)
df = pd.DataFrame({
'A': np.random.randint(1, 100, 1000),
'B': np.random.rand(1000),
'C': np.random.randint(1, 100, 1000)
})
# Define aggregation functions
aggregations = {
'A': ['sum', 'mean', 'std'],
'B': ['sum', 'mean', 'std'],
'C': ['sum', 'mean', 'std']
}
# Timing the agg method
start_time_agg = time.time()
df_agg = df.agg(aggregations)
time_agg = time.time() - start_time_agg
# Timing the individual application of functions
start_time_individual = time.time()
results_individual = {
'A_sum': df['A'].sum(),
'A_mean': df['A'].mean(),
'A_std': df['A'].std(),
'B_sum': df['B'].sum(),
'B_mean': df['B'].mean(),
'B_std': df['B'].std(),
'C_sum': df['C'].sum(),
'C_mean': df['C'].mean(),
'C_std': df['C'].std()
}
time_individual = time.time() - start_time_individual
# Print results
print(f"Time using agg method: {time_agg:.6f} seconds")
print(f"Time applying functions individually: {time_individual:.6f} seconds")
print("Aggregated results using agg method:")
print(df_agg)
print("Results applying functions individually:")
print(results_individual)
Output:
Time using agg method: 0.001994 seconds Time applying functions individually: 0.000000 seconds Aggregated results using agg method: A B C sum 49723.000000 509.199400 48276.000000 mean 49.723000 0.509199 48.276000 std 28.857183 0.296208 28.470799 Results applying functions individually: {'A_sum': 49723, 'A_mean': 49.723, 'A_std': 28.857182953434812, 'B_sum': 509.19940043113445, 'B_mean': 0.5091994004311344, 'B_std': 0.2962083809189193, 'C_sum': 48276, 'C_mean': 48.276, 'C_std': 28.47079925837016}
Explanation:
- Import necessary libraries:
- Import pandas, numpy, and time.
- Create a sample DataFrame:
- Random data is generated for columns 'A', 'B', and 'C'.
- Define aggregation functions:
- A dictionary specifying the functions to be applied to each column.
- Timing the agg method:
- Measure the time taken to apply multiple aggregations using the agg method.
- Timing the individual application of functions:
- Measure the time taken to apply each function individually.
- Finally compare the times and show the results from both methods
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Rolling Window Calculation in Pandas: rolling vs. Manual.
Next: Optimize reading large Excel files with Pandas.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics