w3resource

Efficiently apply multiple Aggregation functions in Pandas


Pandas: Performance Optimization Exercise-19 with Solution


Write a Python program that uses the agg method to apply multiple aggregation functions to a DataFrame and compares the performance with applying each function individually.

Sample Solution :

Python Code :

# Import necessary libraries
import pandas as pd
import numpy as np
import time

# Create a sample DataFrame
np.random.seed(0)
df = pd.DataFrame({
    'A': np.random.randint(1, 100, 1000),
    'B': np.random.rand(1000),
    'C': np.random.randint(1, 100, 1000)
})

# Define aggregation functions
aggregations = {
    'A': ['sum', 'mean', 'std'],
    'B': ['sum', 'mean', 'std'],
    'C': ['sum', 'mean', 'std']
}

# Timing the agg method
start_time_agg = time.time()
df_agg = df.agg(aggregations)
time_agg = time.time() - start_time_agg

# Timing the individual application of functions
start_time_individual = time.time()
results_individual = {
    'A_sum': df['A'].sum(),
    'A_mean': df['A'].mean(),
    'A_std': df['A'].std(),
    'B_sum': df['B'].sum(),
    'B_mean': df['B'].mean(),
    'B_std': df['B'].std(),
    'C_sum': df['C'].sum(),
    'C_mean': df['C'].mean(),
    'C_std': df['C'].std()
}
time_individual = time.time() - start_time_individual

# Print results
print(f"Time using agg method: {time_agg:.6f} seconds")
print(f"Time applying functions individually: {time_individual:.6f} seconds")
print("Aggregated results using agg method:")
print(df_agg)
print("Results applying functions individually:")
print(results_individual)

Output:

Time using agg method: 0.001994 seconds
Time applying functions individually: 0.000000 seconds
Aggregated results using agg method:
                 A           B             C
sum   49723.000000  509.199400  48276.000000
mean     49.723000    0.509199     48.276000
std      28.857183    0.296208     28.470799
Results applying functions individually:
{'A_sum': 49723, 'A_mean': 49.723, 'A_std': 28.857182953434812, 'B_sum': 509.19940043113445, 'B_mean': 0.5091994004311344, 'B_std': 0.2962083809189193, 'C_sum': 48276, 'C_mean': 48.276, 'C_std': 28.47079925837016}

Explanation:

  • Import necessary libraries:
    • Import pandas, numpy, and time.
  • Create a sample DataFrame:
    • Random data is generated for columns 'A', 'B', and 'C'.
  • Define aggregation functions:
    • A dictionary specifying the functions to be applied to each column.
  • Timing the agg method:
    • Measure the time taken to apply multiple aggregations using the agg method.
  • Timing the individual application of functions:
    • Measure the time taken to apply each function individually.
  • Finally compare the times and show the results from both methods

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Rolling Window Calculation in Pandas: rolling vs. Manual.
Next: Optimize reading large Excel files with Pandas.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.