w3resource

Rolling Window Calculation in Pandas: rolling vs. Manual


Pandas: Performance Optimization Exercise-18 with Solution


Write a Pandas program to perform a rolling window calculation on a time series DataFrame using the rolling method. Compare the performance with manual calculation.

Sample Solution :

Python Code :

# Import necessary libraries
import pandas as pd
import numpy as np
import time

# Create a time series DataFrame
num_rows = 1000000
date_range = pd.date_range(start='1/1/2020', periods=num_rows, freq='T')
df = pd.DataFrame({'value': np.random.randn(num_rows)}, index=date_range)

# Define the window size
window_size = 60

# Measure time for rolling method
start_time = time.time()
rolling_mean = df['value'].rolling(window=window_size).mean()
end_time = time.time()
rolling_time = end_time - start_time

# Measure time for manual rolling calculation
start_time = time.time()
manual_rolling_mean = df['value'].copy()
for i in range(window_size, num_rows):
    manual_rolling_mean.iloc[i] = df['value'].iloc[i-window_size:i].mean()
end_time = time.time()
manual_rolling_time = end_time - start_time

# Print the time taken for each method
print(f"Time taken using rolling method: {rolling_time:.6f} seconds")
print(f"Time taken using manual calculation: {manual_rolling_time:.6f} seconds")

Output:

Time taken using rolling method: 0.034991 seconds
Time taken using manual calculation: 95.308910 seconds

Explanation:

  • Import Libraries:
    • Import pandas, numpy, and time.
  • Create Time Series DataFrame:
    • Generate a time series DataFrame with 1,000,000 rows, each representing a minute.
  • Define Window Size:
    • Set the window size for the rolling calculation (e.g., 60).
  • Time Measurement for rolling Method:
    • Measure the time taken to calculate the rolling mean using the rolling method.
  • Time Measurement for Manual Calculation:
    • Measure the time taken to manually calculate the rolling mean using a for loop.
  • Print Results:
    • Print the time taken for each method.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Performance comparison of DataFrame sorting in Pandas.
Next: Efficiently apply multiple Aggregation functions in Pandas.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.