Rolling Window Calculation in Pandas: rolling vs. Manual
Pandas: Performance Optimization Exercise-18 with Solution
Write a Pandas program to perform a rolling window calculation on a time series DataFrame using the rolling method. Compare the performance with manual calculation.
Sample Solution :
Python Code :
# Import necessary libraries
import pandas as pd
import numpy as np
import time
# Create a time series DataFrame
num_rows = 1000000
date_range = pd.date_range(start='1/1/2020', periods=num_rows, freq='T')
df = pd.DataFrame({'value': np.random.randn(num_rows)}, index=date_range)
# Define the window size
window_size = 60
# Measure time for rolling method
start_time = time.time()
rolling_mean = df['value'].rolling(window=window_size).mean()
end_time = time.time()
rolling_time = end_time - start_time
# Measure time for manual rolling calculation
start_time = time.time()
manual_rolling_mean = df['value'].copy()
for i in range(window_size, num_rows):
manual_rolling_mean.iloc[i] = df['value'].iloc[i-window_size:i].mean()
end_time = time.time()
manual_rolling_time = end_time - start_time
# Print the time taken for each method
print(f"Time taken using rolling method: {rolling_time:.6f} seconds")
print(f"Time taken using manual calculation: {manual_rolling_time:.6f} seconds")
Output:
Time taken using rolling method: 0.034991 seconds Time taken using manual calculation: 95.308910 seconds
Explanation:
- Import Libraries:
- Import pandas, numpy, and time.
- Create Time Series DataFrame:
- Generate a time series DataFrame with 1,000,000 rows, each representing a minute.
- Define Window Size:
- Set the window size for the rolling calculation (e.g., 60).
- Time Measurement for rolling Method:
- Measure the time taken to calculate the rolling mean using the rolling method.
- Time Measurement for Manual Calculation:
- Measure the time taken to manually calculate the rolling mean using a for loop.
- Print Results:
- Print the time taken for each method.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Performance comparison of DataFrame sorting in Pandas.
Next: Efficiently apply multiple Aggregation functions in Pandas.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics