Performance comparison of Resampling time Series data in Pandas
Pandas: Performance Optimization Exercise-13 with Solution
Write a Pandas program to create a time series DataFrame and use the resample method to downsample the data. Measure the performance improvement over manual resampling.
Sample Solution :
Python Code :
# Import necessary libraries
import pandas as pd
import numpy as np
import time
# Create a time series DataFrame
num_rows = 1000000
date_range = pd.date_range(start='1/1/2020', periods=num_rows, freq='T')
df = pd.DataFrame({'value': np.random.randn(num_rows)}, index=date_range)
# Resampling frequency
resample_freq = 'H'
# Measure time for resample method
start_time = time.time()
resampled_df = df.resample(resample_freq).mean()
end_time = time.time()
resample_time = end_time - start_time
# Measure time for manual resampling
start_time = time.time()
manual_resampled_df = df.groupby(pd.Grouper(freq=resample_freq)).mean()
end_time = time.time()
manual_resample_time = end_time - start_time
# Print the time taken for each method
print(f"Time taken using resample method: {resample_time:.6f} seconds")
print(f"Time taken using manual resampling: {manual_resample_time:.6f} seconds")
Output:
Time taken using resample method: 0.029992 seconds Time taken using manual resampling: 0.021973 seconds
Explanation:
- Import Libraries:
- Import pandas, numpy, and time.
- Create Time Series DataFrame:
- Generate a time series DataFrame with 1,000,000 rows, each representing a minute.
- Set Resampling Frequency:
- Define the frequency for downsampling (e.g., 'H' for hourly).
- Time Measurement for resample Method:
- Measure the time taken to downsample the data using the resample method.
- Time Measurement for Manual Resampling:
- Measure the time taken to downsample the data manually using groupby and "pd.Grouper".
- Print Results:
- Print the time taken for each method.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Performance comparison of DataFrame filtering in Pandas.
Next: Performance comparison of cumulative Sum calculation in Pandas.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics