w3resource

Performance comparison of Resampling time Series data in Pandas


Pandas: Performance Optimization Exercise-13 with Solution


Write a Pandas program to create a time series DataFrame and use the resample method to downsample the data. Measure the performance improvement over manual resampling.

Sample Solution :

Python Code :

# Import necessary libraries
import pandas as pd
import numpy as np
import time

# Create a time series DataFrame
num_rows = 1000000
date_range = pd.date_range(start='1/1/2020', periods=num_rows, freq='T')
df = pd.DataFrame({'value': np.random.randn(num_rows)}, index=date_range)

# Resampling frequency
resample_freq = 'H'

# Measure time for resample method
start_time = time.time()
resampled_df = df.resample(resample_freq).mean()
end_time = time.time()
resample_time = end_time - start_time

# Measure time for manual resampling
start_time = time.time()
manual_resampled_df = df.groupby(pd.Grouper(freq=resample_freq)).mean()
end_time = time.time()
manual_resample_time = end_time - start_time

# Print the time taken for each method
print(f"Time taken using resample method: {resample_time:.6f} seconds")
print(f"Time taken using manual resampling: {manual_resample_time:.6f} seconds")

Output:

Time taken using resample method: 0.029992 seconds
Time taken using manual resampling: 0.021973 seconds

Explanation:

  • Import Libraries:
    • Import pandas, numpy, and time.
  • Create Time Series DataFrame:
    • Generate a time series DataFrame with 1,000,000 rows, each representing a minute.
  • Set Resampling Frequency:
    • Define the frequency for downsampling (e.g., 'H' for hourly).
  • Time Measurement for resample Method:
    • Measure the time taken to downsample the data using the resample method.
  • Time Measurement for Manual Resampling:
    • Measure the time taken to downsample the data manually using groupby and "pd.Grouper".
  • Print Results:
    • Print the time taken for each method.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Performance comparison of DataFrame filtering in Pandas.
Next: Performance comparison of cumulative Sum calculation in Pandas.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.