Performance comparison of cumulative Sum calculation in Pandas
Pandas: Performance Optimization Exercise-14 with Solution
Write a Pandas program to compare the performance of calculating the cumulative sum of a column using the “cumsum” method vs. using a "for" loop.
Sample Solution :
Python Code :
# Import necessary libraries
import pandas as pd
import numpy as np
import time
# Create a sample DataFrame
num_rows = 1000000
df = pd.DataFrame({'value': np.random.randn(num_rows)})
# Measure time for cumsum method
start_time = time.time()
cumsum_result = df['value'].cumsum()
end_time = time.time()
cumsum_time = end_time - start_time
# Measure time for for loop method
start_time = time.time()
cumsum_for_loop = np.zeros(num_rows)
cumsum_for_loop[0] = df['value'].iloc[0]
for i in range(1, num_rows):
cumsum_for_loop[i] = cumsum_for_loop[i-1] + df['value'].iloc[i]
end_time = time.time()
for_loop_time = end_time - start_time
# Print the time taken for each method
print(f"Time taken using cumsum method: {cumsum_time:.6f} seconds")
print(f"Time taken using for loop: {for_loop_time:.6f} seconds")
Output:
Time taken using cumsum method: 0.006079 seconds Time taken using for loop: 8.145854 seconds
Explanation:
- Import Libraries:
- Import pandas, numpy, and time.
- Create DataFrame:
- Generate a sample DataFrame with 1,000,000 rows.
- Time Measurement for cumsum Method:
- Measure the time taken to calculate the cumulative sum using the cumsum method.
- Time Measurement for for Loop:
- Measure the time taken to calculate the cumulative sum using a for loop.
- Print Results:
- Print the time taken for each method.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Performance comparison of Resampling time Series data in Pandas.
Next: Optimize string operations in Pandas: str accessor vs. apply.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics