Reshaping DataFrame in Pandas: pivot_table vs. manual Loop
Pandas: Performance Optimization Exercise-16 with Solution
Write a Pandas program that uses the pivot_table method to reshape a DataFrame and compares the performance with manual reshaping using for loops.
Sample Solution :
Python Code :
# Import necessary libraries
import pandas as pd
import numpy as np
import time
# Create a sample DataFrame
num_rows = 1000000
df = pd.DataFrame({
'A': np.random.choice(['foo', 'bar', 'baz'], size=num_rows),
'B': np.random.choice(['one', 'two', 'three'], size=num_rows),
'values': np.random.randn(num_rows)
})
# Measure time for pivot_table method
start_time = time.time()
pivot_table_result = df.pivot_table(index='A', columns='B', values='values', aggfunc='mean')
end_time = time.time()
pivot_table_time = end_time - start_time
# Measure time for manual reshaping using for loops
start_time = time.time()
result = {}
for a in df['A'].unique():
result[a] = {}
for b in df['B'].unique():
result[a][b] = df[(df['A'] == a) & (df['B'] == b)]['values'].mean()
manual_reshape_result = pd.DataFrame(result).T
end_time = time.time()
manual_reshape_time = end_time - start_time
# Print the time taken for each method
print(f"Time taken using pivot_table method: {pivot_table_time:.6f} seconds")
print(f"Time taken using manual reshaping: {manual_reshape_time:.6f} seconds")
Output:
Time taken using pivot_table method: 0.130437 seconds Time taken using manual reshaping: 1.244391 seconds
Explanation:
- Import Libraries:
- Import pandas, numpy, and time.
- Create DataFrame:
- Generate a sample DataFrame with 1,000,000 rows, with categorical columns 'A' and 'B', and a numerical column 'values'.
- Time Measurement for pivot_table Method:
- Measure the time taken to reshape the DataFrame using the pivot_table method.
- Time Measurement for Manual Reshaping:
- Measure the time taken to manually reshape the DataFrame using nested for loops and mean calculations.
- Print Results:
- Print the time taken for each method.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Optimize string operations in Pandas: str accessor vs. apply.
Next: Performance comparison of DataFrame sorting in Pandas.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics