w3resource

Reshaping DataFrame in Pandas: pivot_table vs. manual Loop


Pandas: Performance Optimization Exercise-16 with Solution


Write a Pandas program that uses the pivot_table method to reshape a DataFrame and compares the performance with manual reshaping using for loops.

Sample Solution :

Python Code :

# Import necessary libraries
import pandas as pd
import numpy as np
import time

# Create a sample DataFrame
num_rows = 1000000
df = pd.DataFrame({
    'A': np.random.choice(['foo', 'bar', 'baz'], size=num_rows),
    'B': np.random.choice(['one', 'two', 'three'], size=num_rows),
    'values': np.random.randn(num_rows)
})

# Measure time for pivot_table method
start_time = time.time()
pivot_table_result = df.pivot_table(index='A', columns='B', values='values', aggfunc='mean')
end_time = time.time()
pivot_table_time = end_time - start_time

# Measure time for manual reshaping using for loops
start_time = time.time()
result = {}
for a in df['A'].unique():
    result[a] = {}
    for b in df['B'].unique():
        result[a][b] = df[(df['A'] == a) & (df['B'] == b)]['values'].mean()

manual_reshape_result = pd.DataFrame(result).T
end_time = time.time()
manual_reshape_time = end_time - start_time

# Print the time taken for each method
print(f"Time taken using pivot_table method: {pivot_table_time:.6f} seconds")
print(f"Time taken using manual reshaping: {manual_reshape_time:.6f} seconds")

Output:

Time taken using pivot_table method: 0.130437 seconds
Time taken using manual reshaping: 1.244391 seconds

Explanation:

  • Import Libraries:
    • Import pandas, numpy, and time.
  • Create DataFrame:
    • Generate a sample DataFrame with 1,000,000 rows, with categorical columns 'A' and 'B', and a numerical column 'values'.
  • Time Measurement for pivot_table Method:
    • Measure the time taken to reshape the DataFrame using the pivot_table method.
  • Time Measurement for Manual Reshaping:
    • Measure the time taken to manually reshape the DataFrame using nested for loops and mean calculations.
  • Print Results:
    • Print the time taken for each method.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Optimize string operations in Pandas: str accessor vs. apply.
Next: Performance comparison of DataFrame sorting in Pandas.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.