Measure concatenation time of DataFrames in Pandas
Pandas: Performance Optimization Exercise-11 with Solution
Write a Pandas program to measure the time taken to concatenate multiple DataFrames using the "concat" method vs. using a "for" loop.
Sample Solution :
Python Code :
# Import necessary libraries
import pandas as pd
import numpy as np
import time
# Function to create a list of DataFrames
def create_dataframes(num_dfs, num_rows, num_cols):
return [pd.DataFrame(np.random.randn(num_rows, num_cols)) for _ in range(num_dfs)]
# Number of DataFrames, rows, and columns
num_dfs = 100
num_rows = 1000
num_cols = 10
# Create DataFrames
dfs = create_dataframes(num_dfs, num_rows, num_cols)
# Measure time for pd.concat method
start_time = time.time()
result_concat = pd.concat(dfs, axis=0)
end_time = time.time()
concat_time = end_time - start_time
# Measure time for for-loop method
start_time = time.time()
result_for_loop = dfs[0]
for df in dfs[1:]:
result_for_loop = pd.concat([result_for_loop, df], axis=0)
end_time = time.time()
for_loop_time = end_time - start_time
# Print the time taken for each method
print(f"Time taken using pd.concat: {concat_time:.6f} seconds")
print(f"Time taken using for loop: {for_loop_time:.6f} seconds")
Output:
Time taken using pd.concat: 0.004262 seconds Time taken using for loop: 0.111962 seconds
Explanation:
- Import Libraries:
- Import pandas, numpy, and time.
- Create DataFrames:
- Define a function to create a list of random DataFrames.
- Parameters:
- Set the number of DataFrames, rows, and columns.
- Generate DataFrames:
- Use the function to generate the DataFrames.
- Time Measurement for pd.concat:
- Measure the time taken to concatenate the DataFrames using pd.concat.
- Time Measurement for for Loop:
- Measure the time taken to concatenate the DataFrames using a for loop and pd.concat.
- Finally print the time taken for each method.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Compare performance of eval method vs. standard operations in Pandas.
Next: Performance comparison of DataFrame filtering in Pandas.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics