w3resource

Measure concatenation time of DataFrames in Pandas


Pandas: Performance Optimization Exercise-11 with Solution


Write a Pandas program to measure the time taken to concatenate multiple DataFrames using the "concat" method vs. using a "for" loop.

Sample Solution :

Python Code :

# Import necessary libraries
import pandas as pd
import numpy as np
import time

# Function to create a list of DataFrames
def create_dataframes(num_dfs, num_rows, num_cols):
    return [pd.DataFrame(np.random.randn(num_rows, num_cols)) for _ in range(num_dfs)]

# Number of DataFrames, rows, and columns
num_dfs = 100
num_rows = 1000
num_cols = 10

# Create DataFrames
dfs = create_dataframes(num_dfs, num_rows, num_cols)

# Measure time for pd.concat method
start_time = time.time()
result_concat = pd.concat(dfs, axis=0)
end_time = time.time()
concat_time = end_time - start_time

# Measure time for for-loop method
start_time = time.time()
result_for_loop = dfs[0]
for df in dfs[1:]:
    result_for_loop = pd.concat([result_for_loop, df], axis=0)
end_time = time.time()
for_loop_time = end_time - start_time

# Print the time taken for each method
print(f"Time taken using pd.concat: {concat_time:.6f} seconds")
print(f"Time taken using for loop: {for_loop_time:.6f} seconds")

Output:

Time taken using pd.concat: 0.004262 seconds
Time taken using for loop: 0.111962 seconds

Explanation:

  • Import Libraries:
    • Import pandas, numpy, and time.
  • Create DataFrames:
    • Define a function to create a list of random DataFrames.
  • Parameters:
    • Set the number of DataFrames, rows, and columns.
  • Generate DataFrames:
    • Use the function to generate the DataFrames.
  • Time Measurement for pd.concat:
    • Measure the time taken to concatenate the DataFrames using pd.concat.
  • Time Measurement for for Loop:
    • Measure the time taken to concatenate the DataFrames using a for loop and pd.concat.
  • Finally print the time taken for each method.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Compare performance of eval method vs. standard operations in Pandas.
Next: Performance comparison of DataFrame filtering in Pandas.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.