w3resource

Performance comparison of DataFrame sorting in Pandas


Pandas: Performance Optimization Exercise-17 with Solution


Write a Pandas program to measure the time taken to sort a large DataFrame using the sort_values method vs. using a custom sorting function with apply.

Sample Solution :

Python Code :

# Import necessary libraries
import pandas as pd
import numpy as np
import time

# Create a sample DataFrame
num_rows = 1000000
df = pd.DataFrame({
    'A': np.random.randint(0, 100, size=num_rows),
    'B': np.random.randn(num_rows)
})

# Measure time for sort_values method
start_time = time.time()
sorted_df = df.sort_values(by='A')
end_time = time.time()
sort_values_time = end_time - start_time

# Define a custom sorting function
def sort_custom(df):
    return df.sort_values(by='A')

# Measure time for apply method
start_time = time.time()
apply_sorted_df = df.apply(lambda x: x).sort_values(by='A')
end_time = time.time()
apply_time = end_time - start_time

# Print the time taken for each method
print(f"Time taken using sort_values method: {sort_values_time:.6f} seconds")
print(f"Time taken using apply method: {apply_time:.6f} seconds")

Output:

Time taken using sort_values method: 0.075608 seconds
Time taken using apply method: 0.075997 seconds

Explanation:

  • Import Libraries:
    • Import pandas, numpy, and time.
  • Create DataFrame:
    • Generate a sample DataFrame with 1,000,000 rows with columns 'A' and 'B'.
  • Time Measurement for sort_values Method:
    • Measure the time taken to sort the DataFrame using the sort_values method.
  • Define Custom Sorting Function:
    • Define a custom function for sorting.
  • Time Measurement for apply Method:
    • Measure the time taken to sort the DataFrame using the custom sorting function with the apply method.
  • Finally print the time taken for each method.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Reshaping DataFrame in Pandas: pivot_table vs. manual Loop.
Next: Rolling Window Calculation in Pandas: rolling vs. Manual.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.