Performance comparison of DataFrame sorting in Pandas
Pandas: Performance Optimization Exercise-17 with Solution
Write a Pandas program to measure the time taken to sort a large DataFrame using the sort_values method vs. using a custom sorting function with apply.
Sample Solution :
Python Code :
# Import necessary libraries
import pandas as pd
import numpy as np
import time
# Create a sample DataFrame
num_rows = 1000000
df = pd.DataFrame({
'A': np.random.randint(0, 100, size=num_rows),
'B': np.random.randn(num_rows)
})
# Measure time for sort_values method
start_time = time.time()
sorted_df = df.sort_values(by='A')
end_time = time.time()
sort_values_time = end_time - start_time
# Define a custom sorting function
def sort_custom(df):
return df.sort_values(by='A')
# Measure time for apply method
start_time = time.time()
apply_sorted_df = df.apply(lambda x: x).sort_values(by='A')
end_time = time.time()
apply_time = end_time - start_time
# Print the time taken for each method
print(f"Time taken using sort_values method: {sort_values_time:.6f} seconds")
print(f"Time taken using apply method: {apply_time:.6f} seconds")
Output:
Time taken using sort_values method: 0.075608 seconds Time taken using apply method: 0.075997 seconds
Explanation:
- Import Libraries:
- Import pandas, numpy, and time.
- Create DataFrame:
- Generate a sample DataFrame with 1,000,000 rows with columns 'A' and 'B'.
- Time Measurement for sort_values Method:
- Measure the time taken to sort the DataFrame using the sort_values method.
- Define Custom Sorting Function:
- Define a custom function for sorting.
- Time Measurement for apply Method:
- Measure the time taken to sort the DataFrame using the custom sorting function with the apply method.
- Finally print the time taken for each method.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Reshaping DataFrame in Pandas: pivot_table vs. manual Loop.
Next: Rolling Window Calculation in Pandas: rolling vs. Manual.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://198.211.115.131/python-exercises/pandas/performance-comparison-of-dataframe-sorting-in-pandas.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics