Performance comparison of DataFrame filtering in Pandas
Pandas: Performance Optimization Exercise-12 with Solution
Write a Pandas program that uses the query method to filter rows of a DataFrame based on a condition. Compare the performance with boolean indexing.
Sample Solution :
Python Code :
# Import necessary libraries
import pandas as pd
import numpy as np
import time
# Create a sample DataFrame
num_rows = 1000000
df = pd.DataFrame({
'A': np.random.randint(0, 100, size=num_rows),
'B': np.random.randn(num_rows),
'C': np.random.rand(num_rows)
})
# Define the condition
condition = 'A > 50 and B < 0'
# Measure time for query method
start_time = time.time()
result_query = df.query(condition)
end_time = time.time()
query_time = end_time - start_time
# Measure time for boolean indexing
start_time = time.time()
result_boolean_indexing = df[(df['A'] > 50) & (df['B'] < 0)]
end_time = time.time()
boolean_indexing_time = end_time - start_time
# Print the time taken for each method
print(f"Time taken using query method: {query_time:.6f} seconds")
print(f"Time taken using boolean indexing: {boolean_indexing_time:.6f} seconds")
Output:
Time taken using query method: 0.021941 seconds Time taken using boolean indexing: 0.008976 seconds
Explanation:
- Import Libraries:
- Import pandas, numpy, and time.
- Create DataFrame:
- Generate a sample DataFrame with 1,000,000 rows.
- Define Condition:
- Set a condition for filtering rows.
- Time Measurement for query Method:
- Measure the time taken to filter rows using the query method.
- Time Measurement for Boolean Indexing:
- Measure the time taken to filter rows using boolean indexing.
- Print Results:
- Print the time taken for each method.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Measure concatenation time of DataFrames in Pandas.
Next: Performance comparison of Resampling time Series data in Pandas.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://198.211.115.131/python-exercises/pandas/performance-comparison-of-dataframe-filtering-in-pandas.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics