Compare DataFrame element-wise multiplication using for loop vs. * Operator
Pandas: Performance Optimization Exercise-9 with Solution
Write a Pandas program that performs element-wise multiplication on a DataFrame using a for loop vs. using the * operator. Compare the performance.
Sample Solution :
Python Code :
import pandas as pd # Import the Pandas library
import numpy as np # Import the NumPy library
import time # Import the time module to measure execution time
# Create a sample DataFrame
np.random.seed(0) # Set seed for reproducibility
data = {
'A': np.random.randint(1, 100, size=1000000),
'B': np.random.randint(1, 100, size=1000000)
}
df = pd.DataFrame(data)
# Perform element-wise multiplication using a for loop
start_time = time.time() # Record the start time
result_for_loop = []
for index, row in df.iterrows():
result_for_loop.append(row['A'] * row['B'])
result_for_loop = pd.Series(result_for_loop)
time_for_loop = time.time() - start_time # Calculate the time taken
# Perform element-wise multiplication using the * operator
start_time = time.time() # Record the start time
result_vectorized = df['A'] * df['B']
time_vectorized = time.time() - start_time # Calculate the time taken
# Print the time taken for both methods
print("Time taken using for loop:", time_for_loop, "seconds")
print("Time taken using * operator:", time_vectorized, "seconds")
Output:
Time taken using for loop: 37.052802324295044 seconds Time taken using * operator: 0.0019948482513427734 seconds
Explanation:
- Import Libraries:
- Import the Pandas library for data manipulation.
- Import the NumPy library for generating random data.
- Import the time module to measure execution time.
- Create a Sample DataFrame:
- Set a seed for reproducibility using np.random.seed(0).
- Create a dictionary data with columns 'A' and 'B' containing random integers.
- Generate a DataFrame df using the dictionary.
- Perform Element-wise Multiplication Using a for loop:
- Record the start time using time.time().
- Initialize an empty list result_for_loop to store the multiplication results.
- Iterate through each row in the DataFrame using a for loop with "df.iterrows()". Multiply the values in columns 'A' and 'B' and append the result to result_for_loop.
- Convert 'result_for_loop' to a Pandas Series.
- Calculate the time taken by subtracting the start time from the current time.
- Perform Element-wise Multiplication Using the * Operator:
- Record the start time using time.time().
- Use the * operator to perform element-wise multiplication of columns 'A' and 'B'.
- Store the result in 'result_vectorized'.
- Calculate the time taken by subtracting the start time from the current time.
- Finally display the time taken for both the for loop method and the * operator method.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Optimize memory usage with Categorical data type in Pandas DataFrame.
Next: Compare performance of eval method vs. standard operations in Pandas.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics