Reduce memory usage in Pandas DataFrame using astype method
Pandas: Performance Optimization Exercise-4 with Solution
Write a Pandas program that uses the "astype" method to convert the data types of a DataFrame and measures the reduction in memory usage.
Sample Solution :
Python Code :
import pandas as pd # Import the Pandas library
import numpy as np # Import the NumPy library
# Create a sample DataFrame with mixed data types
np.random.seed(0) # Set seed for reproducibility
data = {
'int_col': np.random.randint(0, 100, size=100000),
'float_col': np.random.random(size=100000) * 100,
'category_col': np.random.choice(['A', 'B', 'C'], size=100000),
'object_col': np.random.choice(['foo', 'bar', 'baz'], size=100000)
}
df = pd.DataFrame(data)
# Print memory usage before optimization
print("Memory usage before optimization:")
print(df.info(memory_usage='deep'))
# Convert data types using astype method
df['int_col'] = df['int_col'].astype('int16')
df['float_col'] = df['float_col'].astype('float32')
df['category_col'] = df['category_col'].astype('category')
df['object_col'] = df['object_col'].astype('category')
# Print memory usage after optimization
print("\nMemory usage after optimization:")
print(df.info(memory_usage='deep'))
Output:
Memory usage before optimization: <class 'pandas.core.frame.DataFrame'> RangeIndex: 100000 entries, 0 to 99999 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 int_col 100000 non-null int32 1 float_col 100000 non-null float64 2 category_col 100000 non-null object 3 object_col 100000 non-null object dtypes: float64(1), int32(1), object(2) memory usage: 12.4 MB None Memory usage after optimization: <class 'pandas.core.frame.DataFrame'> RangeIndex: 100000 entries, 0 to 99999 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 int_col 100000 non-null int16 1 float_col 100000 non-null float32 2 category_col 100000 non-null category 3 object_col 100000 non-null category dtypes: category(2), float32(1), int16(1) memory usage: 781.9 KB None
Explanation:
- Import Libraries:
- Import the Pandas library for data manipulation.
- Import the NumPy library for generating random data.
- Create a sample DataFrame:
- Set a seed for reproducibility using np.random.seed(0).
- Create a dictionary data with columns of mixed data types: integers, floats, categories, and objects.
- Generate a DataFrame df using the dictionary.
- Print memory usage before optimization:
- Use df.info(memory_usage='deep') to display the memory usage of the DataFrame before optimization.
- Convert data types using astype method:
- Convert the 'int_col' to 'int16'.
- Convert the 'float_col' to 'float32'.
- Convert the 'category_col' and 'object_col' to 'category'.
- Print Memory usage after optimization:
- Use df.info(memory_usage='deep') to display the memory usage of the DataFrame after optimization.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Optimize Memory usage when loading large CSV into Pandas DataFrame.
Next: Compare DataFrame row filtering using for loop vs. Boolean indexing.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics