Reduce memory usage in Pandas DataFrame using astype method

Last update on May 05 2025 13:03:55 (UTC/GMT +8 hours)

4. Data Type Conversion with astype

Write a Pandas program that uses the "astype" method to convert the data types of a DataFrame and measures the reduction in memory usage.

Sample Solution :

Python Code :

import pandas as pd  # Import the Pandas library
import numpy as np  # Import the NumPy library

# Create a sample DataFrame with mixed data types
np.random.seed(0)  # Set seed for reproducibility
data = {
    'int_col': np.random.randint(0, 100, size=100000),
    'float_col': np.random.random(size=100000) * 100,
    'category_col': np.random.choice(['A', 'B', 'C'], size=100000),
    'object_col': np.random.choice(['foo', 'bar', 'baz'], size=100000)
}
df = pd.DataFrame(data)

# Print memory usage before optimization
print("Memory usage before optimization:")
print(df.info(memory_usage='deep'))

# Convert data types using astype method
df['int_col'] = df['int_col'].astype('int16')
df['float_col'] = df['float_col'].astype('float32')
df['category_col'] = df['category_col'].astype('category')
df['object_col'] = df['object_col'].astype('category')

# Print memory usage after optimization
print("\nMemory usage after optimization:")
print(df.info(memory_usage='deep'))

Output:

Memory usage before optimization:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 4 columns):
 #   Column        Non-Null Count   Dtype  
---  ------        --------------   -----  
 0   int_col       100000 non-null  int32  
 1   float_col     100000 non-null  float64
 2   category_col  100000 non-null  object 
 3   object_col    100000 non-null  object 
dtypes: float64(1), int32(1), object(2)
memory usage: 12.4 MB
None

Memory usage after optimization:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 100000 entries, 0 to 99999
Data columns (total 4 columns):
 #   Column        Non-Null Count   Dtype   
---  ------        --------------   -----   
 0   int_col       100000 non-null  int16   
 1   float_col     100000 non-null  float32 
 2   category_col  100000 non-null  category
 3   object_col    100000 non-null  category
dtypes: category(2), float32(1), int16(1)
memory usage: 781.9 KB
None

Explanation:

Import Libraries:

Import the Pandas library for data manipulation.
Import the NumPy library for generating random data.

Create a sample DataFrame:

Set a seed for reproducibility using np.random.seed(0).
Create a dictionary data with columns of mixed data types: integers, floats, categories, and objects.
Generate a DataFrame df using the dictionary.

Print memory usage before optimization:

Use df.info(memory_usage='deep') to display the memory usage of the DataFrame before optimization.

Convert data types using astype method:

Convert the 'int_col' to 'int16'.
Convert the 'float_col' to 'float32'.
Convert the 'category_col' and 'object_col' to 'category'.

Print Memory usage after optimization:

Use df.info(memory_usage='deep') to display the memory usage of the DataFrame after optimization.

For more Practice: Solve these Related Problems:

Write a Pandas program to convert DataFrame columns to lower memory data types using astype() and report the memory reduction.
Write a Pandas program to change column data types with astype() and measure performance before and after conversion.
Write a Pandas program to optimize a DataFrame by converting numeric columns to more efficient types and compare memory_usage().
Write a Pandas program to benchmark the time taken for type conversion of a large DataFrame and display the memory savings.

Go to:

Previous: Optimize Memory usage when loading large CSV into Pandas DataFrame.
Next: Compare DataFrame row filtering using for loop vs. Boolean indexing.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.