Optimize Memory usage when loading large CSV into Pandas DataFrame

Last update on May 05 2025 13:03:56 (UTC/GMT +8 hours)

3. Optimize Memory Usage When Loading CSV

Write a Pandas program that loads a large CSV file into a DataFrame and optimizes memory usage by specifying appropriate data types.

Sample Solution :

Python Code :

import pandas as pd  # Import the Pandas library

# Define the CSV file path
csv_file_path = 'large_csv_file.csv'

# Load a small chunk of the CSV file to infer data types
chunk = pd.read_csv(csv_file_path, nrows=100)

# Specify the data types for the columns based on the initial chunk
dtype_dict = {
    'column1': 'int32',
    'column2': 'float32',
    'column3': 'category',
    'column4': 'category',
    # Add more columns with appropriate data types
}

# Load the full CSV file with specified data types to optimize memory usage
df = pd.read_csv(csv_file_path, dtype=dtype_dict)

# Print memory usage after optimization
print("Memory usage after optimization:")
print(df.info(memory_usage='deep'))

Output:

Memory usage after optimization:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2715 entries, 0 to 2714
Data columns (total 4 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   1         2715 non-null   int64  
 1   1.02      2715 non-null   float64
 2   Folder    2715 non-null   object 
 3   Folder.1  2715 non-null   object 
dtypes: float64(1), int64(1), object(2)
memory usage: 629.5 KB
None

Explanation:

Import Pandas Library:

Import the Pandas library for data manipulation.

Define CSV File Path:

Specify the path to the large CSV file with csv_file_path.

Load Initial Chunk:

Load a small chunk of the CSV file (e.g., 100 rows) using pd.read_csv(csv_file_path, nrows=100) to infer data types.

Specify Data Types:

Based on the initial chunk, create a dictionary 'dtype_dict' that maps column names to appropriate data types (e.g., 'int32', 'float32', 'category').

Load Full CSV with Specified Data Types:

Use pd.read_csv(csv_file_path, dtype=dtype_dict) to load the full CSV file while specifying the data types to optimize memory usage.

Print Memory Usage:

Use df.info(memory_usage='deep') to print the memory usage of the DataFrame after optimization.

For more Practice: Solve these Related Problems:

Write a Pandas program to load a large CSV file by explicitly specifying data types for each column and measure memory usage.
Write a Pandas program to compare memory consumption when reading a CSV with default settings versus with optimized data types.
Write a Pandas program to load a CSV file and use the memory_usage() method to quantify the benefits of specifying data types.
Write a Pandas program to implement dtype specifications during CSV import and evaluate the impact on processing speed and memory.

Go to:

Previous: Compare performance of apply vs. Vectorized operations in Pandas.
Next: Reduce memory usage in Pandas DataFrame using astype method.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.