w3resource

Optimize reading large Excel files with Pandas


Pandas: Performance Optimization Exercise-20 with Solution


Write a Pandas program to optimize the performance of reading a large Excel file into a DataFrame by specifying data types and using the 'usecols' parameter.

Sample Solution :

Python Code :

# Import necessary libraries
import pandas as pd

# Specify the path to the large Excel file
file_path = 'large_excel_file.xlsx'

# Define the data types for the columns
dtypes = {
    'Column1': 'int64',
    'Column2': 'float64',
    'Column3': 'object',
    # Add more column types as needed
}

# Define the columns to read
usecols = ['Column1', 'Column2', 'Column3']

# Use read_excel with specified data types and columns to read
df = pd.read_excel(file_path, dtype=dtypes, usecols=usecols)

# Display the DataFrame
print(df.head())

Output:

   Column1  Column2 Column3
0        1      1.1    abcd
1        2      1.2    abcd
2        3      1.3    abcd
3        4      1.4    abcd
4        5      1.5    abcd

Explanation:

  • Import necessary libraries:
    • Import pandas.
  • Specify the path to the large Excel file:
    • The file path is set to a variable.
  • Define the data types for the columns:
    • A dictionary specifying the data types for each column.
  • Define the columns to read:
    • A list of column names to be read from the Excel file.
  • Read the Excel file:
    • Use pd.read_excel with the dtype and usecols parameters to optimize performance.
  • Display the DataFrame:
    • Print the first few rows of the DataFrame to confirm successful loading.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Efficiently apply multiple Aggregation functions in Pandas.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.