Replacing missing values with column mean in Pandas DataFrame

Last update on December 21 2024 07:43:16 (UTC/GMT +8 hours)

Replace missing values in a Pandas DataFrame with the mean of the column.

Sample Solution:

Python Code:

import pandas as pd
import numpy as np

# Create a sample DataFrame with missing values
data = {'A': [1, 2, np.nan, 4, 5],
        'B': [10, np.nan, 30, 40, 50],
        'C': [100, 200, 300, np.nan, 500],
        'D': [1000, 2000, 3000, 4000, np.nan]}

df = pd.DataFrame(data)

# Replace missing values with the mean of each column
df_filled = df.fillna(df.mean())

# Display the DataFrame with missing values replaced
print(df_filled)

Output:

     A     B      C       D
0  1.0  10.0  100.0  1000.0
1  2.0  32.5  200.0  2000.0
2  3.0  30.0  300.0  3000.0
3  4.0  40.0  275.0  4000.0
4  5.0  50.0  500.0  2500.0

Explanation:

In the exerciser above,

Create a sample DataFrame (df) with some missing values (represented by np.nan).
The df.mean() calculates the mean of each column.
The df.fillna(df.mean()) replaces the missing values in each column with the mean of that column.
The result is a new DataFrame (df_filled) with missing values replaced by the mean of each column.

Flowchart:

Python Code Editor:

Previous: Reshaping Pandas DataFrame with pivot_table in Python.
Next: Creating Histogram with NumPy and Matplotlib in Python.