Removing outliers from the Dataset using Z-Score method in Pandas

Last update on May 06 2025 13:19:32 (UTC/GMT +8 hours)

10. Removing Outliers from a Dataset

Write a Pandas program that removes outliers from a Dataset.

This exercise demonstrates how to remove outliers from a dataset using the Z-score method.

Sample Solution :

Code :

import pandas as pd
import numpy as np
from scipy import stats

# Load the dataset
df = pd.read_csv('data.csv')

# Remove outliers from the 'Age' column using Z-scores
z_scores = np.abs(stats.zscore(df['Age']))
df_cleaned = df[z_scores < 3]  # Keep rows where Z-score is less than 3

# Output the cleaned dataset
print(df_cleaned)

Output:

Empty DataFrame
Columns: [ID, Name, Age, Gender, Salary, Target]
Index: []

Explanation:

Loaded the dataset using Pandas.
Calculated the Z-scores of the 'Age' column using stats.zscore().
Removed rows where the Z-score was greater than 3 (indicating outliers).
Displayed the cleaned dataset.

For more Practice: Solve these Related Problems:

Write a Pandas program to remove outliers using the IQR method and output the cleaned dataset.
Write a Pandas program to detect and remove outliers based on Z-score thresholds from a numerical column.
Write a Pandas program to remove outliers from a dataset and then compare the statistical summaries before and after removal.
Write a Pandas program to identify outliers using multiple methods and remove only those that meet all criteria.

Go to:

Previous: Splitting Dataset into Training and Testing Sets.
Next: Imputing Missing Values Using K-Nearest Neighbours.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.