w3resource

Removing outliers from the Dataset using Z-Score method in Pandas


10. Removing Outliers from a Dataset

Write a Pandas program that removes outliers from a Dataset.

This exercise demonstrates how to remove outliers from a dataset using the Z-score method.

Sample Solution :

Code :

import pandas as pd
import numpy as np
from scipy import stats

# Load the dataset
df = pd.read_csv('data.csv')

# Remove outliers from the 'Age' column using Z-scores
z_scores = np.abs(stats.zscore(df['Age']))
df_cleaned = df[z_scores < 3]  # Keep rows where Z-score is less than 3

# Output the cleaned dataset
print(df_cleaned)

Output:

Empty DataFrame
Columns: [ID, Name, Age, Gender, Salary, Target]
Index: []

Explanation:

  • Loaded the dataset using Pandas.
  • Calculated the Z-scores of the 'Age' column using stats.zscore().
  • Removed rows where the Z-score was greater than 3 (indicating outliers).
  • Displayed the cleaned dataset.

For more Practice: Solve these Related Problems:

  • Write a Pandas program to remove outliers using the IQR method and output the cleaned dataset.
  • Write a Pandas program to detect and remove outliers based on Z-score thresholds from a numerical column.
  • Write a Pandas program to remove outliers from a dataset and then compare the statistical summaries before and after removal.
  • Write a Pandas program to identify outliers using multiple methods and remove only those that meet all criteria.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.