w3resource

Removing outliers from the Dataset using Z-Score method in Pandas


Pandas: Machine Learning Integration Exercise-10 with Solution


Write a Pandas program that removes outliers from a Dataset.

This exercise demonstrates how to remove outliers from a dataset using the Z-score method.

Sample Solution :

Code :

import pandas as pd
import numpy as np
from scipy import stats

# Load the dataset
df = pd.read_csv('data.csv')

# Remove outliers from the 'Age' column using Z-scores
z_scores = np.abs(stats.zscore(df['Age']))
df_cleaned = df[z_scores < 3]  # Keep rows where Z-score is less than 3

# Output the cleaned dataset
print(df_cleaned)

Output:

Empty DataFrame
Columns: [ID, Name, Age, Gender, Salary, Target]
Index: []

Explanation:

  • Loaded the dataset using Pandas.
  • Calculated the Z-scores of the 'Age' column using stats.zscore().
  • Removed rows where the Z-score was greater than 3 (indicating outliers).
  • Displayed the cleaned dataset.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.