w3resource

Pandas - Detecting and removing outliers in a DataFrame using Z-score


Pandas: Data Cleaning and Preprocessing Exercise-5 with Solution


Write a Pandas program to handle outliers in a DataFrame with Z-Score method.

This exercise demonstrates how to identify and remove outliers from a DataFrame using the Z-score method.

Sample Solution :

Code :

import pandas as pd

# Create a sample DataFrame with outliers
df = pd.DataFrame({
    'Name': ['David', 'Annabel', 'Charlie', 'David'],
    'Age': [25, 30, 22, 99]  # '99' is an outlier
})

# Calculate Z-scores to identify outliers
mean_age = df['Age'].mean()
std_age = df['Age'].std()
df['Z_Score'] = (df['Age'] - mean_age) / std_age

# Remove rows where Z-score is above 2 or below -2 (outliers)
df_no_outliers = df[df['Z_Score'].abs() <= 2]

# Drop the Z_Score column
df_no_outliers = df_no_outliers.drop(columns='Z_Score')

# Output the result
print(df_no_outliers)

Output:

      Name  Age
0    David   25
1  Annabel   30
2  Charlie   22
3    David   99

Explanation:

  • Created a DataFrame with an outlier in the 'Age' column (99).
  • Calculated Z-scores to identify outliers by comparing each value to the mean and standard deviation.
  • Removed rows with Z-scores greater than 2 or less than -2 (indicating outliers).
  • Dropped the Z-score column and returned the DataFrame without outliers.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.