Removing outliers from the Dataset using Z-Score method in Pandas
10. Removing Outliers from a Dataset
Write a Pandas program that removes outliers from a Dataset.
This exercise demonstrates how to remove outliers from a dataset using the Z-score method.
Sample Solution :
Code :
import pandas as pd
import numpy as np
from scipy import stats
# Load the dataset
df = pd.read_csv('data.csv')
# Remove outliers from the 'Age' column using Z-scores
z_scores = np.abs(stats.zscore(df['Age']))
df_cleaned = df[z_scores < 3] # Keep rows where Z-score is less than 3
# Output the cleaned dataset
print(df_cleaned)
Output:
Empty DataFrame Columns: [ID, Name, Age, Gender, Salary, Target] Index: []
Explanation:
- Loaded the dataset using Pandas.
- Calculated the Z-scores of the 'Age' column using stats.zscore().
- Removed rows where the Z-score was greater than 3 (indicating outliers).
- Displayed the cleaned dataset.
For more Practice: Solve these Related Problems:
- Write a Pandas program to remove outliers using the IQR method and output the cleaned dataset.
- Write a Pandas program to detect and remove outliers based on Z-score thresholds from a numerical column.
- Write a Pandas program to remove outliers from a dataset and then compare the statistical summaries before and after removal.
- Write a Pandas program to identify outliers using multiple methods and remove only those that meet all criteria.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.