Removing outliers from the Dataset using Z-Score method in Pandas
Pandas: Machine Learning Integration Exercise-10 with Solution
Write a Pandas program that removes outliers from a Dataset.
This exercise demonstrates how to remove outliers from a dataset using the Z-score method.
Sample Solution :
Code :
import pandas as pd
import numpy as np
from scipy import stats
# Load the dataset
df = pd.read_csv('data.csv')
# Remove outliers from the 'Age' column using Z-scores
z_scores = np.abs(stats.zscore(df['Age']))
df_cleaned = df[z_scores < 3] # Keep rows where Z-score is less than 3
# Output the cleaned dataset
print(df_cleaned)
Output:
Empty DataFrame Columns: [ID, Name, Age, Gender, Salary, Target] Index: []
Explanation:
- Loaded the dataset using Pandas.
- Calculated the Z-scores of the 'Age' column using stats.zscore().
- Removed rows where the Z-score was greater than 3 (indicating outliers).
- Displayed the cleaned dataset.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics