Handling class imbalance using random oversampling in Pandas
Pandas: Machine Learning Integration Exercise-13 with Solution
Write a Pandas program to handling class imbalance using random oversampling.
This exercise show how to handle class imbalance using random oversampling with the RandomOverSampler from Imbalanced-learn.
Sample Solution :
Code :
import pandas as pd
from imblearn.over_sampling import RandomOverSampler
# Load the dataset
df = pd.read_csv('data.csv')
# Split into features and target
X = df.drop('Target', axis=1)
y = df['Target']
# Initialize the RandomOverSampler
ros = RandomOverSampler(random_state=42)
# Apply random oversampling to balance the target classes
X_resampled, y_resampled = ros.fit_resample(X, y)
# Output the resampled dataset
print(pd.concat([X_resampled, y_resampled], axis=1))
Output:
ID Name Age Gender Salary Target 0 1 Sara 25.0 Female 50000.0 0 1 2 Ophrah 30.0 Male 60000.0 1 2 3 Torben 22.0 Male 70000.0 0 3 4 Masaharu 35.0 Male 80000.0 1 4 5 Kaya NaN Female 55000.0 0 5 6 Abaddon 29.0 Male NaN 1
Explanation:
- Loaded the dataset using Pandas.
- Split the data into features (X) and target (y).
- Initialized RandomOverSampler from Imbalanced-learn to balance the dataset by oversampling the minority class.
- Applied oversampling and displayed the resampled dataset.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics