w3resource

Handling class imbalance using random oversampling in Pandas


Pandas: Machine Learning Integration Exercise-13 with Solution


Write a Pandas program to handling class imbalance using random oversampling.

This exercise show how to handle class imbalance using random oversampling with the RandomOverSampler from Imbalanced-learn.

Sample Solution :

Code :

import pandas as pd
from imblearn.over_sampling import RandomOverSampler

# Load the dataset
df = pd.read_csv('data.csv')

# Split into features and target
X = df.drop('Target', axis=1)
y = df['Target']

# Initialize the RandomOverSampler
ros = RandomOverSampler(random_state=42)

# Apply random oversampling to balance the target classes
X_resampled, y_resampled = ros.fit_resample(X, y)

# Output the resampled dataset
print(pd.concat([X_resampled, y_resampled], axis=1))

Output:

   ID      Name   Age  Gender   Salary  Target
0   1      Sara  25.0  Female  50000.0       0
1   2    Ophrah  30.0    Male  60000.0       1
2   3    Torben  22.0    Male  70000.0       0
3   4  Masaharu  35.0    Male  80000.0       1
4   5      Kaya   NaN  Female  55000.0       0
5   6   Abaddon  29.0    Male      NaN       1

Explanation:

  • Loaded the dataset using Pandas.
  • Split the data into features (X) and target (y).
  • Initialized RandomOverSampler from Imbalanced-learn to balance the dataset by oversampling the minority class.
  • Applied oversampling and displayed the resampled dataset.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.