w3resource

Applying Log transformation to Skewed data using Pandas


Pandas: Machine Learning Integration Exercise-17 with Solution


Write a Pandas program that applies Log Transformation to Skewed Data.

This exercise shows how to apply a log transformation to skewed numerical data to normalize its distribution

Sample Solution :

Code :

import pandas as pd
import numpy as np

# Load the dataset
df = pd.read_csv('data.csv')

# Apply log transformation to the 'Salary' column
df['Log_Salary'] = np.log(df['Salary'] + 1)  # Adding 1 to avoid log(0)

# Output the transformed dataset
print(df[['Salary', 'Log_Salary']])

Output:

    Salary  Log_Salary
0  50000.0   10.819798
1  60000.0   11.002117
2  70000.0   11.156265
3  80000.0   11.289794
4  55000.0   10.915107
5      NaN         NaN

Explanation:

  • Loaded the dataset using Pandas.
  • Applied log transformation to the 'Salary' column to reduce skewness.
  • Displayed the original and log-transformed columns.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.