Applying Log transformation to Skewed data using Pandas
Pandas: Machine Learning Integration Exercise-17 with Solution
Write a Pandas program that applies Log Transformation to Skewed Data.
This exercise shows how to apply a log transformation to skewed numerical data to normalize its distribution
Sample Solution :
Code :
import pandas as pd
import numpy as np
# Load the dataset
df = pd.read_csv('data.csv')
# Apply log transformation to the 'Salary' column
df['Log_Salary'] = np.log(df['Salary'] + 1) # Adding 1 to avoid log(0)
# Output the transformed dataset
print(df[['Salary', 'Log_Salary']])
Output:
Salary Log_Salary 0 50000.0 10.819798 1 60000.0 11.002117 2 70000.0 11.156265 3 80000.0 11.289794 4 55000.0 10.915107 5 NaN NaN
Explanation:
- Loaded the dataset using Pandas.
- Applied log transformation to the 'Salary' column to reduce skewness.
- Displayed the original and log-transformed columns.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://198.211.115.131/python-exercises/pandas/pandas-apply-log-transformation-to-skewed-data.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics