Applying Polynomial features for feature expansion in Pandas
Pandas: Machine Learning Integration Exercise-14 with Solution
Write a Pandas program that applies Polynomial Features for feature expansion.
The following e exercise shows how to expand the feature set by generating polynomial features using Scikit-learn's PolynomialFeatures.
Sample Solution :
Code :
import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.impute import SimpleImputer
# Load the dataset
df = pd.read_csv('data.csv')
# Select only numeric columns for polynomial feature expansion
numeric_cols = df.select_dtypes(include=[float, int])
# Impute missing values using the mean for numeric columns
imputer = SimpleImputer(strategy='mean')
numeric_cols_imputed = pd.DataFrame(imputer.fit_transform(numeric_cols), columns=numeric_cols.columns)
# Initialize PolynomialFeatures with degree 2
poly = PolynomialFeatures(degree=2)
# Apply polynomial feature expansion to imputed numeric data
X_poly = poly.fit_transform(numeric_cols_imputed)
# Output the expanded feature set
print(X_poly)
Output:
[[1.0000e+00 1.0000e+00 2.5000e+01 5.0000e+04 0.0000e+00 1.0000e+00 2.5000e+01 5.0000e+04 0.0000e+00 6.2500e+02 1.2500e+06 0.0000e+00 2.5000e+09 0.0000e+00 0.0000e+00] [1.0000e+00 2.0000e+00 3.0000e+01 6.0000e+04 1.0000e+00 4.0000e+00 6.0000e+01 1.2000e+05 2.0000e+00 9.0000e+02 1.8000e+06 3.0000e+01 3.6000e+09 6.0000e+04 1.0000e+00] [1.0000e+00 3.0000e+00 2.2000e+01 7.0000e+04 0.0000e+00 9.0000e+00 6.6000e+01 2.1000e+05 0.0000e+00 4.8400e+02 1.5400e+06 0.0000e+00 4.9000e+09 0.0000e+00 0.0000e+00] [1.0000e+00 4.0000e+00 3.5000e+01 8.0000e+04 1.0000e+00 1.6000e+01 1.4000e+02 3.2000e+05 4.0000e+00 1.2250e+03 2.8000e+06 3.5000e+01 6.4000e+09 8.0000e+04 1.0000e+00] [1.0000e+00 5.0000e+00 2.8200e+01 5.5000e+04 0.0000e+00 2.5000e+01 1.4100e+02 2.7500e+05 0.0000e+00 7.9524e+02 1.5510e+06 0.0000e+00 3.0250e+09 0.0000e+00 0.0000e+00] [1.0000e+00 6.0000e+00 2.9000e+01 6.3000e+04 1.0000e+00 3.6000e+01 1.7400e+02 3.7800e+05 6.0000e+00 8.4100e+02 1.8270e+06 2.9000e+01 3.9690e+09 6.3000e+04 1.0000e+00]]
Explanation:
- Import Libraries:
- pandas is imported for data manipulation.
- PolynomialFeatures from Scikit-learn is imported for polynomial feature expansion.
- SimpleImputer from Scikit-learn is imported to handle missing values (imputation).
- Load the Dataset:
- The dataset data.csv is loaded using pd.read_csv() into a DataFrame df.
- Select Numeric Columns:
- select_dtypes(include=[float, int]) is used to filter and select only numeric columns (Age, Salary), excluding non-numeric ones like Name and Gender.
- Impute Missing Values:
- SimpleImputer with strategy='mean' is used to replace any missing values (NaN) in the numeric columns with the mean of the respective columns.
- The imputed numeric columns are stored in numeric_cols_imputed.
- Initialize PolynomialFeatures:
- PolynomialFeatures(degree=2) is initialized to generate polynomial and interaction features up to the second degree.
- Apply Polynomial Feature Expansion:
- fit_transform() is applied to the imputed numeric data (numeric_cols_imputed), creating new features such as squares and interactions between the original numeric columns.
- Output the Expanded Feature Set:
- The transformed feature set X_poly is printed, which contains both original and new polynomial features.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics