Applying Polynomial features for feature expansion in Pandas

Last update on May 06 2025 13:19:28 (UTC/GMT +8 hours)

14. Applying Polynomial Features for Feature Expansion

Write a Pandas program that applies Polynomial Features for feature expansion.

The following e exercise shows how to expand the feature set by generating polynomial features using Scikit-learn's PolynomialFeatures.

Sample Solution :

Code :

import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.impute import SimpleImputer

# Load the dataset
df = pd.read_csv('data.csv')

# Select only numeric columns for polynomial feature expansion
numeric_cols = df.select_dtypes(include=[float, int])

# Impute missing values using the mean for numeric columns
imputer = SimpleImputer(strategy='mean')
numeric_cols_imputed = pd.DataFrame(imputer.fit_transform(numeric_cols), columns=numeric_cols.columns)

# Initialize PolynomialFeatures with degree 2
poly = PolynomialFeatures(degree=2)

# Apply polynomial feature expansion to imputed numeric data
X_poly = poly.fit_transform(numeric_cols_imputed)

# Output the expanded feature set
print(X_poly)

Output:

[[1.0000e+00 1.0000e+00 2.5000e+01 5.0000e+04 0.0000e+00 1.0000e+00
  2.5000e+01 5.0000e+04 0.0000e+00 6.2500e+02 1.2500e+06 0.0000e+00
  2.5000e+09 0.0000e+00 0.0000e+00]
 [1.0000e+00 2.0000e+00 3.0000e+01 6.0000e+04 1.0000e+00 4.0000e+00
  6.0000e+01 1.2000e+05 2.0000e+00 9.0000e+02 1.8000e+06 3.0000e+01
  3.6000e+09 6.0000e+04 1.0000e+00]
 [1.0000e+00 3.0000e+00 2.2000e+01 7.0000e+04 0.0000e+00 9.0000e+00
  6.6000e+01 2.1000e+05 0.0000e+00 4.8400e+02 1.5400e+06 0.0000e+00
  4.9000e+09 0.0000e+00 0.0000e+00]
 [1.0000e+00 4.0000e+00 3.5000e+01 8.0000e+04 1.0000e+00 1.6000e+01
  1.4000e+02 3.2000e+05 4.0000e+00 1.2250e+03 2.8000e+06 3.5000e+01
  6.4000e+09 8.0000e+04 1.0000e+00]
 [1.0000e+00 5.0000e+00 2.8200e+01 5.5000e+04 0.0000e+00 2.5000e+01
  1.4100e+02 2.7500e+05 0.0000e+00 7.9524e+02 1.5510e+06 0.0000e+00
  3.0250e+09 0.0000e+00 0.0000e+00]
 [1.0000e+00 6.0000e+00 2.9000e+01 6.3000e+04 1.0000e+00 3.6000e+01
  1.7400e+02 3.7800e+05 6.0000e+00 8.4100e+02 1.8270e+06 2.9000e+01
  3.9690e+09 6.3000e+04 1.0000e+00]]

Explanation:

Import Libraries:

pandas is imported for data manipulation.
PolynomialFeatures from Scikit-learn is imported for polynomial feature expansion.
SimpleImputer from Scikit-learn is imported to handle missing values (imputation).

Load the Dataset:

The dataset data.csv is loaded using pd.read_csv() into a DataFrame df.

Select Numeric Columns:

select_dtypes(include=[float, int]) is used to filter and select only numeric columns (Age, Salary), excluding non-numeric ones like Name and Gender.

Impute Missing Values:

SimpleImputer with strategy='mean' is used to replace any missing values (NaN) in the numeric columns with the mean of the respective columns.
The imputed numeric columns are stored in numeric_cols_imputed.

Initialize PolynomialFeatures:

PolynomialFeatures(degree=2) is initialized to generate polynomial and interaction features up to the second degree.

Apply Polynomial Feature Expansion:

fit_transform() is applied to the imputed numeric data (numeric_cols_imputed), creating new features such as squares and interactions between the original numeric columns.

Output the Expanded Feature Set:

The transformed feature set X_poly is printed, which contains both original and new polynomial features.

For more Practice: Solve these Related Problems:

Write a Pandas program to generate polynomial features for numeric columns up to a specified degree and append them to the DataFrame.
Write a Pandas program to create interaction terms between features using polynomial expansion and compare model performance.
Write a Pandas program to apply polynomial feature expansion selectively on columns based on their correlation with the target variable.
Write a Pandas program to generate polynomial features and then perform feature selection to remove redundant variables.

Go to:

Previous: Handling Class Imbalance Using Random Oversampling.
Next: Scaling Numerical Features Using Scikit-learn's RobustScaler.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.