w3resource

Applying Polynomial features for feature expansion in Pandas


Pandas: Machine Learning Integration Exercise-14 with Solution


Write a Pandas program that applies Polynomial Features for feature expansion.

The following e exercise shows how to expand the feature set by generating polynomial features using Scikit-learn's PolynomialFeatures.

Sample Solution :

Code :

import pandas as pd
from sklearn.preprocessing import PolynomialFeatures
from sklearn.impute import SimpleImputer

# Load the dataset
df = pd.read_csv('data.csv')

# Select only numeric columns for polynomial feature expansion
numeric_cols = df.select_dtypes(include=[float, int])

# Impute missing values using the mean for numeric columns
imputer = SimpleImputer(strategy='mean')
numeric_cols_imputed = pd.DataFrame(imputer.fit_transform(numeric_cols), columns=numeric_cols.columns)

# Initialize PolynomialFeatures with degree 2
poly = PolynomialFeatures(degree=2)

# Apply polynomial feature expansion to imputed numeric data
X_poly = poly.fit_transform(numeric_cols_imputed)

# Output the expanded feature set
print(X_poly)

Output:

[[1.0000e+00 1.0000e+00 2.5000e+01 5.0000e+04 0.0000e+00 1.0000e+00
  2.5000e+01 5.0000e+04 0.0000e+00 6.2500e+02 1.2500e+06 0.0000e+00
  2.5000e+09 0.0000e+00 0.0000e+00]
 [1.0000e+00 2.0000e+00 3.0000e+01 6.0000e+04 1.0000e+00 4.0000e+00
  6.0000e+01 1.2000e+05 2.0000e+00 9.0000e+02 1.8000e+06 3.0000e+01
  3.6000e+09 6.0000e+04 1.0000e+00]
 [1.0000e+00 3.0000e+00 2.2000e+01 7.0000e+04 0.0000e+00 9.0000e+00
  6.6000e+01 2.1000e+05 0.0000e+00 4.8400e+02 1.5400e+06 0.0000e+00
  4.9000e+09 0.0000e+00 0.0000e+00]
 [1.0000e+00 4.0000e+00 3.5000e+01 8.0000e+04 1.0000e+00 1.6000e+01
  1.4000e+02 3.2000e+05 4.0000e+00 1.2250e+03 2.8000e+06 3.5000e+01
  6.4000e+09 8.0000e+04 1.0000e+00]
 [1.0000e+00 5.0000e+00 2.8200e+01 5.5000e+04 0.0000e+00 2.5000e+01
  1.4100e+02 2.7500e+05 0.0000e+00 7.9524e+02 1.5510e+06 0.0000e+00
  3.0250e+09 0.0000e+00 0.0000e+00]
 [1.0000e+00 6.0000e+00 2.9000e+01 6.3000e+04 1.0000e+00 3.6000e+01
  1.7400e+02 3.7800e+05 6.0000e+00 8.4100e+02 1.8270e+06 2.9000e+01
  3.9690e+09 6.3000e+04 1.0000e+00]]

Explanation:

  • Import Libraries:
    • pandas is imported for data manipulation.
    • PolynomialFeatures from Scikit-learn is imported for polynomial feature expansion.
    • SimpleImputer from Scikit-learn is imported to handle missing values (imputation).
  • Load the Dataset:
    • The dataset data.csv is loaded using pd.read_csv() into a DataFrame df.
  • Select Numeric Columns:
    • select_dtypes(include=[float, int]) is used to filter and select only numeric columns (Age, Salary), excluding non-numeric ones like Name and Gender.
  • Impute Missing Values:
    • SimpleImputer with strategy='mean' is used to replace any missing values (NaN) in the numeric columns with the mean of the respective columns.
    • The imputed numeric columns are stored in numeric_cols_imputed.
  • Initialize PolynomialFeatures:
    • PolynomialFeatures(degree=2) is initialized to generate polynomial and interaction features up to the second degree.
  • Apply Polynomial Feature Expansion:
    • fit_transform() is applied to the imputed numeric data (numeric_cols_imputed), creating new features such as squares and interactions between the original numeric columns.
  • Output the Expanded Feature Set:
    • The transformed feature set X_poly is printed, which contains both original and new polynomial features.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.