Feature selection using variance threshold in Pandas
Pandas: Machine Learning Integration Exercise-12 with Solution
Write a Pandas program to select feature selection using variance threshold.
This exercise demonstrates how to select features based on their variance using Scikit-learn's VarianceThreshold.
Sample Solution :
Code :
import pandas as pd
from sklearn.feature_selection import VarianceThreshold
# Load the dataset
df = pd.read_csv('data.csv')
# Select only the numeric columns for feature selection
numeric_cols = df.select_dtypes(include=[float, int])
# Initialize the VarianceThreshold with a threshold of 0.1
selector = VarianceThreshold(threshold=0.1)
# Apply feature selection based on variance
X_selected = selector.fit_transform(numeric_cols)
# Output the selected features
print(X_selected)
Output:
[[1.0e+00 2.5e+01 5.0e+04 0.0e+00] [2.0e+00 3.0e+01 6.0e+04 1.0e+00] [3.0e+00 2.2e+01 7.0e+04 0.0e+00] [4.0e+00 3.5e+01 8.0e+04 1.0e+00] [5.0e+00 nan 5.5e+04 0.0e+00] [6.0e+00 2.9e+01 nan 1.0e+00]]
Explanation:
- Import Libraries:
- pandas is imported for handling data in DataFrame format.
- VarianceThreshold from Scikit-learn is imported for performing feature selection based on variance.
- Load the Dataset:
- The dataset data.csv is loaded using pd.read_csv() and stored in the DataFrame df.
- Select Numeric Columns:
- select_dtypes(include=[float, int]) is used to select only the numeric columns from the dataset (e.g., Age, Salary) and exclude non-numeric columns like Name and Gender.
- Initialize VarianceThreshold:
- VarianceThreshold is initialized with a threshold of 0.1. Features with variance lower than this threshold will be removed.
- Apply VarianceThreshold:
- fit_transform() is applied to the numeric columns to perform feature selection, keeping only the features that have a variance greater than 0.1.
- Output the Selected Features:
- The resulting selected features are printed after the variance-based filtering.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics