Feature selection using variance threshold in Pandas
12. Selecting Features Using Variance Threshold
Write a Pandas program to select feature selection using variance threshold.
This exercise demonstrates how to select features based on their variance using Scikit-learn's VarianceThreshold.
Sample Solution :
Code :
Output:
[[1.0e+00 2.5e+01 5.0e+04 0.0e+00] [2.0e+00 3.0e+01 6.0e+04 1.0e+00] [3.0e+00 2.2e+01 7.0e+04 0.0e+00] [4.0e+00 3.5e+01 8.0e+04 1.0e+00] [5.0e+00 nan 5.5e+04 0.0e+00] [6.0e+00 2.9e+01 nan 1.0e+00]]
Explanation:
- Import Libraries:
- pandas is imported for handling data in DataFrame format.
- VarianceThreshold from Scikit-learn is imported for performing feature selection based on variance.
- Load the Dataset:
- The dataset data.csv is loaded using pd.read_csv() and stored in the DataFrame df.
- Select Numeric Columns:
- select_dtypes(include=[float, int]) is used to select only the numeric columns from the dataset (e.g., Age, Salary) and exclude non-numeric columns like Name and Gender.
- Initialize VarianceThreshold:
- VarianceThreshold is initialized with a threshold of 0.1. Features with variance lower than this threshold will be removed.
- Apply VarianceThreshold:
- fit_transform() is applied to the numeric columns to perform feature selection, keeping only the features that have a variance greater than 0.1.
- Output the Selected Features:
- The resulting selected features are printed after the variance-based filtering.
For more Practice: Solve these Related Problems:
- Write a Pandas program to perform feature selection by removing columns with variance below a specified threshold.
- Write a Pandas program to calculate variance for each feature and drop those with low variability.
- Write a Pandas program to apply a variance threshold and then compare the shape of the DataFrame before and after feature selection.
- Write a Pandas program to automate variance thresholding and output the names of features that were dropped.
Go to:
Previous: Imputing Missing Values Using K-Nearest Neighbours.
Next: Handling Class Imbalance Using Random Oversampling.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.