Splitting a Dataset into training and testing sets using Pandas
Pandas: Machine Learning Integration Exercise-9 with Solution
Write a Pandas program that splits Dataset into Training and Testing sets.
This exercise shows how to split a dataset into training and testing sets using Scikit-learn's train_test_split().
Sample Solution :
Code :
import pandas as pd
from sklearn.model_selection import train_test_split
# Load the dataset
df = pd.read_csv('data.csv')
# Split the dataset into features and target
X = df.drop('Target', axis=1)
y = df['Target']
# Split the dataset into training and testing sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Output the size of the training and testing sets
print(f"Training set size: {len(X_train)}")
print(f"Testing set size: {len(X_test)}")
Output:
Training set size: 4 Testing set size: 2
Explanation:
- Loaded the dataset and split it into features (X) and target (y).
- Used train_test_split() to split the dataset into training and testing sets with an 80-20 ratio.
- Displayed the size of the training and testing sets.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics