w3resource

Splitting a Dataset into training and testing sets using Pandas


Pandas: Machine Learning Integration Exercise-9 with Solution


Write a Pandas program that splits Dataset into Training and Testing sets.

This exercise shows how to split a dataset into training and testing sets using Scikit-learn's train_test_split().

Sample Solution :

Code :

import pandas as pd
from sklearn.model_selection import train_test_split

# Load the dataset
df = pd.read_csv('data.csv')

# Split the dataset into features and target
X = df.drop('Target', axis=1)
y = df['Target']

# Split the dataset into training and testing sets (80-20 split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Output the size of the training and testing sets
print(f"Training set size: {len(X_train)}")
print(f"Testing set size: {len(X_test)}")

Output:

Training set size: 4
Testing set size: 2

Explanation:

  • Loaded the dataset and split it into features (X) and target (y).
  • Used train_test_split() to split the dataset into training and testing sets with an 80-20 ratio.
  • Displayed the size of the training and testing sets.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.