Pandas DataFrame: Shuffle a given DataFrame rows
Write a Pandas program to shuffle a given DataFrame rows.
Sample data:
Original DataFrame:
attempts name qualify score
0 1 Anastasia yes 12.5
1 3 Dima no 9.0
2 2 Katherine yes 16.5
3 3 James no NaN
4 2 Emily no 9.0
5 3 Michael yes 20.0
6 1 Matthew yes 14.5
7 1 Laura no NaN
8 2 Kevin no 8.0
9 1 Jonas yes 19.0
New DataFrame:
attempts name qualify score
5 3 Michael yes 20.0
0 1 Anastasia yes 12.5
9 1 Jonas yes 19.0
6 1 Matthew yes 14.5
7 1 Laura no NaN
1 3 Dima no 9.0
3 3 James no NaN
4 2 Emily no 9.0
8 2 Kevin no 8.0
2 2 Katherine yes 16.5
Sample Solution :
Python Code :
import pandas as pd
import numpy as np
exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
df = pd.DataFrame(exam_data)
print("Original DataFrame:")
print(df)
df = df.sample(frac=1)
print("\nNew DataFrame:")
print(df)
Sample Output:
Original DataFrame: attempts name qualify score 0 1 Anastasia yes 12.5 1 3 Dima no 9.0 2 2 Katherine yes 16.5 3 3 James no NaN 4 2 Emily no 9.0 5 3 Michael yes 20.0 6 1 Matthew yes 14.5 7 1 Laura no NaN 8 2 Kevin no 8.0 9 1 Jonas yes 19.0 New DataFrame: attempts name qualify score 5 3 Michael yes 20.0 0 1 Anastasia yes 12.5 9 1 Jonas yes 19.0 6 1 Matthew yes 14.5 7 1 Laura no NaN 1 3 Dima no 9.0 3 3 James no NaN 4 2 Emily no 9.0 8 2 Kevin no 8.0 2 2 Katherine yes 16.5
Explanation:
The said code first creates a Pandas DataFrame df from the dictionary ‘exam_data’. The DataFrame contains information about students' names, scores, number of attempts and whether they qualify or not.
df = df.sample(frac=1): This code shuffles the rows of the Pandas DataFrame df randomly using the sample method with frac=1, which means to sample all rows. It essentially reorders the rows of the DataFrame randomly.
The original DataFrame is ‘exam_data’. The DataFrame has 4 columns, namely name, score, attempts, and qualify. Each column has 10 elements. The sample method is used to shuffle the rows of this DataFrame in a random order.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to combining two series into a DataFrame.
Next: Write a Pandas program to convert DataFrame column type from string to datetime.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics