w3resource

Pandas: Split a given DataFrame into two random subsets


67. Split DataFrame into Two Random Subsets

Write a Pandas program to split a given DataFrame into two random subsets.

Sample Solution :

Python Code :

import pandas as pd
df = pd.DataFrame({
    'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Syed Wharton'],
    'date_of_birth': ['17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
    'age': ['18', '21', '22', '22', '23']
})

df_1 = df.sample(frac = 0.6)
df_2 = df.drop(df_1.index)
print("Original Dataframe and shape:")
print(df)
print(df.shape)
print("\nSubset-1 and shape:")
print(df_1)
print(df_1.shape)
print("\nSubset-2 and shape:")
print(df_2)
print(df_2.shape)

Sample Output:

Original Dataframe and shape:
             name date_of_birth age
0  Alberto Franco    17/05/2002  18
1    Gino Mcneill    16/02/1999  21
2     Ryan Parkes    25/09/1998  22
3    Eesha Hinton    11/05/2002  22
4    Syed Wharton    15/09/1997  23
(5, 3)

Subset-1 and shape:
           name date_of_birth age
1  Gino Mcneill    16/02/1999  21
4  Syed Wharton    15/09/1997  23
2   Ryan Parkes    25/09/1998  22
(3, 3)

Subset-2 and shape:
             name date_of_birth age
0  Alberto Franco    17/05/2002  18
3    Eesha Hinton    11/05/2002  22
(2, 3)

For more Practice: Solve these Related Problems:

  • Write a Pandas program to randomly split a DataFrame into two subsets using a specified ratio and then verify the split sizes.
  • Write a Pandas program to partition a DataFrame into training and testing sets randomly and then reset their indices.
  • Write a Pandas program to randomly divide a DataFrame into two parts and then export each subset to separate CSV files.
  • Write a Pandas program to split a DataFrame into two random groups and then compute the mean of a numeric column in each group.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to select columns by data type of a given DataFrame.
Next: Write a Pandas program to rename all columns with the same pattern of a given DataFrame.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.