w3resource

Pandas: Divide a DataFrame in a given ratio


Write a Pandas program to divide a DataFrame in a given ratio.

Sample data:
Original DataFrame:
0 1
0 0.316147 -0.767359
1 -0.813410 -2.522672
2 0.869615 1.194704
3 -0.892915 -0.055133
4 -0.341126 0.518266
5 1.857342 1.361229
6 -0.044353 -1.205002
7 -0.726346 -0.535147
8 -1.350726 0.563117
9 1.051666 -0.441533
70% of the said DataFrame:
0 1
8 -1.350726 0.563117
2 0.869615 1.194704
5 1.857342 1.361229
6 -0.044353 -1.205002
3 -0.892915 -0.055133
1 -0.813410 -2.522672
0 0.316147 -0.767359
30% of the said DataFrame:
0 1
4 -0.341126 0.518266
7 -0.726346 -0.535147
9 1.051666 -0.441533

Sample Solution :

Python Code :

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(10, 2))
print("Original DataFrame:")
print(df)
part_70 = df.sample(frac=0.7,random_state=10)
part_30 = df.drop(part_70.index)
print("\n70% of the said DataFrame:")
print(part_70)
print("\n30% of the said DataFrame:")
print(part_30)

Sample Output:

Original DataFrame:
          0         1
0  0.316147 -0.767359
1 -0.813410 -2.522672
2  0.869615  1.194704
3 -0.892915 -0.055133
4 -0.341126  0.518266
5  1.857342  1.361229
6 -0.044353 -1.205002
7 -0.726346 -0.535147
8 -1.350726  0.563117
9  1.051666 -0.441533

70% of the said DataFrame:
          0         1
8 -1.350726  0.563117
2  0.869615  1.194704
5  1.857342  1.361229
6 -0.044353 -1.205002
3 -0.892915 -0.055133
1 -0.813410 -2.522672
0  0.316147 -0.767359

30% of the said DataFrame:
          0         1
4 -0.341126  0.518266
7 -0.726346 -0.535147
9  1.051666 -0.441533                  

Explanation:

The above code first generates a Pandas DataFrame df with 10 rows and 2 columns filled with random numbers using NumPy.

part_70 = df.sample(frac=0.7,random_state=10): This code creates a new DataFrame 'part_70' by sampling 70% of the rows from 'df' using the sample method. The 'frac' parameter specifies the fraction of the rows to be sampled, while the random_state parameter is used to ensure that the same set of rows is always sampled if the code is run again with the same random_state value.

part_30 = df.drop(part_70.index): This code creates another DataFrame 'part_30' by dropping the rows in ‘part_70’ from ‘df’. This is achieved by calling the drop method on ‘df’ with the indices of the rows to be dropped, which are obtained by calling the index attribute on ‘part_70’. The resulting DataFrame ‘part_30’ contains the remaining 30% of the rows from df.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to reset index in a given DataFrame.
Next: Write a Pandas program to combining two series into a DataFrame.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.