w3resource

Pandas Practice Set-1: Get sample 75% of the diamonds DataFrame's rows without replacement and store the remaining 25% of the rows in another DataFrame


63. Sample 75% of Rows Without Replacement and Store 25% Separately

Write a Pandas program to get sample 75% of the diamonds DataFrame's rows without replacement and store the remaining 25% of the rows in another DataFrame.

Sample Solution:

Python Code:

import pandas as pd
diamonds = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv')
print("Original Dataframe:")
print(diamonds.shape)
print("\nSample 75% of diamonds DataFrame's rows without replacement:")
result = diamonds.sample(frac=0.75, random_state=99)
print(result)
print("\nRemaining 25% of the rows:")
print(diamonds.loc[~diamonds.index.isin(result.index), :])

Sample Output:

Original Dataframe:
(53940, 10)

Sample 75% of diamonds DataFrame's rows without replacement:
       carat        cut color clarity  ...   price     x     y     z
42653   0.40    Premium     G      IF  ...    1333  4.73  4.77  2.89
4069    0.31  Very Good     D     SI1  ...     571  4.32  4.34  2.73
27580   1.60      Ideal     F     VS2  ...   18421  7.49  7.54  4.66
33605   0.31       Good     D     SI2  ...     462  4.33  4.38  2.75
34415   0.30      Ideal     G      IF  ...     863  4.35  4.38  2.67
46932   0.52    Premium     G     VS1  ...    1815  5.12  5.11  3.19
52243   0.80  Very Good     J     VS1  ...    2487  5.91  5.95  3.72
38855   0.40      Ideal     G    VVS2  ...    1050  4.74  4.72  2.95
38362   0.33      Ideal     D    VVS2  ...    1021  4.46  4.44  2.72
20258   1.72      Ideal     J     SI1  ...    8688  7.66  7.62  4.74
37444   0.51  Very Good     J     VS2  ...     984  5.09  5.12  3.16
32912   0.33      Ideal     F    VVS2  ...     810  4.42  4.46  2.75
11992   1.14      Ideal     H     SI1  ...    5146  6.68  6.73  4.13
45683   0.50      Ideal     F     VS1  ...    1695  5.13  5.17  3.15
17521   1.04  Very Good     G     VS1  ...    7049  6.50  6.55  4.03
33203   0.35      Ideal     G    VVS2  ...     820  4.53  4.56  2.81
14551   1.00    Premium     D     SI1  ...    5880  6.52  6.39  3.96
28766   0.31      Ideal     E     VS2  ...     680  4.34  4.38  2.70
47568   0.51      Ideal     G    VVS2  ...    1875  5.15  5.19  3.20
2946    1.00    Premium     J     SI2  ...    3293  6.32  6.28  3.95
24409   1.32       Fair     F    VVS1  ...   12648  7.31  7.28  4.23
27707   0.36       Good     E     SI2  ...     648  4.55  4.52  2.89
39335   0.57       Good     I     SI1  ...    1072  5.27  5.29  3.34
15654   1.22  Very Good     G     SI1  ...    6278  6.74  6.77  4.28
166     0.80  Very Good     F     SI2  ...    2772  6.01  6.03  3.67
3899    0.92      Ideal     J     VS1  ...    3489  6.24  6.27  3.88
15730   0.95      Ideal     F     SI1  ...    6291  6.31  6.34  3.90
40014   0.37  Very Good     D    VVS1  ...    1108  4.57  4.65  2.87
48927   0.55      Ideal     G    VVS1  ...    2042  5.25  5.27  3.26
27972   0.30    Premium     E     VS2  ...     658  4.24  4.28  2.65
     ...        ...   ...     ...  ...     ...   ...   ...   ...
41234   0.33      Ideal     E    VVS1  ...    1207  4.44  4.46  2.74
17183   1.21       Good     E     SI1  ...    6861  6.65  6.77  4.27
46066   0.70  Very Good     F      I1  ...    1736  5.57  5.48  3.49
3808    1.25       Good     I     SI2  ...    3465  6.91  6.82  4.15
[40455 rows x 10 columns]

Remaining 25% of the rows:
       carat        cut color clarity  ...   price     x     y     z
13      0.31      Ideal     J     SI2  ...     344  4.35  4.37  2.71
14      0.20    Premium     E     SI2  ...     345  3.79  3.75  2.27
18      0.30       Good     J     SI1  ...     351  4.23  4.26  2.71
26      0.24    Premium     I     VS1  ...     355  3.97  3.94  2.47
33      0.23  Very Good     E     VS1  ...     402  4.01  4.06  2.40
36      0.23       Good     E     VS1  ...     402  3.83  3.85  2.46
43      0.26       Good     D     VS1  ...     403  4.19  4.24  2.46
44      0.32       Good     H     SI2  ...     403  4.34  4.37  2.75
46      0.32  Very Good     H     SI2  ...     403  4.35  4.42  2.71
50      0.24  Very Good     F     SI1  ...     404  4.02  4.03  2.45
51      0.23      Ideal     G     VS1  ...     404  3.93  3.95  2.44
53      0.22    Premium     E     VS2  ...     404  3.93  3.89  2.41
54      0.22    Premium     D     VS2  ...     404  3.91  3.88  2.31
60      0.35      Ideal     I     VS1  ...     552  4.54  4.59  2.78
65      0.28      Ideal     G    VVS2  ...     553  4.19  4.22  2.58
67      0.31  Very Good     G     SI1  ...     553  4.33  4.30  2.73
68      0.31    Premium     G     SI1  ...     553  4.35  4.32  2.68
70      0.24  Very Good     D    VVS1  ...     553  3.97  4.00  2.45
73      0.30    Premium     H     SI1  ...     554  4.29  4.25  2.67
76      0.26  Very Good     E    VVS2  ...     554  4.15  4.23  2.51
80      0.26  Very Good     E    VVS1  ...     554  4.00  4.04  2.55
83      0.38      Ideal     I     SI2  ...     554  4.65  4.67  2.87
84      0.26       Good     E    VVS1  ...     554  4.22  4.25  2.45
89      0.32    Premium     I     SI1  ...     554  4.35  4.33  2.73
91      0.86       Fair     E     SI2  ...    2757  6.45  6.33  3.52
92      0.70      Ideal     G     VS2  ...    2757  5.70  5.67  3.50
108     0.81      Ideal     F     SI2  ...    2761  6.14  6.11  3.60
109     0.59      Ideal     E    VVS2  ...    2761  5.38  5.43  3.35
118     0.70      Ideal     E     VS2  ...    2762  5.73  5.76  3.49
119     0.80      Ideal     F     SI2  ...    2762  6.01  6.07  3.62
     ...        ...   ...     ...  ...     ...   ...   ...   ...
53828   0.70  Very Good     E     VS2  ...    2737  5.74  5.70  3.46
53830   0.72      Ideal     F     SI1  ...    2737  5.78  5.82  3.55
53835   0.70    Premium     G    VVS2  ...    2737  5.86  5.78  3.47


[13485 rows x 10 columns]

For more Practice: Solve these Related Problems:

  • Write a Pandas program to randomly split the diamonds DataFrame into two parts: 75% for training and 25% for testing.
  • Write a Pandas program to partition the diamonds dataset into two DataFrames using sample() with a given fraction and its complement.
  • Write a Pandas program to randomly select 75% of the rows from the diamonds DataFrame and assign the remaining 25% to a separate DataFrame.
  • Write a Pandas program to perform a train-test split on the diamonds DataFrame without using scikit-learn and display the sizes of both splits.

Go to:


Previous: Write a Pandas program to get randomly sample rows from diamonds DataFrame.
Next: Write a Pandas program to read the diamonds DataFrame and detect duplicate color.

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?



Follow us on Facebook and Twitter for latest update.