Pandas Practice Set-1: Get sample 75% of the diamonds DataFrame's rows without replacement and store the remaining 25% of the rows in another DataFrame
Pandas Practice Set-1: Exercise-63 with Solution
Write a Pandas program to get sample 75% of the diamonds DataFrame's rows without replacement and store the remaining 25% of the rows in another DataFrame.
Sample Solution:
Python Code:
import pandas as pd
diamonds = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv')
print("Original Dataframe:")
print(diamonds.shape)
print("\nSample 75% of diamonds DataFrame's rows without replacement:")
result = diamonds.sample(frac=0.75, random_state=99)
print(result)
print("\nRemaining 25% of the rows:")
print(diamonds.loc[~diamonds.index.isin(result.index), :])
Sample Output:
Original Dataframe: (53940, 10) Sample 75% of diamonds DataFrame's rows without replacement: carat cut color clarity ... price x y z 42653 0.40 Premium G IF ... 1333 4.73 4.77 2.89 4069 0.31 Very Good D SI1 ... 571 4.32 4.34 2.73 27580 1.60 Ideal F VS2 ... 18421 7.49 7.54 4.66 33605 0.31 Good D SI2 ... 462 4.33 4.38 2.75 34415 0.30 Ideal G IF ... 863 4.35 4.38 2.67 46932 0.52 Premium G VS1 ... 1815 5.12 5.11 3.19 52243 0.80 Very Good J VS1 ... 2487 5.91 5.95 3.72 38855 0.40 Ideal G VVS2 ... 1050 4.74 4.72 2.95 38362 0.33 Ideal D VVS2 ... 1021 4.46 4.44 2.72 20258 1.72 Ideal J SI1 ... 8688 7.66 7.62 4.74 37444 0.51 Very Good J VS2 ... 984 5.09 5.12 3.16 32912 0.33 Ideal F VVS2 ... 810 4.42 4.46 2.75 11992 1.14 Ideal H SI1 ... 5146 6.68 6.73 4.13 45683 0.50 Ideal F VS1 ... 1695 5.13 5.17 3.15 17521 1.04 Very Good G VS1 ... 7049 6.50 6.55 4.03 33203 0.35 Ideal G VVS2 ... 820 4.53 4.56 2.81 14551 1.00 Premium D SI1 ... 5880 6.52 6.39 3.96 28766 0.31 Ideal E VS2 ... 680 4.34 4.38 2.70 47568 0.51 Ideal G VVS2 ... 1875 5.15 5.19 3.20 2946 1.00 Premium J SI2 ... 3293 6.32 6.28 3.95 24409 1.32 Fair F VVS1 ... 12648 7.31 7.28 4.23 27707 0.36 Good E SI2 ... 648 4.55 4.52 2.89 39335 0.57 Good I SI1 ... 1072 5.27 5.29 3.34 15654 1.22 Very Good G SI1 ... 6278 6.74 6.77 4.28 166 0.80 Very Good F SI2 ... 2772 6.01 6.03 3.67 3899 0.92 Ideal J VS1 ... 3489 6.24 6.27 3.88 15730 0.95 Ideal F SI1 ... 6291 6.31 6.34 3.90 40014 0.37 Very Good D VVS1 ... 1108 4.57 4.65 2.87 48927 0.55 Ideal G VVS1 ... 2042 5.25 5.27 3.26 27972 0.30 Premium E VS2 ... 658 4.24 4.28 2.65 ... ... ... ... ... ... ... ... ... 41234 0.33 Ideal E VVS1 ... 1207 4.44 4.46 2.74 17183 1.21 Good E SI1 ... 6861 6.65 6.77 4.27 46066 0.70 Very Good F I1 ... 1736 5.57 5.48 3.49 3808 1.25 Good I SI2 ... 3465 6.91 6.82 4.15 [40455 rows x 10 columns] Remaining 25% of the rows: carat cut color clarity ... price x y z 13 0.31 Ideal J SI2 ... 344 4.35 4.37 2.71 14 0.20 Premium E SI2 ... 345 3.79 3.75 2.27 18 0.30 Good J SI1 ... 351 4.23 4.26 2.71 26 0.24 Premium I VS1 ... 355 3.97 3.94 2.47 33 0.23 Very Good E VS1 ... 402 4.01 4.06 2.40 36 0.23 Good E VS1 ... 402 3.83 3.85 2.46 43 0.26 Good D VS1 ... 403 4.19 4.24 2.46 44 0.32 Good H SI2 ... 403 4.34 4.37 2.75 46 0.32 Very Good H SI2 ... 403 4.35 4.42 2.71 50 0.24 Very Good F SI1 ... 404 4.02 4.03 2.45 51 0.23 Ideal G VS1 ... 404 3.93 3.95 2.44 53 0.22 Premium E VS2 ... 404 3.93 3.89 2.41 54 0.22 Premium D VS2 ... 404 3.91 3.88 2.31 60 0.35 Ideal I VS1 ... 552 4.54 4.59 2.78 65 0.28 Ideal G VVS2 ... 553 4.19 4.22 2.58 67 0.31 Very Good G SI1 ... 553 4.33 4.30 2.73 68 0.31 Premium G SI1 ... 553 4.35 4.32 2.68 70 0.24 Very Good D VVS1 ... 553 3.97 4.00 2.45 73 0.30 Premium H SI1 ... 554 4.29 4.25 2.67 76 0.26 Very Good E VVS2 ... 554 4.15 4.23 2.51 80 0.26 Very Good E VVS1 ... 554 4.00 4.04 2.55 83 0.38 Ideal I SI2 ... 554 4.65 4.67 2.87 84 0.26 Good E VVS1 ... 554 4.22 4.25 2.45 89 0.32 Premium I SI1 ... 554 4.35 4.33 2.73 91 0.86 Fair E SI2 ... 2757 6.45 6.33 3.52 92 0.70 Ideal G VS2 ... 2757 5.70 5.67 3.50 108 0.81 Ideal F SI2 ... 2761 6.14 6.11 3.60 109 0.59 Ideal E VVS2 ... 2761 5.38 5.43 3.35 118 0.70 Ideal E VS2 ... 2762 5.73 5.76 3.49 119 0.80 Ideal F SI2 ... 2762 6.01 6.07 3.62 ... ... ... ... ... ... ... ... ... 53828 0.70 Very Good E VS2 ... 2737 5.74 5.70 3.46 53830 0.72 Ideal F SI1 ... 2737 5.78 5.82 3.55 53835 0.70 Premium G VVS2 ... 2737 5.86 5.78 3.47 [13485 rows x 10 columns]
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to get randomly sample rows from diamonds DataFrame.
Next: Write a Pandas program to read the diamonds DataFrame and detect duplicate color.
What is the difficulty level of this exercise?
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://198.211.115.131/python-exercises/pandas/practice-set1/pandas-practice-set1-exercise-63.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics