Pandas Practice Set-1: Get sample 75% of the diamonds DataFrame's rows without replacement and store the remaining 25% of the rows in another DataFrame
Write a Pandas program to get sample 75% of the diamonds DataFrame's rows without replacement and store the remaining 25% of the rows in another DataFrame.
Sample Solution:
Python Code:
import pandas as pd
diamonds = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv')
print("Original Dataframe:")
print(diamonds.shape)
print("\nSample 75% of diamonds DataFrame's rows without replacement:")
result = diamonds.sample(frac=0.75, random_state=99)
print(result)
print("\nRemaining 25% of the rows:")
print(diamonds.loc[~diamonds.index.isin(result.index), :])
Sample Output:
Original Dataframe: (53940, 10) Sample 75% of diamonds DataFrame's rows without replacement: carat cut color clarity ... price x y z 42653 0.40 Premium G IF ... 1333 4.73 4.77 2.89 4069 0.31 Very Good D SI1 ... 571 4.32 4.34 2.73 27580 1.60 Ideal F VS2 ... 18421 7.49 7.54 4.66 33605 0.31 Good D SI2 ... 462 4.33 4.38 2.75 34415 0.30 Ideal G IF ... 863 4.35 4.38 2.67 46932 0.52 Premium G VS1 ... 1815 5.12 5.11 3.19 52243 0.80 Very Good J VS1 ... 2487 5.91 5.95 3.72 38855 0.40 Ideal G VVS2 ... 1050 4.74 4.72 2.95 38362 0.33 Ideal D VVS2 ... 1021 4.46 4.44 2.72 20258 1.72 Ideal J SI1 ... 8688 7.66 7.62 4.74 37444 0.51 Very Good J VS2 ... 984 5.09 5.12 3.16 32912 0.33 Ideal F VVS2 ... 810 4.42 4.46 2.75 11992 1.14 Ideal H SI1 ... 5146 6.68 6.73 4.13 45683 0.50 Ideal F VS1 ... 1695 5.13 5.17 3.15 17521 1.04 Very Good G VS1 ... 7049 6.50 6.55 4.03 33203 0.35 Ideal G VVS2 ... 820 4.53 4.56 2.81 14551 1.00 Premium D SI1 ... 5880 6.52 6.39 3.96 28766 0.31 Ideal E VS2 ... 680 4.34 4.38 2.70 47568 0.51 Ideal G VVS2 ... 1875 5.15 5.19 3.20 2946 1.00 Premium J SI2 ... 3293 6.32 6.28 3.95 24409 1.32 Fair F VVS1 ... 12648 7.31 7.28 4.23 27707 0.36 Good E SI2 ... 648 4.55 4.52 2.89 39335 0.57 Good I SI1 ... 1072 5.27 5.29 3.34 15654 1.22 Very Good G SI1 ... 6278 6.74 6.77 4.28 166 0.80 Very Good F SI2 ... 2772 6.01 6.03 3.67 3899 0.92 Ideal J VS1 ... 3489 6.24 6.27 3.88 15730 0.95 Ideal F SI1 ... 6291 6.31 6.34 3.90 40014 0.37 Very Good D VVS1 ... 1108 4.57 4.65 2.87 48927 0.55 Ideal G VVS1 ... 2042 5.25 5.27 3.26 27972 0.30 Premium E VS2 ... 658 4.24 4.28 2.65 ... ... ... ... ... ... ... ... ... 41234 0.33 Ideal E VVS1 ... 1207 4.44 4.46 2.74 17183 1.21 Good E SI1 ... 6861 6.65 6.77 4.27 46066 0.70 Very Good F I1 ... 1736 5.57 5.48 3.49 3808 1.25 Good I SI2 ... 3465 6.91 6.82 4.15 [40455 rows x 10 columns] Remaining 25% of the rows: carat cut color clarity ... price x y z 13 0.31 Ideal J SI2 ... 344 4.35 4.37 2.71 14 0.20 Premium E SI2 ... 345 3.79 3.75 2.27 18 0.30 Good J SI1 ... 351 4.23 4.26 2.71 26 0.24 Premium I VS1 ... 355 3.97 3.94 2.47 33 0.23 Very Good E VS1 ... 402 4.01 4.06 2.40 36 0.23 Good E VS1 ... 402 3.83 3.85 2.46 43 0.26 Good D VS1 ... 403 4.19 4.24 2.46 44 0.32 Good H SI2 ... 403 4.34 4.37 2.75 46 0.32 Very Good H SI2 ... 403 4.35 4.42 2.71 50 0.24 Very Good F SI1 ... 404 4.02 4.03 2.45 51 0.23 Ideal G VS1 ... 404 3.93 3.95 2.44 53 0.22 Premium E VS2 ... 404 3.93 3.89 2.41 54 0.22 Premium D VS2 ... 404 3.91 3.88 2.31 60 0.35 Ideal I VS1 ... 552 4.54 4.59 2.78 65 0.28 Ideal G VVS2 ... 553 4.19 4.22 2.58 67 0.31 Very Good G SI1 ... 553 4.33 4.30 2.73 68 0.31 Premium G SI1 ... 553 4.35 4.32 2.68 70 0.24 Very Good D VVS1 ... 553 3.97 4.00 2.45 73 0.30 Premium H SI1 ... 554 4.29 4.25 2.67 76 0.26 Very Good E VVS2 ... 554 4.15 4.23 2.51 80 0.26 Very Good E VVS1 ... 554 4.00 4.04 2.55 83 0.38 Ideal I SI2 ... 554 4.65 4.67 2.87 84 0.26 Good E VVS1 ... 554 4.22 4.25 2.45 89 0.32 Premium I SI1 ... 554 4.35 4.33 2.73 91 0.86 Fair E SI2 ... 2757 6.45 6.33 3.52 92 0.70 Ideal G VS2 ... 2757 5.70 5.67 3.50 108 0.81 Ideal F SI2 ... 2761 6.14 6.11 3.60 109 0.59 Ideal E VVS2 ... 2761 5.38 5.43 3.35 118 0.70 Ideal E VS2 ... 2762 5.73 5.76 3.49 119 0.80 Ideal F SI2 ... 2762 6.01 6.07 3.62 ... ... ... ... ... ... ... ... ... 53828 0.70 Very Good E VS2 ... 2737 5.74 5.70 3.46 53830 0.72 Ideal F SI1 ... 2737 5.78 5.82 3.55 53835 0.70 Premium G VVS2 ... 2737 5.86 5.78 3.47 [13485 rows x 10 columns]
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to get randomly sample rows from diamonds DataFrame.
Next: Write a Pandas program to read the diamonds DataFrame and detect duplicate color.
What is the difficulty level of this exercise?
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics