Pandas: Split a given dataset, group by one column and remove those groups if all the values of a specific columns are not available
Write a Pandas program to split a given dataset, group by one column and remove those groups if all the values of a specific columns are not available.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4
Sample Solution:
Python Code :
import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'school_code': ['s001','s002','s003','s001','s002','s004'],
'class': ['V', 'V', 'VI', 'VI', 'V', 'VI'],
'name': ['Alberto Franco','Gino Mcneill','Ryan Parkes', 'Eesha Hinton', 'Gino Mcneill', 'David Parkes'],
'date_Of_Birth ': ['15/05/2002','17/05/2002','16/02/1999','25/09/1998','11/05/2002','15/09/1997'],
'age': [12, 12, 13, 13, 14, 12],
'weight': [173, 192, 186, 167, 151, 159],
'height': [35, None, 33, 30, None, 32]},
index=['S1', 'S2', 'S3', 'S4', 'S5', 'S6'])
print("Original DataFrame:")
print(df)
print("\nGroup by one column and remove those groups if all the values of a specific columns are not available:")
result = df[(~df['height'].isna()).groupby(df['school_code']).transform('any')]
print(result)
Sample Output:
Original DataFrame: school_code class name date_Of_Birth age weight height S1 s001 V Alberto Franco 15/05/2002 12 173 35.0 S2 s002 V Gino Mcneill 17/05/2002 12 192 NaN S3 s003 VI Ryan Parkes 16/02/1999 13 186 33.0 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30.0 S5 s002 V Gino Mcneill 11/05/2002 14 151 NaN S6 s004 VI David Parkes 15/09/1997 12 159 32.0 Group by one column and remove those groups if all the values of a specific columns are not available: school_code class name date_Of_Birth age weight height S1 s001 V Alberto Franco 15/05/2002 12 173 35.0 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33.0 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30.0 S6 s004 VI David Parkes 15/09/1997 12 159 32.0
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe.
Next: Write a Pandas program to split a given dataset using group by on specified column into two labels and ranges.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics