Pandas Grouping and Aggregating: Split-Apply-Combine Exercises, Practice, Solution
[An editor is available at the bottom of the page to write and execute the scripts. Go to the editor]
Pandas Grouping and Aggregating [ 32 exercises with solution]
1. Write a Pandas program to split the following dataframe into groups based on school code. Also check the type of GroupBy object.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
2. Write a Pandas program to split the following dataframe by school code and get mean, min, and max value of age for each school.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
3. Write a Pandas program to split the following given dataframe into groups based on school code and class.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
4. Write a Pandas program to split the following given dataframe into groups based on school code and cast grouping as a list.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
5. Write a Pandas program to split the following given dataframe into groups based on single column and multiple columns. Find the size of the grouped data.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
6. Write a Pandas program to split the following given dataframe into groups based on school code and call a specific group with the name of the group.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
7. Write a Pandas program to split a dataset, group by one column and get mean, min, and max values by group. Using the following dataset find the mean, min, and max values of purchase amount (purch_amt) group by customer id (customer_id).
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 2012-10-05 3005 5002 1 70009 270.65 2012-09-10 3001 5005 2 70002 65.26 2012-10-05 3002 5001 3 70004 110.50 2012-08-17 3009 5003 4 70007 948.50 2012-09-10 3005 5002 5 70005 2400.60 2012-07-27 3007 5001 6 70008 5760.00 2012-09-10 3002 5001 7 70010 1983.43 2012-10-10 3004 5006 8 70003 2480.40 2012-10-10 3009 5003 9 70012 250.45 2012-06-27 3008 5002 10 70011 75.29 2012-08-17 3003 5007 11 70013 3045.60 2012-04-25 3002 5001Click me to see the sample solution
8. Write a Pandas program to split a dataset to group by two columns and count by each row.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 2012-10-05 3005 5002 1 70009 270.65 2012-09-10 3001 5005 2 70002 65.26 2012-10-05 3002 5001 3 70004 110.50 2012-08-17 3009 5003 4 70007 948.50 2012-09-10 3005 5002 5 70005 2400.60 2012-07-27 3007 5001 6 70008 5760.00 2012-09-10 3002 5001 7 70010 1983.43 2012-10-10 3004 5006 8 70003 2480.40 2012-10-10 3009 5003 9 70012 250.45 2012-06-27 3008 5002 10 70011 75.29 2012-08-17 3003 5007 11 70013 3045.60 2012-04-25 3002 5001Click me to see the sample solution
9. Write a Pandas program to split a dataset to group by two columns and then sort the aggregated results within the groups.
In the following dataset group on 'customer_id', 'salesman_id' and then sort sum of purch_amt within the groups.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 2012-10-05 3005 5002 1 70009 270.65 2012-09-10 3001 5005 2 70002 65.26 2012-10-05 3002 5001 3 70004 110.50 2012-08-17 3009 5003 4 70007 948.50 2012-09-10 3005 5002 5 70005 2400.60 2012-07-27 3007 5001 6 70008 5760.00 2012-09-10 3002 5001 7 70010 1983.43 2012-10-10 3004 5006 8 70003 2480.40 2012-10-10 3009 5003 9 70012 250.45 2012-06-27 3008 5002 10 70011 75.29 2012-08-17 3003 5007 11 70013 3045.60 2012-04-25 3002 5001Click me to see the sample solution
10. Write a Pandas program to split the following dataframe into groups based on customer id and create a list of order date for each group.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 2012-10-05 3005 5002 1 70009 270.65 2012-09-10 3001 5005 2 70002 65.26 2012-10-05 3002 5001 3 70004 110.50 2012-08-17 3009 5003 4 70007 948.50 2012-09-10 3005 5002 5 70005 2400.60 2012-07-27 3007 5001 6 70008 5760.00 2012-09-10 3002 5001 7 70010 1983.43 2012-10-10 3004 5006 8 70003 2480.40 2012-10-10 3009 5003 9 70012 250.45 2012-06-27 3008 5002 10 70011 75.29 2012-08-17 3003 5007 11 70013 3045.60 2012-04-25 3002 5001Click me to see the sample solution
11. Write a Pandas program to split the following dataframe into groups and calculate monthly purchase amount.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 05-10-2012 3001 5002 1 70009 270.65 09-10-2012 3001 5005 2 70002 65.26 05-10-2012 3005 5001 3 70004 110.50 08-17-2012 3001 5003 4 70007 948.50 10-09-2012 3005 5002 5 70005 2400.60 07-27-2012 3001 5001 6 70008 5760.00 10-09-2012 3005 5001 7 70010 1983.43 10-10-2012 3001 5006 8 70003 2480.40 10-10-2012 3005 5003 9 70012 250.45 06-17-2012 3001 5002 10 70011 75.29 07-08-2012 3005 5007 11 70013 3045.60 04-25-2012 3005 5001Click me to see the sample solution
12. Write a Pandas program to split the following dataframe into groups, group by month and year based on order date and find the total purchase amount year wise, month wise.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 05-10-2012 3001 5002 1 70009 270.65 09-10-2012 3001 5005 2 70002 65.26 05-10-2012 3005 5001 3 70004 110.50 08-17-2012 3001 5003 4 70007 948.50 10-09-2012 3005 5002 5 70005 2400.60 07-27-2012 3001 5001 6 70008 5760.00 10-09-2012 3005 5001 7 70010 1983.43 10-10-2012 3001 5006 8 70003 2480.40 10-10-2012 3005 5003 9 70012 250.45 06-17-2012 3001 5002 10 70011 75.29 07-08-2012 3005 5007 11 70013 3045.60 04-25-2012 3005 5001Click me to see the sample solution
13. Write a Pandas program to split the following dataframe into groups based on first column and set other column values into a list of values.
Test Data:
X Y Z 0 10 10 22 1 10 15 20 2 10 11 18 3 20 20 20 4 30 21 13 5 30 12 10 6 10 14 0Click me to see the sample solution
14. Write a Pandas program to split the following dataframe into groups based on all columns and calculate Groupby value counts on the dataframe.
Test Data:
id type book 0 1 10 Math 1 2 15 English 2 1 11 Physics 3 1 20 Math 4 2 21 English 5 1 12 Physics 6 2 14 EnglishOutput:
book English Math Physics id type 1 10 0 1 0 11 0 0 1 12 0 0 1 20 0 1 0 2 14 1 0 0 15 1 0 0 21 1 0 0Click me to see the sample solution
15. Write a Pandas program to split the following dataframe into groups and count unique values of 'value' column.
Test Data:
id value 0 1 a 1 1 a 2 2 b 3 3 None 4 3 a 5 4 a 6 4 None 7 4 bOutput:
value
a 3
b 2
Click me to see the sample solution
16. Write a Pandas program to split a given dataframe into groups and list all the keys from the GroupBy object.
Test Data:
school_code class name date_Of_Birth age height weight S1 s001 V Alberto Franco 15/05/2002 12 173 35 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 S6 s004 VI David Parkes 15/09/1997 12 159 32Click me to see the sample solution
17. Write a Pandas program to split a given dataframe into groups and create a new column with count from GroupBy.
Test Data:
book_name book_type book_id 0 Book1 Math 1 1 Book2 Physics 2 2 Book3 Computer 3 3 Book4 Science 4 4 Book1 Math 1 5 Book2 Physics 2 6 Book3 Computer 3 7 Book5 English 5Click me to see the sample solution
18. Write a Pandas program to split a given dataframe into groups with bin counts.
Test Data:
ord_no purch_amt customer_id sales_id 0 70001 150.50 3005 5002 1 70009 270.65 3001 5003 2 70002 65.26 3002 5004 3 70004 110.50 3009 5003 4 70007 948.50 3005 5002 5 70005 2400.60 3007 5001 6 70008 5760.00 3002 5005 7 70010 1983.43 3004 5007 8 70003 2480.40 3009 5008 9 70012 250.45 3008 5004 10 70011 75.29 3003 5005 11 70013 3045.60 3002 5001Click me to see the sample solution
19. Write a Pandas program to split a given dataframe into groups with multiple aggregations.
Split the following given dataframe by school code, class and get mean, min, and max value of height and age for each value of the school.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
20. Write a Pandas program to split a given dataframe into groups and display target column as a list of unique values.
Test Data:
id type book 0 A 1 Math 1 A 1 Math 2 A 1 English 3 A 1 Physics 4 A 2 Math 5 A 2 English 6 B 1 Physics 7 B 1 English 8 B 1 Physics 9 B 2 English 10 B 2 EnglishOutput:
List all unique values in a group: id type book 0 A 1 Math,English,Physics 1 A 2 Math,English 2 B 1 Physics,English 3 B 2 EnglishClick me to see the sample solution
21. Write a Pandas program to split the following dataframe into groups and calculate quarterly purchase amount.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 05-10-2012 3001 5002 1 70009 270.65 09-10-2012 3001 5005 2 70002 65.26 05-10-2012 3005 5001 3 70004 110.50 08-17-2012 3001 5003 4 70007 948.50 10-09-2012 3005 5002 5 70005 2400.60 07-27-2012 3001 5001 6 70008 5760.00 10-09-2012 3005 5001 7 70010 1983.43 10-10-2012 3001 5006 8 70003 2480.40 10-10-2012 3005 5003 9 70012 250.45 06-17-2012 3001 5002 10 70011 75.29 07-08-2012 3005 5007 11 70013 3045.60 04-25-2012 3005 5001Click me to see the sample solution
22. Write a Pandas program to split the following dataframe into groups by school code and get mean, min, and max value of age with customized column name for each school.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
23. Write a Pandas program to split the following datasets into groups on customer id and calculate the number of customers starting with 'C', the list of all products and the difference of maximum purchase amount and minimum purchase amount.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 05-10-2012 C3001 5002 1 70009 270.65 09-10-2012 C3001 5005 2 70002 65.26 05-10-2012 D3005 5001 3 70004 110.50 08-17-2012 D3001 5003 4 70007 948.50 10-09-2012 C3005 5002 5 70005 2400.60 07-27-2012 D3001 5001 6 70008 5760.00 10-09-2012 C3005 5001 7 70010 1983.43 10-10-2012 D3001 5006 8 70003 2480.40 10-10-2012 D3005 5003 9 70012 250.45 06-17-2012 C3001 5002 10 70011 75.29 07-08-2012 D3005 5007 11 70013 3045.60 04-25-2012 D3005 5001Click me to see the sample solution
24. Write a Pandas program to split the following datasets into groups on customer_id to summarize purch_amt and calculate percentage of purch_amt in each group.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 05-10-2012 3001 5002 1 70009 270.65 09-10-2012 3001 5005 2 70002 65.26 05-10-2012 3005 5001 3 70004 110.50 08-17-2012 3001 5003 4 70007 948.50 10-09-2012 3005 5002 5 70005 2400.60 07-27-2012 3001 5001 6 70008 5760.00 10-09-2012 3005 5001 7 70010 1983.43 10-10-2012 3001 5006 8 70003 2480.40 10-10-2012 3005 5003 9 70012 250.45 06-17-2012 3001 5002 10 70011 75.29 07-08-2012 3005 5007 11 70013 3045.60 04-25-2012 3005 5001Click me to see the sample solution
25. Write a Pandas program to split a dataset, group by one column and get mean, min, and max values by group, also change the column name of the aggregated metric. Using the following dataset find the mean, min, and max values of purchase amount (purch_amt) group by customer id (customer_id).
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
26. Write a Pandas program to split a given dataset, group by two columns and convert other columns of the dataframe into a dictionary with column header as key.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
27. Write a Pandas program to split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe.
Test Data:
salesman_id sale_jan sale_feb sale_mar sale_apr sale_may sale_jun \ 0 5002 150.50 250.50 150.50 150.50 130.50 150.50 1 5005 270.65 170.65 270.65 270.65 270.65 270.65 2 5001 65.26 15.26 65.26 95.26 65.26 45.26 3 5003 110.50 110.50 110.50 210.50 310.50 110.50 4 5002 948.50 598.50 948.50 948.50 948.50 948.50 5 5001 2400.60 1400.60 2400.60 2400.60 2400.60 3400.60 6 5001 1760.00 2760.00 5760.00 760.00 760.00 5760.00 7 5006 2983.43 1983.43 1983.43 1983.43 1983.43 983.43 8 5003 480.40 2480.40 2480.40 2480.40 2480.40 2480.40 9 5002 1250.45 250.45 250.45 250.45 250.45 250.45 10 5007 75.29 75.29 75.29 75.29 75.29 75.29 11 5001 1045.60 3045.60 3045.60 3045.60 3045.60 3045.60
sale_jul sale_aug sale_sep sale_oct sale_nov sale_dec 0 950.50 150.50 150.50 150.50 150.50 150.50 1 270.65 70.65 270.65 270.65 270.65 70.65 2 65.26 65.26 65.26 65.26 95.26 65.26 3 210.50 110.50 110.50 110.50 110.50 110.50 4 948.50 948.50 948.50 948.50 948.50 948.50 5 2400.60 400.60 200.60 2400.60 2400.60 2400.60 6 5760.00 5760.00 5760.00 5760.00 5760.00 5760.00 7 983.43 1983.43 1983.43 1983.43 1983.43 1983.43 8 2480.40 2480.40 2480.40 2480.40 2480.40 2480.40 9 250.45 250.45 250.45 250.45 250.45 250.45 10 75.29 75.29 75.29 75.29 75.29 75.29 11 3045.60 3045.60 3045.60 3045.60 3045.60 3045.60Click me to see the sample solution
28. Write a Pandas program to split a given dataset, group by one column and remove those groups if all the values of a specific columns are not available.
Test Data:
school class name date_Of_Birth age height weight address S1 s001 V Alberto Franco 15/05/2002 12 173 35 street1 S2 s002 V Gino Mcneill 17/05/2002 12 192 32 street2 S3 s003 VI Ryan Parkes 16/02/1999 13 186 33 street3 S4 s001 VI Eesha Hinton 25/09/1998 13 167 30 street1 S5 s002 V Gino Mcneill 11/05/2002 14 151 31 street2 S6 s004 VI David Parkes 15/09/1997 12 159 32 street4Click me to see the sample solution
29. Write a Pandas program to split a given dataset using group by on specified column into two labels and ranges.
Split the group on 'salesman_id',
Ranges:
1) (5001...5006)
2) (5007..5012)
Test Data:
salesman_id sale_jan 0 5001 150.50 1 5002 270.65 2 5003 65.26 3 5004 110.50 4 5005 948.50 5 5006 2400.60 6 5007 1760.00 7 5008 2983.43 8 5009 480.40 9 5010 1250.45 10 5011 75.29 11 5012 1045.60Click me to see the sample solution
30. Write a Pandas program to split the following dataset using group by on first column and aggregate over multiple lists on second column.
Test Data:
student_id marks 0 S001 [88, 89, 90] 1 S001 [78, 81, 60] 2 S002 [84, 83, 91] 3 S002 [84, 88, 91] 4 S003 [90, 89, 92] 5 S003 [88, 59, 90]Output:
student_id
S001 [83.0, 85.0, 75.0]
S002 [84.0, 85.5, 91.0]
S003 [89.0, 74.0, 91.0]
Click me to see the sample solution
31. Write a Pandas program to split the following dataset using group by on 'salesman_id' and find the first order date for each group.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 2012-10-05 3005 5002 1 70009 270.65 2012-09-10 3001 5005 2 70002 65.26 2012-10-05 3002 5001 3 70004 110.50 2012-08-17 3009 5003 4 70007 948.50 2012-09-10 3005 5002 5 70005 2400.60 2012-07-27 3007 5001 6 70008 5760.00 2012-09-10 3002 5001 7 70010 1983.43 2012-10-10 3004 5004 8 70003 2480.40 2012-10-10 3009 5003 9 70012 250.45 2012-06-27 3008 5002 10 70011 75.29 2012-08-17 3003 5004 11 70013 3045.60 2012-04-25 3002 5001Click me to see the sample solution
32. Write a Pandas program to split a given dataset using group by on multiple columns and drop last n rows of from each group.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id 0 70001 150.50 2012-10-05 3002 5002 1 70009 270.65 2012-09-10 3001 5003 2 70002 65.26 2012-10-05 3001 5001 3 70004 110.50 2012-08-17 3003 5003 4 70007 948.50 2012-09-10 3002 5002 5 70005 2400.60 2012-07-27 3002 5001 6 70008 5760.00 2012-09-10 3001 5001 7 70010 1983.43 2012-10-10 3004 5003 8 70003 2480.40 2012-10-10 3003 5003 9 70012 250.45 2012-06-27 3002 5002 10 70011 75.29 2012-08-17 3003 5003 11 70013 3045.60 2012-04-25 3001 5001Click me to see the sample solution
Python Code Editor:
More to Come !
Do not submit any solution of the above exercises at here, if you want to contribute go to the appropriate exercise page.
Test your Python skills with w3resource's quiz
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics