w3resource

Pandas: Split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe


Write a Pandas program to split a given dataset, group by one column and apply an aggregate function to few columns and another aggregate function to the rest of the columns of the dataframe.

Test Data:

    salesman_id  sale_jan  sale_feb  sale_mar  sale_apr  sale_may  sale_jun  \
0          5002    150.50    250.50    150.50    150.50    130.50    150.50   
1          5005    270.65    170.65    270.65    270.65    270.65    270.65   
2          5001     65.26     15.26     65.26     95.26     65.26     45.26   
3          5003    110.50    110.50    110.50    210.50    310.50    110.50   
4          5002    948.50    598.50    948.50    948.50    948.50    948.50   
5          5001   2400.60   1400.60   2400.60   2400.60   2400.60   3400.60   
6          5001   1760.00   2760.00   5760.00    760.00    760.00   5760.00   
7          5006   2983.43   1983.43   1983.43   1983.43   1983.43    983.43   
8          5003    480.40   2480.40   2480.40   2480.40   2480.40   2480.40   
9          5002   1250.45    250.45    250.45    250.45    250.45    250.45   
10         5007     75.29     75.29     75.29     75.29     75.29     75.29   
11         5001   1045.60   3045.60   3045.60   3045.60   3045.60   3045.60   
    sale_jul  sale_aug  sale_sep  sale_oct  sale_nov  sale_dec  
0     950.50    150.50    150.50    150.50    150.50    150.50  
1     270.65     70.65    270.65    270.65    270.65     70.65  
2      65.26     65.26     65.26     65.26     95.26     65.26  
3     210.50    110.50    110.50    110.50    110.50    110.50  
4     948.50    948.50    948.50    948.50    948.50    948.50  
5    2400.60    400.60    200.60   2400.60   2400.60   2400.60  
6    5760.00   5760.00   5760.00   5760.00   5760.00   5760.00  
7     983.43   1983.43   1983.43   1983.43   1983.43   1983.43  
8    2480.40   2480.40   2480.40   2480.40   2480.40   2480.40  
9     250.45    250.45    250.45    250.45    250.45    250.45  
10     75.29     75.29     75.29     75.29     75.29     75.29  
11   3045.60   3045.60   3045.60   3045.60   3045.60   3045.60  

Sample Solution:

Python Code :

import pandas as pd
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'salesman_id': [5002,5005,5001,5003,5002,5001,5001,5006,5003,5002,5007,5001],
'sale_jan':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 1760, 2983.43, 480.4,  1250.45, 75.29,1045.6],
'sale_feb':[250.5, 170.65, 15.26, 110.5, 598.5, 1400.6, 2760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_mar':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_apr':[150.5, 270.65, 95.26, 210.5, 948.5, 2400.6, 760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_may':[130.5, 270.65, 65.26, 310.5, 948.5, 2400.6, 760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_jun':[150.5, 270.65, 45.26, 110.5, 948.5, 3400.6, 5760, 983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_jul':[950.5, 270.65, 65.26, 210.5, 948.5, 2400.6, 5760, 983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_aug':[150.5, 70.65,  65.26, 110.5, 948.5, 400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_sep':[150.5, 270.65, 65.26, 110.5, 948.5, 200.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_oct':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6],
'sale_nov':[150.5, 270.65, 95.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6], 
'sale_dec':[150.5, 70.65, 65.26, 110.5, 948.5, 2400.6, 5760, 1983.43, 2480.4, 250.45, 75.29, 3045.6]
})
print("Original Orders DataFrame:")
print(df)
print("\Result after group on salesman_id and apply different aggregate functions:")
df = df.groupby('salesman_id').agg(lambda x : x.sum() if x.name in ['sale_jan','sale_feb','sale_mar'] else x.mean())
print(df)

Sample Output:

Original Orders DataFrame:
    salesman_id  sale_jan  sale_feb  sale_mar  sale_apr  sale_may  sale_jun  \
0          5002    150.50    250.50    150.50    150.50    130.50    150.50   
1          5005    270.65    170.65    270.65    270.65    270.65    270.65   
2          5001     65.26     15.26     65.26     95.26     65.26     45.26   
3          5003    110.50    110.50    110.50    210.50    310.50    110.50   
4          5002    948.50    598.50    948.50    948.50    948.50    948.50   
5          5001   2400.60   1400.60   2400.60   2400.60   2400.60   3400.60   
6          5001   1760.00   2760.00   5760.00    760.00    760.00   5760.00   
7          5006   2983.43   1983.43   1983.43   1983.43   1983.43    983.43   
8          5003    480.40   2480.40   2480.40   2480.40   2480.40   2480.40   
9          5002   1250.45    250.45    250.45    250.45    250.45    250.45   
10         5007     75.29     75.29     75.29     75.29     75.29     75.29   
11         5001   1045.60   3045.60   3045.60   3045.60   3045.60   3045.60   

    sale_jul  sale_aug  sale_sep  sale_oct  sale_nov  sale_dec  
0     950.50    150.50    150.50    150.50    150.50    150.50  
1     270.65     70.65    270.65    270.65    270.65     70.65  
2      65.26     65.26     65.26     65.26     95.26     65.26  
3     210.50    110.50    110.50    110.50    110.50    110.50  
4     948.50    948.50    948.50    948.50    948.50    948.50  
5    2400.60    400.60    200.60   2400.60   2400.60   2400.60  
6    5760.00   5760.00   5760.00   5760.00   5760.00   5760.00  
7     983.43   1983.43   1983.43   1983.43   1983.43   1983.43  
8    2480.40   2480.40   2480.40   2480.40   2480.40   2480.40  
9     250.45    250.45    250.45    250.45    250.45    250.45  
10     75.29     75.29     75.29     75.29     75.29     75.29  
11   3045.60   3045.60   3045.60   3045.60   3045.60   3045.60  
\Result after group on salesman_id and apply different aggregate functions:
             sale_jan  sale_feb  sale_mar     sale_apr  sale_may     sale_jun  \
salesman_id                                                                     
5001          5271.46   7221.46  11271.46  1575.365000  1567.865  3062.865000   
5002          2349.45   1099.45   1349.45   449.816667   443.150   449.816667   
5003           590.90   2590.90   2590.90  1345.450000  1395.450  1295.450000   
5005           270.65    170.65    270.65   270.650000   270.650   270.650000   
5006          2983.43   1983.43   1983.43  1983.430000  1983.430   983.430000   
5007            75.29     75.29     75.29    75.290000    75.290    75.290000   

                sale_jul     sale_aug     sale_sep     sale_oct     sale_nov  \
salesman_id                                                                    
5001         2817.865000  2317.865000  2267.865000  2817.865000  2825.365000   
5002          716.483333   449.816667   449.816667   449.816667   449.816667   
5003         1345.450000  1295.450000  1295.450000  1295.450000  1295.450000   
5005          270.650000    70.650000   270.650000   270.650000   270.650000   
5006          983.430000  1983.430000  1983.430000  1983.430000  1983.430000   
5007           75.290000    75.290000    75.290000    75.290000    75.290000   

                sale_dec  
salesman_id               
5001         2817.865000  
5002          449.816667  
5003         1295.450000  
5005           70.650000  
5006         1983.430000  
5007           75.290000  

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Pandas program to split a given dataset, group by two columns and convert other columns of the dataframe into a dictionary with column header as key.
Next: Write a Pandas program to split a given dataset, group by one column and remove those groups if all the values of a specific columns are not available.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.