Pandas: GroupBy with condition of two labels and ranges
29. Grouping by Salesman ID into Ranges
Write a Pandas program to split a given dataset using group by on specified column into two labels and ranges.
Split the group on 'salesman_id',
Ranges:
1) (5001...5006)
2) (5007..5012)
Test Data:
salesman_id sale_jan 0 5001 150.50 1 5002 270.65 2 5003 65.26 3 5004 110.50 4 5005 948.50 5 5006 2400.60 6 5007 1760.00 7 5008 2983.43 8 5009 480.40 9 5010 1250.45 10 5011 75.29 11 5012 1045.60
Sample Solution:
Python Code :
import pandas as pd
import numpy as np
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'salesman_id': [5001,5002,5003,5004,5005,5006,5007,5008,5009,5010,5011,5012],
'sale_jan':[150.5, 270.65, 65.26, 110.5, 948.5, 2400.6, 1760, 2983.43, 480.4, 1250.45, 75.29,1045.6]})
print("Original Orders DataFrame:")
print(df)
result = df.groupby(pd.cut(df['salesman_id'],
bins=[0,5006,np.inf],
labels=['S1', 'S2']))['sale_jan'].sum().reset_index()
print("\nGroupBy with condition of two labels and ranges:")
print(result)
Sample Output:
Original Orders DataFrame: salesman_id sale_jan 0 5001 150.50 1 5002 270.65 2 5003 65.26 3 5004 110.50 4 5005 948.50 5 5006 2400.60 6 5007 1760.00 7 5008 2983.43 8 5009 480.40 9 5010 1250.45 10 5011 75.29 11 5012 1045.60 GroupBy with condition of two labels and ranges: salesman_id sale_jan 0 S1 3946.01 1 S2 7595.17
For more Practice: Solve these Related Problems:
- Write a Pandas program to group a dataframe by salesman_id and then assign each salesman to a range label based on their id.
- Write a Pandas program to split the dataframe into two groups based on whether salesman_id falls within (5001…5006) or (5007…5012).
- Write a Pandas program to group the dataset by salesman_id and then create a new column indicating the range category.
- Write a Pandas program to segment salesman_id values into two bins and then group by these bins to calculate summary statistics.
Go to:
Previous: Write a Pandas program to split a given dataset, group by one column and remove those groups if all the values of a specific columns are not available.
Next: Write a Pandas program to split the following dataset using group by on first column and aggregate over multiple lists on second column.
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.