Pandas: Drop last n rows from each group after using groupby on a dataframe
32. Grouping by Multiple Columns and Dropping Last n Rows
Write a Pandas program to split a given dataset using group by on multiple columns and drop last n rows of from each group.
Test Data:
ord_no purch_amt ord_date customer_id salesman_id
0 70001 150.50 2012-10-05 3002 5002
1 70009 270.65 2012-09-10 3001 5003
2 70002 65.26 2012-10-05 3001 5001
3 70004 110.50 2012-08-17 3003 5003
4 70007 948.50 2012-09-10 3002 5002
5 70005 2400.60 2012-07-27 3002 5001
6 70008 5760.00 2012-09-10 3001 5001
7 70010 1983.43 2012-10-10 3004 5003
8 70003 2480.40 2012-10-10 3003 5003
9 70012 250.45 2012-06-27 3002 5002
10 70011 75.29 2012-08-17 3003 5003
11 70013 3045.60 2012-04-25 3001 5001
Sample Solution:
Python Code :
import pandas as pd
pd.set_option('display.max_rows', None)
#pd.set_option('display.max_columns', None)
df = pd.DataFrame({
'ord_no':[70001,70009,70002,70004,70007,70005,70008,70010,70003,70012,70011,70013],
'purch_amt':[150.5,270.65,65.26,110.5,948.5,2400.6,5760,1983.43,2480.4,250.45, 75.29,3045.6],
'ord_date': ['2012-10-05','2012-09-10','2012-10-05','2012-08-17','2012-09-10','2012-07-27','2012-09-10','2012-10-10','2012-10-10','2012-06-27','2012-08-17','2012-04-25'],
'customer_id':[3002,3001,3001,3003,3002,3002,3001,3004,3003,3002,3003,3001],
'salesman_id':[5002,5003,5001,5003,5002,5001,5001,5003,5003,5002,5003,5001]})
print("Original Orders DataFrame:")
print(df)
print("\nSplit the said data on 'salesman_id', 'customer_id' wise:")
result = df.groupby(['salesman_id', 'customer_id'])
for name,group in result:
print("\nGroup:")
print(name)
print(group)
n = 2
#result1 = df.groupby(['salesman_id', 'customer_id']).tail(n).index, axis=0)
print("\nDroping last two records:")
result1 = df.drop(df.groupby(['salesman_id', 'customer_id']).tail(n).index, axis=0)
print(result1)
Sample Output:
Original Orders DataFrame:
ord_no purch_amt ord_date customer_id salesman_id
0 70001 150.50 2012-10-05 3002 5002
1 70009 270.65 2012-09-10 3001 5003
2 70002 65.26 2012-10-05 3001 5001
3 70004 110.50 2012-08-17 3003 5003
4 70007 948.50 2012-09-10 3002 5002
5 70005 2400.60 2012-07-27 3002 5001
6 70008 5760.00 2012-09-10 3001 5001
7 70010 1983.43 2012-10-10 3004 5003
8 70003 2480.40 2012-10-10 3003 5003
9 70012 250.45 2012-06-27 3002 5002
10 70011 75.29 2012-08-17 3003 5003
11 70013 3045.60 2012-04-25 3001 5001
Split the said data on 'salesman_id', 'customer_id' wise:
Group:
(5001, 3001)
ord_no purch_amt ord_date customer_id salesman_id
2 70002 65.26 2012-10-05 3001 5001
6 70008 5760.00 2012-09-10 3001 5001
11 70013 3045.60 2012-04-25 3001 5001
Group:
(5001, 3002)
ord_no purch_amt ord_date customer_id salesman_id
5 70005 2400.6 2012-07-27 3002 5001
Group:
(5002, 3002)
ord_no purch_amt ord_date customer_id salesman_id
0 70001 150.50 2012-10-05 3002 5002
4 70007 948.50 2012-09-10 3002 5002
9 70012 250.45 2012-06-27 3002 5002
Group:
(5003, 3001)
ord_no purch_amt ord_date customer_id salesman_id
1 70009 270.65 2012-09-10 3001 5003
Group:
(5003, 3003)
ord_no purch_amt ord_date customer_id salesman_id
3 70004 110.50 2012-08-17 3003 5003
8 70003 2480.40 2012-10-10 3003 5003
10 70011 75.29 2012-08-17 3003 5003
Group:
(5003, 3004)
ord_no purch_amt ord_date customer_id salesman_id
7 70010 1983.43 2012-10-10 3004 5003
Droping last two records:
ord_no purch_amt ord_date customer_id salesman_id
0 70001 150.50 2012-10-05 3002 5002
2 70002 65.26 2012-10-05 3001 5001
3 70004 110.50 2012-08-17 3003 5003
For more Practice: Solve these Related Problems:
- Write a Pandas program to group the dataframe by multiple columns and then drop the last row from each group.
- Write a Pandas program to split the dataframe by specified keys and then remove the last n rows from each group using tail().
- Write a Pandas program to group the dataframe and then apply a lambda function to drop the final two rows from every group.
- Write a Pandas program to group by multiple columns and then trim each group by excluding the last n entries before further aggregation.
Go to:
PREV : Grouping by Salesman ID to Find First Order Date.
NEXT : Pandas Advanced Grouping and Aggregation Exercises Home.
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
