w3resource

Pandas: Joining columns on columns (potentially a many-to-many join)


12. Combine DataFrames with Duplicate Key Combinations

Write a Pandas program to create a combination from two dataframes where a column id combination appears more than once in both dataframes.

Test Data:

data1:
  key1 key2   P   Q
0   K0   K0  P0  Q0
1   K0   K1  P1  Q1
2   K1   K0  P2  Q2
3   K2   K1  P3  Q3
data2:
  key1 key2   R   S
0   K0   K0  R0  S0
1   K1   K0  R1  S1
2   K1   K0  R2  S2
3   K2   K0  R3  S3

Sample Solution:

Python Code :

import pandas as pd
data1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
                     'key2': ['K0', 'K1', 'K0', 'K1'],
                     'P': ['P0', 'P1', 'P2', 'P3'],
                     'Q': ['Q0', 'Q1', 'Q2', 'Q3']}) 
data2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
                      'key2': ['K0', 'K0', 'K0', 'K0'],
                      'R': ['R0', 'R1', 'R2', 'R3'],
                      'S': ['S0', 'S1', 'S2', 'S3']})
print("Original DataFrames:")
print(data1)
print("--------------------")
print(data2)
print("\nMerged Data (many-to-many join case):")
result = pd.merge(data1, data2, on='key1')
print(result)

Test Data:

Original DataFrames:
  key1 key2   P   Q
0   K0   K0  P0  Q0
1   K0   K1  P1  Q1
2   K1   K0  P2  Q2
3   K2   K1  P3  Q3
--------------------
  key1 key2   R   S
0   K0   K0  R0  S0
1   K1   K0  R1  S1
2   K1   K0  R2  S2
3   K2   K0  R3  S3

Merged Data (many-to-many join case):
  key1 key2_x   P   Q key2_y   R   S
0   K0     K0  P0  Q0     K0  R0  S0
1   K0     K1  P1  Q1     K0  R0  S0
2   K1     K0  P2  Q2     K0  R1  S1
3   K1     K0  P2  Q2     K0  R2  S2
4   K2     K1  P3  Q3     K0  R3  S3    

For more Practice: Solve these Related Problems:

  • Write a Pandas program to merge two DataFrames where a combination of key columns appears multiple times in both, then output the resulting DataFrame.
  • Write a Pandas program to join two DataFrames on composite keys that are not unique and then count the number of merged rows.
  • Write a Pandas program to combine DataFrames with duplicate key pairs and then group the result by the key columns to aggregate duplicate records.
  • Write a Pandas program to perform a many-to-many merge on two DataFrames and then verify the number of resulting rows.

Go to:


Previous: Write a Pandas program to create a new DataFrame based on existing series, using specified argument and override the existing columns names.
Next: Write a Pandas program to combine the columns of two potentially differently-indexed DataFrames into a single result DataFrame.

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.