Pandas: Joining columns on columns (potentially a many-to-many join)
Write a Pandas program to create a combination from two dataframes where a column id combination appears more than once in both dataframes.
Test Data:
data1: key1 key2 P Q 0 K0 K0 P0 Q0 1 K0 K1 P1 Q1 2 K1 K0 P2 Q2 3 K2 K1 P3 Q3
data2: key1 key2 R S 0 K0 K0 R0 S0 1 K1 K0 R1 S1 2 K1 K0 R2 S2 3 K2 K0 R3 S3
Sample Solution:
Python Code :
import pandas as pd
data1 = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'P': ['P0', 'P1', 'P2', 'P3'],
'Q': ['Q0', 'Q1', 'Q2', 'Q3']})
data2 = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'R': ['R0', 'R1', 'R2', 'R3'],
'S': ['S0', 'S1', 'S2', 'S3']})
print("Original DataFrames:")
print(data1)
print("--------------------")
print(data2)
print("\nMerged Data (many-to-many join case):")
result = pd.merge(data1, data2, on='key1')
print(result)
Test Data:
Original DataFrames: key1 key2 P Q 0 K0 K0 P0 Q0 1 K0 K1 P1 Q1 2 K1 K0 P2 Q2 3 K2 K1 P3 Q3 -------------------- key1 key2 R S 0 K0 K0 R0 S0 1 K1 K0 R1 S1 2 K1 K0 R2 S2 3 K2 K0 R3 S3 Merged Data (many-to-many join case): key1 key2_x P Q key2_y R S 0 K0 K0 P0 Q0 K0 R0 S0 1 K0 K1 P1 Q1 K0 R0 S0 2 K1 K0 P2 Q2 K0 R1 S1 3 K1 K0 P2 Q2 K0 R2 S2 4 K2 K1 P3 Q3 K0 R3 S3
Python Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Pandas program to create a new DataFrame based on existing series, using specified argument and override the existing columns names.
Next: Write a Pandas program to combine the columns of two potentially differently-indexed DataFrames into a single result DataFrame.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics