w3resource

Checking for Duplicate Rows in a Pandas DataFrame


4. Checking Duplicate Rows in a DataFrame

Write a Pandas program to check duplicate rows in a DataFrame.

This exercise shows how to check duplicate rows in a DataFrame using duplicated().

Sample Solution :

Code :

import pandas as pd

# Create a sample DataFrame with duplicate rows
df = pd.DataFrame({
    'Name': ['Orville', 'Arturo', 'Ruth', 'Orville'],
    'Age': [25, 30, 22, 25],
    'Salary': [50000, 60000, 70000, 50000]
})

# Check for duplicate rows
duplicates = df.duplicated()

# Output the result
print(duplicates)

Output:

0    False
1    False
2    False
3     True
dtype: bool

Explanation:

  • Created a DataFrame with some duplicate rows.
  • Used duplicated() to check for duplicate rows.
  • Outputted a Boolean Series indicating which rows are duplicates.

For more Practice: Solve these Related Problems:

  • Write a Pandas program to check for duplicate rows in a DataFrame and list the indices of the duplicates.
  • Write a Pandas program to identify duplicate rows based on a subset of columns and output a summary of duplicates.
  • Write a Pandas program to count duplicate rows and visualize the frequency of duplicates per unique row.
  • Write a Pandas program to detect duplicate rows and generate a DataFrame that includes an additional column marking duplicate status.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.