w3resource

Removing Duplicate Rows from a DataFrame Using Pandas


Pandas: Data Validation Exercise-5 with Solution


Write a Pandas program to remove duplicate rows from a DataFrame.

This exercise demonstrates how to remove duplicate rows from a DataFrame using drop_duplicates().

Sample Solution :

Code :

import pandas as pd

# Create a sample DataFrame with duplicate rows
df = pd.DataFrame({
    'Name': ['Orville', 'Arturo', 'Ruth', 'Orville'],
    'Age': [25, 30, 22, 25],
    'Salary': [50000, 60000, 70000, 50000]
})

# Remove duplicate rows
df_no_duplicates = df.drop_duplicates()

# Output the result
print(df_no_duplicates)

Output:

      Name  Age  Salary
0  Orville   25   50000
1   Arturo   30   60000
2     Ruth   22   70000

Explanation:

  • Created a DataFrame with some duplicate rows.
  • Used drop_duplicates() to remove duplicate rows.
  • Returned the DataFrame without duplicates.

Python-Pandas Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.