Removing Duplicate Rows from a DataFrame Using Pandas
Pandas: Data Validation Exercise-5 with Solution
Write a Pandas program to remove duplicate rows from a DataFrame.
This exercise demonstrates how to remove duplicate rows from a DataFrame using drop_duplicates().
Sample Solution :
Code :
import pandas as pd
# Create a sample DataFrame with duplicate rows
df = pd.DataFrame({
'Name': ['Orville', 'Arturo', 'Ruth', 'Orville'],
'Age': [25, 30, 22, 25],
'Salary': [50000, 60000, 70000, 50000]
})
# Remove duplicate rows
df_no_duplicates = df.drop_duplicates()
# Output the result
print(df_no_duplicates)
Output:
Name Age Salary 0 Orville 25 50000 1 Arturo 30 60000 2 Ruth 22 70000
Explanation:
- Created a DataFrame with some duplicate rows.
- Used drop_duplicates() to remove duplicate rows.
- Returned the DataFrame without duplicates.
Python-Pandas Code Editor:
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics