Removing duplicate rows in Pandas DataFrame
Python Pandas Numpy: Exercise-18 with Solution
Remove duplicate rows from a Pandas DataFrame.
Sample Solution:
Python Code:
import pandas as pd
# Create a sample DataFrame with duplicate rows
data = {'Name': ['Ross', 'Bob', 'Ross', 'Geoffrey', 'Bob'],
'Age': [25, 30, 25, 22, 30],
'Salary': [50000, 60000, 50000, 45000, 60000]}
df = pd.DataFrame(data)
# Remove duplicate rows
df_no_duplicates = df.drop_duplicates()
# Display the DataFrame without duplicates
print(df_no_duplicates)
Output:
Name Age Salary 0 Ross 25 50000 1 Bob 30 60000 3 Geoffrey 22 45000
Explanation:
In the exerciser above,
- We create a sample DataFrame (df) with columns 'Name', 'Age', and 'Salary'.
- The df.drop_duplicates() method removes duplicate rows from the DataFrame.
- The resulting DataFrame (df_no_duplicates) contains only unique rows.
You can also specify a subset of columns to consider when identifying duplicates using the subset parameter. For example, to remove duplicates based on the 'Name' column:
df_no_duplicates = df.drop_duplicates(subset='Name')
Based on the structure of the DataFrame, adjust the column names and data.
Flowchart:
Python Code Editor:
Previous: Normalizing numerical column in Pandas DataFrame with Min-Max scaling.
Next: Performing element-wise addition in NumPy arrays.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.
https://198.211.115.131/python-exercises/pandas_numpy/pandas_numpy-exercise-18.php
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics