w3resource

Removing duplicate rows in Pandas DataFrame

Python Pandas Numpy: Exercise-18 with Solution

Remove duplicate rows from a Pandas DataFrame.

Sample Solution:

Python Code:

import pandas as pd

# Create a sample DataFrame with duplicate rows
data = {'Name': ['Ross', 'Bob', 'Ross', 'Geoffrey', 'Bob'],
        'Age': [25, 30, 25, 22, 30],
        'Salary': [50000, 60000, 50000, 45000, 60000]}

df = pd.DataFrame(data)

# Remove duplicate rows
df_no_duplicates = df.drop_duplicates()

# Display the DataFrame without duplicates
print(df_no_duplicates)

Output:

       Name  Age  Salary
0      Ross   25   50000
1       Bob   30   60000
3  Geoffrey   22   45000

Explanation:

In the exerciser above,

  • We create a sample DataFrame (df) with columns 'Name', 'Age', and 'Salary'.
  • The df.drop_duplicates() method removes duplicate rows from the DataFrame.
  • The resulting DataFrame (df_no_duplicates) contains only unique rows.

You can also specify a subset of columns to consider when identifying duplicates using the subset parameter. For example, to remove duplicates based on the 'Name' column:

df_no_duplicates = df.drop_duplicates(subset='Name')

Based on the structure of the DataFrame, adjust the column names and data.

Flowchart:

Flowchart: Removing duplicate rows in Pandas DataFrame.

Python Code Editor:

Previous: Normalizing numerical column in Pandas DataFrame with Min-Max scaling.
Next: Performing element-wise addition in NumPy arrays.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Become a Patron!

Follow us on Facebook and Twitter for latest update.

It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.

https://198.211.115.131/python-exercises/pandas_numpy/pandas_numpy-exercise-18.php