w3resource

Pandas Practice Set-1: Read the diamonds DataFrame and detect duplicate color


64. Read Diamonds DataFrame and Detect Duplicate 'color'

Write a Pandas program to read the diamonds DataFrame and detect duplicate color.

Note: duplicated () function returns boolean Series denoting duplicate rows, optionally only considering certain columns.

Sample Solution:

Python Code:

import pandas as pd
diamonds = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/diamonds.csv')
print("Original Dataframe:")
print(diamonds.shape)
print("\nCount the duplicate items:")
print(diamonds.clarity.duplicated().sum())

Sample Output:

Original Dataframe:
(53940, 10)

Count the duplicate items:
53932

For more Practice: Solve these Related Problems:

  • Write a Pandas program to identify duplicate entries in the 'color' column of the diamonds DataFrame and display a boolean mask.
  • Write a Pandas program to flag rows in the diamonds DataFrame where the 'color' value has already appeared.
  • Write a Pandas program to detect and print the indices of duplicate 'color' values in the diamonds DataFrame.
  • Write a Pandas program to create a new column indicating whether the 'color' value is a duplicate in the diamonds dataset.

Go to:


PREV : Sample 75% of Rows Without Replacement and Store 25% Separately.
NEXT : Count Duplicate Rows in Diamonds DataFrame.

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?



Follow us on Facebook and Twitter for latest update.