w3resource

Pandas: Extract the unique sentences from a given column of a given DataFrame


39. Extract Unique Sentences

Write a Pandas program to extract the unique sentences from a given column of a given DataFrame.

Sample Solution:

Python Code :

import pandas as pd
import re as re
df = pd.DataFrame({
    'company_code': ['Abcd','EFGF', 'zefsalf', 'sdfslew', 'zekfsdf'],
    'date_of_sale': ['12/05/2002','16/02/1999','05/09/1998','12/02/2022','15/09/1997'],
    'address': ['9910 Surrey Avenue\n9910 Surrey Avenue','92 N. Bishop Avenue','9910 Golden Star Avenue', '102 Dunbar St.\n102 Dunbar St.', '17 West Livingston Court']
})

print("Original DataFrame:")
print(df)

def find_unique_sentence(str1):
    result = re.findall(r'(?sm)(^[^\r\n]+$)(?!.*^\1$)', str1)
    return result

df['unique_sentence']=df['address'].apply(lambda st : find_unique_sentence(st))
print("\nExtract unique sentences :")
print(df)

Sample Output:

Original DataFrame:
  company_code                   ...                                                   address
0         Abcd                   ...                    9910 Surrey Avenue\n9910 Surrey Avenue
1         EFGF                   ...                                       92 N. Bishop Avenue
2      zefsalf                   ...                                   9910 Golden Star Avenue
3      sdfslew                   ...                            102 Dunbar St.\n102 Dunbar St.
4      zekfsdf                   ...                                  17 West Livingston Court

[5 rows x 3 columns]

Extract unique sentences :
  company_code             ...                         unique_sentence
0         Abcd             ...                    [9910 Surrey Avenue]
1         EFGF             ...                   [92 N. Bishop Avenue]
2      zefsalf             ...               [9910 Golden Star Avenue]
3      sdfslew             ...                        [102 Dunbar St.]
4      zekfsdf             ...              [17 West Livingston Court]

[5 rows x 4 columns]

For more Practice: Solve these Related Problems:

  • Write a Pandas program to extract unique sentences from a DataFrame column by splitting text on punctuation and then removing duplicates.
  • Write a Pandas program to create a series of sentences from a text column and then output only the unique sentences.
  • Write a Pandas program to split a column into sentences, deduplicate them, and then display the unique sentences in sorted order.
  • Write a Pandas program to filter out repeated sentences from a column and then create a new DataFrame with only unique sentence entries.

Go to:


Previous: Write a Pandas program to extract the sentences where a specific word is present in a given column of a given DataFrame.
Next: Write a Pandas program to extract words starting with capital words from a given column of a given DataFrame.

Python Code Editor:

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.