NLTK Tokenize: Read a given text through each line and look for sentences
Write a Python NLTK program that will read a given text through each line and look for sentences. Print each sentence and divide two sentences with “==============”.
Sample Solution:
Python Code-1:
import nltk.data
text = '''
Mr. Smith waited for the train. The train was late.
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station.
'''
print("\nOriginal Tweet:")
print(text)
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
print('\n==============\n'.join(sent_detector.tokenize(text.strip())))
Sample Output:
Original Tweet: Mr. Smith waited for the train. The train was late. Mary and Samantha took the bus. I looked for Mary and Samantha at the bus station. Mr. Smith waited for the train. ============== The train was late. ============== Mary and Samantha took the bus. ============== I looked for Mary and Samantha at the bus station.
Punctuation following sentences is also included by default.
Example:
Python Code-2:
import nltk.data
text = '''
Mr. Smith waited for the train. (The train was late.)
Mary and Samantha took the bus. I looked for Mary and
Samantha at the bus station [Sector-1].
'''
print("\nOriginal Tweet:")
print(text)
sent_detector = nltk.data.load('tokenizers/punkt/english.pickle')
print('\n==============\n'.join(sent_detector.tokenize(text.strip())))
Output:
Original Tweet: Mr. Smith waited for the train. (The train was late.) Mary and Samantha took the bus. I looked for Mary and Samantha at the bus station [Sector-1]. Mr. Smith waited for the train. ============== (The train was late.) ============== Mary and Samantha took the bus. ============== I looked for Mary and Samantha at the bus station [Sector-1].
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Python NLTK program to remove Twitter username handles from a given twitter text.
Next: Write a Python NLTK program to find parenthesized expressions in a given string and divides the string into a sequence of substrings.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics