NLTK Tokenize: Split the text sentence/paragraph into a list of words
Write a Python NLTK program to split the text sentence/paragraph into a list of words.
Sample Solution:
Python Code :
text = '''
Joe waited for the train. The train was late.
Mary and Samantha took the bus.
I looked for Mary and Samantha at the bus station.
'''
print("\nOriginal string:")
print(text)
from nltk.tokenize import sent_tokenize
token_text = sent_tokenize(text)
print("\nSentence-tokenized copy in a list:")
print(token_text)
print("\nRead the list:")
for s in token_text:
print(s)
Sample Output:
Original string: Joe waited for the train. The train was late. Mary and Samantha took the bus. I looked for Mary and Samantha at the bus station. Sentence-tokenized copy in a list: ['Joe waited for the train.', 'The train was late.', 'Mary and Samantha took the bus.', 'I looked for Mary and Samantha at the bus station.'] Read the list: Joe waited for the train. The train was late. Mary and Samantha took the bus. I looked for Mary and Samantha at the bus station.
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: NLTK Tokenize Exercises Home.
Next: Write a Python NLTK program to tokenize sentences in languages other than English.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics