NLTK Tokenize: Split the text sentence/paragraph into a list of words

Last update on December 21 2024 07:35:48 (UTC/GMT +8 hours)

Write a Python NLTK program to split the text sentence/paragraph into a list of words.

Sample Solution:

Python Code :

text = '''
Joe waited for the train. The train was late. 
Mary and Samantha took the bus. 
I looked for Mary and Samantha at the bus station.
'''
print("\nOriginal string:")
print(text)
from nltk.tokenize import sent_tokenize
token_text = sent_tokenize(text)
print("\nSentence-tokenized copy in a list:")
print(token_text)
print("\nRead the list:")
for s in token_text:
    print(s)

Sample Output:

Original string:
Joe waited for the train. The train was late. Mary and Samantha took the bus. I looked for Mary and Samantha at the bus station.

Sentence-tokenized copy in a list:
['Joe waited for the train.', 'The train was late.', 'Mary and Samantha took the bus.', 'I looked for Mary and Samantha at the bus station.']

Read the list:
Joe waited for the train.
The train was late.
Mary and Samantha took the bus.
I looked for Mary and Samantha at the bus station.

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: NLTK Tokenize Exercises Home.
Next: Write a Python NLTK program to tokenize sentences in languages other than English.