NLTK Tokenize: Tokenize a twitter text

Last update on December 21 2024 07:35:51 (UTC/GMT +8 hours)

Write a Python NLTK program to tokenize a twitter text.

Sample Solution:

Python Code :

from nltk.tokenize import TweetTokenizer
tknzr = TweetTokenizer(strip_handles=True, reduce_len=True)
tweet_text = "NoSQL introduction - w3resource http://bit.ly/1ngHC5F  #nosql #database #webdev"
print("\nOriginal Tweet:")
print(tweet_text)
result = tknzr.tokenize(tweet_text)
print("\nTokenize a twitter text:")
print(result)

Sample Output:

Original Tweet:
NoSQL introduction - w3resource http://bit.ly/1ngHC5F  #nosql #database #webdev

Tokenize a twitter text:
['NoSQL', 'introduction', '-', 'w3resource', 'http://bit.ly/1ngHC5F', '#nosql', '#database', '#webdev']

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python NLTK program to tokenize words, sentence wise.
Next: Write a Python NLTK program to remove Twitter username handles from a given twitter text.