w3resource

NLTK Tokenize: Tokenize a twitter text

NLTK Tokenize: Exercise-6 with Solution

Write a Python NLTK program to tokenize a twitter text.

Sample Solution:

Python Code :

from nltk.tokenize import TweetTokenizer
tknzr = TweetTokenizer(strip_handles=True, reduce_len=True)
tweet_text = "NoSQL introduction - w3resource http://bit.ly/1ngHC5F  #nosql #database #webdev"
print("\nOriginal Tweet:")
print(tweet_text)
result = tknzr.tokenize(tweet_text)
print("\nTokenize a twitter text:")
print(result) 

Sample Output:

Original Tweet:
NoSQL introduction - w3resource http://bit.ly/1ngHC5F  #nosql #database #webdev

Tokenize a twitter text:
['NoSQL', 'introduction', '-', 'w3resource', 'http://bit.ly/1ngHC5F', '#nosql', '#database', '#webdev']

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: Write a Python NLTK program to tokenize words, sentence wise.
Next: Write a Python NLTK program to remove Twitter username handles from a given twitter text.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Become a Patron!

Follow us on Facebook and Twitter for latest update.

It will be nice if you may share this link in any developer community or anywhere else, from where other developers may find this content. Thanks.

https://198.211.115.131/python-exercises/nltk/nltk-tokenize-exercise-6.php