NLTK Tokenize: Split all punctuation into separate tokens
Write a Python NLTK program to split all punctuation into separate tokens.
Sample Solution:
Python Code :
from nltk.tokenize import WordPunctTokenizer
text = "Reset your password if you just can't remember your old one."
print("\nOriginal string:")
print(text)
result = WordPunctTokenizer().tokenize(text)
print("\nSplit all punctuation into separate tokens:")
print(result)
Sample Output:
Original string: Reset your password if you just can't remember your old one. Split all punctuation into separate tokens: ['Reset', 'your', 'password', 'if', 'you', 'just', 'can', "'", 't', 'remember', 'your', 'old', 'one', '.']
Have another way to solve this solution? Contribute your code (and comments) through Disqus.
Previous: Write a Python NLTK program to create a list of words from a given string.
Next: Write a Python NLTK program to tokenize words, sentence wise.
What is the difficulty level of this exercise?
Test your Programming skills with w3resource's quiz.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics