PythonTutorials.net
Toggle Menu
Home
Online Python Compiler
Tutorials
Django
Flask
Scikit-Learn
NumPy
NLTK
Pillow
Blog
All Posts
Text Preprocessing in NLTK
Test how well you can clean,normalize,and prepare text data for NLP tasks.
1. What is the process of splitting a text into individual words or tokens called in NLTK?
Tokenization
Stemming
Lemmatization
POS Tagging
2. Which of the following are stemmers available in NLTK? (Select all that apply)
PorterStemmer
WordNetLemmatizer
LancasterStemmer
SnowballStemmer
3. NLTK's WordNetLemmatizer requires specifying part-of-speech (POS) tags to accurately lemmatize words that are not nouns.
True
False
4. What is the name of the NLTK function used to download resources like stopwords? (exact function name)
5. Which NLTK corpus contains a list of common stopwords for various languages?
wordnet
stopwords
brown
gutenberg
6. Which of the following are common steps in text preprocessing using NLTK? (Select all that apply)
Lowercasing text
Tokenization
Stemming or lemmatization
Compiling code
7. Stemming in NLTK always produces valid English words as output.
True
False
8. What is the output of nltk.word_tokenize("Hello, world!")? (provide tokens as comma-separated strings without spaces)
9. What key difference distinguishes lemmatization from stemming in NLTK?
Lemmatization is faster
Lemmatization considers the context and meaning of words
Lemmatization removes stopwords
Lemmatization is rule-based
10. Which NLTK tools are used for sentence tokenization? (Select all that apply)
sent_tokenize
word_tokenize
PunktSentenceTokenizer
PorterStemmer
Reset
Answered 0 of 0 — 0 correct