Automated Spell Check for Python Docstrings and Comments: Detect Typos Missed by Pylint & Flake8
Python developers rely heavily on tools like Pylint and Flake8 to ensure code quality—they catch syntax errors, enforce style guidelines, and flag anti-patterns. However, these tools have a blind spot: typos in docstrings and comments. A misplaced "teh" instead of "the" or "funciton" instead of "function" can undermine readability, confuse collaborators, and erode trust in your documentation.
In this blog, we’ll explore why typos in non-executable text matter, why Pylint and Flake8 miss them, and how to automate spell checking for docstrings and comments using specialized tools. By the end, you’ll have actionable steps to integrate spell checking into your workflow and catch typos before they reach production.
Table of Contents#
- Why Typos in Docstrings and Comments Matter
- Limitations of Pylint & Flake8 for Spell Checking
- Top Tools for Automated Spell Checking in Python
- Step-by-Step Implementation Guide
- Advanced Configuration & Customization
- Integration with CI/CD Pipelines
- Tool Comparison: Which One Should You Choose?
- Conclusion
- References
Why Typos in Docstrings and Comments Matter#
Docstrings and comments are the "human interface" of your code. They explain why a function exists, how to use it, and what edge cases to watch for. Typos here can:
- Reduce readability: A comment like "Teh function returns teh user’s name" forces readers to pause and decipher the mistake.
- Hurt maintainability: Over time, ambiguous or typo-ridden comments become useless, making it harder to update code.
- Damage credibility: If your public API’s docstrings contain typos, users may question the quality of your entire project.
- Introduce confusion: Technical terms (e.g., "asynchronous" misspelled as "asynchronus") can mislead new contributors.
Consider this example:
def calculate_interest(principal, rate, time):
"""
Calculates teh interest using teh formula: (principal * rate * time) / 100.
Returns: float -- teh computed interest.
"""
return (principal * rate * time) / 100 The repeated "teh" (instead of "the") is distracting and unprofessional—yet Pylint and Flake8 would never flag this.
Limitations of Pylint & Flake8 for Spell Checking#
Pylint and Flake8 are designed to analyze code structure, not natural language. Here’s why they miss typos:
What Pylint/Flake8 Do Well:#
- Enforce PEP 8 style (indentation, line length, variable naming).
- Detect syntax errors (e.g., missing colons, undefined variables).
- Flag code smells (e.g., unused variables, overly complex functions).
What They Ignore:#
- Semantic errors in text: They don’t parse English (or any language) for correctness. "teh" and "the" are both valid strings to them.
- Contextual meaning: A comment like "This function procceses data" (misspelled "processes") is treated as valid text.
Example: Run Pylint on the code snippet above. It will check for style (e.g., line length of the docstring) but ignore "teh":
pylint interest_calculator.py
# Output: No errors related to "teh" To catch these issues, we need tools specifically built for spell checking in code.
Top Tools for Automated Spell Checking in Python#
Several tools specialize in finding typos in code, docstrings, and comments. Below are the most popular options:
1. codespell#
Overview: A lightweight, open-source tool that scans files for common typos (e.g., "teh" → "the", "procces" → "process"). It uses a prebuilt dictionary of frequent misspellings and supports 20+ languages.
Features:
- Checks code, comments, docstrings, and even documentation files (e.g.,
README.md). - Fast and easy to run (no complex config required).
- Allows ignoring words or adding custom dictionaries.
Pros: Simple setup, low false positives, integrates with IDEs.
Cons: Limited to its built-in dictionary (misses rare typos), less configurable than advanced tools.
2. doc8#
Overview: Focused on docstring quality, doc8 checks reStructuredText (reST) docstrings for style issues—including spelling (via pyenchant).
Features:
- Validates reST syntax (e.g., missing closing backticks, malformed lists).
- Checks for spelling errors in docstrings using a spell checker.
- Enforces line length and whitespace rules for docstrings.
Pros: Specialized for docstrings, works with reST (common in Python docs).
Cons: Only checks docstrings (ignores regular comments), requires reST knowledge.
3. pyspelling#
Overview: A highly configurable tool that supports spell checking in code, markdown, HTML, and more. It uses hunspell or aspell under the hood and lets you define custom rules.
Features:
- Scan specific file types (e.g., only
.pyfiles) or sections (e.g., comments/docstrings). - Use custom dictionaries (for project jargon like "PyTorch" or "Kubernetes").
- Ignore false positives (e.g., variable names like
num_epochs).
Pros: Extremely flexible, supports multiple formats, handles complex projects.
Cons: Steeper learning curve, requires config files.
4. spellchecker#
Overview: A Python library (not a CLI tool) that wraps the pyenchant spell checker. Useful for building custom spell-checking scripts.
Features:
- Programmatic access to spell checking (e.g., in CI scripts or pre-commit hooks).
- Supports adding custom words or languages.
Pros: Full control via code, ideal for unique workflows.
Cons: Requires writing custom logic (not a turnkey solution).
Step-by-Step Implementation Guide#
Let’s walk through setting up two popular tools: codespell (for simplicity) and pyspelling (for advanced use cases).
Option 1: Quick Start with codespell#
Step 1: Install#
pip install codespell # or via conda: conda install -c conda-forge codespell Step 2: Basic Usage#
Run codespell on your project directory:
codespell my_python_project/ Step 3: Interpret Output#
Example output:
my_python_project/utils.py:15: teh ==> the
my_python_project/models.py:42: procces ==> process, pro cces
my_python_project/docs/readme.md:8: acheive ==> achieve
teh ==> the: A clear typo to fix.procces ==> process, pro cces: Multiple suggestions (pick "process").
Step 4: Fix Typos#
Edit the files manually, or use codespell --write-changes to auto-correct (use cautiously!):
codespell my_python_project/ --write-changes Option 2: Advanced Spell Checking with pyspelling#
pyspelling is ideal if you need to:
- Ignore project-specific terms (e.g., "myapp" or "datapipe").
- Check only docstrings (not code).
- Use custom dictionaries.
Step 1: Install#
pip install pyspelling Step 2: Create a Config File#
Create .spellcheck.yml in your project root. Here’s a Python-focused example:
matrix:
- name: python
sources:
- '**/*.py' # Scan all Python files
exclude:
- '**/venv/**' # Ignore virtual environments
- '**/__pycache__/**'
dictionary:
wordlists:
- ./.custom_words.txt # Custom words to allow
output: build/dict.dic
pipeline:
- pyspelling.filters.python: # Extract comments and docstrings
comments: true
docstrings: true
string: false # Ignore string literals in code
- pyspelling.filters.context: # Ignore words with underscores (e.g., variable names)
context_visible_first: true
delimiters:
- '"[^"]*"' # Ignore quoted strings
- "'[^']*'"
- '\b\w+_\w+\b' # Ignore snake_case words
- pyspelling.filters.wordlist: # Use the custom dictionary
wordlist_files:
- ./.custom_words.txt Step 3: Add Custom Words#
Create .custom_words.txt to whitelist jargon:
myapp
datapipe
PyTorch
Kubernetes Step 4: Run the Check#
pyspelling -c .spellcheck.yml Step 5: Fix Issues#
Example output will flag typos in comments/docstrings, ignoring your custom words:
Spelling check failed for: my_python_project/utils.py
>> Line 15: "Teh function processes data"
Unknown word: teh
Advanced Configuration & Customization#
Ignoring Words#
-
codespell: Use--ignore-words-listor a config file (.codespellrc):[codespell] ignore-words-list = myapp, datapipe skip = venv, __pycache__ -
pyspelling: Add words to.custom_words.txt(as shown earlier).
Excluding Files/Directories#
-
codespell: Use--skipto ignore paths:codespell my_project/ --skip=venv,docs/build -
pyspelling: Use theexcludefield in.spellcheck.yml.
IDE Integration#
- VS Code: Install the Code Spell Checker extension to catch typos in real time.
- PyCharm: Enable "Spelling" under
Preferences > Editor > Inspectionsand add custom dictionaries.
Integration with CI/CD Pipelines#
To catch typos before code is merged, add a spell check step to your CI/CD pipeline. Here’s how to do it with GitHub Actions:
Example: GitHub Actions Workflow#
Create .github/workflows/spell-check.yml:
name: Spell Check
on: [pull_request, push]
jobs:
spellcheck:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: 3.11
- name: Install codespell
run: pip install codespell
- name: Run codespell
run: codespell . --skip=venv,__pycache__ Now, every PR or push will fail if typos are found!
Tool Comparison: Which One Should You Choose?#
| Tool | Ease of Use | Configurability | Speed | Best For |
|---|---|---|---|---|
codespell | ⚡⚡⚡ | Low | Fast | Quick checks, small projects, beginners |
doc8 | ⚡⚡ | Medium | Fast | reST docstrings, strict documentation |
pyspelling | ⚡ | High | Slow | Large projects, custom rules, false positives |
spellchecker | ⚡ | Very High | N/A | Custom scripts, programmatic checks |
Recommendation: Start with codespell for simplicity. If you need to ignore jargon or check specific file types, switch to pyspelling.
Conclusion#
Typos in docstrings and comments are small but impactful. While Pylint and Flake8 ensure code correctness, tools like codespell and pyspelling ensure your code communicates clearly. By integrating automated spell checking into your workflow (and CI/CD pipeline), you’ll write more professional, maintainable, and trustworthy code.
Don’t let a "teh" undermine your hard work—spell check today!