PyCon2006/Tutorials/TextProcessing

Legacy Wiki Page

This page was migrated from the old MoinMoin-based wiki. Information may be outdated or no longer applicable. For current documentation, see python.org.

Intended Audience

Beginning to intermediate programmers. A basic working knowledge of Python is assumed.

Summary

This tutorial will introduce beginning to intermediate programmers to the many useful Python tools & techniques for text and data processing. Topics will include regular expressions, filtering data with generators, and parsing.

Outline

  • Common data sources needing processing:

    • log files

    • CSV

    • tabular data

    • email

    • XML

  • Tools & techniques:

    • lists & dictionaries

    • s.join(list) instead of accumulating

    • for line in file

    • filters, large data sources: generators

    • decorate-sort-undecorate

    • StringIO

  • Regular expressions:

    • pattern matching

    • filtering

    • substitution

    • splitting

  • Parsing:

    • text.split()

    • text.find()

    • regular expressions

    • “real” parsers (including XML)

    • state machines

Please send feedback & ideas for further specific topics to the trainer, David Goodger (email, home page).