PyCon2006/Tutorials/TextProcessing¶
Legacy Wiki Page
This page was migrated from the old MoinMoin-based wiki. Information may be outdated or no longer applicable. For current documentation, see python.org.
Intended Audience
Beginning to intermediate programmers. A basic working knowledge of Python is assumed.
Summary
This tutorial will introduce beginning to intermediate programmers to the many useful Python tools & techniques for text and data processing. Topics will include regular expressions, filtering data with generators, and parsing.
Outline
Common data sources needing processing:
log files
CSV
tabular data
email
XML
Tools & techniques:
lists & dictionaries
s.join(list)instead of accumulatingfor line in filefilters, large data sources: generators
decorate-sort-undecorate
StringIO
Regular expressions:
pattern matching
filtering
substitution
splitting
Parsing:
text.split()text.find()regular expressions
“real” parsers (including XML)
state machines
Please send feedback & ideas for further specific topics to the trainer, David Goodger (email, home page).