WebClientProgramming¶
Legacy Wiki Page
This page was migrated from the old MoinMoin-based wiki. Information may be outdated or no longer applicable. For current documentation, see python.org.
Client-Side Web Programming¶
Libraries¶
µTidylib and mxTidy – Python interfaces to html tidy library to clean up HTML documents.
html5lib A HTML5-compliant library for parsing arbitarily-broken HTML to a range of tree formats including minidom, elementtree (including lxml) and BeautifulSoup
BeautifulSoup – a permissive HTML parser.
Don’t use HTMLParser (Python 2.x) or html.parser (Python 3.x) on HTML that might be invalid! That way lies pain. Either clean it up (using tidy), or use a different parser.
ClientCookie, ClientForm, and Mechanize are higher-level libraries for writing a web client.
mechanoid a mechanize fork.
libxml2dom can parse HTML by employing libxml2’s liberal HTML parser.