com.caucho.xml
Class LooseHtml

java.lang.Object
  |
  +--com.caucho.xml.AbstractParser
        |
        +--com.caucho.xml.XmlParser
              |
              +--com.caucho.xml.LooseHtml
All Implemented Interfaces:
Locator, Parser, XMLReader

public class LooseHtml
extends XmlParser

A forgiving HTML parser interface.

The forgiving HTML parser is useful for extracting information from the web since many sites have not-quite-standard HTML.

To parse a file into a DOM Document use


 Document doc = new Html().parseDocument("foo.html");
 

To parse a string into a DOM Document use


 String html = "<h1>small test</h1>";
 Document doc = new Html().parseDocumentString(html);
 

To parse a file using the SAX API use


 Html html = new Html();
 html.setContentHandler(myContentHandler);
 html.parse("foo.html");
 


Constructor Summary
LooseHtml()
          Create a new forgiving HTML parser
 
Methods inherited from class com.caucho.xml.XmlParser
getColumnNumber, getLineNumber, getPublicId, getSystemId
 
Methods inherited from class com.caucho.xml.AbstractParser
getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getForgiving, getProperty, getResinInclude, getSearchPath, getSkipComments, parse, parse, parse, parse, parseDocument, parseDocument, parseDocument, parseDocument, parseDocumentString, parseString, setAutodetectXml, setContentHandler, setDocumentHandler, setDTDHandler, setEntitiesAsText, setEntityResolver, setErrorHandler, setExpandEntities, setFeature, setForgiving, setLocale, setProperty, setResinInclude, setSearchPath, setSkipComments
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

LooseHtml

public LooseHtml()
Create a new forgiving HTML parser