com.caucho.xml
Class LooseHtml
java.lang.Object
|
+--com.caucho.xml.AbstractParser
|
+--com.caucho.xml.XmlParser
|
+--com.caucho.xml.LooseHtml
- All Implemented Interfaces:
- Locator, Parser, XMLReader
- public class LooseHtml
- extends XmlParser
A forgiving HTML parser interface.
The forgiving HTML parser is useful for extracting information from
the web since many sites have not-quite-standard HTML.
To parse a file into a DOM Document use
Document doc = new Html().parseDocument("foo.html");
To parse a string into a DOM Document use
String html = "<h1>small test</h1>";
Document doc = new Html().parseDocumentString(html);
To parse a file using the SAX API use
Html html = new Html();
html.setContentHandler(myContentHandler);
html.parse("foo.html");
Constructor Summary |
LooseHtml()
Create a new forgiving HTML parser |
Methods inherited from class com.caucho.xml.AbstractParser |
getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getForgiving, getProperty, getResinInclude, getSearchPath, getSkipComments, parse, parse, parse, parse, parseDocument, parseDocument, parseDocument, parseDocument, parseDocumentString, parseString, setAutodetectXml, setContentHandler, setDocumentHandler, setDTDHandler, setEntitiesAsText, setEntityResolver, setErrorHandler, setExpandEntities, setFeature, setForgiving, setLocale, setProperty, setResinInclude, setSearchPath, setSkipComments |
Methods inherited from class java.lang.Object |
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
LooseHtml
public LooseHtml()
- Create a new forgiving HTML parser