Package com.hp.hpl.jena.rdf.arp

Reading RDF/XML.

See:
          Description

Interface Summary
ALiteral A string literal property value from an RDF/XML file.
AResource  
ARPErrorNumbers Error numbers used by ARP.
ARPHandler Convenience generalization of all ARP handler interfaces.
ExtendedHandler Extended callbacks from a reader to an RDF application.
NamespaceHandler This has methods copied form SAX for notifying the application of namespaces.
RDFParserConstants  
StatementHandler The callback from a reader to an RDF application.
 

Class Summary
ARP Another RDF Parser.
ARPSaxErrorHandler This class is not part of the API.
CharacterModel Some support for the Character Model Recommendation from the W3C (currently in second last call working draft).
JenaReader Interface between Jena and ARP.
NTriple A command line interface into ARP.
StanfordImpl An implementation of Sergey Melnik's Stanford API, used by SiRPAC.
URI A class to represent a Uniform Resource Identifier (URI).
 

Exception Summary
MalformedURIException MalformedURIExceptions are thrown in the process of building a URI or setting fields on a URI when an operation would result in an invalid URI specification.
ParseException An exception during the RDF processing of ARP.
 

Package com.hp.hpl.jena.rdf.arp Description

Reading RDF/XML.

The interaction between ARP and Xerces includes a significant memory leak probably caused by Xerces interning some strings, on behalf of ARP. This feature cannot be turned off. This is not a new bug, so ARP users will not experience a degradation in performance. However, for users who have limited memory, or who are processing large files, or files from many diverse sources, or for long-lived applications such as web servers, it is important to be aware of this leak. I believe that the strings being interned correspond to the XML element tags and attribute names in the XML document. These form an open set in RDF/XML, unlike most XML applications.

No proper work around is provided at this time; it is hoped that later enhancements will provide a non-Xerces solution.

The best current advice is as follows:

limited memory environments
buy more memory
convert RDF/XML to N3 or N-triple before sending to the limited memory application.
long-lived applications
web servers
Kill and restart you application regularly e.g. daily. Monitor memory usage to check whether the application should be restarted more frequently.
very large documents
Ensure these are processed in environments with adequate memory, restart application after each job.

The size of the leak is presumably highly dependent on the actual RDF/XML documents read. As an example the OWL Guide wine ontology, (using the version from the OWL Test Cases) contains about 2000 triples and leaks about 180K.



Copyright © 2000-2003 Hewlett-Packard. All Rights Reserved.