Reading RDF/XML.

The interaction between ARP and Xerces includes a significant memory leak probably caused by Xerces interning some strings, on behalf of ARP. This feature cannot be turned off. This is not a new bug, so ARP users will not experience a degradation in performance. However, for users who have limited memory, or who are processing large files, or files from many diverse sources, or for long-lived applications such as web servers, it is important to be aware of this leak. I believe that the strings being interned correspond to the XML element tags and attribute names in the XML document. These form an open set in RDF/XML, unlike most XML applications.

No proper work around is provided at this time; it is hoped that later enhancements will provide a non-Xerces solution.

The best current advice is as follows:

limited memory environments
buy more memory
convert RDF/XML to N3 or N-triple before sending to the limited memory application.
long-lived applications
web servers
Kill and restart you application regularly e.g. daily. Monitor memory usage to check whether the application should be restarted more frequently.
very large documents
Ensure these are processed in environments with adequate memory, restart application after each job.

The size of the leak is presumably highly dependent on the actual RDF/XML documents read. As an example the OWL Guide wine ontology, (using the version from the OWL Test Cases) contains about 2000 triples and leaks about 180K.