To aid in the conversion of environments from the old PDP software to
the format used in PDP++ (`.pat' files), and for generally
importing training and testing data represented in plain text files, we
have provided functions on the Environment that read and write text
files. These functions are called ReadText
and WriteText
.
The format that these functions read and write is very simple, consisting of a sequence of numbers, with an (optional) event name at the beginning or end of the line. Note, you must specify using the fmt parameter whether there will be a name associated with the events or not. Important: the name must be a contiguous string, without any whitespace -- it can however be a number or have any other ASCII characters in it. When reading in a file, ReadText simply reads in numbers sequentially for each pattern in each event, so the layout of the numbers is not critical. If the optional name is to be used, it must appear at the beginning of the line that starts a new event.
For example, in the old PDP software, the "xor.pat" file for the XOR example looks like this:
p00 0 0 0 p01 0 1 1 p10 1 0 1 p11 1 1 0
It is critical that the EventSpec and its constitutent PatternSpecs (see section 11.2 Events, Patterns and their Specs) are configured in advance for the correct number of values in the pattern file. The event spec for the above example would contain two PatternSpecs. The PatternSpecs would look like:
PatternSpec[0] { type = INPUT; to_layer = FIRST; n_vals = 2; }; PatternSpec[0] { type = TARGET; to_layer = LAST; n_vals = 1; };
So that the first two values (n_vals = 2) will be read into the first (input) pattern, and the third value (n_vals = 1) will be read into the last (output) pattern.
The ReadText
function also allows comments in the .pat
files, as it skips over lines beginning with # or //. Further,
ReadText
allows input to be split on different lines, since it
will read numbers until it gets the right number for each pattern.
There is a special comment you can use to control the creation and
organization of subgroups of events. To start a new subgroup, put the
comment # startgroup
before the pattern lines for the events in
your subgroup (note that the # endgroup
comments from earlier
versions are no longer neccessary, as they are redundant with the
startgroup comments -- they will be ignored). For example, if you
wanted 2 groups of 3 events you might have a file that looked like this:
# startgroup p01 0 0 0 p02 0 1 1 p03 0 1 0 # startgroup p11 1 0 1 p12 1 1 0 p13 1 1 1
WriteText
simply produces a file in the above format for all of
the events in the environment on which it is called. This can be useful
for exporting to other programs, or for converting patterns into a
different type of environment, one which cannot be used with the CopyTo
or CopyFrom commands. For example if events were created originally in
a TimeEnv environment, but you now want to use them in a FreqEnv
frequency environment, then you can use WriteText to save the
events to a file, and then use ReadText to read them into a
FreqEnv
which will enable a frequency to be attached to them.
For Environments that are more complicated than a simple list of events,
it is possible to use CSS to import text files of these events. Example
code for reading events structured into subgroups is included in the
distribution as `css/include/read_event_gps.css', and can be used
as a starting point for reading various kinds of different formats. The
key function which makes writing these kinds of functions in CSS easy is
ReadLine
, which reads one line of data from a file and puts it
into an array of strings, which can then be manipulated, converted into
numbers, etc. This is much like the `awk' utility.
The read_event_gps.css
example assumes that it will be read into
a Script
object in a project, with three s_args
values that
control the parameters of the expected format. Note that these
parameters could instead be put in the top of the data file, and read in
from there at the start.