Meta Data can be used to enhance the information about particular fields, records, or documents.
Meta Data System | Meta Data Editor | I/O System
Each section is structured to address purpose, description, design considerations, scope of effect, and relevant code packages.
Information taken in by the Meta Data System may come in by either explicit or implicit means. Explicitly, information comes from the specific form that users may fill out.
Implicitly, information may come from ontology terms provided by the Ontology Services. Also, implicit meta data may come from an existing meta data file about the form.
Information that comes from the Meta Data Editor does not come through the Meta Data System but rather is put directly into the I/O System.
Purpose | To capture and isolate meta data about a document. Project35 was designed to accommodate large data files in scientific computing fields. A layer of meta data was added to the native format *.PDZ files so that data dissemination systems could interpret a small meta data file before having to find search criteria in the much larger data layer. |
---|---|
Description | Project35 has an internal system for maintaining meta data about documents. The kinds of meta data that are managed include:
Some of this information is provided by the end-users. The image below shows the dialog that appears when they select the "Describe this document" feature from the Options menu. The title, author, e-mail, institution and description values are saved as part of the meta data for the document and stored in the *.META layer.
Meta Data Dialog
The remaining meta data are captured automatically. Project35 monitors how many instances of each kind of record appear in the document. When end-users use ontology services to mark-up form fields, the tool asks the services to provide provenance data about all the selected terms. These data are also stored as meta data.
Meta data are stored as an information layer within the
*.PDZ native file format. The layer is expressed as an
XML data file that is defined by a schema located at:
Walkthrough for Capturing Ontology Term Meta DataThe process of committing meta data about ontology terms to file begins when an end-user right clicks on the label of a form field that supports ontology services. This walkthrough traces the activity in the desktop deployment of the tool.
If an instance of
When an ontology service is selected, the
Project35 makes use of a
The The Project35 Meta Data Editor
The Project35 Meta Data Editor is an instance of Project35 that has
been customised to edit the meta data layer of *.PDZ files. The forms
for the tool are generated by the schema described in The Meta Data Editor allows data curators to alter the meta data layer independently of the data layer. The editor is designed to let them annotate the document with terms that come from the same ontology services that are available to a regular end-user. |
Design Considerations | In early versions of the software, the meta data system was limited to the part of the IO system that recorded ontology identifiers in the *.META layer of *.PDZ files. Each time a user selected an ontology term, the software would remember data about the inserted term. These data were serialised to the *.META file when a *.PDZ file was saved.
Eventually the design began to recognise a Data Curator as a distinct
role in the system. Data curators are concerned with ensuring that a
document will be found in a search. The *.META began to
include more data. Eventually, there was a need to standardise the
contents of the *.META layer and an XML schema was
devised to describe meta data attributes. The Project35 Meta Data
Editor was created. This is an instance of Project35 forms that are
generated from the meta data schema located at |
Scope of Effect | Project35’s meta data classes are used extensively by the ontology services and are used by the native file format I/O classes to create the *.META layer in each *.PDZ file. |
Relevant Code Packages |
Most of the meta data classes appear in the |
Purpose | To allow archivists to evaluate or modify the meta data used to describe a document. |
---|---|
Description | Project35 supports a system dedicated to managing meta data about documents produced by the generated data entry tools. Meta data managed by the software are serialised in a *.META xml file that is stored inside each native format *.PDZ file, managed by the I/O System. The software comes with the Project35 Meta Data Editor, a utility tool that allows archivists to edit the contents of the *.META layer of a *.PDZ file. In many cases, archivists can expect to modify meta data that Project35 automatically manages for each document. For example, Project35 automatically records which ontology terms an end-user used to mark-up form fields in the *.META file of a *.PDZ document. Archivists can post-annotate the *.META file using the same ontology services that are available to document authors. This would be useful in cases where old XML documents are described by new classification ontologies, or where document authors have made little use of existing controlled vocabularies to standardise aspects of their documents. Like the Project35 Configuration Tool, the Project35 Meta Data Editor is generated using Project35's form generation engine.
Architecture for the Project35 Meta Data Editor
The meta data that are managed by Project35 for each document are
defined in the XML schema whose path is:
Several of the default File menu features have been over-ridden with similar features that manipulate the *.META layer of a *.PDZ file. These include the general purpose plugins:
The Meta Data Editor uses an ontology service that relies on the
ontology source
The editor also features
|
Design Considerations |
In early versions of the software, meta data were stored in an ad-hoc manner within each *.META file. As more fields were added to the meta data file, it became clear that the gathering of meta data warranted its own special editor tool. The tool would be used by archivists, who had needs which differed from both document authors and people searching for documents. The bespoke code for managing the *.META data layer was replaced by a tool which was driven off a formal specification of meta data maintained by Project35. |
Scope of Effect | Changes made to to the Project35 Meta Data Editor would impact the
following areas:
|
Relevant Code Packages | Most aspects of the Project35 Meta Data Editor are generated by the
main Project35 code base used to generate other data entry tools.
Code that specifically supports features in the editor is managed in
package project35.metaData .
|
Purpose | To store and retrieve form data managed by Project35. |
---|---|
Description | Project35 normally stores a data set as a zipped file ending in a *.PDZ file extension. The zipped file contains a number of XML files, each of which represents a layer of information. Currently there are two layers: the data layer and the meta-data layer. The data layer is represented by the *.PDR file and contains the text that would appear in form fields. The tags found in the data layer will be defined in the target schema used to drive the data entry application.
The meta-data layer is represented by the *.META file and contains meta data about the data set, including basic information about the author and about all the ontology terms which were used to mark-up form fields. The tags found in the meta-data layer are defined in the schema: The I/O system for creating *.PDZ files can be extended to include other information layers (See Architecture Extensions). Project35 can export a data set as an XML file that will only contain information from the data layer. This export feature appears in the "Export to Final Submission Format" menu option but will probably be relabelled something more appropriate in the future. |
Design Considerations | Use of LayersOriginally, Project35 stored a data set as a single XML file. The need to store a data set as a collection of layers arose from the development of ontology services. Initially, ontology services provided text phrases that would be pasted into forms. However, an ontology term is not adequately represented by a word phrase. Ontology terms were eventually redesigned to use a human-readable label and a machine-readable identifier.
Although the labels for selected ontology terms were stored into form
fields, Project35 needed some mechanism for storing information about
the unique identifiers. Initially, ontology terms were written as
hyperlinks in the XML data file. They were stored in the form The problem with this approach was that data sets marked up with ontology terms would fail to validate against the XML schema. This was because the schema would not describe the '<a>' tag which appeared within the tags for a form field. Rather than treating '<a>' as a tag with special significance, I decided to store ontology term identifiers in a separate meta data file that would accompany the data file. Project35 was modified so that its data sets were stored in ZIP files that contained multiple information layers.
The Structure of Project35’s Native Format *.PDZ File
It contains a *.PDR file which
holds the form data and a *.META file which holds the
meta data about the data set. The data held in the *.PDR
layer validates against the XML Schema and the data held in the
*.META layer validates against the meta data schema
defined in Changing ParsersThe software used to rely entirely on the DOM parser and still uses it for parsing the meta data file. The DOM parser works by parsing an XML file and producing an in-memory tree of DOM model objects. The API for DOM objects made it easy to extract information from the XML file. The parser performs well with small data sets but exhibited performance problems when it was used to process large data sets. This is because the parser loaded an entire XML file into memory before the DOM objects could be used. The application experienced great performance gains in reading files when some of the I/O classes began using the SAX parser. Support for StreamsI/O files were modified so they could accept data streams instead of just files. This was done to make it easier to deploy Project35 as a component rather than as a standalone application. In a component mode of activity, Project35 may receive its data input as a stream coming directly from another component. Creating the "Export to Final Submission" FeatureThe software used to allow end-users to export native format *.PDZ files to *.XML files that only contained the data layer of information. The *.XML files tended to be candidate files for submission to data repositories. I thought it was a good idea to rename this format to "Export to Final Submission Format" and cause the menu feature to validate the document. If there were any errors, Project35 would not create the *.XML file. This action ensured that end-users fixed all the errors before they sent their files off to repository managers. Providing Support for the Meta Data LayerFor most of its development cycle, Project35 has saved an arbitrary collection of meta data in the *.META layer. Typically this focused on recording which ontology terms were used to tag a particular kind of schema concept such as a form field or record. Whenever a new attribute was added, it resulted in changes made to special I/O routines which read and wrote meta data records. In 2007, the *.META layer was given its own distinct schema for meta data. Each Project35 tool now loads the Project35_meta_data model and uses a special context variable (See Accessing Application Variables) to help read and write meta data records. These records are maintained independently of the form data end-user edit through the normal use of the tool. A new utility has been designed which will allow data curators to edit just the meta data layer of a given *.PDZ file. The Project35 Meta Data Editor uses the same "project35_meta_data" model but allows curators to post-annotate a *.PDZ file. Curators can now remove ontology terms that were used to tag records and fields. Alternatively, they can add more terms using the same ontology services that are available to end-users. With the new support for the *.META layer, data curators can change the meta data about a data set without editing the data themselves. The layers can be maintained completely independent of one another. |
Scope of Effect |
Most of the I/O classes are defined in |
Relevant Code Packages |
The I/O classes appear in |