Overview Sections:  Overview of Architecture*  |   Service Anatomy  |   Service Families  |   Accessing Application Variables

Detailed Architectures:  Core*  |   Validation  |   ID Generation  |   Ontologies  |   Meta Data  |   Functional Specs  |   Alerts  |   Plugins
* If you are looking at this documentation for the first time, we recommend you start here for each section.


Architecture for Meta Data

Meta Data can be used to enhance the information about particular fields, records, or documents.


Meta Data Architecture Diagram for Project35


Meta Data Scenario

  1. End users can explicitly trigger this process by adding their own meta data information describing the current data set by means of a form that is made availaible in the generated application and allows the users to specify general meta data such as the title, abstract, author of the current document/data set.
  2. The Meta Data System also relies on implicit triggers such as when users insert selected terms provided by ontology services. When users select their terms, the Meta Data System attempts to determine as much information about those terms as possible. It asks the ontology service to provide meta data about selceted terms.
  3. The meta data for data sets are stored in a meta data file. This is an XML file which is stored as a layer inside the native file format. The Meta Data System uses this file to remember meta data that is introduced through each editing session of a file.
  4. The meta data file and the native file format are both managed by the I/O System.
  5. The Meta Data Editor is designed to allow archivists to alter existing meta data managed in the meta data file. The Meta Data Editor allows them to post annoate data sets in a way that does not affect the original form data.


Key Reference Sections

Meta Data System  |   Meta Data Editor  |   I/O System

Each section is structured to address purpose, description, design considerations, scope of effect, and relevant code packages.


Meta Data System Inputs

Information taken in by the Meta Data System may come in by either explicit or implicit means. Explicitly, information comes from the specific form that users may fill out.

Implicitly, information may come from ontology terms provided by the Ontology Services. Also, implicit meta data may come from an existing meta data file about the form.

Information that comes from the Meta Data Editor does not come through the Meta Data System but rather is put directly into the I/O System.


Reference Sections for Meta Data Architecture

Meta Data System

Purpose

To capture and isolate meta data about a document. Project35 was designed to accommodate large data files in scientific computing fields. A layer of meta data was added to the native format *.PDZ files so that data dissemination systems could interpret a small meta data file before having to find search criteria in the much larger data layer.



Description

Project35 has an internal system for maintaining meta data about documents. The kinds of meta data that are managed include:

  • Summary information including the title, author, e-mail, institution and description of the data set;
  • The number of each kind of record that appears in the data set;
  • Provenance data about all the ontology terms that are used to mark-up form fields.

Some of this information is provided by the end-users. The image below shows the dialog that appears when they select the "Describe this document" feature from the Options menu. The title, author, e-mail, institution and description values are saved as part of the meta data for the document and stored in the *.META layer.

Meta Data Dialog

The remaining meta data are captured automatically. Project35 monitors how many instances of each kind of record appear in the document. When end-users use ontology services to mark-up form fields, the tool asks the services to provide provenance data about all the selected terms. These data are also stored as meta data.

Meta data are stored as an information layer within the *.PDZ native file format. The layer is expressed as an XML data file that is defined by a schema located at: .\models\project35_meta_data\model\project35_meta_data.xsd. Using the Project35 Meta Data Editor, data curators can edit the meta data file independently of the form data. They can edit summary information or ontology terms to suit changes in the way documents are classified in a data repository. The following sections describe aspects of the meta data system in more detail.

Walkthrough for Capturing Ontology Term Meta Data

The process of committing meta data about ontology terms to file begins when an end-user right clicks on the label of a form field that supports ontology services. This walkthrough traces the activity in the desktop deployment of the tool.

If an instance of project35.desktopDeployment.TextFieldView has been linked with ontology services, the object associates its starred form label with project35.soa.ontology.views.OntologyServiceManager. This class is responsible for presenting the available ontology services to end-users and ensuring the selected terms appear in the text field.

OntologyServiceManager delegates the task of listening to right-click mouse actions to a ServiceMenuListener class. When the ServiceMenuListener detects a right-click over the form label, it causes the OntologyServiceManager to generate a popup menu of available services. If a service has less than 40 terms, it attempts to render them as menu items. Otherwise, it displays a "Select terms..." buttons which causes an OntologyViewer to display the terms.

When an ontology service is selected, the OntologyViewer is associated with an OntologyTermSelectionListener. When end-users have selected terms for mark-up, the viewer notifies the OntologyTermSelectionListener. The listener is then supposed to ask the viewer to return meta data about each term that has been selected. OntologyViewer obliges by returning a collection of OntologyTermProvenance objects.

Project35 makes use of a DefaultOntologyTermListener which performs the mark-up action. It adds the OntologyTermProvenance objects to OntologyContext, which maintains information about the content of fields that are currently displayed. OntologyContext in turn adds the provenance objects to OntologyTermProvenanceManager. This object maintains information about all the ontology terms that have been used to mark-up the whole document.

The OntologyTermProvenanceManager is owned by a DocumentMetaData object, which holds information about meta data for the whole document. This is the object that is used by NativeFileFormatWriter and NativeFileFormatReader to serialise the meta data information to a *.META XML file.

The Project35 Meta Data Editor

The Project35 Meta Data Editor is an instance of Project35 that has been customised to edit the meta data layer of *.PDZ files. The forms for the tool are generated by the schema described in .\models\project35_meta_data\model\project35_meta_data.xsd. The code used to make the plugins is explained more in the Plugin System section.

The Meta Data Editor allows data curators to alter the meta data layer independently of the data layer. The editor is designed to let them annotate the document with terms that come from the same ontology services that are available to a regular end-user.



Design Considerations

In early versions of the software, the meta data system was limited to the part of the IO system that recorded ontology identifiers in the *.META layer of *.PDZ files. Each time a user selected an ontology term, the software would remember data about the inserted term. These data were serialised to the *.META file when a *.PDZ file was saved.

Eventually the design began to recognise a Data Curator as a distinct role in the system. Data curators are concerned with ensuring that a document will be found in a search. The *.META began to include more data. Eventually, there was a need to standardise the contents of the *.META layer and an XML schema was devised to describe meta data attributes. The Project35 Meta Data Editor was created. This is an instance of Project35 forms that are generated from the meta data schema located at .\models\project35_meta_data.



Scope of Effect

Project35’s meta data classes are used extensively by the ontology services and are used by the native file format I/O classes to create the *.META layer in each *.PDZ file.



Relevant Code Packages

Most of the meta data classes appear in the project35.metaData package. The OntologyTermProvenanceManager used to manage meta data about ontology terms appears within the project35.soa.ontology.provenance package.

Back to top


Meta Data Editor

Purpose To allow archivists to evaluate or modify the meta data used to describe a document.


Description

Project35 supports a system dedicated to managing meta data about documents produced by the generated data entry tools. Meta data managed by the software are serialised in a *.META xml file that is stored inside each native format *.PDZ file, managed by the I/O System.

The software comes with the Project35 Meta Data Editor, a utility tool that allows archivists to edit the contents of the *.META layer of a *.PDZ file. In many cases, archivists can expect to modify meta data that Project35 automatically manages for each document. For example, Project35 automatically records which ontology terms an end-user used to mark-up form fields in the *.META file of a *.PDZ document. Archivists can post-annotate the *.META file using the same ontology services that are available to document authors. This would be useful in cases where old XML documents are described by new classification ontologies, or where document authors have made little use of existing controlled vocabularies to standardise aspects of their documents.

Like the Project35 Configuration Tool, the Project35 Meta Data Editor is generated using Project35's form generation engine.

Architecture for the Project35 Meta Data Editor

The meta data that are managed by Project35 for each document are defined in the XML schema whose path is: .\models\project35_meta_data\model\project35_meta_data.xsd. The configuration options for each of the form concepts supported by the editor are defined at: .\models\project35_meta_data\config\ConfigurationFile.xml.

Several of the default File menu features have been over-ridden with similar features that manipulate the *.META layer of a *.PDZ file. These include the general purpose plugins:

  • OpenMetaDataFile
  • SaveMetaDataFile
  • CloseMetaDataFile
  • ExitMetaDataEditor

The Meta Data Editor uses an ontology service that relies on the ontology source MetaDataOntologySource. Like the Project35ConfigurationOntologySource used in the Project35 Configuration Tool, this service is used to supply the names of records and fields that are defined in the main data entry XML schema. The terms for schema concepts are used to fill in the "name" field of record_meta_data and field_meta_data records that are maintained in the meta data file.

The editor also features OntologyTermValidationService, a Validation Service which generates warnings for two cases:

  • An ontology term is not associated more than once with the same kind of record field;
  • A form field which supports ontology services, has populated instances but which has no entries for ontology term meta data records. This indicates that users decided not to populate a field with ontology terms even though that field supports mark-up using ontology services.



Design Considerations

In early versions of the software, meta data were stored in an ad-hoc manner within each *.META file. As more fields were added to the meta data file, it became clear that the gathering of meta data warranted its own special editor tool. The tool would be used by archivists, who had needs which differed from both document authors and people searching for documents.

The bespoke code for managing the *.META data layer was replaced by a tool which was driven off a formal specification of meta data maintained by Project35.



Scope of Effect Changes made to to the Project35 Meta Data Editor would impact the following areas:
  • The XML schema and ConfigurationFile.xml files found in the model "project35_meta_data";
  • Code found in the package project35.metaData;
  • The *.META XML file that is stored inside each *.PDZ file.


Relevant Code Packages Most aspects of the Project35 Meta Data Editor are generated by the main Project35 code base used to generate other data entry tools. Code that specifically supports features in the editor is managed in package project35.metaData.

Back to top


I/O System

Purpose To store and retrieve form data managed by Project35.


Description

Project35 normally stores a data set as a zipped file ending in a *.PDZ file extension. The zipped file contains a number of XML files, each of which represents a layer of information. Currently there are two layers: the data layer and the meta-data layer. The data layer is represented by the *.PDR file and contains the text that would appear in form fields. The tags found in the data layer will be defined in the target schema used to drive the data entry application.

The meta-data layer is represented by the *.META file and contains meta data about the data set, including basic information about the author and about all the ontology terms which were used to mark-up form fields. The tags found in the meta-data layer are defined in the schema: ./models/Project35_meta_data/model/Project35MetaData.xsd.

The I/O system for creating *.PDZ files can be extended to include other information layers (See Architecture Extensions).

Project35 can export a data set as an XML file that will only contain information from the data layer. This export feature appears in the "Export to Final Submission Format" menu option but will probably be relabelled something more appropriate in the future.



Design Considerations

Use of Layers

Originally, Project35 stored a data set as a single XML file. The need to store a data set as a collection of layers arose from the development of ontology services. Initially, ontology services provided text phrases that would be pasted into forms. However, an ontology term is not adequately represented by a word phrase. Ontology terms were eventually redesigned to use a human-readable label and a machine-readable identifier.

Although the labels for selected ontology terms were stored into form fields, Project35 needed some mechanism for storing information about the unique identifiers. Initially, ontology terms were written as hyperlinks in the XML data file. They were stored in the form label.

The problem with this approach was that data sets marked up with ontology terms would fail to validate against the XML schema. This was because the schema would not describe the '<a>' tag which appeared within the tags for a form field. Rather than treating '<a>' as a tag with special significance, I decided to store ontology term identifiers in a separate meta data file that would accompany the data file. Project35 was modified so that its data sets were stored in ZIP files that contained multiple information layers.

The Structure of Project35’s Native Format *.PDZ File

It contains a *.PDR file which holds the form data and a *.META file which holds the meta data about the data set. The data held in the *.PDR layer validates against the XML Schema and the data held in the *.META layer validates against the meta data schema defined in .\models\project35_meta_data\model\Project35MetaData.xsd.


Changing Parsers

The software used to rely entirely on the DOM parser and still uses it for parsing the meta data file. The DOM parser works by parsing an XML file and producing an in-memory tree of DOM model objects. The API for DOM objects made it easy to extract information from the XML file. The parser performs well with small data sets but exhibited performance problems when it was used to process large data sets. This is because the parser loaded an entire XML file into memory before the DOM objects could be used. The application experienced great performance gains in reading files when some of the I/O classes began using the SAX parser.

Support for Streams

I/O files were modified so they could accept data streams instead of just files. This was done to make it easier to deploy Project35 as a component rather than as a standalone application. In a component mode of activity, Project35 may receive its data input as a stream coming directly from another component.

Creating the "Export to Final Submission" Feature

The software used to allow end-users to export native format *.PDZ files to *.XML files that only contained the data layer of information. The *.XML files tended to be candidate files for submission to data repositories. I thought it was a good idea to rename this format to "Export to Final Submission Format" and cause the menu feature to validate the document. If there were any errors, Project35 would not create the *.XML file. This action ensured that end-users fixed all the errors before they sent their files off to repository managers.

Providing Support for the Meta Data Layer

For most of its development cycle, Project35 has saved an arbitrary collection of meta data in the *.META layer. Typically this focused on recording which ontology terms were used to tag a particular kind of schema concept such as a form field or record.

Whenever a new attribute was added, it resulted in changes made to special I/O routines which read and wrote meta data records. In 2007, the *.META layer was given its own distinct schema for meta data. Each Project35 tool now loads the Project35_meta_data model and uses a special context variable (See Accessing Application Variables) to help read and write meta data records. These records are maintained independently of the form data end-user edit through the normal use of the tool.

A new utility has been designed which will allow data curators to edit just the meta data layer of a given *.PDZ file. The Project35 Meta Data Editor uses the same "project35_meta_data" model but allows curators to post-annotate a *.PDZ file. Curators can now remove ontology terms that were used to tag records and fields. Alternatively, they can add more terms using the same ontology services that are available to end-users.

With the new support for the *.META layer, data curators can change the meta data about a data set without editing the data themselves. The layers can be maintained completely independent of one another.



Scope of Effect

Most of the I/O classes are defined in project35.io package. Whereas the meta data used to be managed by project35.io.MetaDataReader and project35.io.MetaDataWriter classes, meta data records are now written using the normal Project35DataFileReader and Project35DataFileWriter classes respectively. Most of the I/O packages are called in the project35.desktopDeployment.FileMenu or project35.tabletDeployment.FileMenu classes.



Relevant Code Packages

The I/O classes appear in project35.io. Project35DataFileReader/Writer are used to manage the .PDR files that represent the data layer of each data set. NativeDataFileReader/Writer uses these classes when it manages the zipped .PDZ files. XMLSubmissionFileReader/Writer wraps Project35DataFileReader/Writer and produces *.XML files.

Back to top