Tidying up SWO

I spent some time over the past few days performing a little bit of standardisation and tidying prior to the new software class additions that Duncan and I will be working on in the coming weeks. This post lists the changes made to each file and the reasons for those changes.

A note on new identifiers: I have begun each new ID with the single-number prefix that particular file already has (e.g. a “9” for swo_licenses.owl or a “0” for swo_core.owl) and then taken the 4000 range, as it looks like no other identifiers are in that range. For example, new classes in swo_core.owl would start with SWO_0004000 and increase in integer increments from there. The 4000 range restarts from the initial 4000 value in each separate OWL file, but there will be no overlaps due to the number prefix unique to each file. However, the only place new identifiers were needed was in the swo_core.owl file, and therefore if we wish to change the way identifiers are built, this can still be done easily. Please note, however, that due to the moving of axioms from the originalsoftwareontology.owl file to other appropriate OWL files, the single-number prefix convention is not consistent throughout the files.

A note on naming: For naming conventions, we are using the OBO principles found here: http://obofoundry.org/wiki/index.php/Naming . The only alteration to these rules is that our classes should begin with initial capital letters, unless dictated otherwise by the formal or proper noun being used.

SWO Original Ontology file: originalsoftwareontology.owl

This file is now obsolete. All classes have been moved to appropriate locations within the other SWO OWL files, and the direct import of this file within swo_core has been deleted.

This section describes the movement of classes and properties from this file. For the rest of this section, “original file” refers to originalsoftwareontology.owl, and “core file”, “algorithm file” etc. refer to swo_core.owl, swo_algorithms.owl etc. All changes were done using the Protege 4.1 “move axioms” refactoring tool.

Algorithm hierarchy

The algorithm hierarchy was checked for duplicates within the original file and the algorithm file. Clustering algorithm was the name of a class in both files. In the algorithm file, this class name label belongs to an OntoDM class, while in the original file it belongs to SWO_0000500. As the SWO version is used elsewhere in the file, I have simply renamed it to clustering algorithm orig until all other tidying has been completed. At this time, alignment can occur between the two classes (see section below).

All other children of algorithm were then moved en masse to the algorithm file. As these algorithms have not yet been further classified within the OntoDM hierarchy present within the algorithm file, no attempt was made to perform further classification under OntoDM. That is a larger refactoring exercise than is warranted in this tidying up session.

All axioms using the object property implements were then moved to the core file, as this object property belongs with that file as, although the property has no stated domain, it makes sense that the domain is software.

A spelling mistake was fixed for modied version of the GLAD algorithm (Gain and Loss Analysis of DNA) (changed to “modified”).

Data Hierarchy

The data hierarchy of classes cannot have duplicates as there is no such hierarchy within the swo_data.owl file. The data class and all of its children were moved to swo_data.owl. However, there is an object property duplication for is specified output of, where one is from OBI and one is from SWO. Both are the inverse properties for has specified output. This needs to be fixed after tidying. Finally, all axioms for has specified input, has specified output, is specified input of and is encoded by have been moved to swo_core.owl.

Removed underscores from has specified input and has specified output.

Data Format Hierarchy

The data format hierarchy contains some duplicates. There is a .java class within the original file as well as within the data file. This class duplication is recorded in the appropriate section below. Other classes are also duplicated in this hierarchy, and are listed in the appropriate section below.

Objective Specification Hierarchy

Firstly, all achieves objective axioms are moved from the original file to the core file. Then, all objective specifications were moved from the original file to the objectives file.

Programming Language Hierarchy

As discussed with James, the 2 classes and associated axioms within the programming language hierarchy have been moved to the core file from the original file.

Software Hierarchy

All software axioms were moved to the core file from the original file. One software class, gcRMA quantification, was not properly asserted under software. After moving to the core file from the original file, the assertion was added.

Material Entity Hierarchy

All material entity, and therefore organization (as the only child of material entity in the original file) axioms were moved from the original file to the organization file. The OBI object property is_manufactured_by has been moved from the original file to the organization file. All instances of organization were also moved.

Process Hierarchy

information processing, the only class in this hierarchy in the original file, has been moved to the core file. However, it is a duplicate with the class of the same label (but different ID) in the core file. This duplication has been listed in the appropriate section below.

Role Hierarchy

Two roles in the original file, publisher role and software developer role, were moved to the organization file as these roles are primarily used for organizations. The OBI object property has_role has also been moved to the organization file from the core file.

Object Properties not already covered

The following properties were moved from the original file to the core file, where restrictions had in any case already been asserted on those properties: has_format_specification, has_legal_status, has_part, is about, is specified input of, is_developed_by (this is a duplicate – see appropriate section below), is_encoded_in, is_published_by, is_specified_output_of.

SWO Algorithm OWL file: swo_algorithms.owl

Update label as provided by IAO for information artifact. Other class labels uppercased.

The algorithm OWL file consists mainly of imports: the process class is the BFO class ProcessualEntity, and the sole child of process is algorithm execution and its children. Algorithm execution and its three children are imports from OntoDM, and therefore need no tidying at this stage. There has been a discussion of refactoring the ontology and removing algorithm execution and its children as an unnecessary (for our modelling purposes) duplication of the tree within algorithm, but this has not yet happened (see the tracker item https://sourceforge.net/tracker/?func=detail&aid=3527364&group_id=406800&atid=1689463 for more information).

Information artifact is imported from the IAO, and its children are also imports, mainly from OntoDM. information artifact was updated to information content entity to reflect the changes made by IAO itself. No further changes are required for this set of classes.

Three classes which were partially modelled within this file (SWO_0000254 and SWO_0000609 for the core file, and SWO_0000661 for the data file) were moved from algorithm to their appropriate files. A number of other class labels were converted to have their first letter uppercased.

SWO Data OWL file: swo_data.owl

Class names changed to start with uppercase letters, as per naming conventions.

Other than Taverna workflow format, pdbml and html, all classes below the top-level data format specification are SWO classes. Data format specification itself is an IAO class. All SWO classes contain identifiers, and no typos were found. Please note, however, that a tracker item https://sourceforge.net/tracker/?func=detail&aid=3527368&group_id=406800&atid=1689463 has been opened for text file format and plain text format, as neither of these classes have definitions and it is unclear when to use one class over the other.

A number of classes had the first letter of their labels changed to uppercase.

SWO Interface OWL file: swo_interface.owl

Class names changed to start with uppercase letters, as per naming conventions.

All classes are SWO classes. All SWO classes contain identifiers, and no typos were found.

Classes where the first letter of the label was changed to uppercase: software interface.

SWO Licenses OWL file: swo_licenses.owl

Spelling changes were made. Class names changed to start with uppercase letters, as per naming conventions.

All classes are SWO classes. All SWO classes contain identifiers, and no typos were found. Please note a number of classes seem to deviate from the “9” identifier prefix that most classes in this file use. Examples include Academic Licence version 3 and distribution clause and its children, which start with 10, and software license, which starts with “0”. I will not be changing these identifier, but it is worth noting if someone is adding new classes to this file.

While there are no typos, there are two competing spellings for a single word: “license” and “licence”. This happens throughout the file. As the file name and the ontology URI both use the spelling “license”, I have modified the following classes to reflect this spelling. Please note I have checked all licenses’ naming scheme confirmed that their official name includes the spelling “license”. Changed from “licence” to “license”: Academic Licence version 3, Eclipse Public Licence, FreeBSD Licence (please note that the comment for this class, “2 clause BSD Licence”, was also changed to “License”), Latex Project Public Licence, Lesser GNU Public Licence, MIT Licence, Modified BSD Licence (please note that the comment for this class, “3 clause BSD Licence” was also changed to “License”), Mozilla Public Licence, Open Public Licence, The Artistic Licence 1.0, licence without restrictions on derivatives.

A number of classes had the first letter of their labels changed to uppercase. creative commons was changed to Creative Commons to match naming convention by the CC themselves.

SWO Maturity OWL file: swo_maturity.owl

Update label as provided by IAO. Class names changed to start with uppercase letters, as per naming conventions.

All classes except the top level IAO_0000030 are SWO classes. All SWO classes contain identifiers, and no typos were found. However, the imported class IAO_0000030 does not have a label. As there is some question about the structure of the imported IAO classes in this ontology, I added the following comment to the tracker https://sourceforge.net/tracker/index.php?func=detail&aid=3527366&group_id=406800&atid=1689463 and submitted it to the mailing list for discussion. An answer was provided, and information artifact was updated to information content entity to reflect the changes made by IAO itself.

Classes where the first letter of the label was changed to uppercase: development status,, first release, latest release, live, maintained, obsolete, release candidate, superseded.

alpha and beta remain in lower case as these terms are commonly used in the programming community in their lower case form.

SWO Objective OWL file: swo_objectives.owl

Class names changed to start with uppercase letters, as per naming conventions.

Added a label (‘objective specification’) for IAO class IAO_0000005 at the top of the hierarchy. All SWO classes contain identifiers, and no typos were found. One SWO class does not align with the naming conventions. A tracker item has been submitted, but no change to this class has been made in the OWL file yet: https://sourceforge.net/tracker/?func=detail&aid=3528798&group_id=406800&atid=1689463 .

A number of classes had the first letter of their labels changed to uppercase.

SWO Organization OWL file: swo_organizations.owl

Class names changed to start with uppercase letters, as per naming conventions.

All SWO classes contain identifiers, and no typos were found. A number of classes had the first letter of their labels changed to uppercase.

SWO Versions OWL file: swo_versions.owl

No changes.

The only classes and properties were imported, and nothing needed to be changed with respect to those imported classes.

SWO Core OWL file: swo_core.owl

Class names changed to start with uppercase letters, as per naming conventions. IDs added.

I have assumed that the casing of all software is correct, and haven’t added an uppercase to anything currently in lowercase. However, it may be that some should be in uppercase and these can be modified as the classes are individually revisited in the future. Underscores were also removed in all properties asserted within this file.

New Identifiers

The new identifiers in the core file are as follows:

  • SWO_0004000: gave identifier SWO_0004000 to the object property ‘has version’
  • SWO_0004001: the has_interface object property has been renamed without the underscore, and has had its id changed as it did NOT have a SWO structured ID.
  • SWO_0004002: the has format specification object property did not have an identifier.
  • SWO_0004003: the is developed by object property did not have an identifier. Please note this class may soon become obsolete as it is a duplicated class (see appropriate section below).
  • SWO_0004004: the is published by object property did not have an identifier.
  • SWO_0004005: the is executed in object property did not have an identifier.
  • SWO_0004006: the has website homepage object property did not have an identifier.

Duplicated / Equivalent classes

These classes should be fixed after tidying.

  1. Clustering algorithm: This label is present both in the original file (as SWO_0000500) and in the algorithm file as an import from OntoDM. These two classes can be deemed equivalent.
  2. is specified output of: There is an object property duplication for is specified output of, where one is from OBI and one is from SWO. Both are the inverse properties for has specified output.
  3. .java file: This class is present both in the data file and in the original file. As both are used by other classes, we cannot simply delete one of them.
  4. TIFF (data file) and TIFF image (original file).
  5. XML (data file and original file).
  6. html (data file and original file).
  7. JPEG (data file) and jpeg image (original file).
  8. pdf (data file and original file).
  9. PNG (data file) and png image (original file).
  10. PostScript (data file) and ps image (original file). An equivalence statement already seems to be marked for these two classes.
  11. Information processing: Present both in the original file and in the core file, and each have different identifiers. Currently there is no usage of the original file class, so should be safe to simply delete it.
  12. is developed by and is_developed_by. The former is from the core file and has no identifier, and the latter is from the original file and is associated with a SWO identifier.
  13. Duplicated instances within organization: MathWorks, Stanford University.

 

 

Advertisements
This entry was posted in ontology and tagged . Bookmark the permalink.

One Response to Tidying up SWO

  1. Pingback: Obsolescence of Duplicated Classes, Properties and Instances « SWOP

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s