Assorted modifications December 2014

This is a catch-all blog post outlining the final modifications I made to SWO prior to the 1.5 release. This will be the last bit of work I am doing (for the moment) on SWO as I have reached the end of this contract – it’s been fun!

  • Resolved tracker items
  • has specified data input / has specified data output: We decided to put these two properties as children of the RO properties “has input” and “has output”. Our properties have a very specific meaning and should remain distinct, but it also makes sense that they are children of the more general relations.
  • The original intent of is encoded in was to link both a programming language to software *and* a data format specification to data. This is exemplified by the original definition in SWO of is encoded in: “Relationship between data or software and the format used to specify the syntax”. Since that time, we have introduced the relation has format specification to use for the latter situation between data and data format. To make its association with is encoded in clear, we have refactored it as a child of is encoded in. The definitions have been modified accordingly.
  • moved has declared status, has clause and has license to be children of is about, which is an IAO term whose domain and range are information content entities. This does not change any modelling or anything within SWO other than the hierarchy of these relationships to make things more clear.
  • Added the software Segway 1.2 and associated classes
  • Added a number of formats as described in
  • Back in 2011, James wrote this post about data I/O. Basically it said we might model software and its data I/O in one of two ways: firstly, via “is executed in” and the associated information processing; secondly, more directly from data to software without this “is executed in”. The justification for the first was that data I/O is only relevant during execution, and the justification for the second was that “the data is declared at the level of the software and whether this is used in the process is effectively irrelevant for the primary description; the software can use these sorts of data and that is enough to describe it.”
    In practice the decision was to connect data and software more directly, without the “is executed in”. We still use “is executed in” a lot in SWO, but mainly to link software to types of information processing without tying that processing to data I/O. This decision was codified in the SWO paper and associated diagrams. There was no change to SWO for this, but it is worth including the discussion here so we know it was a deliberate change from earlier drafts of software modelling.
Posted in modeling, ontology | Leave a comment

Thoughts for the Future as of December 2014

I’m about to finish the current round of work on SWO, and there are a couple of issues outstanding that will need looking at by the next ontologist to work on SWO. I’ve also added these to the issue tracker.

  1. Software A references a particular version of Software B: In cases where we have a named or numbered version for Software A, this is straightforward. If an updated version of Software A comes out, we simply create a new class with that new version. However, in some cases, as has happened with EBI Muscle Web Tool, there was no obvious version of the web tool, and yet we know it referenced a particular version of Drive5 MUSCLE called MUSCLE 3.8.31. What happens when the web tool switches to a more up-to-date version? How do we name or otherwise model the change in the web tool? What’s the best way for situations like this?
  2. EDAM uses its own relationships output of / input of and their inverses. However, SWO already has such relationships via RO. It would make sense to align these two ontologies and have EDAM also use the RO relationships, if it is ontologically suited to EDAM. EDAM uses its own has input / has output, when RO already has these as well (RO_0002233 and RO_0002234). It would make sense to correctly model the RO terms (and their inverses which are already present in SWO anyway) and refactor EDAM accordingly.
  3. For EDAM: replace the class name “SBML file” with “SBML” or “SBML format”, as having “file” in a format class label seems a little misplaced.
  4. As the high level EDAM and SWO classes are now all aligned, we should tidy up the lower level classes within data, data format specification, etc. and sort them into the the nice lower hierarchies provided by EDAM.
  5. Ensure that all SWO URIs resolve properly to something sensible, and double check all external IDs.
Posted in modeling, ontology | Leave a comment

Definition and ID changes

I’ve recently been adding more natural language definitions. Mainly, the definitions were for higher level terms; while I am interested in adding definitions for terms such as “R”, the higher level ones are being completed first so as to provide clarity of intent. I’ll add to this document as I put in further definitions and other related changes in the next week or so.

  • Added definitions: followed by; directly followed by; has clause; has declared status; is alternative format of; is version of; input of; output of; directly preceded by; has website homepage; has download location; release candidate; attribution clause; attribution required; derivatives clause; derivative code linked same license; restrictions on derivative software; no restrictions on derivative software; derivative code same license;
  • Modified definitions for: has interface, is about; has part; has version; development status; generalization specification; ensemble specification; single generalization specification; clustering specification; pattern specification; predictive model specification; probability distribution specification.
  • Refactored external IDs (see
    • precedes / preceded_by:  We linked these RO classes using the URIs and ro#preceded_by (and used our own labels “followed by” and “preceded by”). These URIs used within SWO were outdated – they are actually BFO IDs (preceded_by) and BFO_0000063 (precedes). This is because RO makes use of BFO relations that are either uncontroversial (non-temporalized) parts of BFO2, or that will be incorporated in the future. The URIs have been changed accordingly in SWO via Protege refactoring.
    • participates_in / has_participant: In previous versions of RO they made use of unofficial non-temporalized BFO relations. To avoid potential clashes with temporalized relations with different meanings, they have since decided to create separate IDs with RO prefixes. These are available for participates_in (RO_0000056) and has_participant (RO_0000057). SWO was updated to reflect this. This required the obsoleting of SWO_9000059 (our own version of has_participant, “has participant”) and its replacement with RO_0000057. participates_in was already used within SWO correctly used the RO ID, with our own label of “participates in”.
    • has role: This object property currently has the URI of, however this class is now obsolete in OBI. We have therefore refactored the URI, replacing it with the equivalent RO term There seems to be no definition available from RO for this term, so we have not put a definition into SWO at this time.
  • External Classes without external definitions: The following classes do not have definitions within SWO as they have no definitions in their parent ontology: data processing algorithm execution, data visualization algorithm execution, has role.

Hierarchy Changes

  • derivative software and no derivative software: These two classes have been placed, seemingly mistakenly, as children of Restrictions on derivative software. The new structure has these two classes as the direct and only children of Derivatives clause. Then, the children of derivative software would be the both no restrictions on derivatives and restrictions on derivatives. Also, to make it clear when the classes are out of context (e.g. being discussed in a paper or somewhere else that isn’t within the ontology hierarchy itself), the labels for these classes have been changed so that it is clear these entities do not refer to software items themselves, but to clauses of software licenses. Therefore the new names for these classes are derivative software allowed and derivative software not allowed.
Posted in modeling, ontology | Tagged , | Leave a comment

Minimal annotation requested for all SWO classes

All SWO classes should have a minimum level of annotation. Some older classes do not yet have all required annotation, as the updating of these classes to the minimum requirements is ongoing.


  1. ID and label: These are basic requirements. If the class is a new SWO class, the ID should be autogenerated by the web service set up for that purpose. The label should be a string and have a clear, human-understandable meaning. If the class is an imported class, it should have the ID and label provided by that external ontology.
  2. Definition: External classes and SWO classes both should have natural language definitions, but are handled slightly differently. The definition is put under the definition annotation label (IAO_0000115).
    • External classes: SWO editors often wish to provide their own definition for external terms which provides more context for how the term is used within this ontology. In such cases an annotation property, SWO definition (SWO_0000144), is added to a second definition annotation. The SWO-specific definition is stored in the SWO file where the concept is modelled most fully. For example, the official definition for IAO algorithm (IAO_0000064) is stored within swo_core.owl, while the SWO-specific annotation is within swo_algorithm.owl:
    • SWO classes: Each SWO term should have a suitable natural language definition. The SWO definition annotation on the main definition annotation is not necessary in these cases, as the term originates from this ontology.
  3. Definition source: Any time a definition is added, the source of that definition should also be added (IAO_0000119). There can be 1 or more definition source annotations. For instance (as shown in the annotation for the software class), if Robert Stevens modified a wikipedia definition to get an appropriate SWO definition, both Robert and the wikipedia link are included as separate definition sources.


  1. synonyms: Synonyms can be used to provide alternative names for classes, where appropriate.
  2. Examples of usage: Example classes, whether internal or external, should not go in the definition itself, but placed in an example of usage annotation (IAO_0000112).
  3. creator: The creator annotation defines who created the class for SWO, and is only used for SWO classes and not external ones. However, in the past it has not been consistently used and is therefore not present for all classes. Definition source is used for the name of the ontologist who created the definition or modified a base definition from elsewhere. Creator is the name of the ontologist who created the class/property/individual itself, and should be used for new objects.

Requirements for Deprecated Classes, Properties and Individuals

Deprecated objects in the ontology need additional annotation to mark when and why they were obsoleted. Deprecated classes should have all axioms removed and refactored as a subclass of obsolete SWO class. Deprecated individuals/instances are refactored to be instances of obsolete SWO class. Deprecated object properties are refactored to become sub-properties of obsolete object property (there are no deprecated data properties as of the writing of this post). All deprecated objects in the ontology should have the following annotations added to whatever annotation already existed:

  1. owl#deprecated: The boolean annotation flag deprecated needs to be set to true. Many ontology editors such as Protege provide a graphical representation of this flag to show at a glance which classes are obsoleted (Protege crosses out the name of the class in the class hierarchy tab).
  2. contributor: Use to mark which ontologist deprecated the class/property/individual. The phrase used should be of the style “Marked as obsolete by [insert your name here].”
  3. obsoleted in version: Use to mark the object with whichever the next release (point release or full release as appropriate) of the ontology will be, e.g. “0.4”.
  4. reason for obsolescence: Use to provide a natural language explanation as to precisely why this class was deprecated.
Posted in modeling | Leave a comment

Natural Language Definitions

External classes and SWO classes both have natural language definitions, but are handled slightly differently. Examples of usage of any class, whether internal or external, should not go in the definition itself, but placed in an example of usage annotation (IAO_0000112).

Defining External Classes

Generally, external classes already have definitions. In these cases, we should retain the original definitions to show SWO users the imported ontology’s intended meaning. These definitions are marked, as would be expected, with the IAO definition annotation and the source of these definitions is labelled using the IAO definition source annotation. If the imported term is high level and used by more than one section of SWO, the imported definition is stored within swo_core.owl. An example of an imported definition can be seen for IAO_0000030, information content entity within swo_core.owl:

<obo:IAO_0000115>An information content entity is an entity that is 
generically dependent on some artifact and stands in relation of aboutness 
to some entity.</obo:IAO_0000115>

SWO editors often wish to provide their own definition for external terms which provides more context for how the term is used within this ontology. In such cases an annotation property, SWO definition (SWO_0000144), is added to a second definition annotation. The SWO-specific definition is stored in the SWO file where the concept is modelled most fully. For example, the official definition for IAO algorithm (IAO_0000064) is stored within swo_core.owl, while the SWO-specific annotation is within swo_algorithm.owl:

    <!-- -->

    <owl:Class rdf:about="&obo;IAO_0000064">
        <rdfs:subClassOf rdf:resource="&obo;IAO_0000030"/>
        <obo:IAO_0000115>An algorithm is a set of instructions for performing 
a paticular calculation.</obo:IAO_0000115>
        <owl:annotatedTarget>An algorithm is a set of instructions for performing a paticular calculation.</owl:annotatedTarget>
        <owl:annotatedSource rdf:resource="&obo;IAO_0000064"/>
        <owl:annotatedProperty rdf:resource="&obo;IAO_0000115"/>

Below is a list of external terms that have been modified recently according to the scheme above. Existing definitions (and sometimes examples of usage via the EFO example_of_usage annotation) were copied from the external ontologies. Where no definition source was present, the appropriate value was added (e.g. definition source “IAO”):

  • IAO: Information content entity IAO_0000030 (definition, definition source, example of usage), data format specification IAO_0000098 (definition, example of usage, definition source), programming language IAO_0000025 (definition source),
  • BFO: Material Entity (definition source, example of usage), process (example of usage, definition source), Role (definition source, example of usage)

Some external ontology terms had their definitions within a comment rather than within a definition. The problem was fixed for the following terms:

Defining Internal Classes

Each SWO term should have a suitable natural language definition. The SWO definition annotation on the main definition annotation is not necessary in these cases, as the term originates from this ontology. Below is an example of a natural language definition in SWO:

  • Knowledge representation role: A knowledge representation role is a role borne by a data format which utilizes formalisms to make complex systems easier to design and build. Knowledge representation is the field of artificial intelligence devoted to representing information about the world in a form that a computer system can utilize to solve complex tasks. (Definition source: modified from, accessed 10 February 2014)

Also this week…

  • example_of_usage ( was refactored to use the IAO annotation property of the same name (“example of usage”, IAO_0000112) to increase the coverage of IAO terms used for such housekeeping annotation.
  • references to the namespace “swo2” were removed, as they were unnecessary and cluttering up the ontology. Reasoner ran fine after changes.
Posted in Uncategorized | 1 Comment

Refactoring objective specification hierarchy

The refactoring of the objective specification hierarchy in SWO is part of a larger effort to merge the final distinct hierarchies in EDAM/SWO that can be usefully integrated. The EDAM operation hierarchy has been placed under information processing, and the SWO objective specification hierarchy will also be placed there. However, to do this we must drop the use of objective specification, as IAO says it sits within the information content entity hierarchy ( and not within the process hierarchy.

Changes made within SWO:

  1. Added domain of software and range of information processing to the object property is executed in, as well as a definition and other annotation. (See note 1 below).
  2. Moved all children of objective specification as children of information processing via refactoring in Protege 4. This required first moving all classes physically from swo_objectives and into swo_core.
  3. Refactored the URI for achieves planned objective to be, which is the URI for is executed in. This allowed, in just one step, for the ontology to remain satisfiable and for all of the axioms between software and processes (which used to be objectives) to be correct.
  4. Stated that EDAM operation is equivalent to SWO/ information processing.
  5. One class was marked as equivalent to Nothing after running the reasoner: OmniOutliner had an axiom ‘is executed in’ some ‘Outline document format’ which does not fit against the acceptable range of the is executed in property. This was changed to be linked to the information processing class Document outlining, which resolved the problem.

Still to do:

  1. information processing objective becomes inappropriate, and is cleaned up.
  2. swo_objectives is then no longer required and the file is removed from the imports in swo_core.

Note 1: on the use of is executed in:  Originally, we had thought to use these two properties in SWO rather than is executed in:

If we had used is specified input of to link software to the new processes, the domain of its inverse (has specified input) is the OBI term planned process. Therefore, the appropriate hierarchy under BFO ProcessualEntity would have needed to be added: process, and then planned process, as has specified input has a domain of planned process. The simplest way of doing this is to simply change the uri of the process class within SWO; it currently points to the BFO ProcessualEntity and we simply refactored the URI of the SWO process class to match instead the BFO process class. Then we would just add an intervening child of the OBI planned process class.

This wasn’t required in the end, as it was determined that the use of the already-present SWO is executed in would be better at this stage, and simpler. Further, we determined that is executed in axioms should really only be placed on Software classes rather than also having an inverse that could go on information processing classes. If we place the input/output on the process then we define the process of, say, data normalization in terms of software, which is presumably not correct as there are data normalizations that don’t involve software, or at least specific software. If we hang it the other way we define the software in terms being able to be executed in a process, making this essential to the software (which is still not perfect) but not essential and defining for the information processing class (since the is executed in is not inverse). Neither is perfect but no one could quite think of a better way of capturing it. The problem as always is the ‘potential to do something’. Word can be used to do word processing but it can also be used to do document conversion. To do it properly properly you’d need to have processes called Microsoft Word word processing (process) which necessarily has input Microsoft Word but then you’d have dozens of specific classes for this, Star office word processing (process) Open Office word processing (process) etc.

Posted in information processing | 1 Comment

maturity, purchase cost, license

Jon Ison recently requested four types of updates and new classes within SWO. Some parts of his requests are already in SWO, but all could do with a little look at again, to make sure we’ve got what he needs. Below the changes for 3/4 updates are discussed.


Specifically, Jon is looking for the terms alpha, beta and production. We already have the development status hierarchy, which contains a number of terms including alpha and beta, and some which may be suitable for production. After asking Jon how the current hierarchy fits with what he wants, he has said that by Production he means “software that has been designated as suitable for production environments by the developer / publisher”.  He thinks as well that Production is a synonym of Live (as currently defined) and recommends that it becomes a parent of First release and Latest release.

Changes to the ontology

  • Alpha
    • Expand the definition to: Alpha is a development status which is applied to software by the developer/publisher during initial development and testing. Software designated alpha is commonly unstable and prone to crashing. It may or may not be released publicly.
  • Beta
    • Expand the definition to: Beta is a development status which is generally applied to software by the developer/publisher once the majority of features have been implemented, but when the software may still contain bugs or cause crashes or data loss. Software designated beta is often released publicly, either on a general release or to a specific subset of users called beta testers. (Modified from, accessed 11 June 2013.)
    • Add creator and definition source
  • Live
    • Expand the definition to: Live is a development status which is applied to software that has been designated as suitable for production environments by the developer/publisher. If a non-free product, software at this stage is available for purchase (if it is a non-free product).
    • Add creator and definition source (Allyson Lister, Jon Ison, & Modified from, accessed 11 June 2013.
    • Add synonym Production
    • Put First release and Latest release as children of this class
  • Comments
    • Something can be both Maintained and Live, but may also be Live but not Maintained (specifically, these should not be – and are not – disjoint classes). In this context, Production remains best suited as a synonym for Live, as Production status does not require any level of maintenance, just that software described as such is formally released.

Purchase Cost

For modelling purchase cost, there has been a long long discussion on the SWORD mailing list (see “Cost of software” emails from May) and the final decision is to associate the cost with the license. Robert suggests using either a GCI or a defined class for each of FeeBasedSoftware and FreeSoftware, and to have these classes be defined as those which have licenses containing clauses which mark them as either free, free with caveats (e.g. for academics, or for a base version such as with IntelliJ), or non free.

Changes to the ontology

  • New classes within license clause
    • Purchase cost clause
      • Free
      • Not free
  • Comments
    • The “free” used within the purchase cost hierarchy refers only to the cost of the software, not to the definition of “free software” as provided by GNU and which is commonly used to describe “software that respects users’ freedom and community. Roughly, the users have the freedom to run, copy, distribute, study, change and improve the software.” (
    • In some ways, purchase cost does seem similar to the already-extant usage clause hierarchy, which includes restricted and unrestricted usage. However, a usage limitation is not necessarily due to whether or not something costs money: even if the usage is academic only, it could still be either free or non-free. A license could have multiple usage clauses, e.g. academic only when free, and unrestricted if a fee is paid. Instead, a new clause type which is specifically associated with cost (Purchase cost clause) has been created which can, together with a usage clause, define both limitations and cost. Then statements could be made such as the following:
      • GNU GPL v3 has clause Usage unrestricted
        GNU GPL v3 has clause Free
    • A license could have multiple clauses dealing with different aspects of cost and usage. For instance, how do I model a license that, for academics, is free while for all users generally, there is a fee. We don’t need a “free with caveats” option (except as a defined class) within the purchase cost, as this can instead be modelled by combining usage and cost. However, I’m not sure the following axioms would describe accurately the example I’ve just given
      • LicenseX has clause Usage academic only and has clause Free
        LicenseX has clause Usage unrestricted and has clause Not Free
    • Modelling in this way means we don’t have to create a load of defined classes to represent all of the different classifications of software. We can instead use GCIs to categorize our classes without cluttering the hierarchy or creating complex naming schemes (see For now, I haven’t added any GCIs but it is certainly something which could be used in future.


License already has a number of child classes. Jon suggests tidying up this section by using the short form of the license name (e.g. GNU GPL) as the primary name, and all other versions as synonyms. Also, we need to add more flavours of each of the licenses, e.g. all the versions of GNU GPL and of Creative Commons. was a valuable resource for the changes itemized below.

Changes to the ontology

  • New object property: is compatible license of: this will allow links between licenses which do not contain conflicts, e.g. that the Apache License version 2 is compatible with GNU GPL version 3.
  • As many people are familiar with a “free” license as defined by the GNU Project, a defined class GNU Project Free Software License has been created which can identify such cases (see
  • GNU Copyleft Software License is a child of GNU Project Free Software License, and software which is copyleft must 1) be free software according to the GNU project, and 2) requires that all modified and extended versions of the program be free as well.
  • A new license clause has been added: Copyleft (a child of derivative code same license as it not only requires the same license, but limits the type of license used to one which will provide a certain number of freedoms to the users of the code). This allows us to build the defined class GNU Copyleft Software License
  • All currently existing licenses have been updated in the context of all of these changes, resulting in many new axioms. A few new licenses have been added, but these are not exhaustive. New classes are: CC0, CC BY 2.0, CC BY-SA 2.0, GNU GPL v2, GNU GPL v3, MPL v1.1, MPL v2.0
Posted in modeling | Leave a comment

A number of small changes to SWO

This week I did a number of small changes to SWO, and here is a summary of the changes:

  • SBML Model ( was made obsolete. This class was asserted underneath algorithm and had no annotation or axiomisation. It was used in one location, within the software class SBMLR. As we already had a class to describe the SBML format (called SBML file), there was no use for this class. For referencing the SBML format, please use the class ‘SBML file’ (
  • Moved tab delimited file format to be a child (rather than a sibling) of plain text file format.
  • Software interface has been reorganized and expanded. See SWO for the full definitions etc.:
    – Software interface
    –Application Programming Interface
    –Command line interface
    –Graphical User Interface
    —Web User interface

            —Desktop Graphical User Interface

    –Web Service
    —SOAP service: definition
    —REST service: definition

Posted in ontology | Leave a comment

Term Scope for EDAM and SWO

Recently on the SWORD mailing list, there has been a discussion as to which new terms should be added to the EDAM portion of the ontology, and which belong in the scope of SWO. For the record (and posterity), here is a summary of the result of the discussion.

The scope of SWO was originally bioinformatics software, and this scope was moved more generally to software in digital preservation later on. EDAM deliberately does not include software, but rather data types and format, operations and objective (with some useful identifier classes as well). EDAM concepts are domain-specific and concern attributes specific to bioinformatics software. This is in contrast to SWO, which (in addition to software classes) contains domain-neutral terminology; SWO is concerned with attributes of software in general, and not just bioinformatics specifically. This of course creates some degree of overlap, especially with data types, formats, and the operation/objective hierarchies (which are currently being resolved). Bioinformatics-specific data types and formats can live in EDAM while software, bioinformatics-based or otherwise, should live in SWO along with the cross-domain parts (e.g. GIF, TIFF).

The domain neutrality of SWO gives us a very strong selling point to other communities / domains, who will want to manage their own ontologies. SWO could be the hub and domain-specific modules would be the spokes. There could / should be a common vocabulary for software in general, across science and the humanities: we can offer SWO (or a simplification of it) to perform this function.

A slim upper level for software could easily be extracted from SWO to give a domain neutral portion. In the near future, it is worth pulling out that upper level SWO as a separate OWL file.

Posted in modeling | Tagged | Leave a comment

FACS ( fluorescence-activated cell sorter) obsoleted

FACS ( fluorescence-activated cell sorter) was a class which was a child of algorithm. This doesn’t make much sense as FACS itself is more likely to be either a machine or an experimental method, both of which are out of scope of SWO. To fix this, I looked at where it was used, which was only in one location: within the prada software axiom list. To fix this situation, a number of classes were touched.

  • FACS data was given a definition and associated annotations.
  • There was one leftover reference within swo_data.owl to This was removed and replaced with what the rest of the ontology uses, which is rdf:comment.
  • FACS ( fluorescence-activated cell sorter) is only used for the prada software class, and there it is used incorrectly. FACS is an experimental method, not an algorithm. This class ( has therefore been obsoleted in favor of using the data and data format specification changes described here.
  • A new pair of classes, Data File Standard for Flow Cytometry and its child FCS3.0 were created with appropriate annotations. These were created as children of ASCII format. As part of this creation:
  • The new axiom for prada (to replace the incorrect reference to algorithm and to expand upon already existing links to data) are:
     'has specified data input' some
     ('FACS data'
     and ('has format specification' some FCS3.0)
     and ('has format specification' only FCS3.0)),
Posted in data format specification, modeling, software | Leave a comment