Refactoring objective specification hierarchy

The refactoring of the objective specification hierarchy in SWO is part of a larger effort to merge the final distinct hierarchies in EDAM/SWO that can be usefully integrated. The EDAM operation hierarchy has been placed under information processing, and the SWO objective specification hierarchy will also be placed there. However, to do this we must drop the use of objective specification, as IAO says it sits within the information content entity hierarchy (http://www.ontobee.org/browser/rdf.php?o=IAO&iri=http://purl.obolibrary.org/obo/IAO_0000005) and not within the process hierarchy.

Changes made within SWO:

  1. Added domain of software and range of information processing to the object property is executed in, as well as a definition and other annotation. (See note 1 below).
  2. Moved all children of objective specification as children of information processing via refactoring in Protege 4. This required first moving all classes physically from swo_objectives and into swo_core.
  3. Refactored the URI for achieves planned objective to be http://www.ebi.ac.uk/swo/SWO_0040005, which is the URI for is executed in. This allowed, in just one step, for the ontology to remain satisfiable and for all of the axioms between software and processes (which used to be objectives) to be correct.
  4. Stated that EDAM operation is equivalent to SWO/ information processing.
  5. One class was marked as equivalent to Nothing after running the reasoner: OmniOutliner had an axiom ‘is executed in’ some ‘Outline document format’ which does not fit against the acceptable range of the is executed in property. This was changed to be linked to the information processing class Document outlining, which resolved the problem.

Still to do:

  1. information processing objective becomes inappropriate, and is cleaned up.
  2. swo_objectives is then no longer required and the file is removed from the imports in swo_core.

Note 1: on the use of is executed in:  Originally, we had thought to use these two properties in SWO rather than is executed in:
has_specified_input: http://purl.obolibrary.org/obo/OBI_0000293
has_specified_output: http://purl.obolibrary.org/obo/OBI_0000299

If we had used is specified input of to link software to the new processes, the domain of its inverse (has specified input) is the OBI term planned process. Therefore, the appropriate hierarchy under BFO ProcessualEntity would have needed to be added: process, and then planned process, as has specified input has a domain of planned process. The simplest way of doing this is to simply change the uri of the process class within SWO; it currently points to the BFO ProcessualEntity and we simply refactored the URI of the SWO process class to match instead the BFO process class. Then we would just add an intervening child of the OBI planned process class.

This wasn’t required in the end, as it was determined that the use of the already-present SWO is executed in would be better at this stage, and simpler. Further, we determined that is executed in axioms should really only be placed on Software classes rather than also having an inverse that could go on information processing classes. If we place the input/output on the process then we define the process of, say, data normalization in terms of software, which is presumably not correct as there are data normalizations that don’t involve software, or at least specific software. If we hang it the other way we define the software in terms being able to be executed in a process, making this essential to the software (which is still not perfect) but not essential and defining for the information processing class (since the is executed in is not inverse). Neither is perfect but no one could quite think of a better way of capturing it. The problem as always is the ‘potential to do something’. Word can be used to do word processing but it can also be used to do document conversion. To do it properly properly you’d need to have processes called Microsoft Word word processing (process) which necessarily has input Microsoft Word but then you’d have dozens of specific classes for this, Star office word processing (process) Open Office word processing (process) etc.

Posted in information processing | 1 Comment

maturity, purchase cost, license

Jon Ison recently requested four types of updates and new classes within SWO. Some parts of his requests are already in SWO, but all could do with a little look at again, to make sure we’ve got what he needs. Below the changes for 3/4 updates are discussed.

Maturity

Specifically, Jon is looking for the terms alpha, beta and production. We already have the development status hierarchy, which contains a number of terms including alpha and beta, and some which may be suitable for production. After asking Jon how the current hierarchy fits with what he wants, he has said that by Production he means “software that has been designated as suitable for production environments by the developer / publisher”.  He thinks as well that Production is a synonym of Live (as currently defined) and recommends that it becomes a parent of First release and Latest release.

Changes to the ontology

  • Alpha
    • Expand the definition to: Alpha is a development status which is applied to software by the developer/publisher during initial development and testing. Software designated alpha is commonly unstable and prone to crashing. It may or may not be released publicly.
  • Beta
    • Expand the definition to: Beta is a development status which is generally applied to software by the developer/publisher once the majority of features have been implemented, but when the software may still contain bugs or cause crashes or data loss. Software designated beta is often released publicly, either on a general release or to a specific subset of users called beta testers. (Modified from http://en.wikipedia.org/wiki/Software_release_life_cycle, accessed 11 June 2013.)
    • Add creator and definition source
  • Live
    • Expand the definition to: Live is a development status which is applied to software that has been designated as suitable for production environments by the developer/publisher. If a non-free product, software at this stage is available for purchase (if it is a non-free product).
    • Add creator and definition source (Allyson Lister, Jon Ison, & Modified from http://en.wikipedia.org/wiki/Software_release_life_cycle, accessed 11 June 2013.
    • Add synonym Production
    • Put First release and Latest release as children of this class
  • Comments
    • Something can be both Maintained and Live, but may also be Live but not Maintained (specifically, these should not be – and are not – disjoint classes). In this context, Production remains best suited as a synonym for Live, as Production status does not require any level of maintenance, just that software described as such is formally released.

Purchase Cost

For modelling purchase cost, there has been a long long discussion on the SWORD mailing list (see “Cost of software” emails from May) and the final decision is to associate the cost with the license. Robert suggests using either a GCI or a defined class for each of FeeBasedSoftware and FreeSoftware, and to have these classes be defined as those which have licenses containing clauses which mark them as either free, free with caveats (e.g. for academics, or for a base version such as with IntelliJ), or non free.

Changes to the ontology

  • New classes within license clause
    • Purchase cost clause
      • Free
      • Not free
  • Comments
    • The “free” used within the purchase cost hierarchy refers only to the cost of the software, not to the definition of “free software” as provided by GNU and which is commonly used to describe “software that respects users’ freedom and community. Roughly, the users have the freedom to run, copy, distribute, study, change and improve the software.” (http://www.gnu.org/philosophy/free-sw.html)
    • In some ways, purchase cost does seem similar to the already-extant usage clause hierarchy, which includes restricted and unrestricted usage. However, a usage limitation is not necessarily due to whether or not something costs money: even if the usage is academic only, it could still be either free or non-free. A license could have multiple usage clauses, e.g. academic only when free, and unrestricted if a fee is paid. Instead, a new clause type which is specifically associated with cost (Purchase cost clause) has been created which can, together with a usage clause, define both limitations and cost. Then statements could be made such as the following:
      • GNU GPL v3 has clause Usage unrestricted
        GNU GPL v3 has clause Free
    • A license could have multiple clauses dealing with different aspects of cost and usage. For instance, how do I model a license that, for academics, is free while for all users generally, there is a fee. We don’t need a “free with caveats” option (except as a defined class) within the purchase cost, as this can instead be modelled by combining usage and cost. However, I’m not sure the following axioms would describe accurately the example I’ve just given
      • LicenseX has clause Usage academic only and has clause Free
        LicenseX has clause Usage unrestricted and has clause Not Free
    • Modelling in this way means we don’t have to create a load of defined classes to represent all of the different classifications of software. We can instead use GCIs to categorize our classes without cluttering the hierarchy or creating complex naming schemes (see http://ontogenesis.knowledgeblog.org/1288). For now, I haven’t added any GCIs but it is certainly something which could be used in future.

License

License already has a number of child classes. Jon suggests tidying up this section by using the short form of the license name (e.g. GNU GPL) as the primary name, and all other versions as synonyms. Also, we need to add more flavours of each of the licenses, e.g. all the versions of GNU GPL and of Creative Commons. http://www.gnu.org/licenses/license-list.html was a valuable resource for the changes itemized below.

Changes to the ontology

  • New object property: is compatible license of: this will allow links between licenses which do not contain conflicts, e.g. that the Apache License version 2 is compatible with GNU GPL version 3.
  • As many people are familiar with a “free” license as defined by the GNU Project, a defined class GNU Project Free Software License has been created which can identify such cases (see http://www.gnu.org/philosophy/free-sw.html)
  • GNU Copyleft Software License is a child of GNU Project Free Software License, and software which is copyleft must 1) be free software according to the GNU project, and 2) requires that all modified and extended versions of the program be free as well.
  • A new license clause has been added: Copyleft (a child of derivative code same license as it not only requires the same license, but limits the type of license used to one which will provide a certain number of freedoms to the users of the code). This allows us to build the defined class GNU Copyleft Software License
  • All currently existing licenses have been updated in the context of all of these changes, resulting in many new axioms. A few new licenses have been added, but these are not exhaustive. New classes are: CC0, CC BY 2.0, CC BY-SA 2.0, GNU GPL v2, GNU GPL v3, MPL v1.1, MPL v2.0
Posted in modeling | Leave a comment

A number of small changes to SWO

This week I did a number of small changes to SWO, and here is a summary of the changes:

  • SBML Model (http://www.ebi.ac.uk/efo/swo/SWO_0000380) was made obsolete. This class was asserted underneath algorithm and had no annotation or axiomisation. It was used in one location, within the software class SBMLR. As we already had a class to describe the SBML format (called SBML file), there was no use for this class. For referencing the SBML format, please use the class ‘SBML file’ (http://edamontology.org/format_2585).
  • Moved tab delimited file format to be a child (rather than a sibling) of plain text file format.
  • Software interface has been reorganized and expanded. See SWO for the full definitions etc.:
    - Software interface
    –Application Programming Interface
    –Command line interface
    –Graphical User Interface
    —Web User interface

            —Desktop Graphical User Interface

    –Web Service
    —SOAP service: definition
    —REST service: definition

Posted in ontology | Leave a comment

Term Scope for EDAM and SWO

Recently on the SWORD mailing list, there has been a discussion as to which new terms should be added to the EDAM portion of the ontology, and which belong in the scope of SWO. For the record (and posterity), here is a summary of the result of the discussion.

The scope of SWO was originally bioinformatics software, and this scope was moved more generally to software in digital preservation later on. EDAM deliberately does not include software, but rather data types and format, operations and objective (with some useful identifier classes as well). EDAM concepts are domain-specific and concern attributes specific to bioinformatics software. This is in contrast to SWO, which (in addition to software classes) contains domain-neutral terminology; SWO is concerned with attributes of software in general, and not just bioinformatics specifically. This of course creates some degree of overlap, especially with data types, formats, and the operation/objective hierarchies (which are currently being resolved). Bioinformatics-specific data types and formats can live in EDAM while software, bioinformatics-based or otherwise, should live in SWO along with the cross-domain parts (e.g. GIF, TIFF).

The domain neutrality of SWO gives us a very strong selling point to other communities / domains, who will want to manage their own ontologies. SWO could be the hub and domain-specific modules would be the spokes. There could / should be a common vocabulary for software in general, across science and the humanities: we can offer SWO (or a simplification of it) to perform this function.

A slim upper level for software could easily be extracted from SWO to give a domain neutral portion. In the near future, it is worth pulling out that upper level SWO as a separate OWL file.

Posted in modeling | Tagged | Leave a comment

FACS ( fluorescence-activated cell sorter) obsoleted

FACS ( fluorescence-activated cell sorter) was a class which was a child of algorithm. This doesn’t make much sense as FACS itself is more likely to be either a machine or an experimental method, both of which are out of scope of SWO. To fix this, I looked at where it was used, which was only in one location: within the prada software axiom list. To fix this situation, a number of classes were touched.

  • FACS data was given a definition and associated annotations.
  • There was one leftover reference within swo_data.owl to edamontology.org#comment. This was removed and replaced with what the rest of the ontology uses, which is rdf:comment.
  • FACS ( fluorescence-activated cell sorter) is only used for the prada software class, and there it is used incorrectly. FACS is an experimental method, not an algorithm. This class (http://www.ebi.ac.uk/efo/swo/SWO_0000139) has therefore been obsoleted in favor of using the data and data format specification changes described here.
  • A new pair of classes, Data File Standard for Flow Cytometry and its child FCS3.0 were created with appropriate annotations. These were created as children of ASCII format. As part of this creation:
  • The new axiom for prada (to replace the incorrect reference to algorithm and to expand upon already existing links to data) are:
     'has specified data input' some
     ('FACS data'
     and ('has format specification' some FCS3.0)
     and ('has format specification' only FCS3.0)),
Posted in data format specification, modeling, software | Leave a comment

Making SWO Satisfiable

Over the past few days I have been working to fix the approximately 170 unsatisfiable classes that were present in SWO. The vast majority of these were due to improper linking of software classes directly to a data format specification, rather than to a data class. One change is detailed below, and the rest of the classes are just named, although the process to fix them was similar.

Unsatisfiable classes: Data is disjoint with “data format specification”, but an axiom on the class uses “has specified data input” (which has a range of data) with a class from “data format specification”.

Affxparser: has specified data input’ some (BPMAP or ‘CDF binary format’ or CEL)
In order to fix this, I created a defined class called Affymetrix-Compliant Data which included any data classes whose format specification is published by Affymetrix, and set all of (CDF binary format, CDF ASCII format, CEL binary format, CEL ASCII format, CHP binary format, BPMAP, BAR) as published by Affymetrix. Resulting modifications to swo_data are:

Class: 'Affymetrix-compliant data'
Annotations:
 label "Affymetrix-compliant data"@en,
 IAO_0000115 "Affymetrix-compliant data is data produced in a format compatible with Affymetrix software. This is a defined class where other data classes will be inferred to be members if they have a data format specification which has been published by Affymetrix.",
 creator "Allyson Lister"
EquivalentTo:
 SWO_0004002 some
 ('data format specification'
 and (SWO_0004004 value SWO_0000023))
SubClassOf:
 data

And I added the axiom “SWO_0004004 value SWO_0000023” (is published by value Affymetrix) to the following classes: CDF binary format, CDF ASCII format, CEL binary format, CEL ASCII format, CHP binary format, BPMAP, BAR.
Then, back in swo_core, the final linkup to affxparser made modifications to that class as follows:

'has specified data input' only 'Affymetrix-compliant data'
'has specified data input' some 'Affymetrix-compliant data'

Domains and ranges were added to is published by as follows:

 Domain:
 'data format specification'
 or software
Range:
 organization

If it turns out that anything else needs to be added to the range of is published by, then we can either 1) remove the constraint on domain, 2) add that class to the domain constraint, or 3) specialise to sub-properties with single class domains and leave the top is published by property undomained.

Similar changes as those above were made to the following classes: affy, affyContam, affyio, affylmGUI, affyPara, affypdnn, affyPLM, affyQCReport, affyTiling, altcdfenvs, annaffy, annotationTools, aroma.light, arrayMvout, arrayQualityMetrics, BAC, betr, bgx, bgafun, bioDist, biomaRT, Category, cghMCR, convert, copa, cosmo, cosmoGUI, crimm, ctc, daMA, dyebias, ecolitk, edd, exonmap, factDesign, fbat, fdrame, flagme, flowQ, flowClust, flowCore, flowFlowJo, flowStats, flowUtils, flowViz, gaga, gcrma, gene2pathway, genefilter, geneRecommender, GeneticsBase, GeneticsPed, genomeIntervals, globaltest, goTools, gpls, graph, hexbin, hypergraph, idiogram, iterativeBMAsurv, keggorth, lapmix, limma, limmaGUI, logicFS, logitT, lumi, maCorrPlot, maDB, makecdfenv, makePlatformDesign, marray, matchprobes, metaArray, metahdep, microRNA, miRNApath, multtest, nnNorm, nudge, occugene, oligo, oneChannelGUI, ontoTools, pamr, panp, parody, pcaMethods, pcot2, pdInfoBuilder, pdmclass, pgUtils, pickgene, pkgDepTools, plgem, plw, ppiStats, prada, preprocessCore, puma, qpgraph, rama, RankProd, rbsurv, Rdbi, RdbiPgSQL, Rdisop, rflowcyt, Rgraphviz, Ringo, Rintact, RLMM, RMAExpress, RMAExpress 2.0, RMAExpress quantification, RMAGEML, rMAT, ROC, RpsiXML, Rredland, rsbml, rtracklayer, Rtreemix, Ruuid, RwebServices, safe, sagenhaft, SAGx, SBMLR, ScISI, seqLogo, ShortRead, simpleaffy, sizepower, SLGI, SLqPCR, SMAP, snapCGH, SNPchip, snpMatrix, SPIA, splicegear, splots, spotSegmentation, sscore, ssize, SSPA, TargetSearch, tilingArray, timecourse, topGO, tspair, twilight, TypeInfo, VanillaICE, vbmp, weaver, webbioc, xcms, xmapbridge, xps, XDE.

Inferred to be a member of a disjoint class
Affymetrix Software

 ('is specified data output of' some
 ('Software publishing process'
 and ('has participant' value Affymetrix)))
 or ('is specified data output of' some
 ('Software development process'
 and ('has participant' value Affymetrix)))

This class is asserted as member of software, but inferred to be member of Data due to usage of is specified data output of . Fixed by changing the entire axiom to

software and 'is published by' value Affymetrix

Bioconductor Software

 ('is specified data output of' some
 ('Software publishing process'
 and ('has participant' value Bioconductor)))
 or ('is specified data output of' some
 ('Software development process'
 and ('has participant' value Bioconductor)))

This class is asserted as member of software, but inferred to be member of Data due to usage of is specified data output of . Fixed by changing the entire axiom to

software and 'is published by' value Bioconductor

Reconciling Annotation Labels

I have also replaced all references to definition_citation (about 214 cases) and definition_editor (270 cases) in the various OWL files. These annotation labels were replaced with IAO 19 (definition source) and dc:creator (definition editor). This matches how the rest of the ontology was already, and the design decision that was made a year ago. Many of the single quotes around existing single-word class labels were also removed, as they were unnecessary.

Obsoleting is_published_by

All references to is_published_by (SWO_0000395) were replaced with is published by (SWO_0004004), and SWO_0000395 was obsoleted.

Posted in data format specification, modeling, ontology, software | Leave a comment

Bug with ‘swo2′ and ‘swo’ prefix fixed

While examining the code today, I noticed that swo_core had incorrect ontology prefixes ‘swo’ and ‘swo2′ set to ‘http://www.ebi.ac.uk/swo/http://www.ebi.ac.uk/efo/swo/’. This error was only present in an obsolete term SWO_0000399 (I must have introduced a bug when I made that term obsolete). All instances of this incorrect prefix were removed. Additionally, SWO_0000399 was incorrectly written as SWO__0000399 (with two underscores), and this was also fixed. These fixes were committed at revision 258 of the sourceforge SVN respository.

Posted in Uncategorized | Leave a comment

SWO imports edam.owl rather than edam sub-modules

When the original merge between SWO and EDAM occurred, EDAM in OWL was stored in separate modules: edam_core, edam_data, edam_obsoletes, edam_operations, and edam_topics. These have now been superceded by a single file to import, edam.owl. As such, all module files have been deleted (via “svn delete”) from the subversion repository, although of course they can be recovered by checking out a revision prior to 257 at any time.

Posted in Uncategorized | Leave a comment

Bug with ‘data’ prefix fixed

While examining the code today, I noticed that there were both ‘data’ and ‘data2′ prefixes in swo_core.owl and swo_data.owl. For some reason, both swo_core and swo_data had incorrect ontology prefixes ‘data’ and ‘data2′ set to “http://www.ebi.ac.uk/swo/data/http://www.ebi.ac.uk/swo/data/”, which is a weird duplication of the correct URL. All instances of this incorrect prefix were removed from both files. swo_data remains consistent via HerMIT reasoner from within Protege 4. This fix was committed at revision 256 of the sourceforge SVN respository.

Posted in Uncategorized | Leave a comment

SWO-EDAM Merge: Issues Arising from Data Merge

  1. In the longer term, Parameter and Report may be roles rather than classes within Data. For now, however, they remain in their original location within SWO.
  2. Additionally, in the near-to-medium future, the Core data class may become obsolete, and all children of Core data would be moved to be children of Data directly. This fits with what Robert thinks should be done.
  3. There is a serious question about SWO data classes such as AP-MS data. This data class describes a very specific subset of data, and Helen has concerns about whether classes like this should be present. Similarly, classes such as CSV data set may not be required, as this could equally well be described with an anonymous data class which has a format of CSV. Indeed, many pieces of data could be described in this way, requiring a very clear definition of when it is appropriate for data classes to be created. I had imagined that broad classes of data (which could have many formats) are what belongs in a data hierarchy. Examples of this would include microarray data and image data. The result of better defining what should be a data class should be added to the ontology in a comment label of the data class, and perhaps a blog post about it too (the comment could just reference the blog post).
  4. Although HTML report does seem to be a report rather than a member of data, I have left it where it is in the hierarchy until the EDAM hierarchy (and how Reports are to be modelled) has been decided.
  5. Meta data is modelled within data for SWO, and as a child of Report within EDAM. Therefore, for the same reasons that no reconciliation of HTML report is being performed, until the new modelling of the Report hierarchy has been decided, I see no reason to consider moving the SWO class.
Posted in ontology | Tagged , , | 2 Comments