SWO-EDAM Merge: Merging format hierarchies

Import of EDAM:Format into SWO:Data format specification

The following steps have been performed and committed within the development subdirectory of the SWO subversion repository. This merging is not complete (due to the EDAM OWL file not being the master version of the file yet), and there were a number of issues resulting from these steps. Both of these topics are itemized in an additional blog post.

  • Make EDAM:Format a child of “Information content entity” (via subclass statement)
  • Make EDAM:Format equivalent to IAO:0000098 (data format specification) – (via equivalent class statement)
  • SWO data format specification classes moved to other locations within SWO:
    • Renamed Tab delimited file to Tab delimited format
    • Split CDF into its two possible formats: CDF ASCII format and CDF binary format
    • Classes moved to be children of XML: MAGE-ML files (also renamed to MAGE-ML); KGML files (also renamed to renamed to KGML); ARR; gxl format;
    • Classes moved to be children of Text file format: SBMLR format (renamed from SBMLR file), which does not have a definition but seems to be the R files containing SBML data as specificied by the SBMLR package for R; SDF format; Rnw (a type of Sweave document); R data frame (a bit like a table in R); GEO Matrix Series File (also replaced “file “ with “format”); FCS; FASTA; CDF ASCII format; BED file (also renamed to BED format); cls; dcf; gff format; mas5 format; rda.
    • Classes moved to be children of Tab delimited format: MAGE tab format; cdt; gct; gpr format; gtr.
    • Classes moved to be children of Programming language format: Matlab m file (also renamed to Matlab .m file).
    • Moved under EDAM:format_2333 (Binary format): CEL; CHP file; CDF Binary format; BPMAP; lma.
  • Refactoring to remove EDAM:SWO duplicates – by obsoletion of EDAM or SWO concept as appropriate:
    • RDF: Present in both EDAM (format_2376) and SWO (SWO_3000006). The SWO term is retained, as EDAM is intended as a more bioinformatics-specific ontology, and the RDF hierarchy within SWO is more complex. All annotations have been moved across to the SWO RDF class and the EDAM RDF class has been marked as obsolete.
    • Document exchange format and its children have no equivalent classes within EDAM, and therefore were retained within SWO as-is.
    • Image format and its children have no equivalent classes within EDAM, and therefore were retained within SWO as-is.
    • Outline document format and its children have no equivalent classes within EDAM, and therefore were retained within SWO as-is. However, it is recommended that outlining become a role, as classes like OPML should instead be stored within the XML hierarchy.
    • Programming language format and its children have no equivalent classes within EDAM, and therefore were retained within SWO as-is.
    • Spreadsheet and its children have no equivalent classes within EDAM, and therefore were retained within SWO as-is.
    • Text file format (SWO_3000041) and Textual format (format_2330) are equivalent, and are marked with equivalence statements. Each have a hierarchy of children which must be taken into account. Obsolete cases are itemized below, while all others remain in their original hierarchy for now. Please note that the obsoleting of terms occurs by refactoring the URI of the concept to be obsoleted to the URI of the class it is being merged with. Until the two classes Text file format and Textual formatare properly merged/obsoleted, the (slightly messy) dual hierarchy will remain. However, the inferred hierarchy will show the complete set of children (from both ontologies) under both classes.
      • BED format (efo/swo/SWO_0000051) and BED (format_3003) are equivalent, and the SWO class has been obsoleted and all its axioms assigned to the EDAM class. This is because the EDAM class has more annotations.
      • FASTA (efo/swo/SWO_0000142) and FASTA format (format_1929). The SWO class has been obsoleted and all its axioms assigned to the EDAM class. This is because the EDAM class has more annotations and is present within a more complete FASTA hierarchy.
      • OBO Flat File Format (swo/data/SWO_3000040) and OBO (format_2549). The EDAM class will remain and the SWO class will be obsoleted, the SWO label (which is highly descriptive), will be retained.
      • gff format (efo/swo/SWO_0000559) and GFF (format_2305). The SWO class has been obsoleted and all its axioms assigned to the EDAM class. This is because the EDAM class has more annotations and is present within a more complete GFF hierarchy.
      • newick (efo/swo/SWO_0000634) and newick (format_1910). The SWO class has been obsoleted and all its axioms assigned to the EDAM class. This is because the EDAM class has more annotations.
      • MAGE tab format (swo/data/SWO_3000045) and MAGE-TAB (format_3162). The SWO class has been obsoleted and all its axioms assigned to the EDAM class. This creates an extra parent class for MAGE-TAB (Tab delimited file format, which is the parent class for the SWO MAGE tab format). Once the EDAM OWL files become the master version of the files, we can fix this by removing the Textual Format subclass statement.
    • Web page specification in SWO has as its only child the EDAM HTML hierarchy. As such, no changes need to be made within SWO.
    • Word processing document format and its children have no equivalent classes within EDAM, and therefore were retained within SWO as-is.
    • XML (swo/data/SWO_3000005) and XML (format_2332) are equivalent, and are marked with equivalence statements. Each have a hierarchy of children which must be taken into account. Obsolete cases are itemized below, while all others remain in their original hierarchy for now. Please note that the obsoleting of terms occurs by refactoring the class URI to the URI of the class it is being merged with. Until the two XML classes are properly merged/obsoleted, the (slightly messy) dual hierarchy will remain. However, the inferred hierarchy will show the complete set of children (from both ontologies) under both classes.
      • MAGE-ML (efo/swo/SWO_0000268) and MAGE-ML (format_3161) are equivalent, and the SWO class has been obsoleted and all its axioms assigned to the EDAM class.
      • PSI-MI format (swo/data/SWO_3000048) and PSI MI XML (MIF) (format_3158) are equivalent, and the SWO class has been obsoleted and all its axioms assigned to the EDAM class. Retained SWO label as an alternative term.
      • SBML (swo/data/SWO_3000037) and SBML (format_2585) are equivalent, and the SWO class has been obsoleted and all its axioms assigned to the EDAM class.
Advertisements
This entry was posted in ontology and tagged , , , . Bookmark the permalink.

One Response to SWO-EDAM Merge: Merging format hierarchies

  1. Pingback: SWO-EDAM Merge: Overview « SWOP

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s