Until the EDAM OWL files become the master versions of the ontology, it is not useful to obsolete EDAM classes and replace them with the appropriate SWO classes. Instead, a number of equivalence statements were used. Once the EDAM OWL files are the master versions of the ontology (rather than the OBO files), then the following changes will be made to get rid of these equivalence statements:
- Format (EDAM) will be obsoleted and replaced with Data format specification (SWO). Both labels will be retained, with “Format” as an alternative term.
- Textual format (EDAM) will be obsoleted and replaced with Text file format (SWO) – here, the EDAM annotation label may be the preferred label.
- XML (EDAM) will be obsoleted and replaced with XML (SWO).
- MAGE-TAB (EDAM) will drop the subclass statement for Textual format, and will instead only be a subclass of Tab delimited file format (and, for now, Format (typed)),
The following classes within SWO should either be further classified deeper in the data format specification hierarchy or, if we remain unsure what formats they actually are, should be obsoleted:
- .data format, and I am unsure how to further classify it without knowing exactly which format is being described
- .rma format (as it seems to be an algorithm for affy data rather than a format);
- Xba.CQV and Xba.regions (some kind of internal format to the RLMM Bioconductor package I think?);
- chamber slide format (used for splots Bioconductor package);
- covdesc (described in relation to Bioconductor on one or two websites, but no clear description of what it is: http://permalink.gmane.org/gmane.science.biology.informatics.conductor/34506);
- design file and pair file (both part of the HELP system – you try searching for that!);
- gmt format (there are no usages in SWO of this class, and you get the time zone as a result when searching);
- log file and pedigree file (i.e. patient information) seem just wrong as formats and perhaps should be done away with, except perhaps as roles;
- logicFS dataset belongs in data, if it belongs anywhere (currently in data format specification) – even the axiom referencing it doesn’t seem right;
- sproc (which could mean a stored procedure, or perhaps a custom file within Bioconductor, but it’s hard to tell, and even harder to tell the format if it is a Bioconductor package input);
- sqlite is a program / piece of software, not a format, but it could be that there is an sqlite data dump that is a specific format?
Additionally, within SWO I think it might be better to make the outlines (specifically, the class Outline document format) a role rather than a format, as its child classes (e.g. OPML) should instead be stored within their format type hierarchy. In the case of OPML, this would be as a child of XML. This may or may not also be appropriate for Document exchange format and its hierarchy.
Within EDAM, BioPAX is subclassed as a child of OWL within the Format hierarchy. This is not particularly useful, as OWL can have multiple serializations in different formats. I believe that the EDAM BioPAX should be obsoleted in favor of one of the various BioPAX formats described in SWO (either the Manchester OWL or the RDF/XML versions, which have IDs of http://www.ebi.ac.uk/swo/data/SWO_3000056 and http://www.ebi.ac.uk/swo/data/SWO_3000055).
Also, a broader decision needs to be made on the placement of the HTML class within EDAM; within SWO it is a child of Web page specification, which itself is a child of data format specification. Within EDAM, it is a direct child of Format. EDAM may need to add the Web page specification class so that, ultimately, the final merging (rather than the current temporary method of just marking the two classes as equivalent) of the format hierarchies won’t end up with HTML in two different places within the final data format hierarchy.