This post will be updated as work on the SWO-EDAM merge progresses, and describes issues which need to be addressed but which don’t belong to any one particular hierarchy. Some of the points raised here are just my opinion, and may not be the right decision in the end.
- Version information is referenced as a Report in EDAM, and differently in SWO. These should be resolved.
- There is a lot of duplicated hierarchies of taxonomy within the Topic hierarchy. The same is true of lots of other mini-hierarchies, like the ones for UniProt IDs. This creates difficulties in maintenance (both in ensuring all hierarchies are updated equally, and in readability) and in front-facing use (lots of classes with identical labels and different meanings and purposes). For instance, the UniProt ID mini-hierarchy is present in at least 5 different places within the data hierarchy. Another example is the “completely unambiguous pure” mini hierarchy, which is present at least 3 times in the format hierarchy.
- There are a number of formats referenced within SWO via, e.g., X ‘has specified input’ some ‘formatY’. This is odd, as it seems (though isn’t axiomised anywhere) that software should only be related to format via has format specification and perhaps a data type. However, as there are no restrictions as to how has specified input (or output) are used, we get multiple different axioms for linking software and data formats together. This needs to be more rigorously defined
- has format specification (from SWO) and has format (from EDAM) should be aligned and merged, once the operation/software issue has been resolved.
- In my opinion, there should only be a single asserted hierarchy within EDAM, and in many places (e.g. Format) there is a deliberate placing of all child classes within a broader, biological context (e.g. Format (typed)) and a more basic format (e.g. Textual format). I believe that it should be relatively straightforward to make the classes within Format (typed) defined classes, and use the reasoner to place all formats into the appropriate biological format type.
- Within EDAM, there are many instances where two different classes have the same label/name. One example is Ontology the data class (data_0582), and Ontology the topic (topic_0089). This is a Bad Thing, and should be fixed in all cases.
- When the inferred hierarchy is calculated in Protege, the EDAM Data and EDAM Operation classes are inferred to be equivalent. This is definitely not true, and has the knock-on effect that the SWO data class is also inferred to be equivalent to Operation (because EDAM Data and SWO data are asserted to be equivalent). This need to be fixed as early as possible.