One of our ‘priority features’, decided during our first workshop prioritsation exercises, was that of data used in software. There are (at least) two schools of thought on how data inputs and outputs should be tied to software.
Firstly, there is a thought that the data connected to software is only used as input and output during the execution, the running of the software on some hardware. Here, this execution is a process in which the software participates, but also the data does too.
Secondly, there is a thought that software can have data specified as valid input and output with or without these processes of execution. Here, the data is declared at the level of the software and whether this is used in the process is effectively irrelevant for the primary description; the software can use these sorts of data and that is enough to describe it.
Whether any of these are more or less correct is debatable, but a more objective way of evaluating these models is whether or not they can satisfy competency questions for SWO. Some of the data questions are as follows:
What software works best with my dataset?
Which software tool created this data?
What software can I use my data with to support my task?
What are the input and output formats for this software?
What software can read a .cel file?
What are the export options for this software?
The questions clearly point to a common theme which is that the information required is not the type of relationship between some piece of software and the tool, the interest is in what data with what software.
So the conclusion from this. Perhaps this modeling decision is unimportant and instead we should consider other factors. One may be complexity of the OWL expressions. Here are two sample snippets of Manchester OWL for how we might model a bit of Microsoft Excel:
and ((‘has specified input’ some
and (‘has format specification’ some ‘XLS spreadsheet’)))
and (‘has specified output’ some
and (‘has format specification’ some (‘XLS spreadsheet’ or ‘XML spreadsheet’)))))
‘has specified input’ some
data and (‘has format specification’ some ‘XLS spreadsheet’)
‘has specified output’ some
data and ‘has format specification’ some (‘XLS spreadsheet’ or ‘XML spreadsheet’)
In example 1 we use the context of information processing (though through an anonymous class) to describe the data input and output. In example 2 we don’t and simply say that data, with some format, is input and output to and from this software. Some tests of the two may be necessary to determine whether or not the extra nesting in the first example is worse in terms of reasoning. Anecdotally, I would expect this to be the case; I changed the pattern in the EFO recently to represent cell lines a little closer to the biology and introduced more nesting and reasoning slowed marginally, though the number of classes was <1,000.
Fundamentally, the proof will be in answering the competency questions above and if we can do that then, in a sense, the nuances in the modeling decision are less important.