Ins and Outs of software

One of our ‘priority features’, decided during our first workshop prioritsation exercises, was that of data used in software. There are (at least) two schools of thought on how data inputs and outputs should be tied to software.

Firstly, there is a thought that the data connected to software is only used as input and output during the execution, the running of the software on some hardware. Here, this execution is a process in which the software participates, but also the data does too.

Secondly, there is a thought that software can have data specified as valid input and output with or without these processes of execution. Here, the data is declared at the level of the software and whether this is used in the process is effectively irrelevant for the primary description; the software can use these sorts of data and that is enough to describe it.

Whether any of these are more or less correct is debatable, but a more objective way of evaluating these models is whether or not they can satisfy competency questions for SWO. Some of the data questions are as follows:

What software works best with my dataset?

Which software tool created this data?

What software can I use my data with to support my task?

What are the input and output formats for this software?

What software can read a .cel file?

What are the export options for this software?

The questions clearly point to a common theme which is that the information required is not the type of relationship between some piece of software and the tool, the interest is in what data with what software.

So the conclusion from this. Perhaps this modeling decision is unimportant and instead we should consider other factors. One may be complexity of the OWL expressions. Here are two sample snippets of Manchester OWL for how we might model a bit of Microsoft Excel:

example 1:

is_executed_in some
(‘information processing’
and ((‘has specified input’ some
and (‘has format specification’ some ‘XLS spreadsheet’)))
and (‘has specified output’ some
and (‘has format specification’ some (‘XLS spreadsheet’ or ‘XML spreadsheet’)))))

example 2:

‘has specified input’ some
data and (‘has format specification’ some ‘XLS spreadsheet’)

‘has specified output’ some
data and ‘has format specification’ some (‘XLS spreadsheet’ or ‘XML spreadsheet’)

In example 1 we use the context of information processing (though through an anonymous class) to describe the data input and output. In example 2 we don’t and simply say that data, with some format, is input and output to and from this software.  Some tests of the two may be necessary to determine whether or not the extra nesting in the first example is worse in terms of reasoning.  Anecdotally, I would expect this to be the case; I changed the pattern in the EFO recently to represent cell lines a little closer to the biology and introduced more nesting and reasoning slowed marginally, though the number of classes was <1,000.

Fundamentally, the proof will be in answering the competency questions above and if we can do that then, in a sense, the nuances in the modeling decision are less important.

This entry was posted in data, modeling. Bookmark the permalink.

2 Responses to Ins and Outs of software

  1. Bill Duncan says:

    The distinction you raise between the execution of software (as it relates to data) and the specification of software (as it relates to data) is interesting. In the OWL examples you gave, it seems that you are focusing on the latter aspect (i.e., specification). But, do you think this is adequate for your needs? That is, once you describe all the relevant input/output specifications for some piece of software, is there anything that still needs to be described (for your purposes) on the process side? I am not sure what the answer to this question is.

    I took a quick look at BFO (Basic Formal Ontology) and it lists the following kinds of processes:

    fiat process part
    planned process
    process aggregate
    process boundary
    processual context

    When I think about software in execution, I tend to think of it terms of a planned process. But does this capture software that has been designed to run multiple threads?

    There is also the issue of processual context. For, the hardware that executes the software affects various aspects of the process. For example, the time it takes to run a process is affected by the hardware, and some kinds of software can only be run on certain kinds of hardware. Thus, in order to address the issue of processes, it may be necessary to identify some aspects of hardware that need describing.

    Lastly, I am wondering about the case when a piece of software acts (i.e., executes) differently based upon user preferences stored in some data source. For example, when I start Microsoft Word there is, presumably, some kind of configuration file that makes Word run according to my preferences. Here, it seems, the data (in the configuration file) is taking on a more active role, and not just being passive (which how I normally think of data).

  2. Pingback: Assorted modifications December 2014 | SWOP

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s