James Malone1, Robert Stevens2, Andy Brown2 and Helen Parkinson1
1European bioinformatics Institute;
2School of Computer Science, University of Manchester.
Agile Ontology Development Document Version 1.0
The SWO Agile Ontology Development (AOD) process in overview is:
1. An online survey is used to source requirements and specific software descriptions
2. Workshop(s) are used to refine requirements and prioritise as well as collect more software descriptions.
3. A modular approach to ontology organization is used to flesh out different components
4. Third party ontologies are reused where suitable and requested parts from external ontologies such as EDAM, OntoDM and IAO – oh and the RO are used.
5. All development releases are made public such that people can see work early and often.
6. “Official” releases are made every month.
7. Iterations in requirements and evaluation are made at a 2nd (and further) workshop (in Agile fashion) and a re-prioritisation of features is done if required.
8. Repeat requirements gathering, develop + release, review process in short cycles.
The organization of the community engagement is vital to this style of ontology development. We had a few principles in mind when organising our workshops:
- No “death by PowerPoint”; the workshops were meant to do work and deliver information and data. Attendees were not present just to listen, but to be listened to and be active participants in the development of SWO.
- We avoided too much “ontologising”, that is, debating the representation of the knowledge to be held in the ontology. The aim of the workshops was to gather what people wanted to be able to say, and thus ask, about a piece of software. As organizers, we could sneak in some clarifying questions about distinctions that might be made in an ontology, but these were asked without ontological jargon.
- We had lots of short sessions with people working in twos or threes on a task. Working individually or “going around the room means attendees spend a lot of time doing nothing or not talking about a task – the former to be avoided the latter to be encouraged. Regular feedback and summarisation of what has been found helps show that attendees are making a difference.
- Dynamic scheduling is important. Our agenda were not set in stone; if something works and expands, then we capitalize on that and “go with it”. If something doesn’t work, then it is curtailed or dropped. The meeting chairs consult regularly on what to do next. As a meeting chair, this can be a bit scary, but it is important to be responsive and flexible.
- The final principle is on numbers of attendees at the meeting; more than about 20 becomes too cumbersome – better to hold another meeting. Fewer than five becomes a chat, which may be useful, but probably lacks critical mass. Also, the emphasis on gathering requirements and so on avoids the need to gain agreement on an ontological representation from a room of 50 people; that kind of agreement is best achieved another way.
An on-line survey was set up to enable any interested member of the digital preservation community to participate with the project (http://www.surveymonkey.com/s/sword1). The survey asked questions around potential use cases for the ontology and offered a structured form for submitting software descriptions. The survey enabled some early prototyping, using the submitted requirements and software descriptions to ‘seed’ the SWO with the understanding that the ontology would likely change as the project continued.
Within the six month period we organized two community workshops, each with about 15 to 20 people attending. Ideally, the first workshop would have been held right at the start of the six month period, but project timings didn’t allow this to happen. Such timing is important as it allows the main aim of requirements gathering to have as much Effect on the project as possible. Following the workshop principles outlined above, we had the following activities:
- Description from the participants of why they may wish to use a SWO. This is the beginning of use case collection and gathering of competency questions. A collection of use cases for how a SWO could be used is an important part of evaluation for the SWO.
- Pairs of attendees wrote down features that they felt to be important to record or know about software on “postit” notes. We had attendees work in pairs so that discussion could draw out as many features as possible. We then divided the attendees into two groups. Each group then arranged the postit notes into clusters. Each cluster represents an important feature of software. These clusters are recorded for use in other activities and for possible input into the ontology. Having two groups independently cluster allows for comparison and discussion. At this point, having a software representation of the features that could be clustered and re-clustered interactively would have been a good thing. In the second workshop we used an on-line tool (Optimal Sort – see http://bit.ly/l1HV4H for the SWOP example) for this task. Its features, however, are slightly lacking and it cost money. An open version that effectively fed into ontology tools such as Protégé would also be useful.
- We had a similar “card ssorting” activity with postit notes. The organization was the same, but this time competency questions were gathered. Such questions are those that the ontology should be able to answer. It also allows a comparison with the basic feature collection, as questions can often expose features not yet provided.
- We also had the attendees develop persona for the SWO. Persona is a feature of user experience design (UXD) in software engineering and is designed to capture all aspects of the user’s interaction with a software artifact (in our case an ontology). They are richer than use cases, where the human user is just regarded as an actor and no account is taken of their attitudes, interaction preferences, and wider user needs and so on. For the SWO, persona can be imagined for a software engineer; data preservationist; working scientist and so on. For each persona, we had attendees describe name, age, occupation, role in an organization, what they would use a SWO for in their hob. Importantly, the persona also includes personality traits that can help motivate decisions about design. These persona can be used in many other activities in the workshops, with attendees being asked to give their answers from the perspective of a persona. We did not use these persona as much as we could or should have done, but they wil be more firmly embedded in our method as it develops.
- We also borrowed the use of “poker games” for prioritizing requirements. The previous activities together yield far more things for the SWO than can reasonably be modeled in a six month project. We used the Agile technique of “Priority Poker” to help us rank the features to be modeled first. In the first workshop, we took the major features so far determined in the workshop and those present assign a value to each of the requirement categories based on their importance, complexity and risk. Each participant then has an amount of “money” with which they can bid for a requirement. The cost of the requirement has to be met for it to receive any attention during a particular phase of development. There is not enough “money” to buy all features and the available funds migrate towards the most valued feature. This worked for ontology requirements. We should have used the persona in this activity; the attendees would have adopted one or more persona when buying a feature. This would have allowed a more diverse set of views to be incorporated into the prioritization than actually occurred. This prioritization was informally re-run in the second workshop and we saw some features move up the priority list. As this agile ontology development method evolves, this poker activity will become a repeated and key activity.
- Once there is a SWO in place, we had attendees start to describe software using the SWO. We recorded these descriptions in spreadsheets. Again, attendees worked in pairs so that they could discuss the software being described and the capabilities of the SWO to make the description. We did this activity several times in each workshop. It has several advantages: it gathered more things to put in the ontology, as the SWO will not contain everything, even if the super class is present); it exposes major missing features from the SWO; provided descriptions of software to put into the SWO; and forms a general evaluation of the SWO. In our second workshop, features that were lower down the priority list moved back up as attendees actually described their software. All spreadsheets were collected for input into the SWO.
WITH THESE PRIORITIES AND OTHER VALUABLE INPUTS GAATHERED IN THE WORKSHOPS, ONTOLOGY MODELLING CAN BEGIN. WE SET OUT BY ORGANISING THE ONTOLOGY INFRA-STRUCTURE. Separating out the different components of the SWO into separate owl files allowing concurrent development and also reuse of specific modules. For example, the SWO is organized into modules for the software itself; data; format; licence; organization; version; and so on. Each of the SWO’s modules are imported into a “master” ontology file. This straight-forward piece of management allows simple concurrent development of different areas of the ontology. Of course, these files are kept on some kind of versioning system. We have used Source forge, but any such system will suffice. It is well known that simple diffs do not work well on OWL files, but versioning does allow roll-backs to be done, together with the usual forking, version numbering and so on. At this stage, we also set down coding standards – semantic free identifiers for URI; labels for each entity and how those labels should be written.
We used both depth first and bredth first development of the SWO. As we had gathered many descriptions of actual pieces of software, we put these into the ontology and described all their features. This would expose certain re-used features that became high priority, such as versions of software and organization owning and/or developing the software. In this case, SWO modelers diverted to populating a particular aspect of the SWO. This approach can be generalized to using bredth first development (touching each aspect of an ontology as a new addition touches it) with depth first development (working on a particular aspect intensively to save time in the near future) as appropriate. As the SWO is about describing software, we drove its development through the description of software. We thought of this as test case driven ontology development. At all stages of development we chose a mixture of hard case and easy cases to test our model. We also added a lot of widely used software as attractions for potential users. This also meant we followed an 80:20 approach, where we wanted to be able to cover all the commonly used cases, without spending disproportionate amounts of time on rarely used software. Alterations for these can come later in SWO’s development.
In general we made the representations as good enough as was necessary to accommodate our competency questions. We avoided “truth and beauty modeling”. The important thing was to model consistently across any aspect of the ontology; this allows for easier, programmatic re-modelling of the ontology into different patterns as it becomes necessary.