Updates and additions to the BLAST software class

The initial goal of all of the software I’m going to be adding over the next few weeks is to model function, then to go back and model inputs and outputs. As there is time, further modelling (e.g. organizations involved, associated publications and download sites) can take place. It seems the best way to capture “function” is to provide achieves objective axioms. If required, I can modify which axioms are populated in this first-pass modelling step.

If extra information is present without further effort (e.g. if I find associated publications as I am working on modelling the objective) I will add axioms describing such information rather than waiting. If the information is there it may as well be added rather than arbitrarily waiting and possibly forgetting to do so.

BLAST

Questions

“BLAST” could mean the algorithm or any of its multiple incarnations as executable software (and BLASTN appears further down the list). Until now, the BLAST software from the NIH is already modelled via efo:SWO_0000054, and its axioms only describe the organization; the only description of the function of the software is within the textual definitions, of which there are two (from two different authors). The textual definitions seem to include both the protein and the nucleotide versions of BLAST in the same class. The questions below for BLAST might equally apply to many other pieces of software: others might have multiple executables for performing slightly different tasks, or have a web and a command-line version. I would just like to get an idea of the style of modelling for SWO.

  1. The Web version of BLAST and the executable version of BLAST, both  provided by the NIH, would seem to me to be 2 different versions of the software. Now I don’t see any reason to create classes for *every* version of BLAST out there, but equally these will be the two most heavily used versions of BLAST, and so would seem to require two separate software classes. Does this make sense? We are already able to model in SWO both command line interfaces and web interfaces, so it makes sense to differentiate between these two things. We could do this by saying a single BLAST class can provide both Web and command line functionality, or by making two classes, one for each type of functionality.
  2. The executable version of blast is not one single file (though there are scripts for running through multiple types of searches), e.g. from the manual at http://www.ncbi.nlm.nih.gov/books/NBK1763/: “The blastn, blastp, blastx, tblastx, tblastn, psiblast, rpsblast, and rpstblastn are considered search applications, as they execute a BLAST search, whereas makeblastdb, blastdb_aliastool, and blastdbcmd are considered BLAST database applications, as they either create or examine BLAST databases.” It could be said that each executable is a single piece of software, though in practice I’m not sure we want to model in such detail. But where do we draw the line? Do we mention blastn and blastp separately (they are currently conflated in the textual definitions of BLAST in SWO)?
  3. This leads to another associated question: I’ve had a look in BioPortal and realised that EDAM_operation:0292 and it’s children cover this topic perfectly (e.g. have a look under Operation -> Alignment Construction in http://www.ebi.ac.uk/ontology-lookup/browse.do?ontName=EDAM). As we use EDAM in other locations within SWO, should we just import the sequence alignment construction hierarchy, or even the operation hierarchy, rather than creating our own? The only possible problem I see is that the operation class in EDAM has a textual definition that may not match completely, as it mentions inputs and outputs, and the information processing objective within SWO seems more abstract. Is this part of the EDAM hierarchy the right sort of thing to join up with around the area of our pattern discovery task or even join at its parent class, information processing objective?

Answers

The general answer to questions 1 and 2 is that we model what BLAST (or any) software can do by listing the tasks it can perform. This means that the greater complexity of hierarchy is within the tasks section of the ontology. In the case of BLAST, pairwise and multiple alignment classes are required. The fact that a run of BLAST might be pairwise comes from the type of data input, so that information would belong within the data hierarchy. In theory, therefore, the nucleotide pairwise alignment could be inferred, but that doesn’t need to be done at the moment. It is enough right now to state that BLAST is capable of certain types of alignment tasks.

With regards to EDAM and question 3, the response from the SWORD mailing list was this:

Robert: “One thing we’ve been talking about is aligning EDAM with SWO. This would mean us openly talking about the future of EDAM and SWO. I’m not keen on replicating work, so on the whole I’d like to use others. However, we should separate out the inputs/outputs from the task….”

James: “I agree with Robert and I’ve chatted with Jon about this. That part of EDAM is more problematic as there is a definite conflation of several of the axes in SWO. One thing we might do is duplicate it in SWO as we would model it and then compare with EDAM. A pattern of conversion may emerge perhaps?”

Therefore, right now we will model these classes within SWO separately, and add EDAM classes at a later date.

Modelling performed

I was unsure if the most appropriate task hierarchy was data mining task or data processing task, as it seems the two aren’t mutually exclusive. The definitions provided by the external ontology to which these classes belong do not provide a clear answer. As further down in the data mining task hierarchy pattern discovery task seems to match best, I decided to place sequence alignment task as a subclass of pattern discovery task:

  1. Class: ‘sequence alignment task’ (http://www.ebi.ac.uk/swo/objective/SWO_7000002)
    SubClassOf: ‘pattern discovery task’
  2. Class: ‘multiple sequence alignment task’ (http://www.ebi.ac.uk/swo/objective/SWO_7000003)
    SubClassOf:
    ‘sequence alignment task’
  3. Class: ‘pairwise sequence alignment task’ (http://www.ebi.ac.uk/swo/objective/SWO_7000004)
    SubClassOf:
    ‘sequence alignment task’
  4. ‘achieves objective’ some ‘multiple sequence alignment’
    ‘achieves objective’ some ‘pairwise sequence alignment’

    A link has been made between the BLAST software and the tasks achieved by the software.
  5. DataProperty: ‘has documentation’(http://www.ebi.ac.uk/swo/SWO_0000043)
    In order to be able to model associated publications such as papers and user manuals, a data property was created with the identifier SWO_0000043 within the core OWL file.
  6. DataProperty: ‘has download location’(http://www.ebi.ac.uk/swo/SWO_0000046)
    In order to be able to model locations the software can be retrieved from, a data property was created with the identifier SWO_0000046 within the core OWL file.
  7. Class: ‘BLAST+ 2.2.26’ (http://www.ebi.ac.uk/swo/SWO_0000044)
    SubClassOf:
    BLAST
    has version’ value “BLAST+ 2.2.26”(http://www.ebi.ac.uk/swo/versions/SWO_2000027)
    has interface’ some ‘Command Line Interface’
    ‘has publication’ value “http://dx.doi.org/10.1016/S0022-2836(05)80360-2

    ‘has download location’ value “ftp://ftp.ncbi.nih.gov/blast/executables/
    I have created this class to describe a specific type of this software – it is the command line version made by the NCBI. I have chosen this version simply because it is the current version.

Future Work

License, developed by, published by, development status to follow as required. Add inputs (e.g. FASTA) and output (e.g. XML, html and text file format). However, there are many output formats allowed for blast as well as the ability to create customised output formats. Should we list every possible output format or just those most commonly used (I think it would be the latter)? (see http://www.compbio.ox.ac.uk/analysis_tools/BLAST/BLAST_blastall/blastall_examples.shtml).

Identifiers created today

Unless otherwise stated, all identifiers are prefixed with the “ http://www.ebi.ac.uk/swo/” URI.

  1. objective/SWO_7000002 : sequence alignment (class)
  2. objective/SWO_7000003 : multiple sequence alignment (class)
  3. objective/SWO_7000004 : pairwise sequence alignment (class)
  4. SWO_0000043 : has publication (data property)
  5. SWO_0000044: BLAST+ 2.2.26 (class)
  6. versions/SWO_2000027 : BLAST+ version 2.2.26 (instance)
  7. SWO_0000046 : has download location (data property)

About Allyson Lister

Find me at https://orcid.org/0000-0002-7702-4495 and https://www.eng.ox.ac.uk/people/allyson-lister/
This entry was posted in modeling, ontology and tagged , , . Bookmark the permalink.

Leave a comment