Biological Relevance
Find evolutionary related sequences for phylogenetic analysis, by using BLAST to help identify similar sequences then use manual identification of the hits to select sequences for multiple sequence alignment.
Primary persona: Jane
Jane is a first year bioinformatics masters student, female, 28 years old. worked for a health research institute for 4 years. Prior knowledge of computers but not bioinformatics. Attended 6 months worth of introductory bioinformatics courses.
Secondary persona: Professor Bob
Professor Bob is the project leader. 38 years old and has been involved in sequencing projects for the past ten years. Has knowledge of the advanced BLAST settings.
Scenario 1.
Start with a sequence in Genbank format displayed in a web browser. Use the sequence as input to the BLAST similarity search of the "nr" database. Select the hits with an E-value better than 1e-70 and export those sequence records in FASTA format, for use by CLUSTALW on Jane's machine.
Scenario 2.
Start with a sequence record in FASTA format in the file "abc.fasta" on Jane's machine and search the "kinase" database. From the graphical overview of the BLAST results Jane sees that one hit aligns to a different part of the target than the others. She's curious and wants to find out more information about that outlier.
Scenario 3.
Jane got an urgent SMS from Professor Bob to find more information about NBN:12345, a sequence record in the local NBN database. Email him the results of a BLAST search against the "local" database.
System Requirements
- This will be developed as a web application
- The similarity search will be done with BLASTN. All of the BLAST
parameters will be pre-selected by Professor Bob so that Jane only inputs
the query sequence and target database.
The query sequence may be:- a pasted GenBank record (input starts with "LOCUS ")
- a pasted FASTA record (input starts with ">")
- a local database identifier (input starts with "NBN:")
- the raw sequence (anything else, removing all characters except for the letters [A-Za-z])
- a local file in GenBank format
- a local file in FASTA format
- "local" - a private local database
- "nr" - the non-redundant sequences from GenBank
- "kinase" - a set of kinase genes
- Jane is trained on reading BLAST results and is used to reading the output file itself. There is no need to reformat the output into something that is more readable.
- The BLAST result must have a graphical overview similar to what NCBI uses.
- BLAST hits must have a hyperlink to more information about the given sequence
- The output must have a way to select a subset of the similar target sequences, for download to Jane's machine. There must be a way to get the selected sequence records in FASTA and GenBank formats. There should be a way to get the selected hits in comma-separated format ("CSV") for use in a spreadsheet.
- The system will eventually run on the internet so there should be a way to ask for a username and password. These will be administered by the IT department. The password page must have a "mailto" link to let people know where to apply for a password. Password and account administration is outside the scope of this project.
Copyright © 2001-2020 Andrew Dalke Scientific AB