Path: funic!fuug!mcsun!uunet!bionet!GENBANK.BIO.NET!kristoff From: kristoff@GENBANK.BIO.NET (Dave Kristofferson) Newsgroups: bionet.molbio.genbank Subject: IMPORTANT - New GenBank BLAST Database Search E-mail Server Message-ID: Date: 3 Sep 91 18:13:29 GMT Sender: kristoff@genbank.bio.net Distribution: bionet Lines: 348 GenBank is pleased to announce the availability of a new e-mail server for database similarity searches. The BLAST program has been made available to us by NCBI (reference below) and instructions for its use are appended below. Currently the server allows searches of the latest quarterly releases of GenBank, PIR, and SWISS-PROT. Access to the EMBL database and the daily GenBank and EMBL updates will be added soon. Currently the blastn and blastp programs for nucleic acid and protein searches are available although other options may also be added in the near future as computing resources permit. We are starting this off by limiting execution to two queues for nucleic acid (i queue) and protein searches (h queue). Only one job will execute simultaneously in each queue. BLAST is very fast, however, so we do not expect the queues to become very lengthy. However, we will monitor the situation and make adjustments as needed. Although we believe that the system is operating correctly, please assist us by reporting any bugs or other problems to blast-req@genbank.bio.net. The FASTA e-mail server (search@genbank.bio.net) also continues in operation (see note below). Sincerely, David Kristofferson, Ph.D. GenBank Manager kristoff@genbank.bio.net ---------------------------------------------------------------------- BLAST Mail Server Help Document BLAST - Basic Local Alignment Search Tool BLAST was developed by the National Center for Biotechnology Information at the National Library of Medicine and kindly made available for use on the GenBank On-line Service. The program employs a heuristic search algorithm to compare an amino acid query sequence against a protein sequence database or a nucleotide query sequence against a nucleotide sequence database. The BLAST program compares sequences with databases using an ungapped alignment algorithm: S. F. Altschul, W. Gish, W. Miller, E. W. Myers and D. J. Lipman (1990) J. Mol. Biol. 215, 403-410. If you use BLAST as a research tool, we ask that this reference be cited in your paper. You can access the GenBank BLAST Mail Server through a number of different networks, including Internet, BITNET, EARN, NETNORTH and JANET. The GenBank BLAST server allows you to send a specially formatted mail message containing the nucleic acid or protein query sequence to the BLAST Server at GenBank. A BLAST sequence similarity search is then performed against the specified database using the BLAST algorithm. **** DISCLAIMER **** GenBank provides access to several different search algorithms. Please note that we make no claims that the results from either of one of our servers (BLAST vs. FASTA) are to be preferred over the other. While BLAST is faster than FASTA, users should come to their own conclusions about search sensitivity based on a comparison of their own results before deciding which algorithm is suited for their purposes. Yet another search algorithm (FASTDB) is available only for interactive use on GOS. Please be aware that all of these programs may produce somewhat different search results. To obtain instructions for the FASTA e-mail server, send the message HELP to the address search@genbank.bio.net (leave the Subject: line blank). ******************** Accessing the BLAST program To access the program, send an electronic mail message containing the formatted query sequence (as described below) to the following Internet address: BLAST@GENBANK.BIO.NET If you are not on Internet, you may need to change the format of the address. Consult your systems manager to determine the correct address format. Obtaining Help If you would like to receive instructions on using the BLAST program, send a mail message to the address above containing the word "HELP" on a single line of the mail message. Leave the Subject line in the mail header blank. Appended to the end of the help text is the BLAST manual page. This document will describe specific BLAST program functions. For additional help on using BLAST, contact GenBank at (415) 962-7307 or send an electronic mail message to the address: CONSULTANT@GENBANK.BIO.NET Databases for use with BLAST The following databases are currently available for BLAST searches: Designator Database ---------- -------- GenBank Latest GenBank quarterly release. SWISS-PROT All of the SWISS-PROT protein database. PIR All of the PIR protein database. GenBank is a nucleic acid sequence databases and SWISS-PROT and PIR are protein sequence databases. Currently BLAST (which uses a compressed database format) does not search the daily GenBank updates. This remains to be implemented. Other databases will be added soon too. Formatting a Query Queries consist of a mail message with search parameters identifying the program (blastp for proteins or blastn for nucleic acids), the database to be searched, values related to the search, and the query sequence to be used in the search. The mail message has three mandatory lines, one optional line, and a line identifying the query sequence as described below. These lines are typed into the body of the mail message in the order shown below: Search Parameter Mandatory Explanation BLASTPROGRAM Yes Indicates whether to perform a nucleic acid (BLASTN) or protein (BLASTP) search. DATALIB Yes This line specifies the database to be searched (see section below under "Sending the Query sequence") and must be included in the message. MATCH No Scoring value for a match ( applicable for BLASTN only ). BLASTP uses the PAM120 scoring matrix. BEGIN Yes This line must be included in the message. No other information is typed on it. The remainder of the message contains the query sequence in FASTA format (described below; a complete sample query is also provided). *NOTE*: all lines must be LESS THAN 80 characters in length; longer lines will be truncated. Only one query sequence is allowed per mail message and your sequence must be in FASTA format. IntelliGenetics format and GenBank database file format are not currently accepted; however, it is possible to use an editor to change the file to FASTA format. The format includes a mandatory comment line beginning with a greater-than sign ">" followed by the name of the sequence, a space, and an optional note about the sequence. The sequence data begin on the next line without the greater-than sign. For example: >AGREP4 Monkey SV40-like genomic segment promoting transcription. ccccttcaaatctattacaaggtgagcgtctcgccaaggcaatgaaatcgcaatatgatg tttccatttactttggattatacgtcattataaa Sending the Query Sequence Use your local mail program to send GenBank your query sequence. Most mail programs allow you to import a file containing your sequence into the mail message. You should import your sequence file into the mail message on the line after "BEGIN". Please follow the format in the following example of a BLAST request PRECISELY, but note that the program is case-insensitive, i.e. either upper or lower case letters may be used. BLAST MAIL QUERY EXAMPLE Note that the first four lines in the example below are a mail header that is automatically created when you address a mail message. Nothing need be entered for the Subject. NOTE: the text that you enter into the body of the message begins with the "BLASTPROGRAM" keyword below (do not add blank lines in the message). Each line of information must be less than 80 characters in length. Longer lines will be truncated. ~From: drgene@someaddress.somewhere.edu Tue Jun 14 21:36:38 1988 ~Date: 14 Jun 1988 2129:02-PDT To: BLAST@GENBANK.BIO.NET ~Subject: BLASTPROGRAM blastn DATALIB genbank BEGIN >BOVPRL GenBank entry BOVPRL from gbmam file.907 nucleotides. tgcttggctgaggagccataggacgagagcttcctggtgaagtgtgtttcttgaaatcat caccaccatggacagcaaa The example above uses the three mandatory keyword lines: BLASTPROGRAM, DATALIB, and BEGIN. The MATCH line can be used between the DATALIB and BEGIN lines when using the blastn program (see above). See above for a list of choices for the DATALIB line. The completed mail message is then sent to the BLAST Server at GenBank. Once your message is received, it is placed in a batch queue and processed in the order it is received. If you would like to know the status of the queues being processed, you can send a mail message to the BLAST Server address (BLAST@GENBANK.BIO.NET) containing the word "QUEUE" on a single line of the mail message (Leave the Subject field blank). The BLASTP queue is labeled with the letter "h"; the BLASTN queue is labeled with "i", e.g., Rank Execution Date Owner Job # Queue Job Name 1st Aug 22, 1991 16:33 kristoff 446 h kristoff@genbank.bio.n 1st Aug 22, 1991 16:33 kristoff 447 i kristoff@genbank.bio.n Multiple jobs are currently permitted in the queues, but please limit your zeal since others also use the service. For example, submitting ten jobs simultaneously would definitely be in bad taste. We would prefer it if, after submitting 2 - 4 jobs to the queues, you wait until your results are received before submitting additional runs. If these conventions are repeatedly violated we will be forced to implement automatic limitations on the queue as we have for the FASTA "e" queue. Handling the Results of a BLAST Search When the results are returned, use your local mail program to view them. You can transfer the results of a BLAST search to a separate disk file to free up space in your mail directory. Consult the documentation for your local mail program for the commands to read and transfer mail. Interpreting the Results of a BLAST Search Please consult the BLAST manual section (appended to this file) for complete details concerning result descriptions. How to query the BLAST server queue Please note that e-mail retrieval server requests are placed in a queue for processing. Thus it may take a couple of minutes to get your entries back if many people have submitted requests at the same time. This queuing provides efficient scheduling of resources. To find out what requests are queued, send the word "QUEUE" to BLAST@GENBANK.BIO.NET. Retrieving individual entries found in BLAST searches Database entries can be retrieved by either locus name or accession number. To use the GenBank Retrieval System, send an electronic message to RETRIEVE@GENBANK.BIO.NET containing as text (leave the ~Subject: line blank) either accession numbers (one per line) and/or entry names (one per line). Multiple entries may be submitted in a single message using the following format: CHKTUBA BNACYP J02852 J02855 Each sequence found will be returned in a separate mail message. The data banks are searched in the order: GenBank New Data, GenBank current release, EMBL New Data, EMBL current release, GenPept New Data, GenPept current release, and Swiss-Prot until a match is found. If an entry exists in both GenBank and EMBL with the same accession number (the usual case), a query on the accession number will return the GenBank version of the entry. If the EMBL-format version is required, it can be retrieved from the file server at NETSERV@EMBL-Heidelberg.DE (for instructions send a message containing the line HELP to that address). To retrieve GenPept entries, use the LOCUS name of the corresponding GenBank entry followed by a _1, or _n where n represents the nth coding region in that GenBank entry. For example, ASNTUBBA_1 is the GenPept LOCUS name for the translation of the first coding region from GenBank entry ASNTUBBA. Please note that e-mail retrieval server requests are now placed in a queue for processing versus being handled immediately as was the case in the past. Thus it may take a couple of minutes longer now to get your entries back if many people have submitted requests at the same time. This inconvenience was necessary due to the increasing popularity of the service. All batch queues on GOS may be monitored by sending the word QUEUE to SEARCH@GENBANK.BIO.NET. Retrieval requests are entered into the "g" queue. IF YOU DO NOT FIND YOUR ENTRY ON THE SERVER: Authors often request that data be held in confidence until after publication even though they have already been assigned an accession number for their data. This adds an additional delay in data release because the databank staff must ascertain that the data has appeared in print. If you have a reference to sequence data but can not retrieve the data from the e-mail server, please send the literature reference and accession number (or locus name) to UPDATE@GENOME.LANL.GOV and the data will be released to the server as soon as verification of publication is made. DATA SUBMISSION An electronic version of the sequence data submission form used by the sequence data banks is also available through the RETRIEVE server. To receive a copy, send a message containing the word DATASUB as the only line. Instructions for completing and submitting the form are included. We would appreciate it if you would use this form only if you can not use our free Authorin data submission software for the IBM PC and Macintosh. Copies of Authorin may be requested by sending e-mail to authorin@genbank.bio.net. If you have any questions or comments, feel free to mail them to RETRIEVE-REQUEST@GENBANK.BIO.NET. Obtaining BLAST BLAST is available by anonymous ftp from ncbi.nlm.nih.gov [130.14.20.1] in the pub/blast directory. End of BLAST Server Help (BLAST man page omitted from here)