Science Fair Project Encyclopedia
FASTA
FASTA is a sequence alignment package first described (as FASTP) by David J. Lipman and William R. Pearson in 1985 in the article Rapid and sensitive protein similarity searches. There are several programs in this package that allow the alignment of protein sequences and DNA sequences.
A sequence in FASTA format begins with a single-line description, followed by lines of sequence data. The description line is distinguished from the sequence data by a greater-than (">") symbol in the first column. The word following the ">" symbol is the identifier of the sequence, and the rest of the line is the description (both are optional). There should be no space between the ">" and the first letter of the identifier. It is recommended that all lines of text be shorter than 80 characters. The sequence ends if another line starting with a ">" appears; this indicates the start of another sequence. An example FASTA format:
>gi|5524211|gb|AAD44166.1| cytochrome b Elephas maximus maximus LCLYTHIGRNIYYGSYLYSETWNTGIMLLLITMATAFMGYVLPWGQMSFWGATVITNLFSAIPYIGTNLV EWIWGGFSVDKATLNRFFAFHFILPFTMVALAGVHLTFLHETGSNNPLGLTSDSDKIPFHPYYTIKDFLG LLILILLLLLLALLSPDMLGDPDNHMPADPLNTPLHIKPEWYFLFAYAILRSVPNKLGGVLALFLSIVIL GLMPFLHTSKHRSMMLRPLSQALFWTLTMDLLTLTWIGSQPVEYPYTIIGQMASILYFSIILAFLPIAGX IENY
Sequences are expected to be represented in the standard IUB/IUPAC amino acid and nucleic acid codes, with these exceptions: lower-case letters are accepted and are mapped into upper-case; a single hyphen or dash can be used to represent a gap of indeterminate length; and in amino acid sequences, U and * are acceptable letters (see below). Before submitting a request, any numerical digits in the query sequence should either be removed or replaced by appropriate letter codes (e.g., N for unknown nucleic acid residue or X for unknown amino acid residue).
The nucleic acid codes supported are:
| Nucleic Acid Code | Meaning |
|---|---|
| A | Adenosine |
| C | Cytidine |
| G | Guanine |
| T | Thymidine |
| U | Uracil |
| R | G A (puRine) |
| Y | T C (pYrimidine) |
| K | G T (Ketone) |
| M | A C (aMino group) |
| S | G C (Strong interaction) |
| W | A T (Weak interaction) |
| B | G T C (not A) (B comes after A) |
| D | G A T (not C) (D comes after C) |
| H | A C T (not G) (H comes after G) |
| V | G C A (not T, not U) (V comes after U) |
| N | A G C T (aNy) |
| - | gap of indeterminate length |
The amino acid codes supported are:
| Amino Acid Code | Meaning |
|---|---|
| A | Alanine |
| B | Aspartic acid or Asparagine |
| C | Cysteine |
| D | Aspartate |
| E | Glutamate |
| F | Phenylalanine |
| G | Glycine |
| H | Histidine |
| I | Isoleucine |
| K | Lysine |
| L | Leucine |
| M | Methionine |
| N | Asparagine |
| P | Proline |
| Q | Glutamine |
| R | Arginine |
| S | Serine |
| T | Threonine |
| U | Selenocysteine |
| V | Valine |
| W | Tryptophan |
| Y | Tyrosine |
| Z | Glutamate or Glutamine |
| X | any |
| * | translation stop |
| - | gap of indeterminate length |
External link
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details


