============================================================================= *********************************************** * GePSAN * * Geneva Protein Sequence Analysis Newsletter * *********************************************** Published by: Amos Bairoch Dept. Medical Biochemistry / University of Geneva. Switzerland Volume 1, Number 1 / January 1991 To subscribe (or unsubscribe) to this newsletter: gepsan@cgecmu51.bitnet To send comments/suggestions/criticisms: bairoch@cgecmu51.bitnet Data bases availability summary +------------+-------+------------------------------------------------------+ | Data base | Rel. | Email FTP FTP Tape CD-ROM | | | | EMBL File Server GenBank NCBI | +------------+-------+------------------------------------------------------+ | SWISS-PROT | 16.0 | Yes (by entry) Soon Yes Yes Yes | | ENZYME | 3.0 | Yes Yes Yes Yes Yes | | PROSITE | 6.0 | Yes Yes Yes Yes Yes | | SEQANALREF | 13.5 | Yes Yes Yes No No | +------------+-------+------------------------------------------------------+ SWISS-PROT/PROSITE/ENZYME tapes or CD-ROM subscription: datalib@embl.bitnet EMBL file server email address: netserv@embl.bitnet GenBank On-line Service FTP address: genbank.bio.net (or 134.172.1.160) NCBI FTP address: ncbi.nlm.nih.gov (or 130.14.20.1) ============================================================================= ============================================================================= TABLE OF CONTENTS Volume 1, Number 1 / January 1991 1. What is GePSAN. 2. SWISS-PROT news. 3. Cross-references to OMIM in SWISS-PROT and PROSITE. 4. Biomolecular databases integration: current status. 5. NCBI, the GenInfo Backbone Database, and the ASN.1 syntax. 6. PROSITE news. 7. Updated list of public domain programs which make use of PROSITE. 8. ENZYME news. 9. Specialized databases part 1: the P450 database. ============================================================================= ============================================================================= Section: 1 Title : What is GePSAN. GePSAN is a newsletter that deals with aspects of protein sequence analysis that are relevant to the data bases that are maintained at the Department of Medical Biochemistry (DMB) of the University of Geneva, namely: SWISS-PROT: An annotated protein sequence data base. A joint project of the DMB and of the EMBL Data Library. PROSITE : A dictionary of sites and patterns in proteins. ENZYME : An enzyme nomenclature data base. SEQANALREF: A sequence analysis bibliographic reference data base. This newsletter will also attempt to report new developments in the field of protein sequence analysis. ============================================================================= ============================================================================= Section: 2 Title : SWISS-PROT news. 1) Release 16 ============= Release 16.0 of SWISS-PROT contains 18364 sequence entries, comprising 5'986'949 amino acids abstracted from 17763 references. This represents an increase of 9% over release 15. More than 1400 sequences have been added since release 15, the sequence data of 271 existing entries has been updated and the annotations of 3500 entries have been revised. In particular we have used reviews articles to update the annotations of the following groups or families of proteins: - Alpha and beta adrenergic receptors - Arrestins - Chromogranins / secretogranins - CTF/NF-I family - ClpP proteases - ets family - GABA(A) receptors - Gram-positive cocci surface proteins - Hexokinases - Integrins alpha and beta chains - NMePhe pili proteins - p53 proteins - Poly(ADP-ribose) polymerase - Profilins - S-Adenosylmethionine synthetases - Site-specific recombinases - Synaptobrevins - Type-II membrane antigens - UDP-glucoronosyl transferases - Uteroglobin family - LBP / BPI / CETP family We have finished adding cross-references to human protein sequence entries which are represented in the latest edition of OMIM (see the next section for full details). 2) Future developments ====================== One question many users of SWISS-PROT ask me is: what is the exact extent of the overlap between SWISS-PROT and PIR ? Up to now cross-references (DR lines) were provided only to entries in the annotated section of PIR (which is now called PIR1) and for which we provide a complete overlap. Only a few cross-references were provided to entries in the unannotated sections of PIR (which used to be called "NEW", but are now known as PIR2 and PIR3). We started in release 16 to add cross-references, this task will continue in release 17 and be completed for release 18. At that point it will be possible to users that do not want to scan two protein data banks to automatically extract from PIR2/PIR3 all the sequences that are not present in SWISS-PROT and to produce a file that complement SWISS-PROT. In a next issue of this newsletter we will explain this process in detail and also describe what exactly are the differences between SWISS-PROT and PIR. In release 18 we will invert the order of the information in the OS line. Currently we have 'English common name (Latin name)`, we will switch to 'Latin name (English common name)`. Example: OS HUMAN (HOMO SAPIENS). will be changed to: OS HOMO SAPIENS (HUMAN). We hope to also provide in release 18 cross-references to TFD (the relational database of transcription factors from David Gosh (NCBI / USA). 3) News concerning SWISS-PROT availability ========================================== a) New SWISS-PROT entries and updates to existing entries are now available in between regular releases from the EMBL File Server. They are not provided on a daily basis like new nucleotide entries, but we intend to make at least two or three sets of incremental updates between each release. b) SWISS-PROT is now available for download by FTP from the NCBI server. All the files are in the \repository\SWISS-PROT directory. c) SWISS-PROT will also soon be available, also by FTP, from the GenBank On-line Service (GOS) server ============================================================================= ============================================================================= Section: 3 Title : Cross-references to OMIM in SWISS-PROT and PROSITE. OMIM is the on-line version of Mendelian Inheritance in Man (MIM), the famous book from Victor McKusick [1] which holds clinical data on a range of human genetic diseases as well as all known gene loci. During the last five months we have implemented cross-references to OMIM both in SWISS-PROT and ENZYME. [1] McKusick Victor A. Mendelian Inheritance in Man Catalogs of autosomal dominant, autosomal recessive, and X-linked phenotypes Ninth edition Johns Hopkins University Press, Baltimore, (1990). Practically what has been done in SWISS-PROT is the following: 1) In each human protein entry whose gene was found to be described in OMIM, a DR (cross-reference) line was added that points to the OMIM six digits catalog number. Example: DR MIM; 261600; NINTH EDITION. Currently (in release 16.0 of SWISS-PROT) there are 840 human protein sequence entries with one or more DR lines that points to OMIM. A new document file, called MIMTOSP.TXT, is provided with SWISS-PROT, it is a sorted list of the MIM catalog entries cross-referenced in SWISS- PROT and the corresponding protein sequence entry names. 2) If the protein is associated with a genetic defect or disease, this has been indicated in the CC lines using the "DISEASE" topic. Examples: CC -!- DISEASE: THIS ENZYME IS DEFICIENT IN TWO GENETIC DISEASES: THE CC LESCH-NYHAN SYNDROME, IN WHICH THERE IS NO ENZYME ACTIVITY; AND CC HYPERURICEMIA WITH AN EARLY ONSET OF GOUT, IN WHICH THERE IS CC PARTIAL ENZYME ACTIVITY. CC -!- DISEASE: DEFICIENCY OF THE ENZYME CAUSES PHENYLKETONURIA (PKU), CC THE MOST COMMON INBORN ERROR OF AMINO ACID METABOLISM. 3) If variants of the sequences are known, they have been indicated in the feature table using the "VARIANT" key. Example: FT VARIANT 103 103 S -> R (GOUT MUNICH). On the following page is an example of SWISS-PROT sequence which contains all three types of MIM-related enhancements described above. ID CAH2$HUMAN STANDARD; PRT; 259 AA. AC P00918; DT 21-JUL-1986 (REL. 01, CREATED) DT 21-JUL-1986 (REL. 01, LAST SEQUENCE UPDATE) DT 01-NOV-1990 (REL. 16, LAST ANNOTATION UPDATE) DE CARBONIC ANHYDRASE II (EC 4.2.1.1) (CARBONATE DEHYDRATASE II) (GENE DE NAME: CA2). OS HUMAN (HOMO SAPIENS). OC EUKARYOTA; METAZOA; CHORDATA; VERTEBRATA; TETRAPODA; MAMMALIA; OC EUTHERIA; PRIMATES. RN [1] (SEQUENCE FROM N.A.) RA MONTGOMERY J.C., VENTA P.J., TASHIAN R.E., HEWETT-EMMETT D.; RL NUCLEIC ACIDS RES. 15:4687-4687(1987). RN [2] (SEQUENCE FROM N.A.) RA MURAKAMI H., MARELICH G.P., GRUBB J.H., KYLE J.W., SLY W.S.; RL GENOMICS 1:159-166(1987). RN [3] (SEQUENCE) RA HENDERSON L.E., HENRIKSSON D., NYMAN P.O.; RL J. BIOL. CHEM. 251:5457-5463(1976). RN [4] (SEQUENCE) RA LIN K.-T.D., DEUTSCH H.F.; RL J. BIOL. CHEM. 249:2329-2337(1974). RN [5] (SEQUENCE OF 1-76 FROM N.A.) RA VENTA P.J., MONTGOMERY J.C., HEWETT-EMMETT D., TASHIAN R.E.; RL BIOCHIM. BIOPHYS. ACTA 826:195-201(1985). RN [6] (X-RAY CRYSTALLOGRAPHY, 2.0 ANGSTROMS) RA LILJAS A., KANNAN K.K., BERGSTEN P.-C., WAARA I., FRIDBORG K., RA STRANDBERG B., CARLBOM U., JARUP L., LOVGREN S., PETEF M.; RL NATURE NEW BIOL. 235:131-137(1972). RN [7] (X-RAY CRYSTALLOGRAPHY, 2.0 ANGSTROMS) RA ERIKSSON A.E., JONES T.A., LILJAS A.; RL PROTEINS 4:274-282(1988). RN [8] (X-RAY CRYSTALLOGRAPHY, 2.0 ANGSTROMS) RA ERIKSSON A.E., KYLSTEN P.M., JONES T.A., LILJAS A.; RL PROTEINS 4:283-293(1988). RN [9] (JOGJAKARTA VARIANT) RA JONES G.L., SOFRO A.S.M., SHAW D.C.; RL BIOCHEM. GENET. 20:979-1000(1982). RN [10] (MELBOURNE VARIANT) RA JONES G.L., SHAW D.C.; RL HUM. GENET. 63:392-399(1983). CC -!- CATALYTIC ACTIVITY: H(2)CO(3) = CO(2) + H(2)O (REVERSIBLE CC HYDRATATION OF CARBON MONOXIDE). CC -!- THERE ARE AT LEAST 6 ENZYMATIC FORMS OF CARBONIC ANHYDRASE: CA-I CC (OR B), CA-II (OR C), CA-III (OR M), CA-IV, CA-V AND CA-VI. CC -!- DISEASE: DEFECTS IN CA2 ARE THE CAUSE OF OSTEOPETROSIS WITH RENAL CC TUBULAR ACIDOSIS (MARBLE BRAIN DISEASE). DR EMBL; Y00339; HSCA2. DR EMBL; X03251; HSCAII. DR EMBL; J03037; HSCAIIA. DR PIR; A01141; CRHU2. DR PIR; A23202; A23202. DR PIR; A27175; A27175. DR PDB; 1CA2; 15-JAN-90. DR PDB; 2CA2; 15-APR-90. DR PDB; 3CA2; 15-APR-90. DR MIM; 259730; NINTH EDITION. KW LYASE; ACETYLATION; ZINC; 3D-STRUCTURE. FT INIT_MET 0 0 FT MOD_RES 1 1 ACETYLATION. FT ACT_SITE 63 63 FT ACT_SITE 66 66 FT METAL 93 93 ZINC, CATALYTIC. FT METAL 95 95 ZINC, CATALYTIC. FT METAL 118 118 ZINC, CATALYTIC. FT ACT_SITE 126 126 FT ACT_SITE 196 198 FT VARIANT 17 17 K -> E (JOGJAKARTA). FT VARIANT 235 235 P -> H (MELBOURNE). FT VARIANT 251 251 N -> D. SQ SEQUENCE 259 AA; 29115 MW; 365693 CN; SHHWGYGKHN GPEHWHKDFP IAKGERQSPV DIDTHTAKYD PSLKPLSVSY DQATSLRILN NGHAFNVEFD DSQDKAVLKG GPLDGTYRLI QFHFHWGSLD GQGSEHTVDK KKYAAELHLV HWNTKYGDFG KAVQQPDGLA VLGIFLKVGS AKPGLQKVVD VLDSIKTKGK SADFTNFDPR GLLPESLDYW TYPGSLTTPP LLECVTWIVL KEPISVSSEQ VLKFRKLNFN GEGEPEELMV DNWRPAQPLK NRQIKASFK // In ENZYME we have added a "DI" (DIsease) line for all enzymes which are known to be associated with a genetic defect. As shown in the following example: DI PHENYLKETONURIA; MIM:261600. Here is an example of an ENZYME entry with a DI line: ID 4.2.1.1 DE CARBONIC DEHYDRATASE. AN CARBONIC ANHYDRASE. CA H(2)CO(3) = CO(2) + H(2)O. CF ZINC. DI OSTEOPETROSIS-RENAL TUBULAR ACIDOSIS SYNDROME; MIM:259730. DR P00917, CAH1$HORSE; P00915, CAH1$HUMAN; P00916, CAH1$MACMU; DR P13634, CAH1$MOUSE; P07452, CAH1$RABIT; P00921, CAH2$BOVIN; DR P07630, CAH2$CHICK; P00918, CAH2$HUMAN; P00920, CAH2$MOUSE; DR P00919, CAH2$RABIT; P00922, CAH2$SHEEP; P07450, CAH3$HORSE; DR P07451, CAH3$HUMAN; P16015, CAH3$MOUSE; P14141, CAH3$RAT ; DR P18915, CAH6$BOVIN; P18761, CAH6$MOUSE; P08060, CAH6$SHEEP; DR P17067, CAHC$PEA ; P16016, CAHC$SPIOL; // ============================================================================= ============================================================================= Section: 4 Title : Biomolecular databases integration: current status. In the last six months there has been a number of developments relative to the integration of biomolecular databases: 1) The EMBL Nucleotide Sequence Database is now fully cross-referenced to SWISS-PROT. 2) SWISS-PROT and ENZYME are now cross-referenced to MIM (see section 3 of this letter). 3) Cross-references have been added in SWISS-PROT to REBASE, the type II restriction enzymes data base. 4) The new release (9012) of the Drosophila Genetic Maps (DMAP) database from Michael Ashburner (Cambridge / U.K.) is now cross-referenced to EMBL/GenBank, SWISS-PROT and PIR. 5) The new release (2.0) of the Transcription Factors Database (TFD) from David Gosh (NCBI / USA) is now cross-referenced to EMBL/GenBank, SWISS- PROT and PIR. The current status of the relationships between the biomolecular databases is shown in the following schematic: ********************* *********************** <----- * EPD [Promoters] * * EMBL Nucleotide * ********************* ***************** * Sequence Data * * DMAP * ----> * Library * ********************* * [Drosophila * *********************** <----- * ECD [E.coli] * * Genetic maps] * ^ | ^ ********************* ***************** ------- + | | | | | | | ********************* Version: Jan. 10 | | | +---------- * TFD [Trans.fact.] * 1991 | | | | ********************* | | | | ***************** v | v v ********************* * PROSITE * <---- *********************** <----- * ENZYME [Nomencl.] * * [Patterns] * ----> * SWISS-PROT * ********************* ***************** * Protein Sequence * | * Data Bank * | ***************** *********************** v * REBASE * | | | ********************* * [Restriction * <-------+ | +---------> * OMIM [Diseases] * * enzymes] * | ********************* ***************** v *********************** * PDB [3D structures] * *********************** We believe that it is know possible to software developers to start to build hypertext oriented software packages that can navigate between the different biomolecular data banks. ============================================================================= ============================================================================= Section: 5. Title : NCBI, the GenInfo Backbone Database, and the ASN.1 syntax. The National Center for Biotechnology Information (NCBI), at the National Library of Medicine (NLM) (Washington D.C) is involved in the development of a database building system that addresses the problems of integrated information as well as currency and accessibility. One of their projects is the production of an integrated nucleic acid and protein sequence database, which is called the GenInfo Backbone Database ('Backbone'), that accurately reflects the journal literature. The Backbone will include all protein sequences of at least three amino acids and nucleotide sequences of at least nine bases. The annotations provided by the Backbone are minimal; it is meant to reflect the data presented by the scientific literature; but not to model biological reality. The Backbone is a database which will, hopefully, help to build and maintain, fully annotated databases, such as SWISS-PROT, PROSITE or ENZYME. As the Backbone is a database on which to build other databases, the NCBI had to select a reliable data exchange standard to facilitate the exchange of information between biomolecular databases. The standard which has been chosen is called ASN.1 (Abstract Syntax Notation 1), also known as ISO 8824. ASN.1 is specifically designed to allow a formal precise definition of what is exchanged between two applications without specifying how it is to be represented or used by either application. The NCBI is also committed in developing and distributing a software toolbox that will help software and database developers to interact with the ASN.1 notation and the Backbone. As a user of biomolecular database you will probably not have to deal directly with the Backbone or with the ASN.1 format, except if you want to develop a new specialized biomolecular database, but you should be aware of the existence of such projects and of the many positive consequences for the scientific community of such an endeavor, if it is successful. As we believe in the scientific validity and relevance of these projects we have decided to participate. Our participation will at least take two forms: we will provide SWISS-PROT, ENZYME, and PROSITE in the ASN.1 syntax (the existing format will not be discontinued) and we will start to use the Backbone as a source of primary (literature) data for SWISS-PROT. As a first step we have produced an ASN.1 specification for the ENZYME data bank and will soon start to distribute an ASN.1 version of that database (see section 8 of this newsletter). ============================================================================= ============================================================================= Section: 6 Title : PROSITE news. 1) Release 6.0 ============== Release 6.0 of PROSITE contains 375 documentation chapters that describe 433 different patterns. Since release 5.1 77 new chapters have been added and 131 have been updated. Release 6.0 is fully cross-referenced with release 16 of SWISS-PROT. There have been no changes in the format of the files of the data base. 2) Future developments ====================== - Release 6.10 will come out in March 1991 with release 17 of SWISS-PROT, like it was the case for release 5.10, it will not be a "real" update, it will only update pointers to SWISS-PROT for sequence entries whose name have been modified from release 16 to 17. - Release 7.0 will come out with release 18 of SWISS-PROT in early summer 1991. There will be lots of new pattern entries. We can already announce the following ones (as they are either ready or being written): - 6-phosphogluconate dehydrogenase signature - Catalase signatures - Peroxidases signature - Acyltransferases ChoActase / COT / CPT-II family signatures - Chalcone synthase and resveratrol synthase signature - Glutamine amidotransferases class-I active site - Glutamine amidotransferases class-II active site - Polyprenyl synthetases signature - Eukaryotic RNA polymerases 30 to 40 Kd subunits signature - Prokaryotic carbohydrate kinases signature - DNA polymerase family A signature - Clostridium cellulases repeated domain signature - ATP synthase a subunit signature - Aconitase signature - Guanylate cyclases signature - FKBP peptidyl-prolyl cis-trans isomerase signatures - Sodium symporters signatures - Natriuretic peptides receptors signature - PF4/IL-8 cytokines signatures - Myotoxins signature - Pathogenesis-related proteins BetvI family signature We have a large (and growing) lists of new patterns to add. Some of those that are currently in the `pipeline' are listed below. - SH2 and SH3 domains - Animal lectin domain - Bacterial sensory transduction proteins signatures - Alpha-macroglobulin family signature - Clusterins signature - Plants 2S seed storage proteins signature - TNF/NGF receptors family signature - Small heat shock proteins (HSP20). But this is far from being a complete list ! We have not yet received any matrices from any sources so the introduction of matrices in PROSITE is probably not for release 7.0. 3) On-line experts ================== We have added, in the PROSITE documentation file (PROSITE.DOC), the email addresses of experts specific to a specific field.This information is present in the following format: -Expert(s) to contact by email: Name X.Y. name@location.network As you can see from the following table our current list of experts is still very small, so I would like again to call for volunteers (the `requirements' to be fulfilled to become an on-line expert are listed at the end of this section), please don't be shy !!! Field of expertise Name Email address --------------------------- ------------------ -------------------------- Alcohol dehydrogenases Bengt P. bengt@medfys.ki.se Aldehyde dehydrogenases Bengt P. bengt@medfys.ki.se Apolipoproteins Boguski M.S. boguski@ncbi.nlm.nih.gov Arrestins Kolakowski L.F. Jr. lfk@athena.mit.edu Bacteriophage P4 Halling C. chh9@midway.uchicago.edu Beta-lactamases Brannigan J. bafm1@cluster.sussex.ac.uk Chitinases Henrissat B. cermav@frgren81.bitnet CTF/NF-I Mermod N. nmermod@clsuni51.bitnet EF-hand calcium-binding Cox J.A. cox@cgeuge52.bitnet Kretsinger R.H. rhk5i@virginia.bitnet Glucanases Henrissat B. cermav@frgren81.bitnet Beguin P. phycel@pasteur.bitnet Eryf1-type zinc-fingers Boguski M.S. boguski@ncbi.nlm.nih.gov G-protein coupled receptors Chollet A. chollet@clients.switch.ch Inorganic pyrophosphatases Kolakowski L.F. Jr. lfk@athena.mit.edu Integrases Roy P.H. 2020000@lavalvx1.bitnet Protein kinases Hanks S. hanks@vuctrvax Restriction-modification Bickle T. bickle@urz.unibas.ch Roberts R.J. roberts@cshl.org Ring-cleavage dioxygenases Harayama S. harayama@cgecmu51.bitnet Subtilisin family proteases Brannigan J. bafm1@cluster.sussex.ac.uk Thiol proteases Turks B. turk@ijs.ac.mail.yu Thiol proteases inhibitors Turks B. turk@ijs.ac.mail.yu TPR repeats Boguski M.S. boguski@ncbi.nlm.nih.gov Transit peptides von Heijne G. gunnar@cbts.sunet.se Type-II membrane antigens Levy S. levy@cellbio.stanford.edu Requirements to fulfill to become an on-line expert =================================================== An expert should be a scientist working with specific famili(es) of proteins (or specific domains) and which would: a) Review the protein sequences in SWISS-PROT and the patterns/matrices in PROSITE relevant to their field of research. b) Agree to be contacted by people that have obtained new sequence(s) which seem to belong to "their" familie(s) of proteins. c) Have access to electronic mail and be willing to use it to send and receive data. If you are willing to be part of this scheme please contact me (but, please by email exclusively !) ============================================================================= ============================================================================= Section: 7 Title : Updated list of public domain programs which make use of PROSITE. I have been made aware of the development of the following public domain software packages that make use of PROSITE. 1) MacPattern ============= Apple MacIntosh application. Offers features like a pattern list for pattern selection, direct access to documentation in PROSITE, pattern sets, pattern entering by keyboard, etc. It can read SWISS-PROT, PIR, DNA Strider, DNAid, Pearson and plain ASCII sequences. MacPattern can also use any other pattern database adhering to the PROSITE syntax, even DNA patterns. No special hard- or software is required. Contact : Rainer Fuchs fuchs@embl.bitnet Version : 1.1 Available: On the EMBL File Server: MAC_SOFTWARE:MACPATTERN.HQX 2) Scrutineer ============= SCRUTINEER is a sophisticated pattern searching and database analysis program written by Peter Sibbald at EMBL. The program is written in Pascal and comes complete with source, manual and on-line help. SCRUTINEER is described in the following reference: Sibbald P.R., Argos P. Scrutineer: a computer program that flexibly seeks and describes motifs and profiles in protein sequence databases." CABIOS 6:279-288(1990). SCRUTINEER works on VAXes, and apparently can be made to runs on UNIX systems. The November 1990 version of SCRUTINEER add, among other enhancements, the possibility of searching for all of the PROSITE patterns in one or more protein sequences. Contact : Peter Sibbald sibbald@embl.bitnet Version : Nov. 1990 Available: On the EMBL File Server: VAX_SOFTWARE:MACPATTERN.UAA 3) ProSearch ============ A software, written mostly in AWK, that runs under Unix and that will search a protein sequence for all of the PROSITE patterns. Note: it will also run under MS-DOS and VMS if you have access to a public domain or commercial version of AWK on such systems. Contact : Lee F. Kolakowski lfk@athena.mit.edu Version : 1.1 Available: On the EMBL File Server: UNIX_SOFTWARE:PROSEARCH.UUE 4) CREGEX ========= CREGEX creates, from the native PROSITE data bank, the file containing valid AWK regular expressions that can then be used with the ProSearch program. Contact : Jack Leunissen jackl@caos.caos.kun.nl Version : 1.1 Available: On the EMBL File Server: UNIX_SOFTWARE:CREGEX.C 5) PROINDEX =========== VAX-Fortran program to create an index built from the information stored in the DE lines of the PROSITE.DAT file. Contact : Steve Clark clark@utoroci.bitnet or clark@mshri.utoronto.ca Available: On the EMBL File Server: VAX_SOFTWARE:PROINDEX.UUE 6) PROSITEC =========== VAX-Pascal program to convert the PROSITE files into GCG FIND-format. Contact : Kay Hofmann akc01@dk0rrzk1.bitnet Version : 1.1 Available: On the EMBL File Server: VAX_SOFTWARE:PROSITEC.UUE 7) ProDoc ========= VAX program for the GCG package to display documentation entries in the PROSITE.DOC file, given a documentation entry number. Contact : Anne Marie Quinn quinn@salk.bitnet Available: By anonymous ftp on: SALK-SC2.SDSC.EDU 8) BISANCE system ================= A program to interrogate PROSITE is available on-line on the BISANCE system of the French CITI2 biocomputing resource. Contact: Phillipe Dessen dessen@frciti51.bitnet ============================================================================= ============================================================================= Section: 8 Title : ENZYME news. There are few things we want to point out about release 3.0 of ENZYME as well as about future releases. 1) Completeness =============== Currently the data bank contains full information about the recommended name, alternative name(s), catalytic activity, cofactor(s) of ALL 3071 enzymes. The ENZYME data bank can now be considered as fully operational. 2) The DI line ============== As described in section 3 of this letter, a new line type 'DI` (= DIsease) was implemented (starting with release 2.0) so as to add cross-references to MIM (Mendelian Inheritance in Man). The precise format of the DI line is: DI DISEASE_NAME; MIM:NUMBER. Where 'NUMBER' is the MIM catalog number of the disease (or phenotype). Examples: DI XANTHINURIA; MIM:278300. DI PHENYLKETONURIA; MIM:261600. 3) Future releases ================== Until new enzyme nomenclature data is published we only plan to update the SWISS-PROT pointers at each release of the protein sequence data bank, correct eventual errors, and complete the information concerning synonyms and cofactors using the literature. 4) An ASN.1 version of ENZYME ============================= We will soon start to distribute a version of ENZYME in the ASN.1 syntax which has been selected by the NCBI to facilitate the exchange of information between biomolecular databases (see section 5 of this newsletter). We will continue to distribute ENZYME in its current format, but there will be two additional files: ECSPEC.ASN: ENZYME database ASN.1 specification. This file describes the syntax used by the ASN.1 version of the ENZYME data base. ENZYME.ASN: ENZYME database in ASN.1 notation. We will not list here the full ENZYME database ASN.1 specification, but just to give you a "flavor" of ASN.1, an example of an entry in both the original and the ASN.1 format: ID 1.4.3.14 DE L-LYSINE OXIDASE. AN LYSYL OXIDASE. CA L-LYSINE + O(2) + H(2)O = 2-OXO-6-AMINOHEXANOATE + NH(3) + H(2)O(2). CF COPPER; PQQ. CC -!- ALSO ACTS, MORE SLOWLY, ON L-ORNITHINE, L-PHENYLALANINE, L-ARGININE, CC AND L-HISTIDINE. DI CUTIS LAXA (EHLERS-DANLOS SYNDROME IX); MIM:304150. DI LYSINE INTOLERANCE; MIM:247900. DR P16636, LYOX$RAT ; // Is represented in the ASN.1 notation, following the specifications that we have developed for it, by: Enzyme-activity ::= { ecnumb { class 1 , subclass 4 , sub-subclass 3 , serial-numb 14 } , status data { name "L-LYSINE OXIDASE." , synonyms { "LYSYL OXIDASE." } , reaction reac-equa { left { { stoich "1" , compound { chem-name "L-LYSINE" } } , { stoich "1" , compound { chem-name "O(2)" } } , { stoich "1" , compound { chem-name "H(2)O" } } } , right { { stoich "1" , compound { chem-name "2-OXO-6-AMINOHEXANOATE" } } , { stoich "1" , compound { chem-name "NH(3)" } } , { stoich "1" , compound { chem-name "H(2)O(2)" } } } } , cofactors { { chem-name "COPPER" } , { chem-name "PQQ" } } , comments { "ALSO ACTS, MORE SLOWLY, ON L-ORNITHINE, L-PHENYLALANINE, L-ARGININE, AND L-HISTIDINE." } , disease { { disease-name "CUTIS LAXA (EHLERS-DANLOS SYNDROME IX)", MIM-numb 30415 } , { disease-name "LYSINE INTOLERANCE", MIM-numb 24790 } } , x-ref { { db-name "SPROT", ident-1 "P16636", ident-2 "LYOX$RAT" } } } } ============================================================================= ============================================================================= Section: 9 Title : Specialized databases: the P450 database We will use this section to describe specialized biomolecular databases which, in our opinion, are important, yet not very well known. In this first issue we briefly describe: ******************************** * The cytochrome P450 database * ******************************** Produced by the group of Alexander Archakov at the Institute of Biological and medical Chemistry of the USSR Academy of Medical Sciences in Moscow, this database contains a wealth of information on cytochromes P450: names, sequences, genome location, inducers, substrates, etc. The database supplements the book of A.I. Archakov and G.I. Bachmanova: "Cytochrome P-450 and active oxygen", published by Taylor and Francis Ltd in 1990. The database is distributed, for MS/PC-DOS based systems, in two forms: the first one, called DBCPD, runs under dBase III plus, the second one, called RBCPD, runs under Rbase. Both forms are menu-driven and are very easy to use. The group of Archakov can be contacted at the following address: Prof. A.I. Archakov Institute of Biological and Medical Chemistry USSR Academy of Medical Sciences Pogodinskaya str. 10 119838 Moscow USSR Fax: (+7) (095) 938 21 23 (+7) (095) 245 08 57 ============================================================================= ====== End of GePSAN Newsletter Volume 1 - Number 1 =========================