Posts

Showing posts from December, 2025

Nucleic Acid databases

 Nucleotide Sequence Databases  Nucleotide Sequence Databases are data repositories that accept nucleic acid sequence data and make it freely available to public. The data in these repositories are heterogenous with respect to the source of material, quality, annotation and intended completeness of sequence relative to its biological target.  Nucleotide Sequence Databases are of 2 types 1) Primary Sequence Databases- Genbank, EMBL, DDBJ, TrEMBL  2) Secondary Sequence Databases-Swiss Prot, Prosite, PDB International  Nucleotide Sequence Database Collaboration consist mainly 3 databases; Genbank, EMBL, DDBJ. These 3 databases exchange and update data on a daily basis to achieve optimal synchronization. Genbank GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences .It is a primary database of nucleotide.  GenBank  is accessed and searched through  Entrez gateway at NCBI. User can ...

SWISS-PROT

 SWISS-PROT SWISS-PROT is a secondary databases which provides detailed sequence annotation that includes structure, function, and protein family assignment. It was established in 1986.It is maintained collaboratively by SIB (Swiss Institute of Bioinformatics) and EBI/EMBL. Provides high-level annotations, including description of protein function, structure of protein domains, post-translational modifications, variants, etc. It aims to be minimally redundant. Swiss-Prot is linked to many other resources, including other sequence databases. The sequence data are mainly derived from TrEMBL, a database of translated nucleic acid sequences stored in the EMBL database. The annotation of each entry is carefully curated by human experts and thus is of good quality. The protein annotation includes function, domain structure, catalytic sites, cofactor binding, posttranslational modification, metabolic pathway information, disease association, and similarity with other sequences.

Multiple Sequence Alignment using CLUSTAL W

  Multiple Sequence Alignment using CLUSTAL W Aim To show phylogenetic relationships of sequences by creating tree. Description Multiple sequence alignment is simply an alignment that contains more than two sequences. Multiple sequence alignment is very important for finding similar domains in a set of sequences and further doing phylogenetic analysis. There are two methods of multiple sequence alignment; progressive and iterative. CLUSTAL W is an example of progressive method. It produces multiple sequence alignment of divergent sequences. Evolutionary relationships are shown through cladogram.   Procedure STEP 1: Obtain sequence from NCBI for multiple sequence alignment. Go to NCBI homepage, select nucleotide/protein database and type the query. Select the hit in FASTA format for similarity search. STEP 2: Select BLAST-n option from NCBI -BLAST STEP 3: Run BLAST. STEP 4: Select three or four sequence similar to query and download it in FASTA format. STE...

RASMOL/RASWIN

  RASMOL/RASWIN Show information and Background Aim To display information about the protein selected and to change the background color. Description RasMOL is a computer program written for molecular graphics visualization intended and used mainly to depict and explore biological macromolecule structure, such as those found in the protein data bank . It was originally developed by Ronger Dayle in the early 1990s.RasMOL includes a scripting language, to perform many functions such as selecting certain protein chains, changing colours etc. Jmol Sirus software have incorporated this language into their commands.   Procedure STEP 1: Open RasMol STEP 2: Open a new PDB file of protein Command RasMol.>Show information RasMol.>background white Output(take print) Result Information about the selected protein was displayed and background color was changed to white.               Show Sequenc...

SNP

  SNP Aim To retrieve single nucleotide polymorphism (SNP) of the given. Description Single nucleotide polymorphism frequently called SNPs, are the most common type of genetic variation among people. It is a variation in a single nucleotide that occurs at a specific positon in the genome .SNPs occur normally throughout a person’s DNA. They can act as a biological markers, helping scientists   to locate genes that are associated with disease. Procedure STEP 1: Open   https://www.ncbi.nlm.nih.gov/snp/ STEP 2: Select SNP from the dropdown list and type gene name in the search box and click on go. STEP 3: Select three hits from the displayed gene SNPs . STEP 4: Note down the accession number, chromosomal number, allele number and clinical significance. STEP 5: Save the page STEP 6: Close the window Result

KEGG

  KEGG Aim To retrieve the “cysteine metabolism “ from Oryza savita Description KEGG (Koyoto Encyclopedia of Genes and Genomes ) is a database resource for understanding high level functions and utilities of the biological system, such as the cell, the organism and the ecosystem, from genomic and molecular level information. The most unique data object in KEGG is the molecular networks – molecular interaction, reaction and relation and relation networks representing systemic functions    of the cell and the organism .The KEGG database has been in development by Kanehisa Laboratories since 1995,and is now a prominent reference knowledge base for integration and interpretation of large –scale molecular sets generated by genome sequencing and other high-throughput experimental technologies. Procedure STEP 1: Access   https://www.genome.jp/kegg/ STEP 2: Select KEGG pathway STEP 3: Enter organism name and “cysteine metabolism” as keyword STEP 4: Click g...

PIR

  PIR Aim To retrieve aminoacid sequence for heat shock protein HSP70 in tomato. Description The protein information resource(PIR),located at Georgetown University Medical Center (GUMC), is an integrated public bioinformatics resource to support genomic   and proteomic research and scientific studies. PIR was established in 1984 by the National Biomedical Research Foundation(NBRF) as a resource to assist researchers and consumers in the identification and interpretation of protein sequence information. Prior to that ,the NBRF compiled the first comprehensive collection of macromolecular sequences n the Atlas of protein sequence and structure, published from 1964-1974,under the editorship of Margaret Dayhoff. Dr. Dayhoff and her research group pioneered in the development of computer methods for the comparison of protein sequences, for the detection of distantly related sequences and duplications within sequences and for the inference of evolutionary histories from al...

CATH

  CATH Aim To retrieve the structural neighbourhood of given protein Description CATH classifies protein based on the automates structural alignment program SSAP as well as manual comparison. Structural domain seperations is carried out also as a combined effect of a human expert and computer programs. Individual domain structures are classified at five major level, class, architecture, fold/topology, homologous superfamily and homologous family. The definition for class is CATH similar to SCOP and is based on secondary structure content. Architecture is a unique level in CATH, intermediate between fold and class. It describes overall packing and arrangement of secondary structure. The topology level is equivalent to the fold level in SCOP,which describes overall orientation of the secondary structure and take into account the sequence connectively between secondary element. The homologous superfamily and homologous family are equivalent to the superfamily and family leve...

SCOP

SCOP Aim To learn about SCOP database. Description SCOP2 is a successor to the Structural Classification of Protein(SCOP) database. Similarly, to SCOP, the main focus of SCOP2 is to organize structurally characterized proteins according to their structural and evolutionary relationships. The main focus of SCOP2 is on knowledge based expert analysis and classification of proteins that are structurally characterized and deposited in the Protein Data Bank .The relationships of SCOP2 fall into four major categories: Protein types, Evolutionary events, Structural classes and Protein relationships. The first two categories do not have counter parts in SCOP.   Procedure STEP 1: Access https://scop2.mrc-lmb.cam.ac.uk/ STEP 2: Type the protein of interest(kinase) and click search STEP 3: Note down the fold, superfamily and family. STEP 4: Select any one of the hit from the family. STEP 5: Select any one of the hit from selected family and note down domain, species and ...

PDB

PDB Aim To retrieve structure information for a protein sequence Description PDB is a worldwide central repository of structural information of biological macromolecules and is currently managed by the Research Collaboratory for Structural Bioinformatics (RCSB). In addition, the PDB website provides a number of services for structure submission and data searching and retrieval. PDB is one centralized database for three-dimensional structures of biological macromolecules. This database archives atomic coordinates of macromolecules (both proteins and nucleic acids) determined by x-ray crystallography and NMR. It uses a flat file format to represent protein name, authors, experimental details, secondary structure, cofactors, and atomic coordinates. The web interface of PDB also provides viewing tools for simple image manipulation. The coordinate information is required to be deposited in the Protein Data Bank (PDB, www.rcsb.org/pdb/) as a condition of publication of a journal pape...

DNA Data Bank of Japan (DDBJ)

  DNA Data Bank of Japan  ( DDBJ ) Aim To retrieve information for a given organism from DDBJ Description DDBJ is a Primary nucleotide sequence database in Japan . The  DNA Data Bank of Japan  ( DDBJ ) is a  biological database  that collects DNA sequences. It is located at the  National Institute of Genetics  (NIG) in the  Shizuoka prefecture  of Japan. It is also a member of the  International Nucleotide Sequence Database Collaboration  or  INSDC . It exchanges its data with  European Molecular Biology Laboratory  at the  European Bioinformatics Institute  and with  GenBank  at the  National Center for Biotechnology Information  on a daily basis.   Presently, sequence submission to either GenBank, EMBL, or DDBJ is a precondition for publication in most scientific journals to ensure the fundamental molecular data to be made freely available.  Procedure ST...

EMBL (The European Molecular Biology Laboratory)

  EMBL ( The European Molecular Biology Laboratory) Aim To retrieve information for a given nucleotide sequence of given disease from EMBL Description The European Molecular Biology Laboratory (EMBL) is a nucleotide sequence database maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank (USA). Data is exchanged amongst the collaborative databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. EMBL is a primary nucleotide sequence database in Europe. Network services allow free access to the most up-to-date data collection via Internet and WWW interfaces. EBI's Sequence Retrieval System (SRS) is a network browser for databanks in molecular biology, integrating and linking the main nucleotide and protein databases plus many specialised databases. For sequence similarity searching a variety of tools (e.g., BLITZ, FASTA,...

Genbank

  Genbank Aim To retrieve information for a given organism from Genbank Description GenBank is the NIH genetic sequence database, an annotated collection of all publicly available DNA sequences .It is a primary database of nucleotide. GenBank is part of the  International Nucleotide Sequence Database Collaboration , which comprises the DNA Data Bank of Japan (DDBJ), the European Nucleotide Archive (ENA), and GenBank at NCBI. These three organizations exchange data on a daily basis. A GenBank release occurs every two months and is available from the  ftp site . The  release notes  for the current version of GenBank provide detailed information about the release and notifications of upcoming changes to GenBank. Release notes for  previous GenBank releases  are also available. Here information starts with the line containing the word “LOCUS”. Procedure STEP 1: Open   https://www.ncbi.nlm.nih.gov/genbank/ STEP 2: Select Genban...