Internet Resources and Methods for Protein Sequence and Structure Studies

Dr. Russell Yamazaki
last updated 09/18/07
DATABASES AND ON-LINE ANALYSES
 
PubMed

http://www.ncbi.nlm.nih.gov/sites/entrez?otool=waynelib

PubMed is the site to use to access biomedical literature. This site is also useful for retrieving nucleotide, protein sequences and structure (PDB) files by text words, gene name or accession number. If you are not connected to the Wayne State University network, you will need to either enter your Wayne AccessID and password or else delete the 'otool=waynelib' portion of the URL.
BLAST

http://www.ncbi.nlm.nih.gov/BLAST/

NIH site for finding nucleotide or protein sequences similar to a query. Click here for more details.
Proteomics tools at Expasy

http://us.expasy.org/tools/

A collection of web sites to determine such things as tryptic maps, pI values, ProSite to identify domains.
Swiss Protein

http://us.expasy.org/sprot/

Annoted database for protein data
PDB

http://www.rcsb.org/pdb/

Protein Data Base 3D structural files. Click here for more information.
DSSP

ftp://ftp.cmbi.kun.nl/pub/molbio/data/dssp/

Database for protein structural information from PDB files that can be used with GeneDoc to highlight secondary structures in multiple sequence alignments. The files are accessed through the PDB file name. This site is slow.
EMBL-EBI

http://www.ebi.ac.uk/services/

EMBL European Bioinformatics Institute services including access to various protein analysis tools
NIH Modeling

http://cmm.info.nih.gov/modeling/tools.html

NIH - links to databases and program sites
Swiss Model

http://www.expasy.org/swissmod/

Server for modeling 3D structures for proteins without a PDB file. Used in conjunction with Swiss PDB Viewer program (below).
SOFTWARE
 
Wayne State download site

http://computing.wayne.edu/software/index.php

WSU site for downloading free versions of IE browser, email programs, anti-virus software, bibliographic programs (Mac and Windows)
EBI software site

http://www.ebi.ac.uk/FTP/

European Bioinformatics Institute catalog of software programs
Antheprot

http://antheprot-pbil.ibcp.fr

program to run ProSite, titrations, cleavages, etc on a Windows machine
Chimera

http://www.rbvi.ucsf.edu/chimera/
program to view msf files, secondary structural features and protein 3D structures (Windows and Mac)
Clustal X

http://bips.u-strasbg.fr/fr/Documentation/ClustalX/

program (both Mac and Windows) for carrying out multiple sequence alignments; with Help files. Click here for further information. This is the program that I use for alignments.
Note added 9/16/07 - The Mac version of Clustal X from this site seems to be faulty when using OSX 10.4 Tiger. Use the EBI software site above to get to the ClustalW2 folder and get the Clustalx-mac-univeral-2.0.gz file.
GeneDoc

http://www.nrbsc.org/downloads/

GeneDoc for viewing msf sequence alignment files (Windows only). Click here for further information.
Jalview

http://www.jalview.org/
Jalview is a Java applet that allows you to view msf, tree and structural files.
SeaView

http://pbil.univ-lyon1.fr/software/seaview.html

program to display msf files on both Windows and Macs.
Treeview

http://taxonomy.zoology.gla.ac.uk/rod/treeview.html

program to display phylogenetic trees showing relationships between organisms (Windows and Mac)
RasMol and Protein Explorer

http://www.umass.edu/microbio/rasmol/

3D viewing program for protein structures (Windows), Protein Explorer (works with a browser on Mac and Windows) and Chime. Click here for further information.
Swiss PDB Viewer

http://us.expasy.org/spdbv/

program (Mac and Windows) for viewing 3D protein structures (PDB files), carrying out "magic fits" superimposing 3D structures and preparing data for Swiss Model submission
USEFUL LINKS AND SEARCH ENGINES
 
Google

http://www.google.com/

Internet www search engine
Garrett Morris page

http://www.scripps.edu/pub/olson-web/people/gmm/

Home page of Garrett Morris at Scripps with links to many programs and resources for 3D protein structures


BLAST (Basic Local Alignment Search Tool). Use this to find nucleotide or protein sequences similar to your own. The program uses a form for entry of data. Your own sequence must have been previously loaded into the clipboard by using a Windows program such as MS Word, highlighting the sequence text with the cursor, and clicking EDIT COPY in the menu bar or by holding down the Control key and pressing the C key. The text is then pasted into the form by clicking in the box and then clicking EDIT PASTE (or holding down the Control key and pressing the V key). The sequence (nucleotide or protein) must be in FASTA format which means that the first line of text must start with the character > followed by identification information (Example >My favorite protein ) and the second and remaining lines must contain only sequence text with no numbers (Example MSKTAQKRLLKEL and so on to the end). You may have to do some editing of the text in a word processor before copying it to the clipboard. For protein sequences, the Program box must be set to blastp. 
You can view tutorials on BLAST at
http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/information3.html
 

Protein Data Base (PDB). This is the primary site for the atomic coordinates for macromolecules (proteins, nucleic acids). Use the 3D Browser to find your molecule of interest. Use the save as function in the top FILE menu to save the file on your hard drive or floppy disc. You will probably want to use a name meaningful to you such as UBC7.PDB rather than 2UCZ.PDB. You can also get to PDB files through the Entrez browser.
 

USING PROTEIN VIEWERS SUCH AS RasMol TO VIEW PROTEIN STRUCTURES.

  1. You need the atomic coordinates file for the structure. As an example, we will use the structure for the ubiquitin-conjugating protein Ubc4. 
  2. Start up IE and go to the Entrez Browser http://www.ncbi.nlm.nih.gov/entrez/query.fcgi
  3. Click on the Search box to change from "PubMed" to "structure"
  4. Enter ubc4 in the box and click on the Go button. 
  5. Select the file name 1QCQ next to PDB. 
  6. 1QCQis the Protein Data Base file. Click on 1QCQ to get to the Protein Data Base.
  7. Click on View structure and then choose RasMol.
  8. You can save the file to your disc (use the file extension .pdb) or view it directly if your version of IE is configured to use RasMol as helper applications. 
  9. If you have saved the file, start up RasMol and OPEN the file. 
  10. You must click on the RasMol Command icon at the bottom of the screen to be able to type in commands. You will need access to the HELP file for RasMol for the commands and their usage. You will probably want to resize the Command window to be able to see the structure and Command windows simultaneously.