|
The publication of the human genome in both Science
and Nature on February 12, 2001 marks a historic point in our
understanding of the molecular basis of life. The pages of those issues,
available free online, contain all the information necessary to encode
a human being. According to the central
dogma (DNA makes RNA which makes Protein), all of our proteins are
determined by the sequence of nucleotides that is now known with 99.6%
accuracy. And proteins are the targets of nearly all traditional drugs—including
all ten of the top-selling prescription drugs. Yet, nearly a year after
the first announcement of the completion of the human genome, we do not
seem any closer to developing drugs based on that profound information.
Genomics has provided spectacular amounts of data, but most of it remains
uninterpretable at our current level of understanding. In some ways, genomics
raises more questions than it answers. The emerging field of proteomics
promises to answer some of those questions by systematically studying
all of the proteins encoded by the genome.
The term proteomics encompasses a broad set of
disciplines aimed at understanding and monitoring proteins. This includes
work correlating genetic sequence with three-dimensional protein structure
and 3D structure with protein function, development of protein separation
and protein profiling techniques, and investigation of protein-protein
interactions. Proteomics promises to discover unanticipated targets for
drug design by determining the function of thousands of unidentified proteins
expected to be found in the human genome. Proteomics will also provide
experimental data to improve computer-modeling programs that predict protein
structure from DNA sequence; it will help us learn to interpret the information
contained in the genome. A breakdown of the field and companies working
in it is presented below.
Protein Profiling and Separation Techniques:
Analysis of mRNA transcripts on so-called "gene chips" has
provided dynamic information regarding which genes are expressed in cells
under a given set of experimental conditions, yielding clues as to which
proteins are involved in certain pathways and disease states. However,
differences in the half-lives of RNA and proteins, as well as post-translational
modifications important to protein function prevent mRNA profiles from
being perfectly correlated to the cells’ actual protein profiles. The
potential for direct protein expression profiling—in which the
proteins from one cell population are compared to those of another cell
population to identify disparate expression patterns at the protein level—would
give new means of identifying disease states. It could also help pharmaceutical
companies to identify individuals that would most benefit from certain
treatments, as well as identify those individuals most likely to experience
unwanted side-effects. In order to perform such analyses, efficient, reproducible
protein separation techniques must be developed.
| The classic —and still most common—
technique for protein separation is two- dimensional polyacrylamide
gel electrophoresis (2D PAGE). Proteins are separated in one
dimension according to their size, and in the second dimension according
to their charge (more precisely, their isoelectric point, pI). After
separation, the gel is stained so that protein spots can be visualized.
A 2D gel is shown to the right : |

(www.incyte.com) |
Spots are then cut out from the gel and the proteins
are digested into short peptide fragments and analyzed by mass spectrometry
(MS). The characteristic mass spec profile obtained is used to determine
the amino acid sequence of the protein. Amino acid sequence information
can then be linked to DNA sequence information using bioinformatics software.
The 2-D gel patterns and specific spots can be used as a profile to generate
expression patterns for analysis as described above. See the diagram
below:

(www.incyte.com)
Oxford GlycoSciences
(OGS) is a self-described "data factory" with industrial
scale throughput capacity for the screening of proteins by 2D PAGE. Their
business plan is to discover and patent protein drug targets, and build
a pipeline of proprietary small molecule and antibody drug and diagnostic
products based on those targets. OGS has developed collaborations with
ten companies and research institutions, including Pfizer, Merck, Bayer,
Medarex, Pioneer Hi-Bred/Dupont, and Oxford University.
Large Scale Biology/Large
Scale Proteomics (LSP) was founded sixteen years ago, and is thus
a pioneer in the field of 2D PAGE. They have developed a proprietary technology
platform capable of one million protein analyses per week with excellent
resolution, quantitation, and reproducibility. LSP collaborates on specific
proteomics projects in pharmaceutical research and development with companies
such as Glaxo Wellcome PLC, Procter & Gamble Co. and Novartis AG,
Genentech, Inc., Gemini Genomics PLC, and BioSite Diagnostics, Inc.
A number of problems remain for 2D PAGE as a protein
separation and expression profiling technique. 2D PAGE requires significant
pre-separation processing of cell extracts, many of which must be optimized
for the cell type being analyzed. It does not work for hydrophobic proteins
(such as transmembrane receptors), since they cannot be incorporated into
the aqueous buffer system in which the gels are run. Also, proteins present
at low concentrations in the cell are generally not detected on the gels.
Furthermore, each spot on a 2D gel is likely to represent more than one
protein, since the 30,000+ proteins present in a cell at any given time
cannot be completely resolved on a single gel. Perhaps most damagingly,
the reproducibility of 2D PAGE experiments is generally poor. This has
prevented 2D gels from becoming the equivalent of a "gene chip"
for protein profiling experiments, since it is difficult to compare data
from multiple experiments.
Protein Chips:
Several companies are developing protein chip separation technologies
that they hope will rival 2D PAGE as the method of choice for proteome
analysis.
| Ciphergen
produces ProteinChip Arrays with spots containing either a chemical
(ionic, hydrophobic, hydrophilic, etc.) or biochemical (antibody,
receptor, DNA, etc.) surface designed to capture proteins of interest.
Crude protein sample is washed over the chip surface, and analyzed
by surface enhanced laser desorption/ionization and mass spectrometry
(SELDI -MS). The ProteinChip technology improves the speed and reproducibility
of protein separations relative to 2D PAGE, and is currently being
used to identify protein biomarkers of diseases such as Alzheimer’s
and ovarian cancer. |
 |
Pierce Milwaukee,
Inc (a division of Pierce Chemical) has a patent-pending on a
method for high-throughput, sub-milligram capacity protein purification
in a microwell filter plate. This allows separations of small samples
to be achieved using traditional chromatography media.
One promising concept for protein separation is to develop a monoclonal
antibody (mAb) for each protein in a cell, and then pattern these
mAbs onto different spots on a "protein chip." mAbs are an ideal
recognition element for protein profiling, since they each bind strongly
and specifically to a single protein. They even differentiate between
copies of the same protein carrying a different post-translational modification
(phosphorylation, glycosylation, ubiquitination, etc.). These modifications
can make a big difference in the function of a protein, but the physical
changes they impart are often too subtle for 2D PAGE and other techniques
to identify. Furthermore, mAbs can be generated with practically infinite
variety by shuffling the DNA sequences that encode them—the human body
is capable of generating over 100,000,000 different antibodies through
gene shuffling!
Companies pursuing mAbs for use on protein chips
must overcome several problems, however. First, they must be able to generate
mAbs (or antibody mimics—proteins that specifically bind other proteins,
but aren’t based on the DNA sequence of a natural antibody) against a
given protein and then identify the DNA sequence that encodes the best
antibody. Identification is achieved by use of a tagging technique
that allows researchers to quickly recognize the antibody they select.
Since most tagging techniques are patented, this becomes a major issue
for any company wishing to develop a mAb chip. Another problem with the
use of mAbs for protein separation is that a substantial amount of pure
protein is required to develop and select the mAb specific to that protein.
This effectively prevents the use of mAbs for discovery of new proteins,
and makes it a difficult technology to use for detecting rare proteins.
Nonetheless, mAb chips will be quite useful for protein profiling techniques
that seek to compare the expression levels of important known proteins.
| Dyax
is developing phage display techniques to produce and select monoclonal
antibodies (mAbs) via in vitro high-throughput screens. Phage
display is a tagging technique that works by attaching the DNA sequence
of an antibody to the sequence encoding a coat protein of a bacteriophage
virus (phage) so that when the phage grows it presents the antibody
on its surface. A library of phage-displaying mAbs can be washed over
a column containing the protein of interest, and those phage that
bind can be isolated and reproduced, creating a large sample of useful
antibodies. To further improve binding, the DNA sequence encoding
the antibody can be mutated at a few sites and the new set of phage
produced can again be selected to identify those mAbs that bind most
strongly to the protein of interest. |

This picture illustrates the
phage (on the left) with the monoclonal antibodies displayed on
its outer surface (on the right). Each phage presents a unique
antibody, together creating a pool of millions of antibodies.
|
Biovation
makes recombinant human antibodies bearing a unique barcode preceded by
a protease cleavage site, for eventual use in protein microarray technology.
These antibodies are exposed to a target protein, and those that bind
are treated with protease to release their barcodes, which are then sequenced
using mass spectrometry. The use of mass spectrometry solves the tagging
problem without the need for phage display, and thus avoids the technical
and patent issues surrounding display technologies.
Phylos
uses their proprietary "PROfusion" technology to directly link
a protein to the mRNA that encodes it. This technology is used to create
a library of proteins—antibody mimics—that bind cellular proteins with
high affinity. The antibody mimics are immobilized onto a solid substrate
to create Phylos’s "HIP" chip, a protein array that can be used
for protein separation or protein-protein interaction assays. See
diagrams below:
| 
Phylos' technique of linking the mRNA to its protein
|

Phylos' "HIP" chip is an array of PROfused molecules |
Biacore’s
surface plasmon resonance (SPR) systems detect mass changes on a surface,
enabling sensitive detection of protein binding without the need for labels
or dyes. Biacore now manufactures SPR instruments that can monitor binding
on 100x100 arrays that can be used for analysis of protein expression
profiles.
Bioinformatics:
These companies are most closely linked to genomics efforts. Their goal
is to provide data-mining and warehousing capabilities to allow the prediction
of protein structure and function based on DNA sequence. Predictions are
made by comparing novel protein sequences to sequences for which the protein
structure and function are known. This is feasible because proteins can
be grouped into families that share similar function, regardless of the
species in which the protein is found. It should be noted, however, that
the relationships among proteins can be complex: relatively different
DNA sequences can lead to similar structural motifs, while some functionally
similar proteins have quite different structures, and vice versa.
As the field of bioinformatics matures, it will improve researchers’ ability
to identify related proteins, and will extend comparisons to include common
structural motifs in disparate proteins.
Currently, much of the effort in bioinformatics revolves around developing
software to improve the reliability—and thus utility—of 2D PAGE data.
The Swiss Institute
of Bioinformatics and the European Bioinformatics Institute are
collaborating to develop ExPASy (Expert Protein Analysis
System), which runs various search capabilities for comparing novel
proteins or sequences to known proteins. Their Human Proteome Initiative
(HPI) aims to fully annotate all known human sequences. They will provide
extensive information on every known protein including a description of
its function, domain structure, subcellular location, post-translational
modifications, variants, and similarities to other proteins.
| Compugen’s
patented Z3 software technology allows alignment of different 2D gels
to allow data comparison despite the poor reproducibility of 2D PAGE.
The software includes spot detection capabilities.
The figure to the right shows two superimposed gels from a differential
protein expression experiment. The yellow circles identify proteins
that are differentially expressed in the two samples; the pink circle
identifies a protein that is only expressed in one of the samples.
|

|
Incyte’s
LifeExpress protein expression database is unique in its correlation of
RNA and protein expression information. In collaboration with Oxford GlycoSciences,
Incyte developed a 2D gel technique that they consider sufficiently reproducible
for inter-gel comparisons. Their LifeExpress database allows the identification
of proteins based on either their location on the gel, or on mass spec
data obtained from spots on the gel. These proteins or amino acid sequences
are then compared to ESTs found in Incyte’s LifeSeq human gene sequence
database, allowing the user to move quickly between proteomics and genomics
data.
GeneData’s
protein analysis software, GeneData Impressionist, supports 2D gel data
as well as data generated by various other technologies. The pattern-extraction
software features a variety of statistical algorithms and interactive
graphical tools that enable the user to perform quality control and analyze
large, complex experimental series.
High-Throughput Protein Production and Structure
Determination:
In order to study proteins that have been identified as important targets
by separation and profiling efforts, large pure quantities of those proteins
must be obtained. Ideally, an X-ray crystal structure will also be determined
for every new protein discovered. This structural information is crucial
for rational drug design efforts, which rely on it for design of small
molecules that bind to proteins.

Structure diagram of the viral HN protein
(celera.com/genomics/news/articles/11_00)
Harvard
Institute of Proteomics Research is developing the FLEX (Full
Length Expression) Repository to provide researchers with
the complete set of known genes and open reading frames (ORFs) in a robot-accessible
array of cDNA clones. Their system obviates the need to design a unique
cloning method for each protein by providing simple in-frame shuttling
of each gene into any of a variety of expression vectors. With such a
system, high-throughput parallel screens of proteins for the creation
of protein microarrays and the facilitation of structural determinations
can be designed.
Vertex Pharmaceuticals
uses a combination of NMR and X-ray crystallography to solve protein structures.
They also group proteins into families that share similar active site
structures, allowing them to speed both structure determination and lead
compound identification.
MediChem
integrates protein expression, crystallization, and 3D structural determination
technologies with work in computational chemistry and crystallization
databasing.
Integrative
Proteomics specializes in high throughput purification of proteins,
as well as structure and function determination.
Protein Analysis for Drug Development:
These companies work most closely with pharmaceutical developers, providing
valuable information regarding how proteins interact with each other.
They provide insight into biological pathways, allowing target validation
and discovery of novel drug targets.
Hybrigenics
develops Protein-Protein Interaction Maps (PIMs) using two-hybrid interaction
assays. They currently have complete protein-protein interaction maps
and associated proprietary selected drug targets and small molecule lead
compounds for Helicobacter pylori, Hepatitis C virus, and Saccharomyces
cerevisiae, all of which are available for licensing. Hybrigenics
will also enter into strategic alliances for drug discovery, covering
all the steps from establishment of a protein-protein interaction map,
selection or validation of drug targets, identification of lead compounds,
and biological validation. ("PIM" and "PIMs" are registered
trademarks and trademarks of Hybrigenics)
CuraGen Corporation
uses their PathCalling technology, which combines two-hybrid assays with
bioinformatics software, to identify protein-protein interactions and
biological pathways. Their substantial use of bioinformatics allows comparison
of protein interactions in different species, as well as the expansion
of pathways by incorporating information on second-degree interactions.
Myriad Genetics
also uses a high-throughput two-hybrid system for identification of protein-protein
interactions. They have also developed a database of these interactions
that combines their proprietary information with published interactions.
Molecular Simulations,
Inc. predicts the structure and function of novel protein targets
using their Target Explorer software. This technology is designed to assign
function to protein sequences and deliver these potential targets to specialized
protein simulation and engineering tools. It allows users to identify
key targets directly from DNA sequence information.
Areas for Improvement:
Proteomics efforts are currently hampered by the
fact that the field’s main techniques, including 2D PAGE and X-ray crystallography,
require highly skilled operators. Means of improving and automating these
processes must be developed in order for the field to progress rapidly.
Databases containing current information on protein
sequence, structure, and function are just becoming available. Techniques
for linking and interpreting the vast amounts of data produced by genomics
and proteomics efforts are still far from optimized.
As the field of proteomics matures, it will become
possible to design more sophisticated experiments to determine protein
function, and to investigate how proteins fit into complex biological
pathways. Proteomics will multiply the number of known drug targets 100-fold,
putting pressure on the pharmaceutical industry to capitalize on that
new information. Protein profiling will also make advanced diagnostics
a possibility, but it will be a challenge for the medical field to navigate
the expanse of new data available to it.
|