HomeAboutEventsIndustryCareerNews
Join
Contact
Sitemap
Search


Most Recent Technology Reviews

Drug Delivery

Gene Therapy

Pharmacogenomics

Proteomics

Stem Cells

Tissue Engineering

Proteomics

by Jennifer Phend
phend@fas.harvard.edu

The publication of the human genome in both Science and Nature on February 12, 2001 marks a historic point in our understanding of the molecular basis of life. The pages of those issues, available free online, contain all the information necessary to encode a human being. According to the central dogma (DNA makes RNA which makes Protein), all of our proteins are determined by the sequence of nucleotides that is now known with 99.6% accuracy. And proteins are the targets of nearly all traditional drugs—including all ten of the top-selling prescription drugs. Yet, nearly a year after the first announcement of the completion of the human genome, we do not seem any closer to developing drugs based on that profound information. Genomics has provided spectacular amounts of data, but most of it remains uninterpretable at our current level of understanding. In some ways, genomics raises more questions than it answers. The emerging field of proteomics promises to answer some of those questions by systematically studying all of the proteins encoded by the genome.

The term proteomics encompasses a broad set of disciplines aimed at understanding and monitoring proteins. This includes work correlating genetic sequence with three-dimensional protein structure and 3D structure with protein function, development of protein separation and protein profiling techniques, and investigation of protein-protein interactions. Proteomics promises to discover unanticipated targets for drug design by determining the function of thousands of unidentified proteins expected to be found in the human genome. Proteomics will also provide experimental data to improve computer-modeling programs that predict protein structure from DNA sequence; it will help us learn to interpret the information contained in the genome. A breakdown of the field and companies working in it is presented below.

 

Protein Profiling and Separation Techniques:

Analysis of mRNA transcripts on so-called "gene chips" has provided dynamic information regarding which genes are expressed in cells under a given set of experimental conditions, yielding clues as to which proteins are involved in certain pathways and disease states. However, differences in the half-lives of RNA and proteins, as well as post-translational modifications important to protein function prevent mRNA profiles from being perfectly correlated to the cells’ actual protein profiles. The potential for direct protein expression profiling—in which the proteins from one cell population are compared to those of another cell population to identify disparate expression patterns at the protein level—would give new means of identifying disease states. It could also help pharmaceutical companies to identify individuals that would most benefit from certain treatments, as well as identify those individuals most likely to experience unwanted side-effects. In order to perform such analyses, efficient, reproducible protein separation techniques must be developed.

The classic —and still most common— technique for protein separation is two- dimensional polyacrylamide gel electrophoresis (2D PAGE). Proteins are separated in one dimension according to their size, and in the second dimension according to their charge (more precisely, their isoelectric point, pI). After separation, the gel is stained so that protein spots can be visualized. 

A 2D gel is shown to the right  :


(www.incyte.com)

Spots are then cut out from the gel and the proteins are digested into short peptide fragments and analyzed by mass spectrometry (MS). The characteristic mass spec profile obtained is used to determine the amino acid sequence of the protein. Amino acid sequence information can then be linked to DNA sequence information using bioinformatics software. The 2-D gel patterns and specific spots can be used as a profile to generate expression patterns for analysis as described above.  See the diagram below:  


(www.incyte.com)

 

Oxford GlycoSciences (OGS) is a self-described "data factory" with industrial scale throughput capacity for the screening of proteins by 2D PAGE. Their business plan is to discover and patent protein drug targets, and build a pipeline of proprietary small molecule and antibody drug and diagnostic products based on those targets. OGS has developed collaborations with ten companies and research institutions, including Pfizer, Merck, Bayer, Medarex, Pioneer Hi-Bred/Dupont, and Oxford University.

Large Scale Biology/Large Scale Proteomics (LSP) was founded sixteen years ago, and is thus a pioneer in the field of 2D PAGE. They have developed a proprietary technology platform capable of one million protein analyses per week with excellent resolution, quantitation, and reproducibility. LSP collaborates on specific proteomics projects in pharmaceutical research and development with companies such as Glaxo Wellcome PLC, Procter & Gamble Co. and Novartis AG, Genentech, Inc., Gemini Genomics PLC, and BioSite Diagnostics, Inc.

A number of problems remain for 2D PAGE as a protein separation and expression profiling technique. 2D PAGE requires significant pre-separation processing of cell extracts, many of which must be optimized for the cell type being analyzed. It does not work for hydrophobic proteins (such as transmembrane receptors), since they cannot be incorporated into the aqueous buffer system in which the gels are run. Also, proteins present at low concentrations in the cell are generally not detected on the gels. Furthermore, each spot on a 2D gel is likely to represent more than one protein, since the 30,000+ proteins present in a cell at any given time cannot be completely resolved on a single gel. Perhaps most damagingly, the reproducibility of 2D PAGE experiments is generally poor. This has prevented 2D gels from becoming the equivalent of a "gene chip" for protein profiling experiments, since it is difficult to compare data from multiple experiments.

 

Protein Chips:

            Several companies are developing protein chip separation technologies that they hope will rival 2D PAGE as the method of choice for proteome analysis.

Ciphergen produces ProteinChip Arrays with spots containing either a chemical (ionic, hydrophobic, hydrophilic, etc.) or biochemical (antibody, receptor, DNA, etc.) surface designed to capture proteins of interest. Crude protein sample is washed over the chip surface, and analyzed by surface enhanced laser desorption/ionization and mass spectrometry (SELDI -MS). The ProteinChip technology improves the speed and reproducibility of protein separations relative to 2D PAGE, and is currently being used to identify protein biomarkers of diseases such as Alzheimer’s and ovarian cancer.

Pierce Milwaukee, Inc (a division of Pierce Chemical) has a patent-pending on a method for high-throughput, sub-milligram capacity protein purification in a microwell filter plate. This allows separations of small samples to be achieved using traditional chromatography media.

 

           One promising concept for protein separation is to develop a monoclonal antibody (mAb) for each protein in a cell, and then pattern these mAbs onto different spots on a "protein chip." mAbs are an ideal recognition element for protein profiling, since they each bind strongly and specifically to a single protein. They even differentiate between copies of the same protein carrying a different post-translational modification (phosphorylation, glycosylation, ubiquitination, etc.). These modifications can make a big difference in the function of a protein, but the physical changes they impart are often too subtle for 2D PAGE and other techniques to identify. Furthermore, mAbs can be generated with practically infinite variety by shuffling the DNA sequences that encode them—the human body is capable of generating over 100,000,000 different antibodies through gene shuffling!

Companies pursuing mAbs for use on protein chips must overcome several problems, however. First, they must be able to generate mAbs (or antibody mimics—proteins that specifically bind other proteins, but aren’t based on the DNA sequence of a natural antibody) against a given protein and then identify the DNA sequence that encodes the best antibody. Identification is achieved by use of a tagging technique that allows researchers to quickly recognize the antibody they select. Since most tagging techniques are patented, this becomes a major issue for any company wishing to develop a mAb chip. Another problem with the use of mAbs for protein separation is that a substantial amount of pure protein is required to develop and select the mAb specific to that protein. This effectively prevents the use of mAbs for discovery of new proteins, and makes it a difficult technology to use for detecting rare proteins. Nonetheless, mAb chips will be quite useful for protein profiling techniques that seek to compare the expression levels of important known proteins.

Dyax is developing phage display techniques to produce and select monoclonal antibodies (mAbs) via in vitro high-throughput screens. Phage display is a tagging technique that works by attaching the DNA sequence of an antibody to the sequence encoding a coat protein of a bacteriophage virus (phage) so that when the phage grows it presents the antibody on its surface. A library of phage-displaying mAbs can be washed over a column containing the protein of interest, and those phage that bind can be isolated and reproduced, creating a large sample of useful antibodies. To further improve binding, the DNA sequence encoding the antibody can be mutated at a few sites and the new set of phage produced can again be selected to identify those mAbs that bind most strongly to the protein of interest.

This picture illustrates the phage (on the left) with the monoclonal antibodies displayed on its outer surface (on the right).  Each phage presents a unique antibody, together creating a pool of millions of antibodies.

 

Biovation makes recombinant human antibodies bearing a unique barcode preceded by a protease cleavage site, for eventual use in protein microarray technology. These antibodies are exposed to a target protein, and those that bind are treated with protease to release their barcodes, which are then sequenced using mass spectrometry. The use of mass spectrometry solves the tagging problem without the need for phage display, and thus avoids the technical and patent issues surrounding display technologies. 

Phylos uses their proprietary "PROfusion" technology to directly link a protein to the mRNA that encodes it. This technology is used to create a library of proteins—antibody mimics—that bind cellular proteins with high affinity. The antibody mimics are immobilized onto a solid substrate to create Phylos’s "HIP" chip, a protein array that can be used for protein separation or protein-protein interaction assays.  See diagrams below:


Phylos' technique of linking the mRNA to its protein


Phylos' "HIP" chip is an array of PROfused molecules

Biacore’s surface plasmon resonance (SPR) systems detect mass changes on a surface, enabling sensitive detection of protein binding without the need for labels or dyes. Biacore now manufactures SPR instruments that can monitor binding on 100x100 arrays that can be used for analysis of protein expression profiles.

 

Bioinformatics:

These companies are most closely linked to genomics efforts. Their goal is to provide data-mining and warehousing capabilities to allow the prediction of protein structure and function based on DNA sequence. Predictions are made by comparing novel protein sequences to sequences for which the protein structure and function are known. This is feasible because proteins can be grouped into families that share similar function, regardless of the species in which the protein is found. It should be noted, however, that the relationships among proteins can be complex: relatively different DNA sequences can lead to similar structural motifs, while some functionally similar proteins have quite different structures, and vice versa. As the field of bioinformatics matures, it will improve researchers’ ability to identify related proteins, and will extend comparisons to include common structural motifs in disparate proteins.

Currently, much of the effort in bioinformatics revolves around developing software to improve the reliability—and thus utility—of 2D PAGE data.

The Swiss Institute of Bioinformatics and the European Bioinformatics Institute are collaborating to develop ExPASy (Expert Protein Analysis System), which runs various search capabilities for comparing novel proteins or sequences to known proteins. Their Human Proteome Initiative (HPI) aims to fully annotate all known human sequences. They will provide extensive information on every known protein including a description of its function, domain structure, subcellular location, post-translational modifications, variants, and similarities to other proteins.

Compugen’s patented Z3 software technology allows alignment of different 2D gels to allow data comparison despite the poor reproducibility of 2D PAGE. The software includes spot detection capabilities. 

The figure to the right shows two superimposed gels from a differential protein expression experiment. The yellow circles identify proteins that are differentially expressed in the two samples; the pink circle identifies a protein that is only expressed in one of the samples.

 

Incyte’s LifeExpress protein expression database is unique in its correlation of RNA and protein expression information. In collaboration with Oxford GlycoSciences, Incyte developed a 2D gel technique that they consider sufficiently reproducible for inter-gel comparisons. Their LifeExpress database allows the identification of proteins based on either their location on the gel, or on mass spec data obtained from spots on the gel. These proteins or amino acid sequences are then compared to ESTs found in Incyte’s LifeSeq human gene sequence database, allowing the user to move quickly between proteomics and genomics data.

GeneData’s protein analysis software, GeneData Impressionist, supports 2D gel data as well as data generated by various other technologies. The pattern-extraction software features a variety of statistical algorithms and interactive graphical tools that enable the user to perform quality control and analyze large, complex experimental series.

 

High-Throughput Protein Production and Structure Determination:

In order to study proteins that have been identified as important targets by separation and profiling efforts, large pure quantities of those proteins must be obtained. Ideally, an X-ray crystal structure will also be determined for every new protein discovered. This structural information is crucial for rational drug design efforts, which rely on it for design of small molecules that bind to proteins.

Structure diagram of the viral HN protein 
(celera.com/genomics/news/articles/11_00)

Harvard Institute of Proteomics Research is developing the FLEX (Full Length Expression) Repository to provide researchers with the complete set of known genes and open reading frames (ORFs) in a robot-accessible array of cDNA clones. Their system obviates the need to design a unique cloning method for each protein by providing simple in-frame shuttling of each gene into any of a variety of expression vectors. With such a system, high-throughput parallel screens of proteins for the creation of protein microarrays and the facilitation of structural determinations can be designed.

Vertex Pharmaceuticals uses a combination of NMR and X-ray crystallography to solve protein structures. They also group proteins into families that share similar active site structures, allowing them to speed both structure determination and lead compound identification.

MediChem integrates protein expression, crystallization, and 3D structural determination technologies with work in computational chemistry and crystallization databasing.

Integrative Proteomics specializes in high throughput purification of proteins, as well as structure and function determination.

 

Protein Analysis for Drug Development:

These companies work most closely with pharmaceutical developers, providing valuable information regarding how proteins interact with each other. They provide insight into biological pathways, allowing target validation and discovery of novel drug targets.

Hybrigenics develops Protein-Protein Interaction Maps (PIMs) using two-hybrid interaction assays. They currently have complete protein-protein interaction maps and associated proprietary selected drug targets and small molecule lead compounds for Helicobacter pylori, Hepatitis C virus, and Saccharomyces cerevisiae, all of which are available for licensing. Hybrigenics will also enter into strategic alliances for drug discovery, covering all the steps from establishment of a protein-protein interaction map, selection or validation of drug targets, identification of lead compounds, and biological validation. ("PIM" and "PIMs" are registered trademarks and trademarks of Hybrigenics)

CuraGen Corporation uses their PathCalling technology, which combines two-hybrid assays with bioinformatics software, to identify protein-protein interactions and biological pathways. Their substantial use of bioinformatics allows comparison of protein interactions in different species, as well as the expansion of pathways by incorporating information on second-degree interactions.

Myriad Genetics also uses a high-throughput two-hybrid system for identification of protein-protein interactions. They have also developed a database of these interactions that combines their proprietary information with published interactions.

Molecular Simulations, Inc. predicts the structure and function of novel protein targets using their Target Explorer software. This technology is designed to assign function to protein sequences and deliver these potential targets to specialized protein simulation and engineering tools. It allows users to identify key targets directly from DNA sequence information.

Areas for Improvement:

Proteomics efforts are currently hampered by the fact that the field’s main techniques, including 2D PAGE and X-ray crystallography, require highly skilled operators. Means of improving and automating these processes must be developed in order for the field to progress rapidly.

Databases containing current information on protein sequence, structure, and function are just becoming available. Techniques for linking and interpreting the vast amounts of data produced by genomics and proteomics efforts are still far from optimized.

As the field of proteomics matures, it will become possible to design more sophisticated experiments to determine protein function, and to investigate how proteins fit into complex biological pathways. Proteomics will multiply the number of known drug targets 100-fold, putting pressure on the pharmaceutical industry to capitalize on that new information. Protein profiling will also make advanced diagnostics a possibility, but it will be a challenge for the medical field to navigate the expanse of new data available to it.