What is a protein?

The word protein is derived from the Greek proteios, meaning “of the first rank”. The term was coined in 1838 by the Swedish scientist Jöns Berzelius, to reflect the importance of this group of molecules.

A stretch of DNA called a gene carries the information required to build a protein. It is believed that there are between 20,000 and 25,000 genes in the human genome (1), but over 1 million proteins in the human proteome (2), making proteins the most abundant class of all biological molecules. The difference between the number of genes and proteins is due the fact that one gene is able to give rise to more than one protein, and that once produced, proteins can be chemically modified (usually by other proteins) to change their properties and activities.

The building blocks of proteins are amino acids. There are twenty naturally occurring amino acids (see table The naturally occurring amino acids) from which all natural proteins are constructed. All twenty are based on a common structure and differ in the chemical properties of their so-called side-chains. Some (e.g., tryptophan and phenylalanine) are strongly hydrophobic, while others (e.g., lysine and aspartic acid) carry an ionic charge at physiological pH, making them hydrophilic. Amino acids are linked together by peptide bonds to form protein chains. The sequence of amino acids in a protein and the way the protein chain is folded determine its properties.

The advances made in molecular biology over the past few decades have greatly improved the study of proteins. Previously, the only way to obtain a specific protein was to purify it from the natural source, a procedure that was often extremely inefficient and time-consuming. With the advent of recombinant molecular biological techniques it is possible to clone the DNA that encodes the protein of interest into an expression vector and express the protein in bacteria, often E. coli. The universality of the genetic code that translates a DNA sequence into a protein allows proteins from any organism to be expressed quickly and in large amounts.

This section describes procedures for expression, analysis, detection, and assay of proteins.

The naturally occurring amino acids
Amino acid 3-letter code    1-letter code
Alanine Ala A
Arginine Arg R
Asparagine Asn N
Aspartic acid Asp D
Cysteine Cys C
Glutamic acid Glu E
Glutamine Gln Q
Glycine Gly G
Histidine His H
Isoleucine Iso I
Leucine Leu L
Lysine Lys K
Methionine Met M
Phenylalanine Phe F
Proline Pro Pro
Serine Ser S
Threonine Thr  T
Tryptophan Trp W
Tyrosine Tyr Y
Valine Val V