Case 6
The human immune system presents several layers of defence to pathogens
such as bacteria and viruses. On the front line there's the skin,
a huge protective barrier that keeps out the vast majority of intruders.
Of course there are routes past this barrier -- the body needs a lot of
things to pass in and out in order to survive -- and a second line of
defence is presented by the mucous membranes and linings of the
gastro-intestinal, respiratory and urinary tracts.
Beyond those, there are the two levels that we usually think of when
talking about immunity: the innate immune system, which provides
non-specific defences against a broad range of pathogens, and the
adaptive immune system, which learns to combat novel
infections.
There a many elements to innate immunity, different mechanisms that have
evolved to combat various classes of potential threat. Anti-microbial peptides and the double-stranded RNA response exploited
by RNAi both fall into this category, but perhaps most crucial are the
phagocytes and macrophages: cells that engulf and digest
other cells. These cells devour various classes of intruder directly,
but they also devour native body cells when they die -- as a natural
part of the process of cleaning up and recycling -- or when they release
certain distress signals to indicate they are damaged by infection.
Digested cells -- including any contained pathogens -- are rendered down
into small peptide fragments or antigens. These bind to protein
complexes called MHCs and are returned to the cell surface, where they
become fodder for the adaptive immune system.
Adaptive immunity is mediated by two main types of cells, known as
B-cells (made in the bone marrow) and T-cells (made in the thymus).
T-cells are responsible for recognising pathogens; B-cells for
remembering them and generating antibodies, Y-shaped
molecules that bind to the pathogens and disrupt their function.
The antigens presented on a cell surface may be perfectly legitimate
ones that are found in the body all the time; or they may be
characteristic of some alien invader. T-cells make that call,
distinguishing self from non-self. When non-self fragments
are found, the T-cell does one of two things depending on its type.
Killer T-cells go off looking for pathogens matching the peptide
sequence and kill them. Helper T-cells go off looking for B-cells
that can manufacture corresponding antibodies and stimulate them to do
so.
Antibodies recognise pathogen shape in a complementary fashion.
The antigen surface to which the antibody binds is known as the
epitope; the corresponding surface on the antibody itself is the
paratope.
Clearly, the T-cell ability to distinguish self from non-self is crucial
-- its failure can lead to auto-immune diseases, in which the
immune system attacks the body's own healthy cells -- but it is not
fully understood.
The traditional model is clonal selection theory, in which
T-cells are produced in the thymus with essentially random binding
affinities, and then negatively selected: if they respond to
known self cells they're destroyed; otherwise, they're released. Vast
numbers of variant B-cells are manufactured, each kind capable of
producing antibodies to different hypothetical epitopes. The
system is degenerate: there are millions of different possible
defences, only a fraction of which will ever actually be triggered.
There are problems with clonal selection, in that the negative selection
criteria are meant to be laid down early -- in the foetus or early
neonate -- but the body changes drastically at various times in its
existence -- for example during puberty and pregnancy -- producing whole
new collections of antigens without normally stimulating an immune
response. Tumours often present new antigens without causing rejection,
and rejection of tissue transplants is quite variable -- eg, liver
transplants are less frequently rejected than skin grafts. Further, the
immune system doesn't attack food particles, which are clearly non-self.
An alternative explanation, known as danger theory1, proposes that there is no
neonatal definition of self, but rather that T-cells respond locally to
danger signals from cells being stressed or damaged. Different
signals prompt different kinds of response and a whole library of
responses is gradually developed that way.
Yet a third model is that of the cognitive immune system, in
which the system learns appropriate responses by incremental adaptation,
making connections between recognition fragments to construct some kind
of immune intelligence that is part of a holistic healing
mechanism. An analogy is made to the connection-based cognitive
processes by which our visual system develops. However, there is no
biological evidence for this theory.
One characteristic of the adaptive immune system is that its antigenic
responses are swifter and often more effective the second time around,
leading to acquired immunity. This is the basis of vaccination,
which aims to prevent or mitigate diseases by stimulating the initial
response without going through full-scale infection.
Although vaccination was originally a wholly empirical business,
understanding and modelling the workings of the immune system provide a
theoretical basis for better vaccine design and development.
Bacteria and even viruses are pretty complicated things, with a great
many potentially recognisable bits -- potential epitopes -- most
of which will turn out to be no use at all as vaccines. Perhaps they are
not distinguishable enough; perhaps they do not bind well to the various
molecules used in the recognition process; perhaps they are well
recognised, but always hidden away deep inside the pathogen so that
T-cells and antibodies never get a glimpse of them.
The goal of vaccine design is to find good epitopes in amongst
everything else. Knowing what constitutes "good" can help narrow the
search and make locating promising candidates more likely. Given the
amount of time and effort required -- both in terms of development and
official approval -- to go from such candidates to functioning vaccine
that can be used legally in humans, the earlier you can weed out the
millions of useless ones, the better. A reliable computational model for
epitope prediction can save a great deal of biochemical assay work.
A good epitope needs a high binding affinity for one of the MHCs --
major histocompatibility complexes -- that will display it to the
T-cells. There are two main MHC classes, with different binding
characteristics. Class I occur in all cells and bind short peptide
sequences of 8-11 residues; class II occur only in specialized
antigen-presenting cells and bind longer sequences of 12-25.
Class I are easier to model.
Several bioinformatic methods have been used for epitope mapping using
the protein sequences of the pathogens. Motif approaches search
for sequences with preferred characteristics, particularly at the
anchor residues -- the fixed points at which the MHC grabs onto
the peptide, where only a couple of amino acids are favoured; such
approaches are simple to implement but unreliable, producing a lot of
false positives and false negatives. Machine learning techniques such as
Hidden Markov Models and Support Vector Machines are more
complex, essentially building statistical models to classify sequences
as good or bad binders on the basis of known examples. These can achieve
something like 80% accuracy given a good body of training data.
Such approaches try to classify directly from the linear sequence data.
Structure-based approaches, on the other hand, in particular
molecular dynamics, try to closely model the actual physical
interactions between molecules. These can be very successful, and we'll
return to them in a different context shortly, but they are
spectacularly computationally-intensive and thus problematic to
apply to the large-scale classification of many thousands of candidate
epitopes.
An alternative to all these epitope mapping techniques is reverse
vaccinology, which proceeds from the open reading frames2 of the pathogen's genetic
sequence to subcellular location prediction. In other words, it
tries to identify those peptide subsequences which will actually be
exposed on the pathogen's surface, and thus available as targets for
T-cells and antibodies. While predicting location from sequence is an
unreliable process, results so far are promising.
A further issue is that, at least as far as antibodies are
concerned, an epitope is a fragment of shape on the pathogen
surface, not a fragment of sequence. It is possible for the area
of recognition to consist of non-contiguous parts of the sequence that
are brought together by folding, or conceivably even parts of distinct
sequences in a complex3.
At present there is no feasible way of considering such targets.
For a physical scientist coming into this sort of problem domain, the
philosophy underlying biomedical endeavours can seem
disconcertingly antiquated.4 Rutherford famously divided
all of science into physics and stamp collecting, and it's clear which
side of the border biology spends most of its time. Where physics has
embraced the Popperian notion of falsifiability as the
cornerstone of science, the life sciences cling to Baconian
induction: theory springing forth from the weight of accumulated
evidence. Experiments are done, observations made, but in the spirit of
see what happens rather than test theoretical prediction.
In this environment, hard mathematical modelling offers a shred of
disciplinary comfort, and molecular dynamics is very hard indeed.
The basic equations -- school-level Newtonian mechanics -- verge on
trivial when applied to one or two objects, but start causing trouble at
three. Trying to work out what might happen to molecules in a
typical biological context -- just a tiny one, cut off from everything
else -- in which millions of atoms are subject to millions
of interactions -- is rather more difficult.
In order to make the process at all possible, it needs to be handled
numerically -- which is to say, doing the calculations iteratively in
discrete steps. In order to make the results plausible, those
steps have to be small enough that the quantization doesn't introduce
significant errors, which is to say very, very small. Of the
order of 10-15 seconds. Which in turn means
that you have go through a lot of steps in order to model
anything happening on a biological timescale (let's say, for the sake of
argument, some tens of nanoseconds).
So. A lot of number crunching for a lot of particles for a lot of time
steps. That's going to need a lot of computer power. This is a problem
that isn't going to go away -- you can always add another macromolecule
to the simulation -- but grid computing offers some hope.
Distributed computational resources can be made available across
geographical and administrative boundaries, each node given a little
piece of the same big problem: many hands make light work. Various
scientific grid systems are up and running, allowing calculations to be
multiplexed with some degree of transparency, and they're
expanding all the time. Not all problems can be sensibly parallelised,
but molecular dynamics can.5
Given adequate computing power, how might one apply this technique to
practical biomedical problems? One area of potential value is in the
choice of drugs for treating people with HIV.
HIV is a retrovirus, which means that it copies itself into the
host's own genome by a process known as reverse transcription. At
some point thereafter, the infected cell starts manufacturing copies of
the virus, and Bob's its uncle. Well, almost.
In fact, the cell doesn't manufacture the whole virus in a single bound.
Viruses are a lot simpler even than bacteria, but like all living
systems their life cycles are complicated. What the cell manufactures
initially is more like a construction kit: further assembly is
required to make actual HIV. Various parts of this assembly process are
potential drug targets: block them and the virus can't reproduce.
One popular drug target is HIV protease, an enzyme that
cleaves one of the main polyprotein chains the virus forces the
cell to create into the separate proteins needed to build a new virus.
Protease inhibitors bind to the enzyme site at which cleaving would
otherwise occur, meaning the building blocks aren't available and new
virus particles can't be made. To function, such drugs must fit neatly
into the binding site.
Unfortunately, reverse transcription is a rather error-prone process,
which means HIV mutates very quickly, accumulating many variations
within a single patient. Of course, many of these mutations will have no
effect, or may even be harmful to the virus, but occasionally one might
modify the protease behaviour to make it less susceptible to a
particular inhibitor. If that inhibitor is in use, there will be rapid
selection for the resistant mutant, since it will be able to reproduce
while other strains will not. Resistance is an ongoing problem for HIV
treatment: there are known resistant strains for all 8
currently-approved protease inhibiting drugs.
Clinically, then, it is important to know which drugs will be most
effective for any individual patient, which in turn depends on the
specific shapes of the viral enzymes in relation to the shapes of the
drugs. Such interactions are dynamic: static crystallographic
structural information is insufficient. Molecular dynamics, on the other
hand, can calculate goodness of fit, given appropriate genomic
information from the patient, and thereby provide a useful indication as
to drug efficacy.
Of course, there would need to be substantial infrastructural support to
make such patient-specific molecular modelling a routine part of the
treatment procedure for HIV, as well as a change in mindset for
clinicians, for whom such approaches have not previously been available.
1 Invented by a former Playboy bunny and dog trainer called Polly Matzinger, who notoriously published a paper in the Journal of Experimental Medicine co-credited to one Galadriel Mirkwood, later revealed to be Matzinger's Afghan Hound!
2 ORFs are the sections of the genetic sequence that could potentially code for a protein, inasmuch as they meet some very basic syntactic requirements. Though pretty well defined for prokaryotes, ORFs are a significantly less straightforward proposition in eukaryotes because the transcriptional mechanics are a lot more complicated. But if you're designing a vaccine your target is almost certainly a bacterium or virus, so you don't have to worry about that.
3 This seems a little shaky, given that the recognition process is initiated by short MHC-bound peptide fragments. I can see how antibodies could bind to spatially proximate but sequentially remote combinations, but I can't see how that sort of macrostructural response would ever begin. Unless the idea is to design a vaccine that mimics in its short sequence a structure that the pathogen exhibits by conformational coincidence? That would be a neat trick, but its difficulty really does boggle the mind.
4 Atypically, I'm attempting to capture the presenter's points here -- albeit paraphrased -- rather than editorialising. I am not a physical scientist and my philosophical take on these matters isn't so clear cut.
5 I'm currently up to my elbows in work for Case 4, so it's difficult to resist drawing analogies with cellular automata, but I shall.
1 Invented by a former Playboy bunny and dog trainer called Polly Matzinger, who notoriously published a paper in the Journal of Experimental Medicine co-credited to one Galadriel Mirkwood, later revealed to be Matzinger's Afghan Hound!
2 ORFs are the sections of the genetic sequence that could potentially code for a protein, inasmuch as they meet some very basic syntactic requirements. Though pretty well defined for prokaryotes, ORFs are a significantly less straightforward proposition in eukaryotes because the transcriptional mechanics are a lot more complicated. But if you're designing a vaccine your target is almost certainly a bacterium or virus, so you don't have to worry about that.
3 This seems a little shaky, given that the recognition process is initiated by short MHC-bound peptide fragments. I can see how antibodies could bind to spatially proximate but sequentially remote combinations, but I can't see how that sort of macrostructural response would ever begin. Unless the idea is to design a vaccine that mimics in its short sequence a structure that the pathogen exhibits by conformational coincidence? That would be a neat trick, but its difficulty really does boggle the mind.
4 Atypically, I'm attempting to capture the presenter's points here -- albeit paraphrased -- rather than editorialising. I am not a physical scientist and my philosophical take on these matters isn't so clear cut.
5 I'm currently up to my elbows in work for Case 4, so it's difficult to resist drawing analogies with cellular automata, but I shall.