Case 6

The human immune system presents several layers of defence to pathogens such as bacteria and viruses. On the front line there's the skin, a huge protective barrier that keeps out the vast majority of intruders. Of course there are routes past this barrier -- the body needs a lot of things to pass in and out in order to survive -- and a second line of defence is presented by the mucous membranes and linings of the gastro-intestinal, respiratory and urinary tracts.

Beyond those, there are the two levels that we usually think of when talking about immunity: the innate immune system, which provides non-specific defences against a broad range of pathogens, and the adaptive immune system, which learns to combat novel infections.

There a many elements to innate immunity, different mechanisms that have evolved to combat various classes of potential threat. Anti-microbial peptides and the double-stranded RNA response exploited by RNAi both fall into this category, but perhaps most crucial are the phagocytes and macrophages: cells that engulf and digest other cells. These cells devour various classes of intruder directly, but they also devour native body cells when they die -- as a natural part of the process of cleaning up and recycling -- or when they release certain distress signals to indicate they are damaged by infection. Digested cells -- including any contained pathogens -- are rendered down into small peptide fragments or antigens. These bind to protein complexes called MHCs and are returned to the cell surface, where they become fodder for the adaptive immune system.

Adaptive immunity is mediated by two main types of cells, known as B-cells (made in the bone marrow) and T-cells (made in the thymus). T-cells are responsible for recognising pathogens; B-cells for remembering them and generating antibodies, Y-shaped molecules that bind to the pathogens and disrupt their function.

The antigens presented on a cell surface may be perfectly legitimate ones that are found in the body all the time; or they may be characteristic of some alien invader. T-cells make that call, distinguishing self from non-self. When non-self fragments are found, the T-cell does one of two things depending on its type. Killer T-cells go off looking for pathogens matching the peptide sequence and kill them. Helper T-cells go off looking for B-cells that can manufacture corresponding antibodies and stimulate them to do so.

Antibodies recognise pathogen shape in a complementary fashion. The antigen surface to which the antibody binds is known as the epitope; the corresponding surface on the antibody itself is the paratope.

Clearly, the T-cell ability to distinguish self from non-self is crucial -- its failure can lead to auto-immune diseases, in which the immune system attacks the body's own healthy cells -- but it is not fully understood.

The traditional model is clonal selection theory, in which T-cells are produced in the thymus with essentially random binding affinities, and then negatively selected: if they respond to known self cells they're destroyed; otherwise, they're released. Vast numbers of variant B-cells are manufactured, each kind capable of producing antibodies to different hypothetical epitopes. The system is degenerate: there are millions of different possible defences, only a fraction of which will ever actually be triggered.

There are problems with clonal selection, in that the negative selection criteria are meant to be laid down early -- in the foetus or early neonate -- but the body changes drastically at various times in its existence -- for example during puberty and pregnancy -- producing whole new collections of antigens without normally stimulating an immune response. Tumours often present new antigens without causing rejection, and rejection of tissue transplants is quite variable -- eg, liver transplants are less frequently rejected than skin grafts. Further, the immune system doesn't attack food particles, which are clearly non-self.

An alternative explanation, known as danger theory¹, proposes that there is no neonatal definition of self, but rather that T-cells respond locally to danger signals from cells being stressed or damaged. Different signals prompt different kinds of response and a whole library of responses is gradually developed that way.

Yet a third model is that of the cognitive immune system, in which the system learns appropriate responses by incremental adaptation, making connections between recognition fragments to construct some kind of immune intelligence that is part of a holistic healing mechanism. An analogy is made to the connection-based cognitive processes by which our visual system develops. However, there is no biological evidence for this theory.

One characteristic of the adaptive immune system is that its antigenic responses are swifter and often more effective the second time around, leading to acquired immunity. This is the basis of vaccination, which aims to prevent or mitigate diseases by stimulating the initial response without going through full-scale infection.

Although vaccination was originally a wholly empirical business, understanding and modelling the workings of the immune system provide a theoretical basis for better vaccine design and development.

Bacteria and even viruses are pretty complicated things, with a great many potentially recognisable bits -- potential epitopes -- most of which will turn out to be no use at all as vaccines. Perhaps they are not distinguishable enough; perhaps they do not bind well to the various molecules used in the recognition process; perhaps they are well recognised, but always hidden away deep inside the pathogen so that T-cells and antibodies never get a glimpse of them.

The goal of vaccine design is to find good epitopes in amongst everything else. Knowing what constitutes "good" can help narrow the search and make locating promising candidates more likely. Given the amount of time and effort required -- both in terms of development and official approval -- to go from such candidates to functioning vaccine that can be used legally in humans, the earlier you can weed out the millions of useless ones, the better. A reliable computational model for epitope prediction can save a great deal of biochemical assay work.

A good epitope needs a high binding affinity for one of the MHCs -- major histocompatibility complexes -- that will display it to the T-cells. There are two main MHC classes, with different binding characteristics. Class I occur in all cells and bind short peptide sequences of 8-11 residues; class II occur only in specialized antigen-presenting cells and bind longer sequences of 12-25. Class I are easier to model.

Several bioinformatic methods have been used for epitope mapping using the protein sequences of the pathogens. Motif approaches search for sequences with preferred characteristics, particularly at the anchor residues -- the fixed points at which the MHC grabs onto the peptide, where only a couple of amino acids are favoured; such approaches are simple to implement but unreliable, producing a lot of false positives and false negatives. Machine learning techniques such as Hidden Markov Models and Support Vector Machines are more complex, essentially building statistical models to classify sequences as good or bad binders on the basis of known examples. These can achieve something like 80% accuracy given a good body of training data.

Such approaches try to classify directly from the linear sequence data. Structure-based approaches, on the other hand, in particular molecular dynamics, try to closely model the actual physical interactions between molecules. These can be very successful, and we'll return to them in a different context shortly, but they are spectacularly computationally-intensive and thus problematic to apply to the large-scale classification of many thousands of candidate epitopes.

An alternative to all these epitope mapping techniques is reverse vaccinology, which proceeds from the open reading frames² of the pathogen's genetic sequence to subcellular location prediction. In other words, it tries to identify those peptide subsequences which will actually be exposed on the pathogen's surface, and thus available as targets for T-cells and antibodies. While predicting location from sequence is an unreliable process, results so far are promising.

A further issue is that, at least as far as antibodies are concerned, an epitope is a fragment of shape on the pathogen surface, not a fragment of sequence. It is possible for the area of recognition to consist of non-contiguous parts of the sequence that are brought together by folding, or conceivably even parts of distinct sequences in a complex³. At present there is no feasible way of considering such targets.

For a physical scientist coming into this sort of problem domain, the philosophy underlying biomedical endeavours can seem disconcertingly antiquated.⁴ Rutherford famously divided all of science into physics and stamp collecting, and it's clear which side of the border biology spends most of its time. Where physics has embraced the Popperian notion of falsifiability as the cornerstone of science, the life sciences cling to Baconian induction: theory springing forth from the weight of accumulated evidence. Experiments are done, observations made, but in the spirit of see what happens rather than test theoretical prediction.

In this environment, hard mathematical modelling offers a shred of disciplinary comfort, and molecular dynamics is very hard indeed. The basic equations -- school-level Newtonian mechanics -- verge on trivial when applied to one or two objects, but start causing trouble at three. Trying to work out what might happen to molecules in a typical biological context -- just a tiny one, cut off from everything else -- in which millions of atoms are subject to millions of interactions -- is rather more difficult.

In order to make the process at all possible, it needs to be handled numerically -- which is to say, doing the calculations iteratively in discrete steps. In order to make the results plausible, those steps have to be small enough that the quantization doesn't introduce significant errors, which is to say very, very small. Of the order of 10^-15 seconds. Which in turn means that you have go through a lot of steps in order to model anything happening on a biological timescale (let's say, for the sake of argument, some tens of nanoseconds).

So. A lot of number crunching for a lot of particles for a lot of time steps. That's going to need a lot of computer power. This is a problem that isn't going to go away -- you can always add another macromolecule to the simulation -- but grid computing offers some hope. Distributed computational resources can be made available across geographical and administrative boundaries, each node given a little piece of the same big problem: many hands make light work. Various scientific grid systems are up and running, allowing calculations to be multiplexed with some degree of transparency, and they're expanding all the time. Not all problems can be sensibly parallelised, but molecular dynamics can.⁵

Given adequate computing power, how might one apply this technique to practical biomedical problems? One area of potential value is in the choice of drugs for treating people with HIV.

HIV is a retrovirus, which means that it copies itself into the host's own genome by a process known as reverse transcription. At some point thereafter, the infected cell starts manufacturing copies of the virus, and Bob's its uncle. Well, almost.

In fact, the cell doesn't manufacture the whole virus in a single bound. Viruses are a lot simpler even than bacteria, but like all living systems their life cycles are complicated. What the cell manufactures initially is more like a construction kit: further assembly is required to make actual HIV. Various parts of this assembly process are potential drug targets: block them and the virus can't reproduce.

One popular drug target is HIV protease, an enzyme that cleaves one of the main polyprotein chains the virus forces the cell to create into the separate proteins needed to build a new virus. Protease inhibitors bind to the enzyme site at which cleaving would otherwise occur, meaning the building blocks aren't available and new virus particles can't be made. To function, such drugs must fit neatly into the binding site.

Unfortunately, reverse transcription is a rather error-prone process, which means HIV mutates very quickly, accumulating many variations within a single patient. Of course, many of these mutations will have no effect, or may even be harmful to the virus, but occasionally one might modify the protease behaviour to make it less susceptible to a particular inhibitor. If that inhibitor is in use, there will be rapid selection for the resistant mutant, since it will be able to reproduce while other strains will not. Resistance is an ongoing problem for HIV treatment: there are known resistant strains for all 8 currently-approved protease inhibiting drugs.

Clinically, then, it is important to know which drugs will be most effective for any individual patient, which in turn depends on the specific shapes of the viral enzymes in relation to the shapes of the drugs. Such interactions are dynamic: static crystallographic structural information is insufficient. Molecular dynamics, on the other hand, can calculate goodness of fit, given appropriate genomic information from the patient, and thereby provide a useful indication as to drug efficacy.

Of course, there would need to be substantial infrastructural support to make such patient-specific molecular modelling a routine part of the treatment procedure for HIV, as well as a change in mindset for clinicians, for whom such approaches have not previously been available.

1 Invented by a former Playboy bunny and dog trainer called Polly Matzinger, who notoriously published a paper in the Journal of Experimental Medicine co-credited to one Galadriel Mirkwood, later revealed to be Matzinger's Afghan Hound!
2 ORFs are the sections of the genetic sequence that could potentially code for a protein, inasmuch as they meet some very basic syntactic requirements. Though pretty well defined for prokaryotes, ORFs are a significantly less straightforward proposition in eukaryotes because the transcriptional mechanics are a lot more complicated. But if you're designing a vaccine your target is almost certainly a bacterium or virus, so you don't have to worry about that.
3 This seems a little shaky, given that the recognition process is initiated by short MHC-bound peptide fragments. I can see how antibodies could bind to spatially proximate but sequentially remote combinations, but I can't see how that sort of macrostructural response would ever begin. Unless the idea is to design a vaccine that mimics in its short sequence a structure that the pathogen exhibits by conformational coincidence? That would be a neat trick, but its difficulty really does boggle the mind.
4 Atypically, I'm attempting to capture the presenter's points here -- albeit paraphrased -- rather than editorialising. I am not a physical scientist and my philosophical take on these matters isn't so clear cut.
5 I'm currently up to my elbows in work for Case 4, so it's difficult to resist drawing analogies with cellular automata, but I shall.