Case 5
Although the activities of a cell are largely mediated by proteins,
cells also use and produce a whole range of "small" molecules1 known as
metabolites. The host of metabolites in a cell is known
collectively as the metabolome, and the study of that is
metabolomics2.
Now, cells are scarily complicated assemblies of self-managing
machinery, and while we have a good idea how many individual pieces
work, a lot of the processes are understood in only the sketchiest ways.
Metabolomics constitutes one of a number of different perspectives on
the problem, casting the cell as a metabolite factory3 -- a somewhat limited view,
but one that can add useful pieces to the jigsaw puzzle.
Determining the functions of genes, as discussed before, is a hit and miss affair,
typically done by knocking genes out one by one4 and seeing what happens.
Often the functional consequences of a deletion are difficult to
identify, and this is somewhere metabolomics can help.
About three quarters of the genes in E. coli have had (at least
some of) their functions identified by the deletion technique, and half
of those are involved in metabolic control, so it seems plausible5 that a similar proportion
of the unknown genes may have metabolic consequences. If it were
possible to get a decent measure of a gene's effect on the metabolome,
that could go some way toward working out what the gene is for6. As it turns out, it
is possible, to some extent.
The problem has a number of components. To begin with, there's a whole
biochemical obstacle course to be negotiated in order to culture the
bacteria and extract the metabolome from in amongst all the other stuff
down there -- extracellular material, membrane, all those clumsy big
molecules -- over which let us draw a discreet veil.
The resulting soup of molecules then needs to be separated, identified
and measured. This is no mean feat even once -- there are hundreds of
different metabolites -- but to provide useful information it has to be
done a lot of times, so the process mustn't be too slow.
The chosen technique in this case is7 capillary
electrophoresis: the solution is drawn through a very fine glass
tube by an electrical field. The field affects the molecules differently
according to their charge and size, so different species wind up going
through the tube at different times, although the separation isn't
absolute. En route, their absorption spectra are measured over a visible
to near-UV range.
The output from this is not, alas, a neat list of all the metabolites
and their quantities, it's a whopping great table of absorption
coefficients for different wavelengths over time. Getting from the
latter to the former is another lengthy saga that we'll skip breezily
over with only the following notelets: spectra are mapped into a space
of known absorptions whose basis is neither orthogonal nor spanning;
fractal compression is used to reduce the data size; efficient lookup
algorithms are possible using differences from the basis, but there
isn't yet a complete spectral library in which to look.
Having gone through all that, the real question becomes: how can
we relate the metabolome measurements to gene function?
Metabolites participate in convoluted sequences of chemical reactions
known as metabolic pathways. That, in ateleological essence, is what
they're for. It's where they come from and where they're going.
The rates of those reactions depend on the concentrations of the
reactants, but also on the presence of appropriate proteins, serving as
catalysts, and hence on the genes. Which -- oh, it sounds so simple! --
is how the genome controls metabolism.
If we abstract away all the cellular machinery, we can view the cell
metabolomically as a network of transformations of small
molecules. Each metabolite is connected to others by the reactions they
have in common and each reaction has an associated rate. These rates,
and the consequent chemical concentrations, are constrained by basic
conservation laws: atoms can't magically appear or disappear. The
network is dynamic -- always in flux -- but settles overall into a
steady state where the reactions balance out8, and the cell hums along
harmoniously.
If we modify the black box mechanics of the network by deleting a
(metabolically relevant) gene, there will be some corresponding change
in the reaction rates, leading to a change in the steady state. The
concentrations of the metabolites -- the stuff we can measure -- will be
different.
Of course, turning the measured concentrations into reaction rates --
reconstructing the perturbed network from experimental data -- is a
rather hairy inverse problem. It would probably be impossible
were it not that the actual pathways have already been mapped, so we
know pretty much what the network topology must be. Deleting a gene is
never going to add new reactions, it's only going to rebalance
the ones already there, perhaps in some cases reducing them to
non-occurence; and the conservation constraints must always remain.
Identifying the location of the perturbation -- which is to say, the
principal effects of deleting the gene -- uses a technique called
co-response analysis, based on determining the response
and control coefficients (basically, the sensitivities of the
concentrations to one another and to the reaction rates, calculated as
partial derivatives) of different units (sub-nets) of the overall
network.
This is considerably more analytic and considerably less
statistical than I was expecting, and it seems at first sight
that there are a number of inadequately justified steps to the argument,
but I am assured that the technique has been comprehensively tested
against a variety of known and unknown data sources and can persuasively
locate network perturbations in both real and simulated data sets with a
high degree of confidence.
1 All molecules are small, obviously, but some are considerably smaller than others. The big ones, as far as biology is concerned, are the polymers -- proteins and nucleic acids -- which can contain millions of atoms. Pretty much everything else is considered "small".
2 If you're thinking that's a pig-ugly word, well, you're not alone. It's also arguably a pretty artificial division, but we humans do cherish our territorial boundaries.
3 Other points of view include proteomics (the cell is a protein factory), genomics (it's a gene factory, which is tantamount to being a cell factory) and, with a name clearly invented just to outdo metabolomics in the ugly stakes, transcriptomics (an RNA factory). All of these assertions are, of course, absolutely true -- in a shadows on the cave wall kind of way.
4 In this context, by physically changing the DNA rather than suppressing its products as in the earlier RNA interference discussion. The two techniques work to different ends. Besides, the organism under consideration today will be E. coli, which is a prokaryote and thus lacking the eukaryotic machinery exploited by RNAi. (There have been reports of an analogous process in prokaryotes, but I don't think it has been put to equivalent use.)
5 Whether a gene's function is known is obviously not a random variable, so there's no basis for making anything but the most shoulder-shrugging claims about this.
6 #include <stddisclaimer.h>
7 Mass spectrometry is also used, to cross-check the electrophoresis results, but it doesn't scale: mass spectrometers are bulky and extremely expensive, so it's not feasible to run hundreds of them in parallel; capillaries, on the other hand, are cheap as chips.
8 Approximately. One of the potential issues with this whole scheme is the steady state assumption, but I'm told it is biologically relevant.
1 All molecules are small, obviously, but some are considerably smaller than others. The big ones, as far as biology is concerned, are the polymers -- proteins and nucleic acids -- which can contain millions of atoms. Pretty much everything else is considered "small".
2 If you're thinking that's a pig-ugly word, well, you're not alone. It's also arguably a pretty artificial division, but we humans do cherish our territorial boundaries.
3 Other points of view include proteomics (the cell is a protein factory), genomics (it's a gene factory, which is tantamount to being a cell factory) and, with a name clearly invented just to outdo metabolomics in the ugly stakes, transcriptomics (an RNA factory). All of these assertions are, of course, absolutely true -- in a shadows on the cave wall kind of way.
4 In this context, by physically changing the DNA rather than suppressing its products as in the earlier RNA interference discussion. The two techniques work to different ends. Besides, the organism under consideration today will be E. coli, which is a prokaryote and thus lacking the eukaryotic machinery exploited by RNAi. (There have been reports of an analogous process in prokaryotes, but I don't think it has been put to equivalent use.)
5 Whether a gene's function is known is obviously not a random variable, so there's no basis for making anything but the most shoulder-shrugging claims about this.
6 #include <stddisclaimer.h>
7 Mass spectrometry is also used, to cross-check the electrophoresis results, but it doesn't scale: mass spectrometers are bulky and extremely expensive, so it's not feasible to run hundreds of them in parallel; capillaries, on the other hand, are cheap as chips.
8 Approximately. One of the potential issues with this whole scheme is the steady state assumption, but I'm told it is biologically relevant.