Case 2
The human visual system is pretty remarkable in various ways, many of
which probably didn't become altogether apparent until people started
trying to reproduce them technologically. Even defining just what
constitutes vision is not always easy; a reasonable working definition
(due to important early machine vision researcher David Marr) holds
vision to be the process of forming a description of a scene --
the objects present, their surface qualities, the lighting conditions,
etc -- given an image of it (eg, on the retina of the eye). This
turns out to be extremely difficult because only a fraction of the
relevant information is directly present in the image; the rest
must be deduced, with the deductive process recruiting all sorts of
domain-specific knowledge.
Marr argued that, rather than starting with the particular ways vision
is implemented by different existing creatures, as a biologist or
neuroscientist might, it would make more sense to try to map the space
of potential solutions to the vision problem in the abstract and then
consider the implementation details on the basis of that higher level
understanding. This approach can lead to models that are rather far
removed from the physiological mechanisms of seeing.
The subset of vision under consideration here is the perception of
motion, and in particular that of second-order motion,
where the movement is carried not by luminance boundaries (the edges of
shapes, let's say) but by changes in texture, contrast, flicker etc.
(See here
for an example that may clarify the idea.) Such motion is
interesting because, while people readily perceive it, some standard
models are either incapable of explaining it or require significant
fudging to do so. (The latter does not necessarily invalidate them --
nature fudges stuff all the bloody time -- but it at least casts doubt.)
Models of motion perception consider the input across the visual field
over time, typically represented as a space-time diagram (most often
with only one spatial dimension, like a single scan-line, so the diagram
can be nicely 2D). Motion is recognised when neurons are stimulated by
particular patterns in this space-time domain.
A simple model of how this might work is the Reichardt correlation
model, which is actually pretty similar to the Jeffress model for
hearing mentioned in the previous case
notes: a particular neuron -- or, less prejudicially, a
filter -- tuned to motion in a particular direction, is fed by
stimuli from different points in the visual field via various delay
lines; when the stimuli arrive simultaneously it registers motion of the
corresponding velocity.
There are many problems with this model that we shall ignore, but one
important one is that the input stimuli are unlikely to be conveniently
arranged for the viewer's benefit: there will often be spatial
periodicity as well as motion, in which case there is more or less
limitless scope for aliasing. To reduce this, we incorporate some
form of directional differencing -- ie, subtracting the effects in one
direction from those in the opposite direction, cancelling periodic
aspects and leaving (more of) the things we're interested in.
Given arbitrary connectivity within such a setup -- which is to say,
imagining that each such motion-sensing filter can be tuned to
any combination of temporal and spatial correspondences -- we
could probably identify most forms of motion. However, nature tends to
be rather parsimonious, while artificial implementations are always
constrained by computational complexity, so we really need a
higher-level model of the sort of pattern such filters might
reasonably pick up. (The directional differencing mentioned above is a
rudimentary example of such a pattern.)
A common model (or really the basis for a whole range of them) is the
motion energy model, in which a pair of filters, oriented
differently in space-time -- assume orthogonal for simplicity -- are
combined. The motion energy is then the sum of the squares of the
differences in the two directions. Often this model is transformed
Fourierwise into the frequency domain, but either way it can't spot
second-order motion.
A possible response to this failure is to posit additional parallel
perceptual channels tuned to the different kinds of motion stimulus: one
vision subsystem identifies the first-order movement, while another is
tuned to second-order. Such a model can, with appropriate mucking around
(surely "rectification"? -- Ed), be persuaded to work, but it
reeks of epicycles.
A preferable model would recognise both first- and second-order motion
in a single processing pathway, and this can be managed by having the
sensors tuned to gradient differences rather than luminance
differences. In mathematical terms, motion in this model is detected
from the ratio of partial derivatives of signal intensity with
respect to space and time.
Now, this is a dismayingly abstract model, and it gets worse: to get
around the perceptual singularities that would be introduced whenever
the denominator is zero, it is necessary to add in higher-order
derivatives both above and below the line. Such hackery makes for a
fairly robust model, able to correctly identify both kinds of motion and
even make a good stab at such degenerate inputs as simultaneous
overlapping coherent motion in different directions (which to a human
observer suggests transparency: seeing through a layer of things moving
in one direction to another layer moving in the other), but it's asking
a lot of the visual cortex to be constantly doing all that partial
differentiation.
As it turns out, at least some of this mathematical complexity
can be rendered into more physiologically-plausible terms as
convolution kernels -- the space-time patterns mentioned
earlier to which neurons might be tuned. There is experimental evidence
that brain cells can be receptive to input patterns cognate with the
multiple orders of partial derivatives in this gradient model.
Still, there do seem steps along the way where this model is far too
enslaved by its own -- inevitably approximate -- mathematics. Nature, as
we've already observed, is profoundly pragmatic, and she'll perpetrate
any kind of sleazy kludge in the pursuit of evolutionary fitness. Why on
Earth would she mess around with the human formalities of arithmetic,
let alone calculus, when a bunch of plastic lookup tables might achieve
the same results?