Extending Chaos and Complexity Theory
to Address Life, Brain and Quantum Foundations
Paul J. Werbos
National
Science Foundation, Room 675
pwerbos@nsf.gov
Introduction
Years
ago, many researchers proposed that chaos
theory could provide a kind of universal theory of the qualitative behavior
of all dynamical systems. Thus it
could provide a solid mathematical foundation for unifying efforts to address
the three really fundamental questions of basic science: (1) what is life?;
(2)
what is mind or intelligence?; (3) what are the
underlying physical laws of the universe?
Chaos
theory is essentially a recent extension of a larger field of mathematics which
it is part of: nonlinear system dynamics (NSD). It mainly addresses systems governed by
Ordinary Differential Equations (ODE) or their discrete-time cousins. Because
the vast bulk of models now used in computational neuroscience are ODE models,
it is only natural that neuroscience has developed great interest in the use of
NSD as a tool for analyzing and designing neural network models.
Walter
Freeman, in particular, has played a seminal role in describing how chaos
theory can be used in computational neuroscience, in his classical models of
the olfactory system, described in articles
in Scientific American and
many scientific journals and IJCNN proceedings. His colleague Morris Hirsch in
mathematics has helped provide a solid foundation for this work. This work has
helped to inspire many efforts throughout the world to use chaos as a design
principle in computer systems and in artificial neural network (ANN) design.
Recently,
however, several issues or questions have begun to emerge, some at
First,
the pure ODE models of the olfactory system did not display the level of
long-term persistence of chaotic
behavior required to match Freeman's empirical data on the olfactory system.
Freeman and Kozma, working together at
Second,
chaos theory as such focuses on the existence of low-dimensional attractors. But brains -- like stock markets or
economies or turbulent flows or the bodies of organisms -- are inherently high-
dimensional systems. In other words, the space of
states actually visited is a very high-dimensional space, even though it is a
tiny fraction of the larger space of states which are theoretically available
to the systems. For this reason, many researchers, at the Santa Fe Institute
and elsewhere, have called for the development of a "complexity
theory" which would address that larger class of dynamical system. The
efforts to produce such a "complexity theory" have produced many
interesting results, but the public enthusiasm for the field has far outpaced
the development of new unifying mathematics. It is reasonably obvious why the
brain is a complex system, not reducible to truly low-dimensional attractors
over the long-term: the information
content which the brain can handle is linked to the richness or
dimensionality of the states which it visits.
Third,
the brain is not just one of many possible dynamical systems. It is a
particular type of dynamical system,
a system evolved to achieve a high level of functionality
in certain very broad and challenging information-processing tasks. Thus, in
addition to the general mathematics
of nonlinear dynamical systems (which does not provide what Grossberg would
call a "strong filter"), it is crucial to also use the more specific
mathematics of functional decision-making
systems, and described, for example, in the IJCNN2000 session on Adaptive
Critic Designs (ACD). Furthermore,
in engineering applications, ACDs can provide a design methodology to actually locate "chaos controllers"
which improve performance over classical fixed controllers, for those environments in which chaos
controllers have a potential performance advantage. ACD designs are normally
used to tune the parameters of controller networks but the identical
mathematics can be used to adapt actual physical design parameters, in simulation
-- so that advantages of high-gain high-performance chaotic physical design can
also be captured in such a design approach.
Fourth,
it is unclear whether ODE models are powerful enough to capture the essential
higher-order capabilities of the brain. Pribram has argued, for example, that
PDE or field effect models may be essential in some cases [1,2].
Other have argued that some sort of quantum computing
may be useful -- at least in developing new higher-level intelligent systems.
(The presence or absence of quantum computing effects in the human brain is
highly speculative and controversial at present [3].) Field effects are
certainly
important, in any case, to the evolution of
biological organisms and to the foundations of physics.
In
the past few months, Robert Kozma and myself have held
extensive discussions by email about the implications of these questions. It is
our view that the existing mathematics available from chaos theory and
complexity research does not really live up to the long-term potential, in
providing a better foundation
for a more rigorous understanding of life, the brain and
quantum foundations. We also have some ideas about what could be done to extend
this mathematics, to come closer to living up to the potential. This paper will
give my personal, tentative views of what is needed here, and of how the pieces
fit together.
These
views will certainly not represent any official views of my employers; in fact,
because they are tentative, they may not even represent my own views by the time
this goes into print. However, I hope that the questions raised here will be
useful in stimulating new thoughts and better concepts by other players in the
research community.
An Immediate
Suggestion: Develop “Stochastic Chaos Theory”
The
work of Freeman and Kozma leads to a straightforward conclusion: we need to
develop a “stochastic chaos theory,” a body of mathematics which
does for their kind of model the same kind of service which ordinary chaos does
for ODE models.
The
need for such a development seems extremely obvious. In my view, the real
puzzle is why this body of mathematics does not exist, or why – if it
exists in some form – it has not been collected together and unified and
disseminated in the same way as ordinary chaos theory has been disseminated.
Years
ago, ordinary NSD and chaos theory went through amazing tribulation and
resistance to their development. In the early years, some mathematicians talked
about the need for a “qualitative theory of ODE.” They emphasized the
need to develop a new field of mathematics, complementary
to the more traditional mathematics of ODE, to address questions about
qualitative behavior which the earlier theory (however valuable) simply did not
address. After decades of effort, NSD now provides a large and growing
literature on the qualitative behavior of systems of the form ¶tx=f(x,W) or x(t+1)=f(x(t),W), where x
is a vector, where W is a set of weights or parameters, and where “¶t” is
physics notation for differentiation with respect to time.
Freeman’s
new model is not so far away from his earlier ODE models. It may be written
(schematically) as ¶tx=f(x,W)+e,
where e is a fairly small
stochastic disturbance vector. In effect, Freeman has been pleading with mathematicians
to provide an extension of chaos theory, to help him rigorously analyze the
properties of models in this class.
The
main obstacle to the development of this theory seems to be a matter of
historical circumstances. One group of mathematicians is committed to a
probabilistic view of the world, and they prove theorems which address
questions analogous to those which were addressed by the mathematics of ODE,
before chaos theory was developed. Another group has spent their lives working
with a deterministic version of chaos, and is often sympathetic to the view
that “noise” should always be represented as the outcome of some
underlying deterministic process. The challenge, in part, is how to settle new
territory
in the no-man’s-land between these groups, a land which
no one has claimed as yet.
These
extreme alternatives do not fully address the needs of biology or of several
other fields. Very often, a model of the form ¶tx=f(x,W)+e will
be far more parsimonious (and easier to test) than a model which requires a
detailed, explicit deterministic account of every source of microscopic or
quantum noise which results in e.
Likewise, from the viewpoint of pure mathematics, it is a reasonable and
well-posed question to ask what the qualitative behavior of such a system would
be.
The
first suggestion, then, is to try to reproduce the achievements of chaos theory
for this more general class of systems, particularly for the case where e is small.
In
formal terms, mathematicians in this area would not actually write “¶tx=f(x,W).” They would write an expression which is
essentially the same, but somewhat more rigorous. Systems of this sort are
called Stochastic Differential Equations (SDE)[4]. The
need here is for a more unified and comprehensive qualitative theory of SDE,
analogous to the modern qualitative theory of ODE, capable at a minimum of
assisting (and simplifying) the analysis of models like the model of Freeman
and Kozma.
The
term “stochastic chaos theory” is the best term we can think of, in
discussions between Kozma, Freeman and myself.
Beyond
this core body of mathematics, however, there may be some need for other
related types
of mathematical tools here.
Qualitative
Theory of Mixed Forwards-Backwards SDE or 0+1-D MRFs
The
qualitative theory of ordinary SDE would already address Freeman’s
current concerns, and might well provide a fully adequate foundation for
computational neuroscience (coupled with the ACD mathematics
discussed previously). However, SDE themselves
are a special case of a larger class of systems.
SDE
assume that the random disturbance “e(t)” is not
correlated with x(t£t). This is
called the “causality assumption” in statistics. (More precisely,
it is assumed that e(t)
is statistically independent of
prior values of x and e.)
Conventional time-series models, formulated as x(t+1)=f(x(t),e(t)),
typically make the same assumption [5]. There are very, very complex books on
martingale theory and such which consistently maintain the same assumption.
Note that e(t)
typically is correlated with x(t>t), because random disturbances
typically do change the state of the system at later times.
More
recently, mathematicians have become interested in the formal properties of
mixed forwards-backwards SDE [4]. In such models, random disturbances can cause
effects both at later times and at earlier times. They claim [4] that such
models are useful both in optimal control and in economics; for example, they
claim that they may lead to capabilities very similar to those of Dual
Heuristic Programming, discussed in the ACD session of IJCNN2000, related to
the Pontryagin equation.
In
my view, mixed forwards-backwards systems will be crucial to a more rigorous
and concise reformulation of quantum field theory. This issue is extremely
complex, and beyond the scope of this paper [3,6-9].
There are certain critical experiments in physics – called
“Bell’s Theorem” experiments, based on a theorem of Clauser
et al – which in my view demonstrate that causality actually runs
forwards and backwards through time, symmetrically, at the microscopic level
[6,3]. The earliest papers on this were published by DeBeauregard and myself, later cited by Penrose [9] among others, although
Von Neumann’s classic tract on quantum mechanics clearly evinced a
similar intuition. Most researchers today were brought up on the ancient
“first quantization” (Schrodinger’s equation over 3+1-D
space-time) or on
the “second quantization” of the 1950’s (wave
functions propagating over Fock space); however, modern high-energy physics is
based on a more general formulation, the “functional integral
formalism” [10,7,3]
which brings out more clearly the underlying
symmetries with respect to time. In the “quantum computing world,”
Yanhua Shih and others have recently performed experiments which strongly
support this intuition.
If
the physical universe we live in might actually be a mixed forwards-backwards
system of some kind, then we need to understand the mathematics of such systems
better than we now do. This is true, regardless of which formalism or theory
survives future tests; indeed, deeper mathematical insight will be crucial to
understanding the theories well enough to test them!
Forwards-backwards
SDE involve continuous time. To develop the
mathematics, we may also consider the related issue of discrete time systems.
The relevant time-forwards discrete systems can generally be written as x(t+1)=f(x(t),...,x(t-k),e), with the causality
assumption applied to e. To
define the forwards-backwards generalization of such systems, consider systems
defined as Markhov Random Fields (MRF) over the set of time points t = -¥,...., -2,
-1,0,1,2,...,¥. I would call
this the “0+1-D” special case of
a more general space-time MRF system[7]; the
“0+1” refers to zero space dimensions and one time dimension.
The
literature on space-like (n+0-D) MRFs is huge and diverse. It ranges from old
discussions of lattice dynamics and spin-glass neural networks in physics,
through to image processing technology, through to recent work by
There
are two main ways to specify a particular MRF model, defined over a regular
lattice of points t
in space-time. The literature uses the terms “cliques” and
“elites,” but I regret that I tend to confuse these terms because
of their lack of intuition; instead, I use “neighborhood” and
“driver set.”
For
a regular MRF model over lattice points t in space time, we may write:
Pr(j(t) | { j(t), t Î N(t)}) =
Pr(j(t) | { j(t), t ¹ t} ) (1)
and we implicitly
assume:
Pr(j(t) | { j(t), t Î N(t)}) = Pr(j(t+d) | { j(t), t Î N(t+d)})
for all d (2)
Equation
1 says that there is a set of points, N(t), the Neighborhood of t, such that the probability of j having any particular value at t only depends on the values of j at other
points in the neighborhood of t;
in other words, there is no independent or direct correlation between the value
of j(t) and the value of j at any other point, outside the neighborhood. Equation 2
basically says that the dynamics of the system are the same at al points. It is
possible to specify an MRF model simply by specifying the probability
distribution on the left-hand side of equation 1; I would call this a
neighborhood representation.
Alternatively,
one may define an MRF model by what I would call a driver representation. In a driver representation, we specify a function
P, which is a different sort of probability distribution P(j(t)|{j(t),tÎD(t)}), where D(t) is a “driver set”
of points near t. When a
driver and a neighborhood representation are both possible, the neighborhood is
usually twice as wide as the driver set, roughly. Thus equation 1 does not hold for the driver set! Instead,
the driver set has a property which may be represented crudely as:
Pr({j(t), t Î R} | {j(t), t Î dR}) = (1/Z) P P(j(t) | {j(t), t Î D(t)}) (3)
t
Î R
where R is essentially any region of
space-time, where dR is a
“border region” (set of points immediately surrounding R), and Z is
a scaling factor used to make sure that the probabilities add up to one. The
ordinary time-forwards dynamic system described above is a special case of
equation 3, in effects, where the driver set D(t) is the set of time-points
t-1,...,t-k. (In the time-series example, notice that x(t) will be correlated
with x(t+1), normally; thus
t+1 will be in N(t) but not in D(t).)
For
biology and physics, a key research goal is to better understand the
“arrow of time” – and then to understand the interface
between microscopic forwards-backwards symmetry and macroscopic time-forwards
causality. This suggests one warm-up question at the 0+1-D level: when can a given dynamical process be represented equivalently as a time-forwards Markhov
process, as a time-backwards Markhov process, and as mixed forwards-backwards
process?
My
preliminary results cannot be proven in 6 pages – and in any case, a real
pure mathematician could do a much better job. My rough impressions are as
follows. In the 0+1-D case, when j is taken from
a finite set of possible values, a time-forwards Markhov process splits the set
of possible values into two
subsets – the set of transient states
(where probability always goes to zero, regardless of initial state) and of
ergodic states. The ergodic core
of this process – the process restricted to the ergodic states –
can be represented equivalently as a time-forwards,
time-backwards and mixed forwards-backwards MRF, in a driver representation,
and it admits a neighborhood representation. But the presence of transient
states
(boundary conditions) in any of these three situations
destroys any possibility of such equivalence or of a neighborhood
representation; it is like nonstationarity in statistics.
One
might imagine that the present state of our universe is something like a
“transient state” in a time-forwards march towards greater entropy.
But this is not so obvious either from the viewpoint of physics [12] or
mathematics. In actuality, the addition of space
changes the analysis quite radically; the 0+1-D analysis is an important
prerequisite, but should not be extrapolated too far. A process which is local in space in forwards time will not, in general, be local in space when
represented equivalently in backwards time, or vice-versa. (Simple linear
examples illustrate this, and illustrate that local mixed MRFs do not reduce to
local Markhov processes.) The next section will argue that the concepts above
are not yet powerful enough to capture our intuitive notion of an “arrow
of time,” in examples like the “earth-in-a-box” system.
Nevertheless,
0+1-D analysis can be an excellent starting point for developing the
mathematical machinery we will need in the full space-time case.
For
example, a key question in complexity theory is follows: what kinds of patterns
– like whirlpools, organisms or solitons – will evolve and persist
in any given dynamical system, in stable equilibrium? If we know the
equilibrium probability distribution, P*({j(x,t),
for all x}), we can
theoretically “read out” whether there is a high probability of
states which contain organized patterns.
To
find the equilibrium probability distribution, we can begin by imagining an
initial probability distribution of states, p, and deriving the dynamic
equation for how p changes over time – and solve for an equilibrium. The
notion of “stability” needs to be modified substantially when we
consider systems with causality running forwards and backwards in time [8], but
it remains important to ask about the dynamics of p(state).
For
systems ¶tj=f(j), the dynamic equation for p(j) is simply
the well known dispersion or diffusion equation of physics:
0 = ¶tp + “div pv” = ¶tp + S ¶i (pfi(j)) (4)
i
where ¶i is an abbreviation for the partial derivative with respect
to ji. For systems
of the form ¶tj=f(j)+e,
where <eeT>=Q (schematically) and Q is necessarily a symmetric
matrix, we can assume without loss of generality that Q is diagonal, because we
can always rotate the definition of j. In that
case, p(j) follows
the dynamics given in equation 4 with an additional term added:
S ci Di p (5)
i
where Di is an abbreviation for the second derivative with respect
to ji and where the
constants ci depend on, and have the same sign as, Qii.
For an ordinary SDE, Q is positive semidefinite; for a backwards-time SDE, it
is negative semidefinite; etc. This probability equation has similarities to
the classical heat equation, running forward or backward in time; however, it
is not obvious apriori that certain signs of c
really imply an “arrow of time”
in the ordinary sense, because the function f and correlations in space are also important to the
qualitative behavior of the overall system. Formally, it may be necessary in
the future to map the concept of an “arrow of time” into very
distinct sets of mathematical concepts, all of some interest, but none
all-encompassing. It is straightforward but tedious (exceeding page limits) to
extend this
to
systems involving “g(j)e”, or to PDE systems,
where j is replaced by a vector in
function space, in effect; likewise, defining the entropy S = log p, an
equivalent dynamical equation for S results; likewise, if p or S are
represented as Taylor/Volterra series, the Taylor series coefficients look like
wave functions of Fock space, and equations 4+5 lead directly to a kind of
Schrodinger equation. In [13, section IV], I derived similar equations for the
statistical moments of a field, starting from the ODE case, building up to the
PDE case, and a “reification” procedure for closing them; the
duality between those equations and these equations is probably of significant
computational use.
Because
the MRF literature is so huge, and I have not even tried to survey it all, I
apologize in advance to anyone whose work I may have reinvented in whole or in
part.
In
section 3.3 of [3], I argue that the interface between microscopic symmetry and
macroscopic time-forwards causality ends up being very simple, for experiments
based on what I call “the standard paradigm” of physics. These are
experiments where all output channels eventually face some kind of particle
counter or equivalent, for which the particle mass-energy E>>kT. Thus it
is clear why any form of real backwards time communication is impossible, for
any setup (like
may lead to an empirical basis for testing different theories
about the macro/micro interface, and perhaps even identifying differences
between a “natural” version of functional integral QFT and the
second quantization.
Entropy, Order
and Correlation Across Space
Once
again, a key question for complexity theory is: when does the equilibrium
probability distribution p*
(or equivalently S* = log p*) imply a high probability or
persistence of interesting spatial patterns, like life?
Because
we have a lack of machinery to answer such questions, we are essentially
reduced to total brute speculation when asked questions like: What kinds of
planetary environments would permit the evolution of some form of life? What
forms of life are actually possible in our universe? Could alternative
conditions on earth lead to evolutionary breakthroughs, such as new forms of
archaea which seriously threaten other forms of life? It is not clear that such
questions can ever be translated into
closed form mathematics, but it would desirable to develop the best
mathematical machinery we can to approach such questions.
At
the WCNN94 conference, a speaker from the Santa Fe Institute presented a number
of interesting artificial life simulations. One speaker from the audience
essentially accused her of lying, arguing that “thermodynamics proves
that such patterns cannot possibly persist in long-term equilibrium...”
This
illustrates the widespread fundamental misunderstanding of thermodynamics. In
essence, ordinary thermodynamics is a strategy for coping with the
intractability of equations 4+5 in the general case, which allow p* and s* to
be any (nonlocal) function of the
state at time t. As a simplifying assumption, we may restrict S to be a sum,
over x, of a local entropy, s(j(x)),
which does not even depend on the spatial derivatives of j. This
simplification makes the math easier, but it eliminates just by apriori assumption any possibility of correlations across
space (like life). This simplification is simply not suitable for answering the
kinds of questions posed here.
But
what else can we do, computationally,
when we face an equation like 4+5 which is incredibly
complex in the PDE or MRF case? Actually,
equation 4+5 is analogous to the Bellman equation, discussed in the ACD session
of this conference, which is also intractable. Many of the same approximation
approaches which have led to a breakthrough in optimal control might also be
applied here. In fact, there is a certain kind of deep equivalence between
building mathematical machinery for us to use in approximating and
understanding S* or p*, and building machinery which tries to understand and
adapt to the universe it lives in.
One
example of an abstract, hypothetical universe which can possibly support life
indefinitely is the “earth-in-a-box model,” constructed as follows.
Imagine the planet earth, surrounded by a perfect spherical boundary a million
miles away, where the boundary absorbs incoming radiation and sends out
“solar light” forever, as a basic “law of the universe”
(boundary condition). Organisms would live
in
the “ergodic core” states of this dynamical system; by the past
section, this means that a backwards-time representation of the dynamics is
possible, in principle, but in practice we know that there would be a
time-forwards arrow of time dominating the everyday lives of these organisms. See [12] for further examples.
Prigogine
has suggested that such systems might be analyzed as “open
systems,” and that the usual notion of entropy can be replaced by another
local function, “entropy production.” But because this is
still a local function, it does not have the
power required here. Prigogine, in the paper next to [12], now argues that
interesting spatial patterns will evolve in any
realistic PDE system, over time, based on a very different mathematical
approach; however, this may be taking the pendulum too far in the opposite
direction.
In
most engineering applications, people do not really exploit the idea that the
Gibbsian local entropy function can be added up to yield S*. Rather, they use
the idea that the sum of Gibbsian entropy always increases. It is used as a
kind of local Lyapunov function. How can we know when a dynamical system or
universe admits such a local Lyapunov function? In fact, the methods discussed
in the ACD session of this conference include methods for locating Lyapunov
functions from any family of functions, when such Lyapunov functions exist. See [7] for some other work on local Lyapunov functions.
Finally,
let us come back to the old question: what is the interface between microscopic time symmetry and macroscopic
forwards-time causality? The old adage that “time is like a river” may
be a surprisingly powerful metaphor here. A river running from north to south
imposes north-south asymmetry,
even though the physics we see standing by the river does not.
The asymmetry is due to global boundary conditions, eventually – rain on mountains in the north, and ocean to the
south – but this does not tell us when and why a perturbation like a rock
sitting in the river causes effects mostly downstream, not upstream. Formally,
this suggests that the local causal flow is a property of the first variation
of the water flow equations, linear nonautonomous equations where the impacts
of a perturbation eventually go to zero both upstream and downstream, such that
one can definitively compare the size of upstream and downstream effects. But
one does not expect upstream effects to always
be exactly zero – and perhaps,
when we understand our universe better at the quantum level, we might find the
same here as well,
even in macroscopic experiments.
References
1. K.Pribram, Brain and Perception: Holonomy and Structure in Figural Processing,
Erlbaum 1991
2. K. Pribram ed., Rethinking Neural Networks: Quantum Fields
and Biological Evidence, Erlbaum, 1993
3. P.Werbos, What Do
Neural Nets and Quantum Theory Tell Us About Mind and Reality? In K. Yasue, T. Della Senta & M. Jibu eds., Towards a Science of Consciousness
(Tokyo99 Proceedings), Benjamin
Books, 2000. (Approximate citation, forthcoming.)
4. N.El-Karoui & L.Mazliak, eds, Backward stochastic differential equations, Addison-Wesley, 1997
5. G.E.P.Box & G.M.Jenkins, Time-Series Analysis: Forecasting and
Control, Holden-Day, 1970
6. P.Werbos,
7. P.Werbos, New Approaches to
Soliton Quantization and Existence for Particle Physics,
xxx.lanl.gov/abs/patt-sol/9804003,
section 6
8. P.Werbos, Can
‘soliton’ attractors exist in realistic 3+1-D conservative
systems?, Chaos, Solitons and Fractals,
Vol. 10, No. 11, July 1999, section 6
9. Penrose Shadows of the Mind
R.Penrose, Shadows of the Mind,
10. J.Zinn-Justin, Quantum Field Theory and Critical Phenomena,
3rd Ed.,
11. M.I.Jordan, Learning in Graphical Models, Kluwer Academic, 1998
12. P.Werbos, Self-organization: Re-examining the basics and an alternative to the Big Bang, in K.Pribram, ed, Origins: Brain and Self-Organization, Erlbaum, 1994
13. P.Werbos, Chaotic solitons and
the foundations of physics: a potential revolution, Applied Mathematics and Computation, Vol.56,p.289-340,
July 1993