Hi, Robert

Extending Chaos and Complexity Theory to Address Life, Brain and Quantum Foundations

Paul J. Werbos

National Science Foundation, Room 675

Arlington, VA, USA 22230

pwerbos@nsf.gov

Introduction

Years ago, many researchers proposed that chaos theory could provide a kind of universal theory of the qualitative behavior of all dynamical systems. Thus it could provide a solid mathematical foundation for unifying efforts to address the three really fundamental questions of basic science: (1) what is life?;

(2) what is mind or intelligence?; (3) what are the underlying physical laws of the universe?

Chaos theory is essentially a recent extension of a larger field of mathematics which it is part of: nonlinear system dynamics (NSD). It mainly addresses systems governed by Ordinary Differential Equations (ODE) or their discrete-time cousins. Because the vast bulk of models now used in computational neuroscience are ODE models, it is only natural that neuroscience has developed great interest in the use of NSD as a tool for analyzing and designing neural network models.

Walter Freeman, in particular, has played a seminal role in describing how chaos theory can be used in computational neuroscience, in his classical models of the olfactory system, described in articles

in Scientific American and many scientific journals and IJCNN proceedings. His colleague Morris Hirsch in mathematics has helped provide a solid foundation for this work. This work has helped to inspire many efforts throughout the world to use chaos as a design principle in computer systems and in artificial neural network (ANN) design.

Recently, however, several issues or questions have begun to emerge, some at Berkeley and some in other places.

First, the pure ODE models of the olfactory system did not display the level of long-term persistence of chaotic behavior required to match Freeman's empirical data on the olfactory system. Freeman and Kozma, working together at Berkeley, found a natural way to fix the model, to better fit biological reality -- but the natural fix involved an injection of small but decisive stochastic disturbance. In other words, the most natural dynamical model here is not an ODE model at all, but a stochastic model. What does this imply for the use of chaos theory as it now exists?

Second, chaos theory as such focuses on the existence of low-dimensional attractors. But brains -- like stock markets or economies or turbulent flows or the bodies of organisms -- are inherently high-

dimensional systems. In other words, the space of states actually visited is a very high-dimensional space, even though it is a tiny fraction of the larger space of states which are theoretically available to the systems. For this reason, many researchers, at the Santa Fe Institute and elsewhere, have called for the development of a "complexity theory" which would address that larger class of dynamical system. The efforts to produce such a "complexity theory" have produced many interesting results, but the public enthusiasm for the field has far outpaced the development of new unifying mathematics. It is reasonably obvious why the brain is a complex system, not reducible to truly low-dimensional attractors over the long-term: the information content which the brain can handle is linked to the richness or dimensionality of the states which it visits.

Third, the brain is not just one of many possible dynamical systems. It is a particular type of dynamical system, a system evolved to achieve a high level of functionality in certain very broad and challenging information-processing tasks. Thus, in addition to the general mathematics of nonlinear dynamical systems (which does not provide what Grossberg would call a "strong filter"), it is crucial to also use the more specific mathematics of functional decision-making systems, and described, for example, in the IJCNN2000 session on Adaptive Critic Designs (ACD). Furthermore, in engineering applications, ACDs can provide a design methodology to actually locate "chaos controllers" which improve performance over classical fixed controllers, for those environments in which chaos controllers have a potential performance advantage. ACD designs are normally used to tune the parameters of controller networks but the identical mathematics can be used to adapt actual physical design parameters, in simulation -- so that advantages of high-gain high-performance chaotic physical design can also be captured in such a design approach.

Fourth, it is unclear whether ODE models are powerful enough to capture the essential higher-order capabilities of the brain. Pribram has argued, for example, that PDE or field effect models may be essential in some cases [1,2]. Other have argued that some sort of quantum computing may be useful -- at least in developing new higher-level intelligent systems. (The presence or absence of quantum computing effects in the human brain is highly speculative and controversial at present [3].) Field effects are certainly

important, in any case, to the evolution of biological organisms and to the foundations of physics.

In the past few months, Robert Kozma and myself have held extensive discussions by email about the implications of these questions. It is our view that the existing mathematics available from chaos theory and complexity research does not really live up to the long-term potential, in providing a better foundation

for a more rigorous understanding of life, the brain and quantum foundations. We also have some ideas about what could be done to extend this mathematics, to come closer to living up to the potential. This paper will give my personal, tentative views of what is needed here, and of how the pieces fit together.

These views will certainly not represent any official views of my employers; in fact, because they are tentative, they may not even represent my own views by the time this goes into print. However, I hope that the questions raised here will be useful in stimulating new thoughts and better concepts by other players in the research community.

An Immediate Suggestion: Develop “Stochastic Chaos Theory”

The work of Freeman and Kozma leads to a straightforward conclusion: we need to develop a “stochastic chaos theory,” a body of mathematics which does for their kind of model the same kind of service which ordinary chaos does for ODE models.

The need for such a development seems extremely obvious. In my view, the real puzzle is why this body of mathematics does not exist, or why – if it exists in some form – it has not been collected together and unified and disseminated in the same way as ordinary chaos theory has been disseminated.

Years ago, ordinary NSD and chaos theory went through amazing tribulation and resistance to their development. In the early years, some mathematicians talked about the need for a “qualitative theory of ODE.” They emphasized the need to develop a new field of mathematics, complementary to the more traditional mathematics of ODE, to address questions about qualitative behavior which the earlier theory (however valuable) simply did not address. After decades of effort, NSD now provides a large and growing literature on the qualitative behavior of systems of the form ¶_tx=f(x,W) or x(t+1)=f(x(t),W), where x is a vector, where W is a set of weights or parameters, and where “¶_t” is physics notation for differentiation with respect to time.

Freeman’s new model is not so far away from his earlier ODE models. It may be written

(schematically) as ¶_tx=f(x,W)+e, where e is a fairly small stochastic disturbance vector. In effect, Freeman has been pleading with mathematicians to provide an extension of chaos theory, to help him rigorously analyze the properties of models in this class.

The main obstacle to the development of this theory seems to be a matter of historical circumstances. One group of mathematicians is committed to a probabilistic view of the world, and they prove theorems which address questions analogous to those which were addressed by the mathematics of ODE, before chaos theory was developed. Another group has spent their lives working with a deterministic version of chaos, and is often sympathetic to the view that “noise” should always be represented as the outcome of some underlying deterministic process. The challenge, in part, is how to settle new territory

in the no-man’s-land between these groups, a land which no one has claimed as yet.

These extreme alternatives do not fully address the needs of biology or of several other fields. Very often, a model of the form ¶_tx=f(x,W)+e will be far more parsimonious (and easier to test) than a model which requires a detailed, explicit deterministic account of every source of microscopic or quantum noise which results in e. Likewise, from the viewpoint of pure mathematics, it is a reasonable and well-posed question to ask what the qualitative behavior of such a system would be.

The first suggestion, then, is to try to reproduce the achievements of chaos theory for this more general class of systems, particularly for the case where e is small.

In formal terms, mathematicians in this area would not actually write “¶_tx=f(x,W).” They would write an expression which is essentially the same, but somewhat more rigorous. Systems of this sort are called Stochastic Differential Equations (SDE)[4]. The need here is for a more unified and comprehensive qualitative theory of SDE, analogous to the modern qualitative theory of ODE, capable at a minimum of assisting (and simplifying) the analysis of models like the model of Freeman and Kozma.

The term “stochastic chaos theory” is the best term we can think of, in discussions between Kozma, Freeman and myself.

Beyond this core body of mathematics, however, there may be some need for other related types

of mathematical tools here.

Qualitative Theory of Mixed Forwards-Backwards SDE or 0+1-D MRFs

The qualitative theory of ordinary SDE would already address Freeman’s current concerns, and might well provide a fully adequate foundation for computational neuroscience (coupled with the ACD mathematics

discussed previously). However, SDE themselves are a special case of a larger class of systems.

SDE assume that the random disturbance “e(t)” is not correlated with x(t£t). This is called the “causality assumption” in statistics. (More precisely, it is assumed that e(t) is statistically independent of

prior values of x and e.) Conventional time-series models, formulated as x(t+1)=f(x(t),e(t)), typically make the same assumption [5]. There are very, very complex books on martingale theory and such which consistently maintain the same assumption. Note that e(t) typically is correlated with x(t>t), because random disturbances typically do change the state of the system at later times.

More recently, mathematicians have become interested in the formal properties of mixed forwards-backwards SDE [4]. In such models, random disturbances can cause effects both at later times and at earlier times. They claim [4] that such models are useful both in optimal control and in economics; for example, they claim that they may lead to capabilities very similar to those of Dual Heuristic Programming, discussed in the ACD session of IJCNN2000, related to the Pontryagin equation.

In my view, mixed forwards-backwards systems will be crucial to a more rigorous and concise reformulation of quantum field theory. This issue is extremely complex, and beyond the scope of this paper [3,6-9]. There are certain critical experiments in physics – called “Bell’s Theorem” experiments, based on a theorem of Clauser et al – which in my view demonstrate that causality actually runs forwards and backwards through time, symmetrically, at the microscopic level [6,3]. The earliest papers on this were published by DeBeauregard and myself, later cited by Penrose [9] among others, although Von Neumann’s classic tract on quantum mechanics clearly evinced a similar intuition. Most researchers today were brought up on the ancient “first quantization” (Schrodinger’s equation over 3+1-D space-time) or on

the “second quantization” of the 1950’s (wave functions propagating over Fock space); however, modern high-energy physics is based on a more general formulation, the “functional integral formalism” [10,7,3]

which brings out more clearly the underlying symmetries with respect to time. In the “quantum computing world,” Yanhua Shih and others have recently performed experiments which strongly support this intuition.

If the physical universe we live in might actually be a mixed forwards-backwards system of some kind, then we need to understand the mathematics of such systems better than we now do. This is true, regardless of which formalism or theory survives future tests; indeed, deeper mathematical insight will be crucial to understanding the theories well enough to test them!

Forwards-backwards SDE involve continuous time. To develop the mathematics, we may also consider the related issue of discrete time systems. The relevant time-forwards discrete systems can generally be written as x(t+1)=f(x(t),...,x(t-k),e), with the causality assumption applied to e. To define the forwards-backwards generalization of such systems, consider systems defined as Markhov Random Fields (MRF) over the set of time points t = -¥,...., -2, -1,0,1,2,...,¥. I would call this the “0+1-D” special case of

a more general space-time MRF system[7]; the “0+1” refers to zero space dimensions and one time dimension.

The literature on space-like (n+0-D) MRFs is huge and diverse. It ranges from old discussions of lattice dynamics and spin-glass neural networks in physics, through to image processing technology, through to recent work by Jordan[11] and others applying MRF mathematics to irregular lattices representing systems of propositions or “belief networks.”

There are two main ways to specify a particular MRF model, defined over a regular lattice of points t in space-time. The literature uses the terms “cliques” and “elites,” but I regret that I tend to confuse these terms because of their lack of intuition; instead, I use “neighborhood” and “driver set.”

For a regular MRF model over lattice points t in space time, we may write:

Pr(j(t) | { j(t), t Î N(t)}) = Pr(j(t) | { j(t), t ¹ t} ) (1)

and we implicitly assume:

Pr(j(t) | { j(t), t Î N(t)}) = Pr(j(t+d) | { j(t), t Î N(t+d)}) for all d (2)

Equation 1 says that there is a set of points, N(t), the Neighborhood of t, such that the probability of j having any particular value at t only depends on the values of j at other points in the neighborhood of t; in other words, there is no independent or direct correlation between the value of j(t) and the value of j at any other point, outside the neighborhood. Equation 2 basically says that the dynamics of the system are the same at al points. It is possible to specify an MRF model simply by specifying the probability distribution on the left-hand side of equation 1; I would call this a neighborhood representation.

Alternatively, one may define an MRF model by what I would call a driver representation. In a driver representation, we specify a function P, which is a different sort of probability distribution P(j(t)|{j(t),tÎD(t)}), where D(t) is a “driver set” of points near t. When a driver and a neighborhood representation are both possible, the neighborhood is usually twice as wide as the driver set, roughly. Thus equation 1 does not hold for the driver set! Instead, the driver set has a property which may be represented crudely as:

Pr({j(t), t Î R} | {j(t), t Î dR}) = (1/Z) P P(j(t) | {j(t), t Î D(t)}) (3)

t Î R

where R is essentially any region of space-time, where dR is a “border region” (set of points immediately surrounding R), and Z is a scaling factor used to make sure that the probabilities add up to one. The ordinary time-forwards dynamic system described above is a special case of equation 3, in effects, where the driver set D(t) is the set of time-points t-1,...,t-k. (In the time-series example, notice that x(t) will be correlated with x(t+1), normally; thus t+1 will be in N(t) but not in D(t).)

For biology and physics, a key research goal is to better understand the “arrow of time” – and then to understand the interface between microscopic forwards-backwards symmetry and macroscopic time-forwards causality. This suggests one warm-up question at the 0+1-D level: when can a given dynamical process be represented equivalently as a time-forwards Markhov process, as a time-backwards Markhov process, and as mixed forwards-backwards process?

My preliminary results cannot be proven in 6 pages – and in any case, a real pure mathematician could do a much better job. My rough impressions are as follows. In the 0+1-D case, when j is taken from a finite set of possible values, a time-forwards Markhov process splits the set of possible values into two

subsets – the set of transient states (where probability always goes to zero, regardless of initial state) and of ergodic states. The ergodic core of this process – the process restricted to the ergodic states – can be represented equivalently as a time-forwards, time-backwards and mixed forwards-backwards MRF, in a driver representation, and it admits a neighborhood representation. But the presence of transient states

(boundary conditions) in any of these three situations destroys any possibility of such equivalence or of a neighborhood representation; it is like nonstationarity in statistics.

One might imagine that the present state of our universe is something like a “transient state” in a time-forwards march towards greater entropy. But this is not so obvious either from the viewpoint of physics [12] or mathematics. In actuality, the addition of space changes the analysis quite radically; the 0+1-D analysis is an important prerequisite, but should not be extrapolated too far. A process which is local in space in forwards time will not, in general, be local in space when represented equivalently in backwards time, or vice-versa. (Simple linear examples illustrate this, and illustrate that local mixed MRFs do not reduce to local Markhov processes.) The next section will argue that the concepts above are not yet powerful enough to capture our intuitive notion of an “arrow of time,” in examples like the “earth-in-a-box” system.

Nevertheless, 0+1-D analysis can be an excellent starting point for developing the mathematical machinery we will need in the full space-time case.

For example, a key question in complexity theory is follows: what kinds of patterns – like whirlpools, organisms or solitons – will evolve and persist in any given dynamical system, in stable equilibrium? If we know the equilibrium probability distribution, P*({j(x,t), for all x}), we can theoretically “read out” whether there is a high probability of states which contain organized patterns.

To find the equilibrium probability distribution, we can begin by imagining an initial probability distribution of states, p, and deriving the dynamic equation for how p changes over time – and solve for an equilibrium. The notion of “stability” needs to be modified substantially when we consider systems with causality running forwards and backwards in time [8], but it remains important to ask about the dynamics of p(state).

For systems ¶_tj=f(j), the dynamic equation for p(j) is simply the well known dispersion or diffusion equation of physics:

0 = ¶_tp + “div pv” = ¶_tp + S ¶_i (pf_i(j)) (4)

where ¶_i is an abbreviation for the partial derivative with respect to j_i. For systems of the form ¶_tj=f(j)+e,

where <ee^T>=Q (schematically) and Q is necessarily a symmetric matrix, we can assume without loss of generality that Q is diagonal, because we can always rotate the definition of j. In that case, p(j) follows

the dynamics given in equation 4 with an additional term added:

S c_i D_i p (5)

where D_i is an abbreviation for the second derivative with respect to j_i and where the constants c_i depend on, and have the same sign as, Q_ii. For an ordinary SDE, Q is positive semidefinite; for a backwards-time SDE, it is negative semidefinite; etc. This probability equation has similarities to the classical heat equation, running forward or backward in time; however, it is not obvious apriori that certain signs of c

really imply an “arrow of time” in the ordinary sense, because the function f and correlations in space are also important to the qualitative behavior of the overall system. Formally, it may be necessary in the future to map the concept of an “arrow of time” into very distinct sets of mathematical concepts, all of some interest, but none all-encompassing. It is straightforward but tedious (exceeding page limits) to extend this

to systems involving “g(j)e”, or to PDE systems, where j is replaced by a vector in function space, in effect; likewise, defining the entropy S = log p, an equivalent dynamical equation for S results; likewise, if p or S are represented as Taylor/Volterra series, the Taylor series coefficients look like wave functions of Fock space, and equations 4+5 lead directly to a kind of Schrodinger equation. In [13, section IV], I derived similar equations for the statistical moments of a field, starting from the ODE case, building up to the PDE case, and a “reification” procedure for closing them; the duality between those equations and these equations is probably of significant computational use.

Because the MRF literature is so huge, and I have not even tried to survey it all, I apologize in advance to anyone whose work I may have reinvented in whole or in part.

In section 3.3 of [3], I argue that the interface between microscopic symmetry and macroscopic time-forwards causality ends up being very simple, for experiments based on what I call “the standard paradigm” of physics. These are experiments where all output channels eventually face some kind of particle counter or equivalent, for which the particle mass-energy E>>kT. Thus it is clear why any form of real backwards time communication is impossible, for any setup (like Bell’s Theorem experiments, for example) which fall within that paradigm. But deeper analysis of experiments outside of that paradigm

may lead to an empirical basis for testing different theories about the macro/micro interface, and perhaps even identifying differences between a “natural” version of functional integral QFT and the second quantization.

Entropy, Order and Correlation Across Space

Once again, a key question for complexity theory is: when does the equilibrium probability distribution p*

(or equivalently S* = log p*) imply a high probability or persistence of interesting spatial patterns, like life?

Because we have a lack of machinery to answer such questions, we are essentially reduced to total brute speculation when asked questions like: What kinds of planetary environments would permit the evolution of some form of life? What forms of life are actually possible in our universe? Could alternative conditions on earth lead to evolutionary breakthroughs, such as new forms of archaea which seriously threaten other forms of life? It is not clear that such questions can ever be translated into closed form mathematics, but it would desirable to develop the best mathematical machinery we can to approach such questions.

At the WCNN94 conference, a speaker from the Santa Fe Institute presented a number of interesting artificial life simulations. One speaker from the audience essentially accused her of lying, arguing that “thermodynamics proves that such patterns cannot possibly persist in long-term equilibrium...”

This illustrates the widespread fundamental misunderstanding of thermodynamics. In essence, ordinary thermodynamics is a strategy for coping with the intractability of equations 4+5 in the general case, which allow p* and s* to be any (nonlocal) function of the state at time t. As a simplifying assumption, we may restrict S to be a sum, over x, of a local entropy, s(j(x)), which does not even depend on the spatial derivatives of j. This simplification makes the math easier, but it eliminates just by apriori assumption any possibility of correlations across space (like life). This simplification is simply not suitable for answering the kinds of questions posed here.

But what else can we do, computationally, when we face an equation like 4+5 which is incredibly

complex in the PDE or MRF case? Actually, equation 4+5 is analogous to the Bellman equation, discussed in the ACD session of this conference, which is also intractable. Many of the same approximation approaches which have led to a breakthrough in optimal control might also be applied here. In fact, there is a certain kind of deep equivalence between building mathematical machinery for us to use in approximating and understanding S* or p*, and building machinery which tries to understand and adapt to the universe it lives in.

One example of an abstract, hypothetical universe which can possibly support life indefinitely is the “earth-in-a-box model,” constructed as follows. Imagine the planet earth, surrounded by a perfect spherical boundary a million miles away, where the boundary absorbs incoming radiation and sends out “solar light” forever, as a basic “law of the universe” (boundary condition). Organisms would live

in the “ergodic core” states of this dynamical system; by the past section, this means that a backwards-time representation of the dynamics is possible, in principle, but in practice we know that there would be a time-forwards arrow of time dominating the everyday lives of these organisms. See [12] for further examples.

Prigogine has suggested that such systems might be analyzed as “open systems,” and that the usual notion of entropy can be replaced by another local function, “entropy production.” But because this is

still a local function, it does not have the power required here. Prigogine, in the paper next to [12], now argues that interesting spatial patterns will evolve in any realistic PDE system, over time, based on a very different mathematical approach; however, this may be taking the pendulum too far in the opposite direction.

In most engineering applications, people do not really exploit the idea that the Gibbsian local entropy function can be added up to yield S*. Rather, they use the idea that the sum of Gibbsian entropy always increases. It is used as a kind of local Lyapunov function. How can we know when a dynamical system or universe admits such a local Lyapunov function? In fact, the methods discussed in the ACD session of this conference include methods for locating Lyapunov functions from any family of functions, when such Lyapunov functions exist. See [7] for some other work on local Lyapunov functions.

Finally, let us come back to the old question: what is the interface between microscopic time symmetry and macroscopic forwards-time causality? The old adage that “time is like a river” may be a surprisingly powerful metaphor here. A river running from north to south imposes north-south asymmetry,

even though the physics we see standing by the river does not. The asymmetry is due to global boundary conditions, eventually – rain on mountains in the north, and ocean to the south – but this does not tell us when and why a perturbation like a rock sitting in the river causes effects mostly downstream, not upstream. Formally, this suggests that the local causal flow is a property of the first variation of the water flow equations, linear nonautonomous equations where the impacts of a perturbation eventually go to zero both upstream and downstream, such that one can definitively compare the size of upstream and downstream effects. But one does not expect upstream effects to always be exactly zero – and perhaps, when we understand our universe better at the quantum level, we might find the same here as well,

even in macroscopic experiments.

References

1. K.Pribram, Brain and Perception: Holonomy and Structure in Figural Processing, Erlbaum 1991

2. K. Pribram ed., Rethinking Neural Networks: Quantum Fields and Biological Evidence, Erlbaum, 1993

3. P.Werbos, What Do Neural Nets and Quantum Theory Tell Us About Mind and Reality? In K. Yasue, T. Della Senta & M. Jibu eds., Towards a Science of Consciousness (Tokyo99 Proceedings), Benjamin Books, 2000. (Approximate citation, forthcoming.)

4. N.El-Karoui & L.Mazliak, eds, Backward stochastic differential equations, Addison-Wesley, 1997

5. G.E.P.Box & G.M.Jenkins, Time-Series Analysis: Forecasting and Control, Holden-Day, 1970

6. P.Werbos, Bell’s theorem: the forgotten loophole and how to exploit it, in M.Kafatos, ed., Bell’s Theorem, Quantum Theory and Conceptions of the Universe. Kluwer, 1989

7. P.Werbos, New Approaches to Soliton Quantization and Existence for Particle Physics,

xxx.lanl.gov/abs/patt-sol/9804003, section 6

8. P.Werbos, Can ‘soliton’ attractors exist in realistic 3+1-D conservative systems?, Chaos, Solitons and Fractals, Vol. 10, No. 11, July 1999, section 6

9. Penrose Shadows of the Mind R.Penrose, Shadows of the Mind, London: Oxford Press, 1994

10. J.Zinn-Justin, Quantum Field Theory and Critical Phenomena, 3rd Ed., Oxford U. Press, 1996

11. M.I.Jordan, Learning in Graphical Models, Kluwer Academic, 1998

12. P.Werbos, Self-organization: Re-examining the basics and an alternative to the Big Bang, in K.Pribram, ed, Origins: Brain and Self-Organization, Erlbaum, 1994

13. P.Werbos, Chaotic solitons and the foundations of physics: a potential revolution, Applied Mathematics and Computation, Vol.56,p.289-340, July 1993