What is Mind?
What is Consciousness?
How Can We Build and Understand
Intelligent Systems?
New in 2009: Intelligence in the Brain: A theory of how it works and how to build it, Neural Networks, in press
(electronic
version available online March 29, 2009).
Understanding the Mind: The Subjective
Approach Versus the Objective Approach
At a conference in
New in 2008: People have
been studying the brain and the mind for centuries, of course. Why is it that
human society has yet to come up with a basic functional understanding of how
brains work, that would also fit with and explain or subjective experience? The
papers posted here actually do provide a new understanding – but if you
are an expert on brain research, you would want to know what is the new
approach that makes this possible. In April, 2008, we had a meeting for all the
people in the
New in 2007: Click here for an updated technical review on intelligence in the brain, new research opportunities, and the connection between the scientific and the subjective points of view. Also new: Here are the slides and here you can find the video transcript of a talk on “mathematics and the brain” for high school mathematics students. The video transcript may be easier to follow if you bring up the slides on your computer at the same time.
A Scientific/Engineering View of
Consciousness and How to Build It
At the first big international conference on consciousness,
held at the United Nations University in
Here is one of the key ideas in that paper: intelligence or “consciousness,” as we see it in nature, is not an “either-or” sort of thing. Nor is it just a matter of degree, like temperature or IQ. Rather, we generally see a kind of staircase, of ever-higher levels of intelligence. (To be precise – once we actually start building these systems, we see something more like a kind of ordered lattice, but let me start from the simpler version of the story.) Each level represents a kind of fundamental qualitative advance over the one below it, just as color television is a fundamental improvement over black and white television.
The central challenge to solid mathematical, scientific understanding of intelligence today is to understand and replicate the highest level of intelligence that we can find even in the brain of the smallest mouse. This is clearly far beyond the qualitative level of capabilities or functionality that we find in even the most advanced engineering systems today. It is even further beyond the simple “Hebbian” models or the “Q-learning” families of models in vogue today in neuroscience, models which typically could not handle even a typical 5-variable real world nonlinear control problem. The brain is living proof that it is possible to develop systems to learn, predict and act upon their environment in a way which is far more powerful and far more universal than one would expect, based on the conventional wisdoms in statistics and control theory. (In every discipline I have ever studied, it is important to learn the difference between the powerful ideas, the real knowledge and the common popular conventional wisdoms.)
Another key idea in that paper: the brain as a whole system is an “intelligent controller,” a system which learns to output external actions as a function of sensor inputs and memory and internal actions, so as to achieve the “best” possible results. What is “best” is defined by a kind of performance measure or utility function, which is defined by the user when we build these systems, and defined by a kind of inborn “primary motivation system” when we look at organisms in nature. (In fact, however, the performance measure inserted into an artificial brain looks “inborn” from the viewpoint of that brain.) In actuality, I have been trying to figure out how to build these kinds of systems since I was 14 or 15, based in part on the inspiration of John Von Neumann’s classic book, The Theory of Games and Economic Behavior, the first book to formulate a clear notion of rationality and of cardinal utility functions. The claim is not that mammals are “born rational,” but that they are born with a learning mechanism evolved to strive ever closer to the outcomes desired by the organism, as much as possible.
Click here for a simplified explanation of how we learn in big jumps and small jumps; this little piece is subtitled “creativity, backpropagation and evolutionary computing made simple.” It also addresses both everyday human experience and the kind of intelligence we see n economic and political systems as well.
In this view, capabilities like expectation, memory, emotional value assessment and derivative calculation are all essential subsystems of the overall intelligent control architecture. Evolution also tries to provide the best possible starting point for the learning process, but this does not imply that the organism could not have learned it all on its own, with enough time and enough motivation.
Some people believe that effects from quantum mechanics are essential to understanding even the lowest levels of consciousness or intelligence. Others believe that “quantum computing” effects cannot possibly be useful at all, in any kind of intelligence. In my view, the truth is somewhere between these two extremes. Even the human brain probably does not use “quantum computing” effects, but we could build computers at a higher level of consciousness which do. If you are interested in that aspect, see comments on quantum mind.
1. Intelligence Up to the “Mouse
Level”
New in September 2007: slides on how to get more accurate prediction (slides only, no text, 300K), a keynote talk for the 2007 international competition in forecasting, which plans to post the audio recording of the talk. (Accurate) “cognitive prediction” is one the two most important streams of research leading us towards brain like intelligence, as in the recent NSF funding announcement on COPN.
The bulk of my own work in the neural network field is aimed at replicating this “basic mammal level” of intelligence. Thus I will first list and describe some papers aimed at that level:
1. A general tutorial on neural networks presented at the IEEE Industrial Electronics Conference (IECON) 2005, slightly updated from the version I gave at the International Joint Conference on Neural Networks (IJCNN) 2005. (4 megabyte file).
(Here is a shortened 1 meg, 60
slide pdf talk to be given in
2. A general
current overview of Adaptive or Approximate Dynamic Programming (ADP), the lead
chapter of the Handbook of Learning and Approximate Dynamic Programming, IEEE
Press, 2004. (Si, Barto, Powell and Wunsch eds.) For more complete information
on ADP, and for important background papers leading up to that book, see www.eas.asu.edu/~nsfadp. The idea of
building “reinforcement learning systems” by use of ADP was first
proposed in my
paper “The elements of
intelligence,” Cybernetica (
3. A review
of how to calculate and use ordered derivatives when building complex
intelligent systems. I first proved
the chain rule for ordered derivatives back in 1974, as part of my Harvard PhD
thesis. The concept propagated from there in many directions, sometimes called
“backpropagation,” sometimes called the “reverse method or
second adjoint method for automatic differentiation,” and sometimes used
in “adjoint circuits.” But it is a general and powerful principle,
far more powerful than is understood by those who have only encountered
second-hand or popularized versions. The review here was published in Martin
Bucker, George Corliss, Paul Hovland, Uwe Naumann & Boyana Norris (eds), Automatic
Differentiation: Applications, Theory and Implementations, Springer (LNCS),
4. P. Werbos, Backpropagation
through time: what it does and how to do it. (1 megabyte). Proc.
IEEE, Vol. 78, No. 10, October 1990. A slightly updated version appears in
my book The Roots of Backpropagation: From Ordered Derivaives to Neural
Networks and Political Forecasting, Wiley 1994. Among the most impressive
real-world applications of neural networks to date is the work at Ford
Research, which has developed a bullet-proof package for using backpropagatoin
through time to train time-lagged recurrent networks (TLRN). In one simulation
study, they showed how TLRNs trained with their package can estimate unobserved
state variables with higher acuracy than extended Kalman filters, and can do as
well as particle filters at much less computational cost:
Lee A. Feldkamp and Danil V. Prokhorov, Recurrent neural networks for state estimation,
in. Proc. Of the Workshop on Adaptive and Learning Systems,
5. Chapters 3 , 10 and 13 of the Handbook of Intelligent Control, White and Sofge eds, Van Nostrand, 1992. Those chapters provide essential mathematical details and concepts which even now cannot be found elsewhere. (1-2-megabyte files).
Chapter 3 provides a kind of general introduction to ADP, and to how to integrate the two major kinds of subsystem it requires – a trained model of the environment (the real focus of chapter 10), and the “Critic” and “Action” components (chapter 13) which provide what some call “values,” “emotions,” “shadow prices,” “cathexis” or a “control Liapunov function”, and a “controller” or “motor system” or “policy” or “stimulus-response system.” The mathematics itself is universal, and essentially provides one way to unify all these disparate-sounding concepts. There are some advocates of reinforcement learning who like simple designs which require no model of the environment at all; however, the ability to adapt to complex environments does require some use of a trained model or “expectation system,” and half the experiments in animal learning basically elaborate on how the expectation systems in animal brains actually behave. ADP design does strive to be highly robust with respect to the model of its environment, but it cannot escape the need to have such a model.
Chapter 3 also mentions
a design idea which I call “syncretism,” which I have written about
in many obscure venues, starting from my paper “Advanced forecasting for global crisis
warning and models of intelligence,” General
Systems Yearbook, 1977 issue. (See also my list of supplemental papers.)
I deeply regret that I have not had time to work more on this idea, because it
plays a central role in overcoming certain all-pervasive dilemmas in
intelligent systems and in statistics, and it is essential to understanding
certain aspects of human experience as described by Freud. In essence –
even today, there is a division between those who try to make predictions based
on some kind of trained or estimated general model of their environment, and
those who try to make predictions based on the similarity of present cases to
past examples. Examples of the former are econometric or time-series models or
Time-Lagged Recurrent Networks(TLRN), or hybrid S/TLRN, the most general neural
form of that class of model (see chapter 10.) Examples of the latter are
heteroassociative memory systems, Kohonen’s prototype systems, and most
of the modern “kernel” methods.
But in actuality, both
extreme cases have many crucial limitations. To overcome these limitations,
when learning a simple static input-output relationship, one needs to combine
the two in a principled way. That is what syncretism is about. One maintains a
global model, which is continually updated based on learning from current
experience and on learning from memory.
(Chris Atkeson has at least implemented a part of
this, and shown how useful it is in robotics.) But one also monitors how well
the global model has explained or fitted each important memory so far. When one
encounters a situation similar to an unexplained or undigested memory,
one’s expectation is a kind of weighted combination of what the global
model and the past example would predict. In the end, we move towards a mathematical
version of Freud’s image of the interplay between the “ego”
and the “id,” between a person’s integrated global
understanding and the (undigested) memories which perturb it. However, in this
model, global understanding may be perturbed just as easily by undigested
euphoric experiences (outcomes more pleasant than expected) as by traumatic
experiences. Again, see the UNU
paper for citations to papers which discuss the neuroscience and psychology
correlates in more detail.
Chapter 10 mainly aims
at the issue of learning to understand the dynamics of one’s
environment. For example, it addresses how to reconcile traditional
probabilistic notions of how to learn time-series dynamics, versus
“robust” approaches which have recently become known of as
“empirical risk approaches” ala Vapnik. Chapter 10 shows
how a pure “robust” method of his type – first formulated and
applied in my 1974 PhD thesis – substantially outperforms the usual least
squares methods in predicting simulated and actual chemical plants. It also
explains how the “pure
robust” method is substantially different from both the
“parallel” and “series” system identification methods
used in adaptive control, even though it seems similar at first glance. I
referred to that chapter, and to the example of ridge regression, when –
in visiting Johann Suykens years ago – I suggested to him that one could
substantially improve the well-known SVM methods by accounting for these more
general principles. Suykens’ modified version of SVM is now part of the
best state of the art in data mining – but it is only a first step in
this direction. Today’s conventional wisdom in data mining has not yet
faced up to key issues regarding causality and unobserved variables which
world-class econometrics already knew about decades ago; chapter 10 cites and
builds upon that knowledge. Chapter 10 also gives details about how to apply
the chain rule for ordered derivatives in a variety of ways.
Chapter 13 mainly addresses ways to adapt Critic
and Action systems. It shows how to adapt any kind of parameterized
differentiable Critic or Action system; it uses a notation which allows one to
plug in a neural network, an adaptable fuzzy logic system, an econometric
system, a soft gain-scheduling system, or anything else of that sort. It describes a variety of designs in
some detail, including Dual Heuristic Programming (DHP), which has so far
outperformed all the other reinforcement learning systems in controlling
systems which involve continuous variables. I first proposed DHP in my 1977
paper (cited above), but chapter 13 contains the first real consistency proof
– and it contains the specific details needed to meet the terms of that
proof.
Chapter 13 also shows how the idea of
a Critic network can be turned on its head, to provide a real-time
time-forwards way to train the time-lagged recurrent networks described in
chapter 10. It also describes the Stochastic Encoder/Decoder/Predictor design,
which addresses the full general challenge of adaptive nonlinear stochastic
time-series modeling.
4. Finally, for a more complete
history and a more robust extension of these methods, see my 1998 paper on the mathematical relations between ADP and modern
control theory (optimal, robust and adaptive). A brief summary of the important
conclusions of that paper may be found in my supplemental papers.
But again, all of this addresses the
“subsymbolic” kind of intelligence that can be found in the brains
of the smallest mouse. Thus I claim that even the smallest mouse experiences the
interplay between traumatic experiences and understanding that Freud talked
about. Even the smallest mouse experiences the flow of “cathexis”
or emotional value signals that Freud talked about. These provide the
foundation which higher intelligence is built upon. It builds upon these
systems, and does not replace or transcend them.
If you look closely at this work, you
will see that by 1992 I was beginning to question the relatively simple model
of intelligence as an emergent phenomenon which I first sketched out in “Building and understanding
adaptive systems: A statistical/numerical approach to factory automation and
brain research,” IEEE Trans. SMC,
Jan./Feb. 1987. In 1997, I sketched out a kind of alternative, more complex
model which was essentially a unification of my previous model with some
of the ideas in a classic paper by Jim Albus. The goal was to attach the minimum
possible a priori structure, while still coping with the prodigious
complexity in time and space that organisms must deal with. But I now begin to
see a third alternative, intermediate between the two.
A clue to this new alternative came in the plenary
talk by Petrides at IJCNN in 2005. A crude simplification of his talk:
“We have studied very carefully the real functioning of the very highest
areas of the frontal lobes. The two most advanced areas appear to be designed
to answer the two most fundamental questions in human life: (1) where did I
leave my car this time in the parking lot; and (2) what was it I was
trying to do anyway?” The newly discovered “loops” between
the cerebral cortex, the basal ganglia and the thalamus do not represent
hierarchical levels, as was once thought; rather, they represent alternative
qualitative types of window into the system (alternative types of output with
corresponding error measures). The appearance of a hierarchy of tasks within
tasks, or time levels within time levels, is actually generated by a system
that looks more like a kind of generalized “subroutine calling”
procedure, in which “hierarchy” is an emergent result of recursive
function calls. Again – I would have written more about this by now, if
we were not living in a world whose very survival is at stake or if I had no
way of improving the global probability of survival.
2. Beyond the “Mouse Level”
We do not have functioning designs or
true mathematical models of the higher levels of intelligence, but I claim that
we can develop a much better preliminary or qualitative understanding of what we
can build at those higher levels by fully accounting for what we have learned
at the subsymbolic level.
Many of my thoughts on these lines are
somewhat premature, scientifically, and I find it hard to take time to write up
details which people are simply not yet ready for. For now I include just three
papers here:
“third
person” viewpoint described in Kuhn’s famous book on the philosophy
and history of science.
firmly
scientific in spirit (as in Francis Bacon’s historic efforts), it does
represent a “first person viewpoint.” There are many
philosophers,
like Chalmers and the existentialists, who stress the fact that we all
ultimately start from a first-person viewpoint.
In my homepage, I mentioned my view that a rational person should never feel obliged to “choose a theory” and be committed to it forever. Rather, we should entertain a menu of vivid, clear, different theories; we should constantly strive to improve all of them, and we should constantly be ready to adapt the level of probability we assign to them. This follows the concept of rationality developed by John Von Neumann, which is also the foundation for “decision analysis” as promulgated by Howard Raiffa.
Yet when we observe human behavior all around us, we see people who “choose” between possible theories like a vain person in a clothing store – trying them on, preening, looking at themselves in the mirror, and then buying one. (The best dressed of all dress up like Cardinals.) They then feel obliged to fight to the death for a particular theory as if it were part of their body, regardless of evidence or of objective, good judgment. Is this really just one of many examples proving that human brains (let alone mouse brains) totally lack any kind of tendency at all towards rationality or intelligence as I have described it? Does it totally invalidate this class of model of the mind?
Not really – and I was fully aware of such behavior when I first started to develop this kind of theory. What it shows is that humans are halfway between two classical views of the human mind. In one view (espoused by B.F. Skinner and one of sides of the philosopher Wittgenstein), humans play “word games” without any natural tendency to treat words and symbols as if they had any meaning at all; words and theories are truly treated on an equal footing with other objects seen in the world, like pants and dresses. In the opposite view (espoused by the “other” Wittgenstein!), humans are born with a kind of natural tendency to do “symbolic reasoning,” in which words have meanings and the meanings are always respected; however, because modern artificial intelligence often treats “symbolic reasoning” as if symbols were devoid of meaning, it is now more precise to call this “semiotic intelligence.” (There are major schools of semiotics within artificial intelligence, and even Marvin Minsky has given talks about how to fill in the gap involving “meaning.”)
My claim is that the first is a good way of understanding mouse-level intelligence. But human intelligence is a kind of halfway point between the two. Humanity is a kind of early prototype species, like the other prototype species which have occurred in the early stages of a “quantum leap” in evolution, as described by the great scientist George Gaylord Simpson, who originated many of the ideas now attributed to Stephen Jay Gould. (Though Gould did, of course, have important new ideas as well.) As an early prototype, it has enough of the new capabilities to “conquer the world” as a single species, but not enough to really perfect these capabilities in a mature or stable way.
Unfortunately, the power of our new technology – nuclear technology, especially, but others as well – is so great that the continued survival of this species may require a higher level of intelligence than what we are born with. Only by learning to emulate semiotic intelligence (or even something higher) do we have much of a chance of survival. The “semiotic” level of intelligence has a close relation to Freud’s notion of “sanity.” Unfortunately, Freud sometimes uses the word “ego” to represent global understanding, sometimes to represent the symbolic level of human intelligence, and sometimes in other ways; however, the deep and empirically-rooted insights there are well worth trying to disentangle.
We do not yet now exactly what a fully evolved “semiotic intelligence” or sapient would really look like. Some things have to be learned, because of their complexity. (For example, probability theory has to be learned, before the “symbolic” level of our mind can keep up with the subsymbolic level, in paying attention to the uncertainties in our life.) Sometimes the best that evolution can do is to create a strong predisposition and ability to learn something. But certainly we humans have a lot to learn, in order to cope more effectively with all of the megachallenges listed on my homepage.
Finally, I should summarize my own personal view of the levels of intelligence, stretching from the “mouse level” up to the highest that I can usefully imagine. First, between the mouse to the human are actually some sublevels, as I discussed in some of these papers; there are evolutionary stages like the first development of “mirror neurons,” learning from vicarious experience, transmission of experience through trance-like dances, languages which function as “word movies,” and so on. Above the level of today’s human is the sapient level. Somewhere above that is when the sapient level gets coupled with the full power of quantum computing effects. Much higher than that is the best that one could do by building a truly integrated “multiagent system” made up of components at the quantum sapient level, with some sort of matrix to hold them together; I think of this as “multimodular intelligence,” and I feel that it would be radically different in many respects from today’s relatively feeble multiagent systems and from the conflict-prone or tightly regulated social organizations we see all around us in everyday life. Still, it would have to rely heavily on the same universal principles of feedback, learning and so on. But as one goes up the ladder, one does begin to get further away from what we really know as yet…