CDC02-INV1801
Neural Networks for Control:
Research Opportunities and Recent Developments
Paul J. Werbos
National
Science Foundation*, Room 675
Arlington, VA 22203
Abstract
Artificial neural networks offer both a challenge to control theory and
some ways to help meet that challenge. We need new efforts/proposals from
control theorists and others to make progress towards the key long-term
challenge: to design generic families of intelligent controllers such that one
system (like the mammal brain) has a general-purpose ability to adapt to a wide
variety of large nonlinear stochastic environments, and learn a strategy of
action to maximize some measure of utility across time. New
results in nonlinear function approximation and
approximate dynamic programming put this goal in sight, but many parallel
efforts will be needed to get there. Concepts from optimal control, adaptive
control and
robust control need to be unified more effectively (as
has been done in some of the recent work on stability).
The Challenge
to Researchers: Context and Motivation
From the view of an NSF Program Director*, neural network control is,
first and foremost, a crucial challenge to the research community. The ECS
Division has long been seeking more proposals – especially cross-disciplinary
proposals, well grounded in control theory – which can rise more effectively to
this challenge.
Somehow or other,
we know that the smallest mammal brain achieves a high degree of competence in
learning to perform very complex, novel tasks in a highly nonlinear environment
fraught with all kinds of uncertainties. It does so in a general-purpose way,
without the use of formal symbolic logic (except in one or two species, in some
situations). The effort to understand how this could be possible is one of the
key challenges to basic mathematical science in this century.
Neuroscience is
unlikely to answer these questions without some sort of cross-disciplinary collaborations. A well-known neuroscientist once stated: ”I have asked myself what would have happened if we had
used our methods to try to understand how a radio works. First we would pull
out a capacitor, watch the radio whine, and publish a paper announcing the
discovery of ‘the whine center.’ Then, on a new grant, we would buy a new
radio, pull out a resistor, and announce ‘the buzz center.’ A thousand
radios later, we would have a complete map of the
functional centers of the radio...” Some of the more
modern methods may be more like doing a spectral analysis
of the radiation emitted by a CPU, when the PC is in various states, like
boot-up, idle, word-processing and so on.
Many
neuroscientists have reached out to the system dynamics community or the
physics community, in search of ideas to guide the development of mathematical models.
There is growing interest in “complex adaptive systems,” not only in biology,
but in engineering areas of growing national
attention, such as management of critical
infrastructures and the “system of systems” in the wake of 9/11.
In the end, any effort
to reverse-engineer or understand higher capabilities of the brain in
serious mathematical terms
requires some specification of what kinds of capabilities we are looking
for.
We need an operational definition of what we are aiming at. With an
appropriate definition, it should
be obvious to many people in the control community
both that the problem is very challenging, and that
the CDC community has a critical role to play in
meeting the challenge. The next subsection will propose an operational definition.
In general, the
electrical engineering community faces major challenges in attracting and
retaining
the best graduate students, who (like Congress) need
to be convinced that new work in this area can be
exciting and of fundamental importance
both to scientific understanding and to the emerging needs of humanity. Facing
up to these challenges will be important to the health of the profession.
A Specific
Challenge and Associated Issues
There are many
debates [1,2] about the exact nature of higher intelligence
in the mammal brain.
Those debates clearly go well beyond engineering. However, consider the
following concept or challenge
to engineering: to develop a family of intelligent control designs, such
that any member of the family has a general purpose ability to learn the
“optimal” strategy of action in any “well-behaved” complex nonlinear stochastic
environment, when the system is given only three specific pieces of
information: (1) a vector of observations y(t)
at each sampling time t; (2) a vector of controls u(t) which it decides on itself; (3) the utility function U(y) whose expected value of
future time the system tries to maximize. The challenge is to achieve all this,
in a design which fits at least the major gross hardware constraint we know
that the brain does meet: real-time operation implemented in a highly parallel
distributed “computer” made up of billions of relatively simple, modular processing elements (“neurons”).
Even though the brain is not an exact utility
maximizer, it is clear that this does capture much of what we see in the higher
levels of intelligence in the mammal brain[1,2]; the
brain does have capabilities which are hard to believe, if one looks out from
the perspective of today’s technology, yet it proves that capabilities of this
sort do exist.
Before discussing
the strategy of how to reach this goal, we need to examine the goal itself in
more detail.
First, as a general matter, when I try to evaluate the potential impact
of an effort in this area, I ask myself: ”What difference would this particular work
make to the expected delay time between now and the time when we really meet
the full challenge?” The best effort will usually not be an effort aimed at
reaching the final goal in one easy step. That is impossible. There are many
parallel efforts possible, which represent
just one big step beyond the present state of the
art, providing pieces of what we will need to achieve the ultimate goal. Much
of what we really need today is new general-purpose mathematics, applicable to
nonlinear systems in general – including artificial neural networks
(ANNs) as a special case, but not limited to them. Much of the best work in
ANNs has actually been using neural networks as a context for developing that
kind of more general mathematics. Indeed, backpropagation itself – the most
widely used
algorithm in the ANN field – is actually a
more general mathematical algorithm[3].
Second, we need to
think about the role of prior information and domain-dependent knowledge.
There have been many extreme polarized debates, in the past, between
people who believe in learning or
data-driven approaches, versus those who
believe in genetically-determined ideas or prior knowledge. Both in engineering
and in neuroscience, the extreme positions are untenable, in my view. In the
most challenging applications, the ideal strategy may be to look for a learning
system as powerful as possible, a system able to converge to the optimal
strategy without any prior knowledge at all – and then initialize that system to an initial strategy and
model as close as possible to the most extensive prior knowledge
we can find. (See also [4, foreword].) Some research
is needed to get the best possible results in learning “without cheating.” In
each application domain, research is also needed to find out how to “cheat”
most effectively. The first kind of research is most important to fundamental
scientific progress, but the second is also needed as part of the effort to
deliver products of importance to the needs of society. In biology, many people
have argued that cells to do edge-detection, for example, appear very early in
the life of an organism; yet researchers have shown that cells in the lateral
part of the brain can learn to take over
as edge detectors, after damage to the usual visual areas. Powerful learning
and prior information are both needed. But for higher intelligence, we are
looking more for the ability to learn and adapt in a general-purpose way.
Third, we cannot
expect the brain or any other physical
device to guarantee an exact
optimal strategy of action in the general case. That is
too hard for any physically realizable system. We will probably never be able
to build a device to play a perfect game of chess or a perfect game of Go.
In computer science terms, those problems are all “NP hard.” But in
engineering and in biology, we do
not need or ask for absolutely perfect solutions. We
look for the best possible approximations, trying to be as exact as we can, but
not giving up on the true nonlinear problems of real interest.
Fourth, the notion
of “well-behaved” is extremely subtle, and itself
points towards
one of the parallel strands of research that needs to
be taken further. Decades ago, statisticians realized that it is impossible to
learn very much from streams of time-series data, if there are billions of
variables, and if one imposes the usual “flat priors” of maximum-likelihood
statistics. Even simple ANNs are possible only because there are some implicit
notions of “Occam’s Razor” priors which allow inference, both in brains
and in ANNs. Almost all theorems about nonlinear
function approximators make similar implicit assumptions about the “smoothness”
of the function to be approximated; there are some control
applications where the usual notions about smoothness
break down, and the usual nonlinear function approximators perform very badly,
compared to others less well-known. Issues of this kind need to be explored
further [3; 4, chapter 10].
Fifth, the issue of
stability and safety is subsumed here into the choice of utility function U, in
this formulation. When the world is modeled, mathematically, as the truly
uncertain place it really is, we can never give a 100% guarantee that bad
events are absolutely impossible. The brain was evolved to minimize the probability of sudden death, in an environment where
an absolute guarantee cannot be achieved. Many practical users of control
systems would rather be certain that the probability of accidents is minimized,
in a full-up stochastic simulation of the real world, rather than having
iron-clad guarantees that
accidents could never happen if only the world were simple and
linear. Stability theory will be an important tool in developing learning
systems which can actually converge to strategies which minimize the
probability of accidents, and it will be important to our ability to obtain and
understand experience in using learning-based designs on complex real-world
systems. But it is only one of several important strands of research, relevant
to the larger goal. At higher levels of systems design and management, the
President’s Economic Adviser has recently urged engineers to place more weight
on performance issues, and to address the tradeoffs between performance and
safety in a more balanced way, grounded in modern risk analysis (i.e.
in the maximization of total expected utility[5]).
Sixth,
I would agree with the classical AI researchers who argue that the highest
levels of intelligence seen in brains on earth is intelligence based on symbolic reasoning, not the subsymbolic
intelligence I am talking about here. But 99% of the human brain is identical
in its underlying wiring and learning abilities to the brains of the smallest
mouse. Before science is able to truly understand how symbolic reasoning works
in the human brain, it must first develop a deeper understanding of the
remaining 99% of the brain. From a larger viewpoint, it is a good thing that
many people do research on symbolic reasoning, even before scientific closure
is possible on those issues; however, research aimed at subsymbolic
intelligence is clearly on the critical path to developing deeper understanding
of such higher levels of intelligence.
Strategies,
Tasks and Tools
Most CDC members
will immediately see that the challenge above is a challenge in optimal
control. It may seem, at first, that the challenge here is simply the old
challenge of “solving the curse of dimensionality” in dynamic programming. But
it is more than that. The challenge is also to learn the model of the
environment and to solve the dynamic programming problem as accurately as
possible, concurrently.
(Some computer scientists advocate a purely model-free approach, without
any learning of how to predict or even do state estimation; this does not scale
well to large problems, and is not consistent with what we know about brains or
animal learning [2,4,6,7,8].)
It is also
well-known that the general nonlinear robust control problem is equivalent to
the problem of solving a nonlinear “Hamilton-Jacobi-Bellman” equation as accurately
as possible. If one allows off-line learning, then the challenge posed above is
equivalent to the challenge of giving nonlinear robust control the tools that
it needs to address the general nonlinear case as accurately as possible. Many
of the near-term opportunities to achieve practical results with neural network
control do involve a
clever use of off-line learning, in part because of
verification and validation issues [9,10]. We may be entering a period where
the difference between nonlinear robust control and neural network control
may start to become more semantic and emotional rather
than real and mathematical.
Adaptive control is
relevant because of the need for systems based on learning or adaptation.
Curiously enough, it now seems that methods derived from neural network control may finally
solve the old problem of universal stability in adaptive control for the linear
MIMO case; however, even the preliminary theorems on those lines [7] make heavy
use of quadratic stability concepts from the linear robust control world.
Greater collaboration between experts in linear robust control and adaptive
control may be necessary to grasp this new opportunity, close at hand as it is.
Clearly this is one of several very important open research opportunities.
The greater
challenge here clearly depends on our ability to bring together the
capabilities of all three of these communities more effectively.
The most important
breakthrough which makes this a viable direction for research is the
development of the field which some people now call neuro-dynamic programming[11] or, more
recently, Adaptive Dynamic Programming
(ADP). ADP originated in three previously-independent
small strands of research led by Bernard Widrow
(“adaptive critics”), Andrew Barron (“reinforcement learning”) and myself
(“approximate dynamic programming” and “reinforcement learning”), in the first
major workshop on neural networks for control held back
in 1990[12]. (See [7] for a review
of the actual history, and for mathematical details
of new adaptation methods important to
strong stability.) The 1981 international conference
paper which first described backpropagation in detail as a method for adapting
multilayer neural networks also gave the general form of the method, for
arbitrary nonlinear systems, and described how to use it as part of a parallel
distributed design for model-based ADP [15,chapter 7].
Since then,
however, the various schools have drifted apart to some degree. Reinforcement
learning methods have become amazingly popular in AI, where they are commonly
regarded as “the answer” to higher-level decision-making and planning problems.
Yet the simple model-free designs in general use do not really address
continuous variable problems, and have difficulties in scaling up to large
problems, as has been noted many times in engineering applications in the
past[4,12,13]. Even their
performance in game-playing applications has been somewhat overstated;
the only researcher who has ever achieved human expert-level performance in a
difficult strategic game, based on learning without heavy prior knowledge,
actually used evolutionary computing to train the “Critic” network in his
system [14].
Clearly engineers have a critical role to play in developing designs
which can scale up to handling larger problems, and can address the issue of
partially-observed systems, by combining learning-based system identification
and ADP together. This has already begun, but considerably
more work remains to be done.
In the CDC talk,
the author will mention some of the recent progress in model-based ADP work,
and the further research challenges important to areas like the control of
complex network systems
like
electric power grids [6]. Particularly
notable are the recent success of Wunsch, Harley and Venayagamoorthy in
controlling a physical network of turbogenerators able to maintain robust
operation in
the face of disturbances much greater than the previous state of the art
allowed; the success of Balakrishnan in benchmark evaluations of success in difficult
missile interception problems; success by Ferrari and Stengel in improving
performance over well-tuned classical methods in aircraft control; and success
by Lendaris’ group at Portland State in tasks ranging from simulated vehicle
skid control through to logistics control – all using model-based ADP methods.
Major new results in stability have also been achieved,
some by presenters in this session, and some by the
Hittle/Young/
(with application to improved energy efficiency
in buildings), among others. See the
references for
mathematics, algorithms, and further citations.
1.
Karl H.Pribram, ed. ,
Brain and Values, Erlbaum:
2. K. Yasue, M. Jibu & T. Della Senta, eds, No Matter, Never Mind : Proceedings of Toward a Science of Consciousness : Fundamental Approaches (Tokyo '99) . John Benjamins Pub Co, 2002.
3. P. Werbos, Backpropagation: General principles and issues for biology. In M. Arbib, ed.,
Handbook of Brain Theory and Neural Networks, Second Edition, MIT Press, 2002.
4. White & D.Sofge, eds, Handbook of Intelligent Control, Van Nostrand, 1992.
5. H. Raiffa, Decision Analysis, Addison-Wesley, 1968.
6. wwwimacm.org
7. xxx.lanl.gov/abs/adap-org/9810001
8. P. Werbos, Neurocontrollers, in J.Webster, ed, Encyclopedia of Electrical and Electronics Engineering, Wiley, 1999
9. J.S. Baras and N.S.Patel, Information state for robust control of set-valued discrete time systems, Proc. 34th Conf. Decision and Control (CDC), IEEE, 1995. p.2302.
10. M. Motter, ed, Special Session on Intelligent Flight Control, ACC Conf. Proc., 2001/
11. D.P.Bertsekas and J.N.Tsisiklis, Neuro-Dynamic
Programming,.
12. W.T.Miller, R.Sutton & P.Werbos (eds), Neural Networks for Control, MIT Press, 1990, now in paper
13. S. Haykin, Neural Networks: A Comprehensive Foundation, Second Edition, Prentice-Hall, 1998.
14. K. Chellapilla & D.B. Fogel, Anaconda defeats Hoyle 6-0: A case study competing an evolved
checkers
program against commercially available software. In Proc. Cong. on Evolutionary Computation
(CEC2000), IEEE Press,
15. P. Werbos, The Roots of Backpropagation, Wiley, 1994. Contains complete reprints.