CDC02-INV1801

Neural Networks for Control: Research Opportunities and Recent Developments

Paul J. Werbos

National Science Foundation*, Room 675

Arlington, VA 22203

pwerbos@nsf.gov

Abstract

Artificial neural networks offer both a challenge to control theory and some ways to help meet that challenge. We need new efforts/proposals from control theorists and others to make progress towards the key long-term challenge: to design generic families of intelligent controllers such that one system (like the mammal brain) has a general-purpose ability to adapt to a wide variety of large nonlinear stochastic environments, and learn a strategy of action to maximize some measure of utility across time. New

results in nonlinear function approximation and approximate dynamic programming put this goal in sight, but many parallel efforts will be needed to get there. Concepts from optimal control, adaptive control and

robust control need to be unified more effectively (as has been done in some of the recent work on stability).

The Challenge to Researchers: Context and Motivation

From the view of an NSF Program Director*, neural network control is, first and foremost, a crucial challenge to the research community. The ECS Division has long been seeking more proposals – especially cross-disciplinary proposals, well grounded in control theory – which can rise more effectively to this challenge.

Somehow or other, we know that the smallest mammal brain achieves a high degree of competence in learning to perform very complex, novel tasks in a highly nonlinear environment fraught with all kinds of uncertainties. It does so in a general-purpose way, without the use of formal symbolic logic (except in one or two species, in some situations). The effort to understand how this could be possible is one of the key challenges to basic mathematical science in this century.

Neuroscience is unlikely to answer these questions without some sort of cross-disciplinary collaborations. A well-known neuroscientist once stated: ”I have asked myself what would have happened if we had used our methods to try to understand how a radio works. First we would pull out a capacitor, watch the radio whine, and publish a paper announcing the discovery of ‘the whine center.’ Then, on a new grant, we would buy a new radio, pull out a resistor, and announce ‘the buzz center.’ A thousand

radios later, we would have a complete map of the functional centers of the radio...” Some of the more

modern methods may be more like doing a spectral analysis of the radiation emitted by a CPU, when the PC is in various states, like boot-up, idle, word-processing and so on.

Many neuroscientists have reached out to the system dynamics community or the physics community, in search of ideas to guide the development of mathematical models. There is growing interest in “complex adaptive systems,” not only in biology, but in engineering areas of growing national

attention, such as management of critical infrastructures and the “system of systems” in the wake of 9/11.

In the end, any effort to reverse-engineer or understand higher capabilities of the brain in

serious mathematical terms requires some specification of what kinds of capabilities we are looking for.

We need an operational definition of what we are aiming at. With an appropriate definition, it should

be obvious to many people in the control community both that the problem is very challenging, and that

the CDC community has a critical role to play in meeting the challenge. The next subsection will propose an operational definition.

In general, the electrical engineering community faces major challenges in attracting and retaining

the best graduate students, who (like Congress) need to be convinced that new work in this area can be

exciting and of fundamental importance both to scientific understanding and to the emerging needs of humanity. Facing up to these challenges will be important to the health of the profession.

A Specific Challenge and Associated Issues

There are many debates [1,2] about the exact nature of higher intelligence in the mammal brain.

Those debates clearly go well beyond engineering. However, consider the following concept or challenge

to engineering: to develop a family of intelligent control designs, such that any member of the family has a general purpose ability to learn the “optimal” strategy of action in any “well-behaved” complex nonlinear stochastic environment, when the system is given only three specific pieces of information: (1) a vector of observations y(t) at each sampling time t; (2) a vector of controls u(t) which it decides on itself; (3) the utility function U(y) whose expected value of future time the system tries to maximize. The challenge is to achieve all this, in a design which fits at least the major gross hardware constraint we know that the brain does meet: real-time operation implemented in a highly parallel distributed “computer” made up of billions of relatively simple, modular processing elements (“neurons”).

Even though the brain is not an exact utility maximizer, it is clear that this does capture much of what we see in the higher levels of intelligence in the mammal brain[1,2]; the brain does have capabilities which are hard to believe, if one looks out from the perspective of today’s technology, yet it proves that capabilities of this sort do exist.

Before discussing the strategy of how to reach this goal, we need to examine the goal itself in more detail.

First, as a general matter, when I try to evaluate the potential impact of an effort in this area, I ask myself: ”What difference would this particular work make to the expected delay time between now and the time when we really meet the full challenge?” The best effort will usually not be an effort aimed at reaching the final goal in one easy step. That is impossible. There are many parallel efforts possible, which represent

just one big step beyond the present state of the art, providing pieces of what we will need to achieve the ultimate goal. Much of what we really need today is new general-purpose mathematics, applicable to

nonlinear systems in general – including artificial neural networks (ANNs) as a special case, but not limited to them. Much of the best work in ANNs has actually been using neural networks as a context for developing that kind of more general mathematics. Indeed, backpropagation itself – the most widely used

algorithm in the ANN field – is actually a more general mathematical algorithm[3].

Second, we need to think about the role of prior information and domain-dependent knowledge.

There have been many extreme polarized debates, in the past, between people who believe in learning or

data-driven approaches, versus those who believe in genetically-determined ideas or prior knowledge. Both in engineering and in neuroscience, the extreme positions are untenable, in my view. In the most challenging applications, the ideal strategy may be to look for a learning system as powerful as possible, a system able to converge to the optimal strategy without any prior knowledge at all – and then initialize that system to an initial strategy and model as close as possible to the most extensive prior knowledge

we can find. (See also [4, foreword].) Some research is needed to get the best possible results in learning “without cheating.” In each application domain, research is also needed to find out how to “cheat” most effectively. The first kind of research is most important to fundamental scientific progress, but the second is also needed as part of the effort to deliver products of importance to the needs of society. In biology, many people have argued that cells to do edge-detection, for example, appear very early in the life of an organism; yet researchers have shown that cells in the lateral part of the brain can learn to take over as edge detectors, after damage to the usual visual areas. Powerful learning and prior information are both needed. But for higher intelligence, we are looking more for the ability to learn and adapt in a general-purpose way.

Third, we cannot expect the brain or any other physical device to guarantee an exact

optimal strategy of action in the general case. That is too hard for any physically realizable system. We will probably never be able to build a device to play a perfect game of chess or a perfect game of Go.

In computer science terms, those problems are all “NP hard.” But in engineering and in biology, we do

not need or ask for absolutely perfect solutions. We look for the best possible approximations, trying to be as exact as we can, but not giving up on the true nonlinear problems of real interest.

Fourth, the notion of “well-behaved” is extremely subtle, and itself points towards

one of the parallel strands of research that needs to be taken further. Decades ago, statisticians realized that it is impossible to learn very much from streams of time-series data, if there are billions of variables, and if one imposes the usual “flat priors” of maximum-likelihood statistics. Even simple ANNs are possible only because there are some implicit notions of “Occam’s Razor” priors which allow inference, both in brains

and in ANNs. Almost all theorems about nonlinear function approximators make similar implicit assumptions about the “smoothness” of the function to be approximated; there are some control

applications where the usual notions about smoothness break down, and the usual nonlinear function approximators perform very badly, compared to others less well-known. Issues of this kind need to be explored further [3; 4, chapter 10].

Fifth, the issue of stability and safety is subsumed here into the choice of utility function U, in this formulation. When the world is modeled, mathematically, as the truly uncertain place it really is, we can never give a 100% guarantee that bad events are absolutely impossible. The brain was evolved to minimize the probability of sudden death, in an environment where an absolute guarantee cannot be achieved. Many practical users of control systems would rather be certain that the probability of accidents is minimized, in a full-up stochastic simulation of the real world, rather than having iron-clad guarantees that accidents could never happen if only the world were simple and linear. Stability theory will be an important tool in developing learning systems which can actually converge to strategies which minimize the probability of accidents, and it will be important to our ability to obtain and understand experience in using learning-based designs on complex real-world systems. But it is only one of several important strands of research, relevant to the larger goal. At higher levels of systems design and management, the President’s Economic Adviser has recently urged engineers to place more weight on performance issues, and to address the tradeoffs between performance and safety in a more balanced way, grounded in modern risk analysis (i.e.

in the maximization of total expected utility[5]).

Sixth, I would agree with the classical AI researchers who argue that the highest levels of intelligence seen in brains on earth is intelligence based on symbolic reasoning, not the subsymbolic intelligence I am talking about here. But 99% of the human brain is identical in its underlying wiring and learning abilities to the brains of the smallest mouse. Before science is able to truly understand how symbolic reasoning works in the human brain, it must first develop a deeper understanding of the remaining 99% of the brain. From a larger viewpoint, it is a good thing that many people do research on symbolic reasoning, even before scientific closure is possible on those issues; however, research aimed at subsymbolic intelligence is clearly on the critical path to developing deeper understanding of such higher levels of intelligence.

Strategies, Tasks and Tools

Most CDC members will immediately see that the challenge above is a challenge in optimal control. It may seem, at first, that the challenge here is simply the old challenge of “solving the curse of dimensionality” in dynamic programming. But it is more than that. The challenge is also to learn the model of the environment and to solve the dynamic programming problem as accurately as possible, concurrently.

(Some computer scientists advocate a purely model-free approach, without any learning of how to predict or even do state estimation; this does not scale well to large problems, and is not consistent with what we know about brains or animal learning [2,4,6,7,8].)

It is also well-known that the general nonlinear robust control problem is equivalent to the problem of solving a nonlinear “Hamilton-Jacobi-Bellman” equation as accurately as possible. If one allows off-line learning, then the challenge posed above is equivalent to the challenge of giving nonlinear robust control the tools that it needs to address the general nonlinear case as accurately as possible. Many of the near-term opportunities to achieve practical results with neural network control do involve a

clever use of off-line learning, in part because of verification and validation issues [9,10]. We may be entering a period where the difference between nonlinear robust control and neural network control

may start to become more semantic and emotional rather than real and mathematical.

Adaptive control is relevant because of the need for systems based on learning or adaptation.

Curiously enough, it now seems that methods derived from neural network control may finally solve the old problem of universal stability in adaptive control for the linear MIMO case; however, even the preliminary theorems on those lines [7] make heavy use of quadratic stability concepts from the linear robust control world. Greater collaboration between experts in linear robust control and adaptive control may be necessary to grasp this new opportunity, close at hand as it is. Clearly this is one of several very important open research opportunities.

The greater challenge here clearly depends on our ability to bring together the capabilities of all three of these communities more effectively.

The most important breakthrough which makes this a viable direction for research is the development of the field which some people now call neuro-dynamic programming[11] or, more

recently, Adaptive Dynamic Programming (ADP). ADP originated in three previously-independent

small strands of research led by Bernard Widrow (“adaptive critics”), Andrew Barron (“reinforcement learning”) and myself (“approximate dynamic programming” and “reinforcement learning”), in the first

major workshop on neural networks for control held back in 1990[12]. (See [7] for a review

of the actual history, and for mathematical details of new adaptation methods important to

strong stability.) The 1981 international conference paper which first described backpropagation in detail as a method for adapting multilayer neural networks also gave the general form of the method, for arbitrary nonlinear systems, and described how to use it as part of a parallel distributed design for model-based ADP [15,chapter 7].

Since then, however, the various schools have drifted apart to some degree. Reinforcement learning methods have become amazingly popular in AI, where they are commonly regarded as “the answer” to higher-level decision-making and planning problems. Yet the simple model-free designs in general use do not really address continuous variable problems, and have difficulties in scaling up to large problems, as has been noted many times in engineering applications in the past[4,12,13]. Even their

performance in game-playing applications has been somewhat overstated; the only researcher who has ever achieved human expert-level performance in a difficult strategic game, based on learning without heavy prior knowledge, actually used evolutionary computing to train the “Critic” network in his system [14].

Clearly engineers have a critical role to play in developing designs which can scale up to handling larger problems, and can address the issue of partially-observed systems, by combining learning-based system identification and ADP together. This has already begun, but considerably more work remains to be done.

In the CDC talk, the author will mention some of the recent progress in model-based ADP work, and the further research challenges important to areas like the control of complex network systems like

electric power grids [6]. Particularly notable are the recent success of Wunsch, Harley and Venayagamoorthy in controlling a physical network of turbogenerators able to maintain robust operation in

the face of disturbances much greater than the previous state of the art allowed; the success of Balakrishnan in benchmark evaluations of success in difficult missile interception problems; success by Ferrari and Stengel in improving performance over well-tuned classical methods in aircraft control; and success by Lendaris’ group at Portland State in tasks ranging from simulated vehicle skid control through to logistics control – all using model-based ADP methods. Major new results in stability have also been achieved,

some by presenters in this session, and some by the Hittle/Young/ Anderson group at Colorado State

(with application to improved energy efficiency in buildings), among others. See the references for

mathematics, algorithms, and further citations.

References

1. Karl H.Pribram, ed. , Brain and Values, Erlbaum: Hillsdale, NJ, 1998.

2. K. Yasue, M. Jibu & T. Della Senta, eds, No Matter, Never Mind : Proceedings of Toward a Science of Consciousness : Fundamental Approaches (Tokyo '99) . John Benjamins Pub Co, 2002.

3. P. Werbos, Backpropagation: General principles and issues for biology. In M. Arbib, ed.,

Handbook of Brain Theory and Neural Networks, Second Edition, MIT Press, 2002.

4. White & D.Sofge, eds, Handbook of Intelligent Control, Van Nostrand, 1992.

5. H. Raiffa, Decision Analysis, Addison-Wesley, 1968.

6. wwwimacm.org

7. xxx.lanl.gov/abs/adap-org/9810001

8. P. Werbos, Neurocontrollers, in J.Webster, ed, Encyclopedia of Electrical and Electronics Engineering, Wiley, 1999

9. J.S. Baras and N.S.Patel, Information state for robust control of set-valued discrete time systems, Proc. 34th Conf. Decision and Control (CDC), IEEE, 1995. p.2302.

10. M. Motter, ed, Special Session on Intelligent Flight Control, ACC Conf. Proc., 2001/

11. D.P.Bertsekas and J.N.Tsisiklis, Neuro-Dynamic Programming,. Belmont, Mass.: Athena Scientific, 1996

12. W.T.Miller, R.Sutton & P.Werbos (eds), Neural Networks for Control, MIT Press, 1990, now in paper

13. S. Haykin, Neural Networks: A Comprehensive Foundation, Second Edition, Prentice-Hall, 1998.

14. K. Chellapilla & D.B. Fogel, Anaconda defeats Hoyle 6-0: A case study competing an evolved

checkers program against commercially available software. In Proc. Cong. on Evolutionary Computation (CEC2000), IEEE Press, Piscataway, NJ, 2000, pp. 857-863.

15. P. Werbos, The Roots of Backpropagation, Wiley, 1994. Contains complete reprints.