Elastic Fuzzy Logic: A Better Fit To Neurocontrol and True Intelligence

Paul J. Werbos

Room 1151, National Science Foundation^*

Washington D.C., USA 20550

Abstract

Since 1990 or so[1], there has been great interest in using adaptive methods taken from neural network theory to adapt fuzzy-logic networks or other fuzzy systems. The basic idea is to use fuzzy logic as a kind of translation technology, to go back and forth between the words of a human expert and the equations of a controller, classifier, or other useful system. One can then use neural network methods to adapt that system, so as to improve performance.

Designs of this sort have already been useful. However, the existing designs do not live up to the full potential of this approach, and they do not achieve anything like brain-style "intelligence." This paper will propose a two-fold approach to achieving the full potential of such hybrid systems: (1) the use of elastic fuzzy logic (ELF), a new extension of fuzzy logic which makes it possible to combine the best of fuzzy logic and neural networks; (2) the use of advanced learning techniques -- using some ELF components -- which make it possible to perform true planning or optimization over time on an adaptive basis[2]. Pseudocode examples will be given to help in application. The paper will also discuss symbolic reasoning, and links to the Real-time Control System (RCS) of Albus [3], which represents the state of the art in classical AI for control.

1. Introduction

How can we build artificial systems which are truly intelligent, in a brain-like sense, which incorporates both learning and the efficient use of distributed systems? How can we build useful systems

capable of learning and implementing strategies of action or plans to accomplish difficult tasks or solve problems over time?

______________________________________________________

*The views herein are those of the author, developed and written up on personal time, and not the views of NSF. BehavHeuristic, Inc. (BHI) of College Park, Maryland, has patents pending on ELF and on some of the adaptation techniques discussed here, as well as more advanced techniques.

This paper will describe a three-pronged approach to these classical problems, based on elastic fuzzy logic (ELF) and advanced learning designs taken from neurocontrol. (I use "ELF" instead of the "correct" abbreviation EFL, because "ELF" is easier to pronounce.) It will also explain why the existing approaches to blending fuzzy logic and neural networks do not quite solve these problems.

The approach here is very broad and very general, as I will describe in Section 4. However, most engineers prefer to begin with a simple and useful example; therefore, the first part of this paper will discuss ELF in the context of conventional, simple fuzzy control. Most readers familiar with classical fuzzy control would regard this as the general case, rather than a specific example; however, section 3 will go on to describe more general control tasks and designs, very briefly. Section 3 will review some concepts from neurocontrol, and explain why the learning approaches used in most applications today are very limited in their power, compared with the best state of the art.

This paper does not claim that ELF is the best design to use on all problems requiring some version of "intelligent control;" however, in those applications where fuzzy control is useful (of which many, many examples have been published), I claim that ELF does permit the use of well-studied, well-proven generic methods from the neural network field, whose link to brain-like intelligence has been discussed at very great length elsewhere. To prove that ELF permits the use of those methods, I will simply specify how to implement these hybrid designs.

2. ELF in Conventional Fuzzy Control

This section will first review conventional fuzzy control, and then define what ELF is in that context.

It will give details on one way of implementing it. Then it will explain the advantages of ELF compared with other forms of adaptable fuzzy control. The implementation details may seem unnecessarily complete to some readers, but they should make it easier to understand and implement the more powerful designs to be discussed in section 3. Pseudocode will be presented in the Appendix.

Review of Conventional Fuzzy Control

In conventional fuzzy control, the expert provides a set of rules -- expressed in words -- and some information about what the words in the rules actually mean. Standard fuzzy logic procedures are used to translate this information into a set of equations used to define a controller. We can think of these equations as a kind of neural network with two hidden layers. Conventional fuzzy control is one way of addressing the more general task of "cloning" or imitating a pre-existing expert.

At each time t, the controller will have access to a set of input variables or sensor readings, X₁(t) through X_m(t), which we can think of as a vector X(t). It will emit a set of actuator or motor or action variables, u₁(t) through u_n(t), which we can think of as a vector u(t). The human expert must first be told what inputs X and controls u are available, before describing a controller. (Of course, the expert himself may specify the inputs and controls required for the particular application.)

After the expert knows what the inputs and outputs are, he then gives you a list of r rules, expressed in words. Each rule must be of the form:

If A and B and ... and C then do D,

where A through D are words or phrases. (D is called the "verb" in this rule.)

Usually, the same word A appears in more than one rule, but each rule uses a different subset of the available words. There is no requirement that rules "divide up" the input space into distinct partitions;

rules usually overlap each other, and one rule may be more specific than other rules, and so on.

Many, many examples of this appear in the literature.

For each word or phrase used in the rule, the expert must also supply a definition of what the word means, relative to the actual input variables X and control variables u available to the controller. Thus for each input word A, he must specify a membership function μ_A(X) which indicates the degree to which the word A is true for each situation X. For each verb, D, he provides the membership function μ_D(u), in theory; in practice, he usually just specifies u'(D), the control settings which best match the word D.

(A strict fuzzy logician would describe u'(D) as the centroid of the function μ_D(u).)

The information from the expert is translated into a two-hidden-layer network as follows.

First, we must do some bookkeeping. The set of input words across the entire system are put into an ordered list. The first word may be called A₁, the second A₂, and so on, up to the last word, A_na.

The rules also form a list, from rule number 1 to rule number r. For each rule, rule number j, we must look up each input word on the overall list of words A_i; thus if "B" is the second word in rule number j, then word B should appear as A_k on the overall list, for some value of k. We may define "i_j,2" as that value of k. More generally, we may define i_j,_η as that value of k such that A_k matches the ηth input word in rule number j. Using this notation, rule number j may be expressed as:

where nj is the number of input words in rule number j, and where u'(j) refers to u'(D) for the verb D of rule number j.

Using this notation, the information from the expert is translated into a two-hidden-layer network as follows. The first hidden layer is the membership layer:

The next hidden layer is the layer of rule-activation, which calculates the degree to which rule number j applies to situation X:

The output layer is the simple defuzzification rule used in most practical applications[4]:

where I define:

Theoretical papers often discuss alternative versions of this, such as the use of the minimum function instead of a product in equation 3; however, empirical comparisons reported in Japan (discussed at length in the 1992 Iizuka conference) have shown that the simple form shown in equation 3 usually leads to better results. This is what I would expect, because equation 3 is smooth and differentiable (more like human experts), but the traditional minimum rule is crisper and more artificial.

In the Appendix, I give pseudocode for a subroutine, FUZZ(u,X), which implements these equations.

Naturally, the subroutine inputs the array X, and outputs the array u.

Elastic Fuzzy Control: the Basics

In elastic fuzzy logic, the words coming from the expert can be translated initially into equations which are absolutely equivalent to those above. However, additional parameters are inserted into the system for use in later adaptation. Equation 3 is replaced by the equation:

where the gamma parameters are all set to one initially. The Appendix describes how to code this up as a new subroutine, ELF(u,X,gamma,uprime), very similar to our old subroutine FUZZ.

Intuitively, the γ_j,0 parameters represent the strength or degree of validity of each rule. The parameters γ_j,k represent the importance of each condition (input word) to the applicability of each rule. For example, when γ_j,k equals 2, in ELF, this would be equivalent to having that word appear twice as an input condition, in conventional fuzzy control.

At the end of this section, I will discuss the advantages of this approach in more detail, relative to the many alternatives in existence. First, however, I will describe an example of how to use it.

Adapting ELF by Supervised Control

At present, when people adapt fuzzy control systems, they generally use a simple adaptation procedure which is called "supervised control" in the neurocontrol field[2]. Supervised control has very limited capabilities. Section 3 will describe its limitations, and how to do better. However, as a basic introduction, I will now describe how to adapt an ELF system in supervised control. Supervised control, in turn, is merely one example of how to perform the task of cloning a preexisting expert or controller; section 3 will describe other tasks of interest in control.

In supervised control, the user first creates a database of "correct" control actions.

For each example, example number t, the user provides a vector of sensor inputs X(t) and a vector of desired or correct control actions, u^*(t) The weights in the system are usually adapted so as to minimize:

In ELF, we would most often define the weights as the combination of the gamma parameters and the u' vectors.

To minimize E_tot as a function of the weights, we can use the conventional neural-network technique of backpropagation, which I first developed in 1974 [5]. This can be described as an iterative approach.

(See [2,5] for real-time versions as well.) On the first iteration, we initialize the gamma parameters to one, and initialize uprime to the values given by the expert. On each subsequent iteration, we can take the following steps:

1. Initialize the arrays of derivatives F_gamma_total(j,k) and

F_uprime_total(j,k) to zero.

2. For each example t do:

2a. CALL ELF(u,X(t),gamma,uprime)

2b. Calculate the vector of derivatives of E(t) with respect to the

components of u(t):

F_u(t) = 2*(u(t) - u^*(t))

2c. Using backpropagation -- the chain rule for ordered derivatives --

work the derivatives back to calculate F_gamma(t) and F_uprime(t),

the derivatives of E(t) with regard to the gamma and uprime parameters.

2d. Update the array F_gamma_total to F_gamma_total plus F_gamma(t), and

likewise for F_uprime_total.

3. Update the arrays of parameters:

new gamma = old gamma - LR1 * F_gamma_total

new uprime = old uprime - LR2 * LR2 * F_uprime_total,

where LR1 and LR2 are positive scalar "learning rates" chosen for convenience. Note how the

assignment statements here all refer to array operations rather than scalar operations.

Pseudocode for this algorithm is given in the Appendix.

All of this is very straightforward, and there are many ways known in the neural network field to improve performance[2]. The one key challenge remaining is to program the dual subroutine, F_ELF, which inputs the derivatives in the array F_u and outputs the derivatives in the arrays F_gamma and F_uprime.

(That subroutine must also have access to other information, but I will not list that explicitly as additional

arguments, because that might confuse the underlying idea here.)

In order to calculate the derivatives efficiently, starting from knowledge of F_u, we can use the chain rule for ordered derivatives[2,5] to derive the following equations:

where the centered dot represents a vector dot product, and where F_u' is a vector.

This dual subroutine could be expanded further, so as to output F_X, the derivatives of E with respect to the inputs X(t); however, that would require knowledge of the membership functions (or another dual subroutine, F_MU).

Practical Advantages of Elastic Fuzzy Control

ELF is certainly not the only form of adaptable fuzzy logic proposed in the past few years; however, it has a number of unique advantages.

The most common version of adaptable fuzzy logic is based on putting parameters into the membership functions μ rather than the rules. This has two disadvantages.

First, by changing the function μ_A(X), we are changing the definition of the word A. Thus the computer is no longer defining words in the same way as the expert. This could reduce our ability to explain to the expert what the adapted version of the controller is doing, or even what was changed in adaptation. When A is a simple word, representing a function of only one sensor input X_i, then this may not be a real problem; however, for more complex words, it could be a problem.

Second -- and more important -- changing the membership functions does not allow you to change the rules themselves; thus the scope for adaptation is very limited. This kind of adaptation can give better results than fixed logic, but it is still quite limited. It does not provide a way to change the basic structure of the "fuzzy partition" of the input space.

When the word A depends on X in a complex way, you might try a very different approach to adapting membership functions. You might present the expert with many different examples of X, and ask him to say how much the word A applies in each example. You could then use a simple neural network to learn the mapping from X to μ_A. Neural networks are often described as "black boxes," but in this

application it can yield a membership function which matches the expert's language more accurately; it therefore makes the overall controller more of a "white box" to the expert.

In the past, it has been suggested that the control vectors, u'(j), for each rule j, be replaced by a neural network, inputting the vector X. This gives considerable flexibility, but the adapted version is then still a black box, to some extent. Could this be any better than simply using a single neural network for the entire controller? In fact, it could be better, because each of the rule-specific neural networks would be "local"; as with the work of Jacobs et al[6], one might expect faster real-time learning in such a system. Local networks have many practical advantages[2]. Nevertheless, this scheme does not result in a white box model. Also, this scheme could actually be combined with elastic fuzzy logic where needed.

Other researchers have proposed something like ELF, but without the γ_i,j exponents. These exponents play a crucial role in adapting the content of each rule; therefore, they are crucial in providing more complete adaptability.

Crucial to the use of ELF is our ability to explain the adapted controller back to the expert. The γ_j,0

parameters can be reported back as the "strength" or "degree of validity" of each rule. The parameters γ_j,k can be described as the "importance" of each condition (input word) to the applicability of the rule. In fact, the parameters γ_j,k correspond exactly to the "elasticities" used by economists; the whole apparatus used by economists to explain the idea of "elasticity" can be used here as well. If economists can understand

intuitively what an elasticity is, then engineers (the usual experts in control applications) should not have great difficulty with them. Also, as a further guide to explaining them, Ron Yager has pointed out that a γ_j,k of 2 is equivalent to having the word appear twice in the rule. In general, a γ_j,k of k (where k is a positive integer) is equivalent to having the word appear k times in the rule.

Another advantage of ELF is the possibility of adaptive adding and pruning of rules, and of words within rules. When γ parameters are near zero, then the corresponding word or rule can be removed. This is really just a special case of the general procedure of pruning connections and neurons in neural networks -- a well-established technique. Likewise, new connections or rules could be tested out safely, by inserting them with γ's initialized to zero, and made effective only as adaptation makes them different from zero. In summary, neural network techniques can be used with ELF nets to adapt the very structure of the controller. The potential scope for adaptation is very great.

The scope for adaptation, while great, is not unlimited. The choice of words, A_i, does limit the allowed set of rules. In any particular application, one can test whether this limitation has measurable consequences simply by using a neural network instead (or making the u' into neural networks), and seeing whether better performance can be had that way. Similar approaches might be used to develop tools to help the expert expand his vocabulary, using display techniques similar in flavor to exploratory data analysis[7].

The ELF network has some very interesting capabilities as a type of neural network, even apart from its link to fuzzy logic. It combines the ability to cope with many input variables, in a smooth and adaptive way, together with an essentially "local" structure. This kind of capability has been very important (and hard to find) in real-time control applications of neural networks[8].

3. Alternative Learning Designs and Neurocontrol

Introduction

Broadly speaking, neural networks have been used in four different ways in useful applications in control[2,9]:

1. As subsystems of a larger system, used for pattern recognition, diagnostics,

sensor fusion, dynamic system identification, etc.;

2. As "clones" which learn to imitate human experts or other existing controllers;

3. As "tracking" systems, which make a robot arm track a desired trajectory in

space, make a chemical plant stay at a desired setpoint, make a plant

track a desired reference model, etc.

4. As optimization systems, systems which learn strategies of action which maximize

some measure of utility (or performance or profit or goal satisfaction, etc.) or minimize

some measure of cost over future time.

Every one of these four tasks is associated with a large body of learning designs, specifically aimed at each task.

Only the fourth set of designs has any hope of explaining or replicating true brain-like intelligence[11-13]. This follows from an obvious process of elimination: the brain uses neural nets to compute actual control signals, not just as sources of information to an external controller. The brain has some subtle capabilities involving imitation, but human learning involves more than just slavish cloning; children do not simply implement every rule their parents tell them, when their parents tell them, and their parents do not give them complete "training sets" of how to move every muscle at every moment in time. Likewise, parents do not give their children "reference trajectories" of where their muscles should be at every moment in time. This leaves the optimization designs -- among all the existing, working neural designs -- as the only known global organizing principles able to encompass all this behavior.

As a practical matter[2], it is often useful to begin by using a simpler design to develop a controller, and then use the resulting controller as the initial state of a controller to be adapted by optimization methods. In a similar way (but far more complex[1]), one may postulate special mechanisms in the human brain that would use imitation and words to help "initialize" the optimal control systems in the brain. The fit between these optimization designs and the brain is discussed further in [11-133].

Supervised Fuzzy Control

The vast bulk of adaptive fuzzy controllers today are limited to the second task above -- cloning.

The reason for this is very simple: to construct a database of desired actions (as discussed in section 2), one must already have some kind of controller or expert available to specify the desired actions, to be inserted into the database.

Fuzzy clones can still be useful in some applications. One can first ask an expert how to control a process; thus one initializes the controller to what the expert says. Then one can adapt the controller to match what the expert actually does in controlling the plant. There are many, many variations of this technique[1,2].

Unfortunately, simple supervised control has important limitations even for the task of cloning experts.

There are many difficult tasks which require that an expert develop a sense of dynamics over time, based on some intuitive understanding of phenomena which are not directly observed. In fact, if an automatic controller for some plant must incorporate dynamics over time, for the sake of stability or performance, then a human must also respond to such dynamics in order to be competent. (Even humans cannot escape the laws of mathematics!) No design which is based on static mapping from X(t) to u(t) could adequately capture the behavior of the human expert in that kind of application.

In that application, one must first find a way to talk to the expert about dynamics. Of course, it is easy enough in principle to define words which depend on time t-1, or which are self-referring, etc.

There is a new, large literature on fuzzy semantics for phrases like "quickly growing," etc. After one has such a dynamic fuzzy controller, how does one adapt it to what the expert does? At that point, one is basically engaging in adaptive system identification. One is modelling the operator himself, using the same kinds of system identification methods one normally uses on a plant. Adaptive system identification turns out to be a very tricky business; some elementary approaches are given in [9], but more robust and reliable methods are given in chapter 10 of [2]. Most failures of neurocontrol in the past few years appear to be due to the use of elementary approaches in system identification, in applications which demand more robustness.

Tracking and Optimization

There are many challenging control problems which human experts do not know how to control, or which demand higher performance than human experts have exhibited. (It is very irritating when a few people end with the conclusion that "neural networks do not work" in some application, when they have only tried out simpler designs which are not adequate for that application!) Also, to imitate human learning abilities, artificial systems should be able to learn a control strategy for the first time, without having to depend 100% on copying someone else. Tracking and optimization designs provide a way of doing this.

As a practical matter, the best existing designs for tracking[2] are based on treating the task as a task in optimization. (For example, one can try to minimize tracking error or energy consumption or some combination of the two[2,10].) Also, human learning is not based on some kind of explicit step-by-step reference trajectory for their lives. There are many manufacturing systems, as well, for which the desired product is known but the trajectory which produces it at maximum quality and minimum cost is not known. For all these reasons, I will focus here on learning techniques for optimizing performance over time.

Strictly speaking, supervised control can still be of use in problems which require optimization. For example, one can ask the human expert for rules of action. This gives the initial state of an ELF network. One can then adapt this network, so as to imitate what the expert does instead of what he says. One can then use the resulting controller as the initial state of a controller adapted to optimize performance.

This multi-step approach can be very important, both in reducing the overall difficulty of learning and in

minimizing the effects of the unavoidable real-world difficulties involving local minima.

But how does one train a network to go beyond the training database provided by the expert, to improve performance beyond that of the human?

Two families of designs are known to perform this task in a useful, efficient manner, and can scale to large problems: (1) the backpropagation of utility; (2) adaptive critics. The Handbook of Intelligent Control[2] describes both families in great detail. It gives pseudocode for "main programs" which can be used to adapt any network or system for which the dual subroutine is known. Because this paper provides pseudocode for the ELF and F_ELF subroutines, the reader should be able to plug in elastic fuzzy logic directly into those main programs (though the F_X derivatives need to be added in some cases).

In the past few years, a variety of empirical studies have shown that these optimization methods can indeed handle problems which had been too difficult for earlier methods. For example, the composites division at McDonnell Douglas had spent millions of dollars in past years, trying to control a continuous-production system to make very high quality composite parts. (The cost of these parts is a major problem in the aircraft industry.) AI, classical control, and extensive physical modelling had not solved the problem, either at McDonnell-Douglas or elsewhere. After reading [10] and [14], White and Sofge[2] decided to try neurocontrol. The basic tracking methods did not work. But White and Sofge did not publish a paper saying "neural networks don't work;' they decided instead to climb up the ladder of designs, as suggested in [14], and try out a simple adaptive critic method -- the Barto, Sutton and Anderson method[15] -- on a simplified version of the problem. When that worked, but did not scale to the larger problem, they remembered the parts about scaling in [10] and [14]; thus they moved on up to an advanced adaptive critic, combining backpropagation together with an adaptive critic design, in an appropriate way[2]. That really solved the problem. In subsequent work, White and Sofge have shown that a similar architecture can solve critical problems in flight control, in thermal control and elsewhere, of even larger real-world importance[2].

In very recent work[17], Jameson has performed simulations showing that more complex critic designs -- using both backpropagation and a model emulator -- can solve a simple but difficult robot control problem which was resistant even to the kind of design used by White and Sofge; working with me[17], Santiago --- now at BehavHeuristics -- has shown faster, more robust learning in a simple broom-balancing application, when more advanced critic designs are used. One of the reviewers of this paper mentions that Jang has actually combined backpropagation with an adaptive critic design for fuzzy systems.

The backpropagation of utility is usually not quite so powerful; however, Hrycej[16] of Daimler-Benz and McAvoy of Maryland[2] have shown some impressive real-world results on automotive and chemical applications, with implications for the environment.

Some of the most exciting new applications of these methods have yet to be fully published, because they are being done for commercial application. For example, BehavHeuristics and USAir announced a new contract, months ago, which is now in the final implementation stage, in which an advanced adaptive critic is optimizing revenue management across all the seats flown by USAir. This is not a small-scale broom-balancing problem! For obvious reasons, the details have not all been published, but some sort of detailed presentation is scheduled for WCNN94, the World Congress on Neural Networks in San Diego. A variety of important papers have been published by Feldkamp and Puskorius and others from Ford, in WCNN93 and elsewhere. Accurate Automation of Chattanooga, Tennessee, has applied critics to a robust controller for the main arm of the space shuttle, and for a prototype controller of a prototype National Aerospace Plane, a physical prototype which they are now building for near-term flight testing. (Contact the company directly for copies of the relevant reports.) White and Sofge, at Neurodyne, have also succeeded in bench-scale tests of a new controller to improve efficiency and reduce pollution in an internal combustion engine. Applications which improve efficiency and reduce pollution have had many real-world applications, discussed by the corporate members of McAvoy's Neural Net Club [2].

An Example: Adapting ELF by Backpropagating Utility

Even though the generic concepts are given in [2], the reader may find it somewhat easier to get started if I give a more explicit example. (In fact, some readers of [2] have told me that they found [2] much easier to implement after reading [9], which offered a similar example, but for neural networks.) The adaptive critic designs are more brain-like, but they are also more complicated. Therefore, I will give an example based on the backpropagation of utility through time, which is a good starting point.

Let us suppose that we are trying to adapt a fuzzy controller which inputs X(t) and outputs u(t), as before. Let us suppose that we are starting from a fixed initial state X(0).

(It is easy to deal with the more general case, as in [2] and [5], but I will assume one fixed starting value for now for the sake of simplicity.) Let us suppose that we are trying to minimize:

for a known utility function U. Let us also suppose that X(t+1) depends only on X(t) and u(t), without noise. How could we adapt our ELF controller?

To use the backpropagation of utility, we must first develop an explicit model of the plant. For example, using the techniques in [9] or in chapter 10 of [2], we could adapt an artificial neural network which inputs X(t) and u(t) and outputs a prediction of X(t+1). We can program that network into a computer subroutine, MODEL(Xold,u,Xnew). For the most common neural network models, both [9] and [2] describe how to program the dual subroutine for such a model, F_MODEL(F_Xold,F_u,F_Xnew);

that subroutine inputs F_Xnew and outputs F_u and F_Xold. It is crucial to understand that you only need one dual subroutine for any network, regardless of whether you are using it to calculate the derivatives of error, the derivatives of utility, or anything else. (However, you do have to do a little extra work to make sure it outputs derivatives with respect to both weights and inputs; in this example, I only show

those derivatives which I need in this learning design.)

To adapt the ELF controller, we could iterate over the following steps:

1. Initialize F_gamma_total,F_uprime_total, and F_X(T+1) to zeroes.

2. For each time, t=1 to time T, calculate X(t) and U(X(t)) by calling

three subroutines in order:

CALL ELF(u(t-1),X(t-1)) (to calculate u(t-1))

CALL MODEL(X(t-1),u(t-1),X(t)) (to calculate X(t))

CALL U(X(t))

3. For each time, starting from t=T and working back to t=0,

perform the following calculations in order:

CALL F_MODEL(F0_X(t),F_U(t),F_X(t+1))

F_X(t) = F0_X(t) + F_U(X(t))

CALL F_ELF(F_gamma,F_uprime,F_u(t))

F_gamma_total = F_gamma_total + F_gamma

F_uprime_total = F_uprime_total + F_uprime

4. Adapt gamma and uprime:

new gamma = old gamma - LR1*F_gamma_total

new uprime = old uprime - LR2*F_uprime_total

The five assignment statement in this algorithm all represent the addition or substraction of arrays, rather than scalars.

Once again, the subroutine F_ELF used here is exactly the same subroutine as the one described in section 2, even though it is used here to calculate a different kind of derivative. Also, F_U represents a subroutine which inputs X(t) and outputs the derivatives of U(X(t)) with respect to X(t)

The algorithm above should be very straightforward to implement, just like the one in section 2. To get maximum performance, however, there are many, many tricks which have been learned through experience[2]. For example, one can actually start out by using possible values for X(T-1) as a starting point, instead of X(0); one can gradually work one's way back in time. Also, one must pay careful attention to the quality of the MODEL (perhaps by testing for performance in simulations where the model generating the simulations is known). Convergence can be sped up by using adaptive learning rates; for example, as in [2], we could use the update rule:

for some "arbitrary" a and b (such as 0.2 and 0.9).

A few tips regarding debugging are given in [2] and [9]. Among the most important are

perturbations to check the validity of derivatives as calculated, and the use of spreadsheet versions of the algorithm on simplified test problems whenever the computer program has difficulties with such tests.

4. Reasoning, Planning and Intelligence

Lotfi Zadeh has argued that any capability which exists in classical AI can be replicated as well in fuzzy logic. One can simply "fuzzify" the AI system, whatever it may be. In exactly the same way, one could "elasticize" that fuzzy system, by using the alternative "AND" operator described above. ("OR" operators follow trivially from AND, if we define "NOT" as one minus the original truth value.) That, in turn, permits one to use neural network learning methods to adapt any kind of AI system, including systems used for complex reasoning and planning.

When inference structures become complex, Zadeh has pointed out that they are not always feedforward. Sometimes we must solve a system of simultaneous equations, in order to calculate the output of such an inference structure. Nevertheless, the procedures of section 2 (and 3) still work out, in a straightforward way. For each example of reasoning, t, we can still calculate the derivatives of error (i.e. actual minus desired output) with reference to the structure used at time t, by backpropagating through that inference structure. If the inference structure involves simultaneous equations, one can use the procedures given in chapter 3 of [2] to backpropagate through simultaneous-recurrent systems.

As a practical matter, classical AI has found it very difficult to address the complex issues posed by real-time control applications. Many people believe that Albus' Real-time Control System (RCS)[3] represents the best capabilities now in existence.. RCS includes a number of important user interfaces, and labelling conventions, etc.; however, underneath all of that, the system is basically built up from a set of tables of if-then rules. Each individual table is exactly like the simple if-then rules used in conventional fuzzy control, except that each table may input or output from a shared memory instead of the just X or Y. Clearly it should be straightforward to fuzzify such tables. The resulting system then turns out to be a recurrent network, instead of a feedforward network, because of the shared memory; however, [2] gives methods to adapt recurrent structures.

Strictly speaking, many of us believe that planning and reasoning can be achieved by neural networks -- as in the brain -- without any need to hardwire the complex notions stressed in AI. This is a complex subject, discussed at great length in [1,2,1,13]. The neurocontrol designs now in use clearly do not display those kinds of capabilities; however, a ladder of designs has been spelled out, at least on paper, which should be able to carry us that high, if only we have the will to climb higher up that ladder, one step at a time. Insights from AI may yet be of great value to us as we try to climb up that ladder, by suggesting more sophisticated neural network designs[2,8].

References

1. P.Werbos, Neurocontrol and fuzzy logic: connections and designs,Int'l J. Approx. Reasoning, Feb. 1992. A less complete version was in R.Lea & J.Villareal, eds., Proc. Second Joint Technology Workshop on Neural Networks and Fuzzy Logic (April 1990). NASA Conference Pub. 10061, Feb. 1991.

2. D.White & D. Sofge, eds, Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches, Van Nostrand, September 1992.

3. M.Herman, J.Albus and T.Hong, Intelligent control for multiple autonomous undersea vehicles,

in W.Miller, Sutton & Werbos, eds, Neural Networks for Control, MIT Press, 1990.

4. Yasuhiko Dote, Fuzzy and neural network controllers, in M.Padgett, ed, Proc. Second Workshop on Neural Networks: Academic/Industrial/NASA/Defense, Society for Computer Simulation, 1991.

5. P.Werbos, Beyond Regression, Harvard Ph.D. thesis, 1974, reprinted in P.Werbos, The Roots of Backpropagation: From Ordered Derivatives to Neural Networks and Political Forecasting, Wiley, 1993.

6. R.Jacobs et al, Adaptive mixtures of local experts, Neural Computation, 3:(1), 1991.

7. J.W.Tukey, Exploratory Data Analysis, Addison-Wesley, 1977.

8. P.J.Werbos, Supervised learning: can it escape from its local minimum,

WCNN93 Proceedings. Washington DC: INNS Press (/Erlbaum), 1993.

9. P.Werbos, Backpropagation through time: what it does and how to do it, Proc. of the IEEE, October 1990 issue. Also reprinted with [5].

10. W. Miller, Sutton & Werbos, Neural Networks for Control, MIT Press, 1990.

11. P.Werbos, Neural networks and the human mind: new mathematics fits humanistic insight.

In Proc. IEEE Conf. Systems, Man and Cybernetics (Chicago 1992), IEEE, 1992. Also reprinted with [5].

12. P.G.Madhavan et al, eds., Neuroscience and Neural Networks: Efforts Towards an Empirical Synthesis (CNS 92 Proceedings). Preliminary edition available from Madhavan, Dept. of Electrical Engineerining, Indiana-University-Purdue-University-at-Indianapolis (IUPUI), Indiana.

13. P.Werbos, The cytoskeleton: why it may be crucial to human learning and to neurocontrol,

Nanobiology, Vol. 1, No. 1, 1992.

14. P.Werbos, Backpropagation and neurocontrol: a review and prospectus, in IJCNN Proceedings

(Washington D.C.), IEEE, June 1989.

15. A.Barto, Sutton & Anderson, Neuronlike elements that can solve difficult learning control

problems, IEEE Trans. Systems, Man, Cyber., 13(5):835-846.

16. T.Hrycej, Model-based training method for neural controllers, in I.Aleksander & J.Taylor, eds, Artificial Neural Networks 2, North Holland, 1992.

17. P. Werbos and R. Santiago, Neurocontrol, Above Threshold (INNS), Vol. 2, No. 2, 1993.

APPENDIX: PSEUDOCODE FOR THE PROCEDURES ABOVE

This Appendix provides pseudocode for many of the designs discussed above. By the time this paper appears in print, additional pseudocode and actual running code will probably be available from BehavHeuristics , Inc. (BHI), of College Park, Maryland, which has patents pending both for ELF and for a number of the adaptation techniques mentioned here.

Equations 2 through 5 -- giving conventional fuzzy control -- can be expressed very easily in pseudocode, in something which looks like a computer subroutine:

SUBROUTINE FUZZ(u,X);

REAL u(n),X(m),x(na),R(r),RSIGMA,uprime(n,r),

running_product,running_sum;

REAL FUNCTION MU(i,X);

INTEGER j,k,l,nj(r),i(r,na)

/* First implement equation 2. Use k instead of i for computer.*/

FOR k=1 TO na;

x(k) = MU(k,X);

end;

/* Next implement equation 3.*/

FOR j=1 TO r;

running_product=1;

FOR k=1 TO nj(r);

running_product=running_product*x(i(j,k));

end;

R(j)=running_product;

end;

/* Next implement equation 5 */

running_sum=0;

FOR j=1 TO r;

running_sum=running_sum + R(j);

end;

RSIGMA=1/running_sum;

/* Next implement equation 4*/

FOR k=1 to n;

running_sum=0;

FOR j=1 to r;

running_sum=running_sum+R(j)*uprime(k,j);

end;

u(k)=running_sum*RSIGMA;

end;

The subroutine above inputs the sensor array X and outputs the control array u. The arrays uprime and i and the function MU represent u'(j), i_j,k and the set of membership functions,

respectively; they need to be generated in additional, supplementary computer code.

In this code, I allocated a huge amount of space for the array "i", just to keep things simple;

in actuality, an efficient program would be different, and harder to understand. This pseudocode is given only to help explain the basic ideas in this paper.

To implement elastic fuzzy control, we can write a new subroutine ELF(u,X,gamma,uprime).

This new subroutine is exactly like the subroutine FUZZ above, except that we

can allocate space for the array gamma by:

REAL gamma(r, 0:na)

and we replace the block which implemented equation 3 by:

/* Implement equation 6 */

FOR j=1 TO r;

running_product = gamma(j,0);

FOR k=1 to nj(r);

running_product=running_product*(x(i(j,k))**gamma(j,k));

end;

R(j)=running_product;

end;

To adapt this ELF network in a supervised control situation, we may use the following

program, which corresponds to the algorithm described in section 2:

INTEGER iter,t,T,k,j,nj(r)

REAL gamma(r,0:na),uprime(n,r),F_gamma_total(r,0:na),F_uprime_total(n,r),

F_gamma(r,0:na),F_uprime(n,r),X(m,T),ustar(n,T),u(n),F_u(n),lr1,lr2

DO iter= 1 to maximum_iterations;

/* First implement step 1*/

FOR j=1 TO r;

FOR k=0 to nj(r);

F_gamma_total(j,k)=0;

end;

FOR k=1 TO n;

F_uprime_total(k,j)=0;

end;

/*Next implement step 2, starting with 2a*/

FOR t=1 TO T;

CALL ELF(u,X(,t),gamma,uprime);

/*Next implement step 2b*/

FOR k=1 TO n;

F_u(k)=2*(u(k) - ustar(k,t));

end;

/* Express step 2c as a subroutine*/

CALL F_ELF(F_gamma,F_uprime,F_u);

/* Implement step 2d */

FOR j=1 TO r;

FOR k=0 TO nj(r);

F_gamma_total(j,k)=F_gamma_total(j,k)+F_gamma(j,k);

end;

FOR k=1 to n;

F_uprime_total(k,j)=F_uprime_total(k,j)+F_uprime(k,j);

end;

/* Finally, step 3*/

FOR j=1 TO r;

FOR k=0 to nj(r);

gamma(j,k)=gamma(j,k)-lr1*F_gamma_total(j,k);

end;

FOR k=1 TO n;

uprime(k,j)=uprime(k,j)-lr2*F_uprime_total(k,j);

end;

To program the subroutine F_ELF, which implements equations 8 through 11,

you can start from the following pseudocode:

SUBROUTINE F_ELF(F-gamma,F-uprime,F_u);

REAL u(n),X(m),gamma(r,0:na),uprime(n,r),base,F_gamma(r,0:na),

F_uprime(n,r),running_sum,F_u(n),R(r),RSIGMA,F_R(r);

INTEGER nj(r),k,j,i(r,na);

/* First calculate F_u dot u, the scalar "base". */

running_sum=0;

FOR k=1 TO n;

running_sum=running_sum + F_u(k)*u(k);

end;

base=running_sum;

/* Next, implement equations 8 through 11 for each rule j*/

FOR j=1 TO r;

/* Equation 8*/

FOR k=1 TO n;

F_uprime(k,j)=F_u(k)*R(j)*RSIGMA;

end;

/*Equation 9*/

running_sum=0;

FOR k=1 TO m;

running_sum=running_sum+F_u(k)*uprime(k,j);

end;

F_R(j)=RSIGMA*(running_sum - base);

/*Equation 10*/

FOR k=1 TO nj(j);

F_gamma(j,k)=F_R(j)*R(j)*log(x(i(j,k)));

end;

/*Equation 11*/

F_gamma(j,0)=F_R(j)*R(j)/gamma(j,0);

end;