Elastic
Fuzzy Logic: A Better Fit To Neurocontrol and True Intelligence
Paul
J. Werbos
Room 1151, National
Science Foundation^{*}
Washington
D.C., USA 20550
Abstract
Since 1990 or
so[1], there has been great interest in using adaptive methods taken
from neural network theory to adapt fuzzy-logic networks or other fuzzy
systems. The basic idea is to use fuzzy logic as a kind of translation
technology, to go back and forth between the words of a human expert and the
equations of a controller, classifier, or other useful system. One can then use
neural network methods to adapt that system, so as to improve
performance.
Designs of
this sort have already been useful. However, the existing designs do not live
up to the full potential of this approach, and they do not achieve anything
like brain-style "intelligence." This paper will propose a two-fold
approach to achieving the full potential of such hybrid systems: (1) the use of
elastic fuzzy logic (ELF), a new extension of fuzzy logic which makes it
possible to combine the best of fuzzy logic and neural networks; (2) the use of
advanced learning techniques -- using some ELF components -- which make
it possible to perform true planning or optimization over time on an adaptive
basis[2]. Pseudocode examples will be given to help in application. The paper
will also discuss symbolic reasoning, and links to the Real-time Control System
(RCS) of Albus [3], which represents the state of the art in classical AI for
control.
1.
Introduction
How can we
build artificial systems which are truly intelligent, in a brain-like sense,
which incorporates both learning and the efficient use of distributed systems?
How can we build useful systems
capable of learning and implementing strategies of action
or plans to accomplish difficult tasks or solve problems over time?
______________________________________________________
*The views herein are those of the author, developed and
written up on personal time, and not the views of NSF. BehavHeuristic, Inc.
(BHI) of College Park, Maryland, has patents pending on ELF and on some of the
adaptation techniques discussed here, as well as more advanced techniques.
This paper
will describe a three-pronged approach to these classical problems, based on
elastic fuzzy logic (ELF) and advanced learning designs taken from
neurocontrol. (I use "ELF" instead of the "correct"
abbreviation EFL, because "ELF" is easier to pronounce.) It will also
explain why the existing approaches to blending fuzzy logic and neural networks
do not quite solve these problems.
The approach
here is very broad and very general, as I will describe in Section 4. However,
most engineers prefer to begin with a simple and useful example; therefore, the
first part of this paper will discuss ELF in the context of conventional,
simple fuzzy control. Most readers familiar with classical fuzzy control would
regard this as the general case, rather than a specific example;
however, section 3 will go on to describe more general control tasks and
designs, very briefly. Section 3 will review some concepts from neurocontrol,
and explain why the learning approaches used in most applications today are
very limited in their power, compared with the best state of the art.
This paper
does not claim that ELF is the best design to use on all problems requiring
some version of "intelligent control;" however, in those applications
where fuzzy control is useful (of which many, many examples have been
published), I claim that ELF does permit the use of well-studied, well-proven
generic methods from the neural network field, whose link to brain-like intelligence
has been discussed at very great length elsewhere. To prove that ELF permits
the use of those methods, I will simply specify how to implement these hybrid
designs.
2.
ELF in Conventional Fuzzy Control
This section
will first review conventional fuzzy control, and then define what ELF is
in that context.
It will give details on one way of implementing it. Then
it will explain the advantages of ELF compared with other forms of adaptable
fuzzy control. The implementation details may seem unnecessarily complete to
some readers, but they should make it easier to understand and implement the
more powerful designs to be discussed in section 3. Pseudocode will be
presented in the Appendix.
Review of Conventional Fuzzy Control
In
conventional fuzzy control, the expert provides a set of rules -- expressed in
words -- and some information about what the words in the rules actually mean.
Standard fuzzy logic procedures are used to translate this information
into a set of equations used to define a controller. We can think of these
equations as a kind of neural network with two hidden layers. Conventional
fuzzy control is one way of addressing the more general task of
"cloning" or imitating a pre-existing expert.
At each time
t, the controller will have access to a set of input variables or sensor
readings, X_{1}(t) through X_{m}(t), which we can think of as a
vector X(t). It will emit a set of actuator or motor or action
variables, u_{1}(t) through u_{n}(t), which we can think of as
a vector u(t). The human expert must first be told what inputs X
and controls u are available, before describing a controller.
(Of course, the expert himself may specify the inputs and controls required for
the particular application.)
After the
expert knows what the inputs and outputs are, he then gives you a list of r
rules, expressed in words. Each rule must be of the form:
If A and
B and ... and C then do D,
where A through D are words or phrases. (D is called the
"verb" in this rule.)
Usually, the same word A appears in more than one rule,
but each rule uses a different subset of the available words. There is no
requirement that rules "divide up" the input space into distinct
partitions;
rules usually overlap each other, and one rule may be
more specific than other rules, and so on.
Many, many examples of this appear in the literature.
For each word
or phrase used in the rule, the expert must also supply a definition of
what the word means, relative to the actual input variables X and
control variables u available to the controller. Thus for each
input word A, he must specify a membership function μ_{A}(X)
which indicates the degree to which the word A is true for each
situation X. For each verb, D, he provides the membership
function μ_{D}(u), in theory; in practice, he
usually just specifies u'(D), the control settings which best
match the word D.
(A strict fuzzy logician would describe u'(D)
as the centroid of the function μ_{D}(u).)
The
information from the expert is translated into a two-hidden-layer network as
follows.
First, we must
do some bookkeeping. The set of input words across the entire system are
put into an ordered list. The first word may be called A_{1}, the
second A_{2}, and so on, up to the last word, A_{na}.
The rules also form a list, from rule number 1 to rule
number r. For each rule, rule number j, we must look up each input word on the
overall list of words A_{i}; thus if "B" is the second word
in rule number j, then word B should appear as A_{k} on the overall
list, for some value of k. We may define "i_{j,2}" as that
value of k. More generally, we may define i_{j,}_{η} as that value of k such that A_{k} matches the ηth input word in rule number j. Using this notation, rule
number j may be expressed as:
_{} 1 |
_{} 2 |
_{} 3 |
where nj is the number of input words in rule number j,
and where u'(j) refers to u'(D) for the verb D of
rule number j.
Using this
notation, the information from the expert is translated into a two-hidden-layer
network as follows. The first hidden layer is the membership layer:
The next hidden layer is the layer of rule-activation,
which calculates the degree to which rule number j applies to situation X:
The output layer is the simple defuzzification rule used
in most practical applications[4]:
_{} 4 |
where I define:
_{} 5 |
Theoretical papers often discuss alternative versions of
this, such as the use of the minimum function instead of a product in equation
3; however, empirical comparisons reported in Japan (discussed at length in the
1992 Iizuka conference) have shown that the simple form shown in equation 3
usually leads to better results. This is what I would expect, because equation
3 is smooth and differentiable (more like human experts), but the traditional
minimum rule is crisper and more artificial.
In the
Appendix, I give pseudocode for a subroutine, FUZZ(u,X), which implements these
equations.
Naturally, the subroutine inputs the array X, and outputs
the array u.
Elastic Fuzzy Control: the Basics
In elastic
fuzzy logic, the words coming from the expert can be translated initially into
equations which are absolutely equivalent to those above. However,
additional parameters are inserted into the system for use in later adaptation.
Equation 3 is replaced by the equation:
_{} 6 |
where the gamma parameters are all set to one initially.
The Appendix describes how to code this up as a new subroutine,
ELF(u,X,gamma,uprime), very similar to our old subroutine FUZZ.
Intuitively,
the γ_{j,0} parameters represent the strength or degree of validity
of each rule. The parameters γ_{j,k} represent the importance
of each condition (input word) to the applicability of each rule. For example,
when γ_{j,k} equals 2, in ELF, this would be equivalent to having
that word appear twice as an input condition, in conventional fuzzy
control.
At the end of
this section, I will discuss the advantages of this approach in more detail,
relative to the many alternatives in existence. First, however, I will describe
an example of how to use it.
Adapting ELF by Supervised Control
At present,
when people adapt fuzzy control systems, they generally use a simple
adaptation procedure which is called "supervised control" in the
neurocontrol field[2]. Supervised control has very limited capabilities.
Section 3 will describe its limitations, and how to do better. However, as a
basic introduction, I will now describe how to adapt an ELF system in
supervised control. Supervised control, in turn, is merely one example of how
to perform the task of cloning a preexisting expert or controller; section 3
will describe other tasks of interest in control.
In supervised
control, the user first creates a database of "correct" control
actions.
For each example, example number t, the user provides a
vector of sensor inputs X(t) and a vector of desired or correct
control actions, u^{*}(t) The weights in the system are
usually adapted so as to minimize:
_{} 7 |
In ELF, we would most often define the weights as the combination
of the gamma parameters and the u' vectors.
To minimize E_{tot}
as a function of the weights, we can use the conventional neural-network
technique of backpropagation, which I first developed in 1974 [5]. This can be
described as an iterative approach.
(See [2,5] for real-time versions as well.) On the first
iteration, we initialize the gamma parameters to one, and initialize uprime to
the values given by the expert. On each subsequent iteration, we can take the
following steps:
1. Initialize
the arrays of derivatives F_gamma_total(j,k) and
F_uprime_total(j,k) to zero.
2. For each
example t do:
2a. CALL
ELF(u,X(t),gamma,uprime)
2b. Calculate
the vector of derivatives of E(t) with respect to the
components of u(t):
F_u(t) = 2*(u(t) - u^{*}(t))
2c.
Using backpropagation -- the chain rule for ordered derivatives --
work the
derivatives back to calculate F_gamma(t) and F_uprime(t),
the
derivatives of E(t) with regard to the gamma and uprime parameters.
2d. Update
the array F_gamma_total to F_gamma_total plus F_gamma(t), and
likewise
for F_uprime_total.
3. Update the
arrays of parameters:
new gamma
= old gamma - LR1 * F_gamma_total
new
uprime = old uprime - LR2 * LR2 * F_uprime_total,
where LR1 and LR2 are positive scalar "learning
rates" chosen for convenience. Note how the
assignment statements here all refer to array
operations rather than scalar operations.
Pseudocode for this algorithm is given in the Appendix.
All of this is
very straightforward, and there are many ways known in the neural network field
to improve performance[2]. The one key challenge remaining is to program the dual
subroutine, F_ELF, which inputs the derivatives in the array F_u and outputs
the derivatives in the arrays F_gamma and F_uprime.
(That subroutine must also have access to other
information, but I will not list that explicitly as additional
arguments, because that might confuse the underlying idea
here.)
In order to
calculate the derivatives efficiently, starting from knowledge of F_u, we can
use the chain rule for ordered derivatives[2,5] to derive the following
equations:
_{} 8 _{} 9 _{} 10 |
_{} 11 |
where the centered dot represents a vector dot product,
and where F_u' is a vector.
This dual
subroutine could be expanded further, so as to output F_X, the derivatives of E
with respect to the inputs X(t); however, that would require
knowledge of the membership functions (or another dual subroutine, F_MU).
Practical Advantages of Elastic Fuzzy Control
ELF is
certainly not the only form of adaptable fuzzy logic proposed in the past few
years; however, it has a number of unique advantages.
The most
common version of adaptable fuzzy logic is based on putting parameters into the
membership functions μ
rather than the rules. This has two disadvantages.
First, by
changing the function μ_{A}(X), we are changing the definition
of the word A. Thus the computer is no longer defining words in the same way as
the expert. This could reduce our ability to explain to the expert what
the adapted version of the controller is doing, or even what was changed in
adaptation. When A is a simple word, representing a function of only one
sensor input X_{i}, then this may not be a real problem; however, for
more complex words, it could be a problem.
Second -- and
more important -- changing the membership functions does not allow you to
change the rules themselves; thus the scope for adaptation is very
limited. This kind of adaptation can give better results than fixed logic, but
it is still quite limited. It does not provide a way to change the basic
structure of the "fuzzy partition" of the input space.
When the word
A depends on X in a complex way, you might try a very different
approach to adapting membership functions. You might present the expert with
many different examples of X, and ask him to say how much
the word A applies in each example. You could then use a simple neural network
to learn the mapping from X to μ_{A}. Neural
networks are often described as "black boxes," but in this
application it can yield a membership function which
matches the expert's language more accurately; it therefore makes the overall
controller more of a "white box" to the expert.
In the past,
it has been suggested that the control vectors, u'(j), for each
rule j, be replaced by a neural network, inputting the vector X.
This gives considerable flexibility, but the adapted version is then still a
black box, to some extent. Could this be any better than simply using a single
neural network for the entire controller? In fact, it could be better, because
each of the rule-specific neural networks would be "local"; as with
the work of Jacobs et al[6], one might expect faster real-time learning in such
a system. Local networks have many practical advantages[2]. Nevertheless, this
scheme does not result in a white box model. Also, this scheme could actually
be combined with elastic fuzzy logic where needed.
Other
researchers have proposed something like ELF, but without the γ_{i,j} exponents.
These exponents play a crucial role in adapting the content of each
rule; therefore, they are crucial in providing more complete adaptability.
Crucial to the
use of ELF is our ability to explain the adapted controller back to the
expert. The γ_{j,0}
parameters can be reported back as the
"strength" or "degree of validity" of each rule. The
parameters γ_{j,k} can be described as the "importance" of each
condition (input word) to the applicability of the rule. In fact, the
parameters γ_{j,k} correspond exactly to the "elasticities" used
by economists; the whole apparatus used by economists to explain the idea of
"elasticity" can be used here as well. If economists can understand
intuitively what an elasticity is, then engineers (the
usual experts in control applications) should not have great difficulty with
them. Also, as a further guide to explaining them, Ron Yager has pointed out
that a γ_{j,k} of 2 is equivalent to having the word appear twice
in the rule. In general, a γ_{j,k} of k (where k is a
positive integer) is equivalent to having the word appear k times in the rule.
Another
advantage of ELF is the possibility of adaptive adding and pruning
of rules, and of words within rules. When γ parameters are near zero, then the corresponding word or
rule can be removed. This is really just a special case of the general
procedure of pruning connections and neurons in neural networks -- a
well-established technique. Likewise, new connections or rules could be tested
out safely, by inserting them with γ's initialized to zero, and made effective only as
adaptation makes them different from zero. In summary, neural network
techniques can be used with ELF nets to adapt the very structure of the
controller. The potential scope for adaptation is very great.
The scope for
adaptation, while great, is not unlimited. The choice of words, A_{i},
does limit the allowed set of rules. In any particular application, one can
test whether this limitation has measurable consequences simply by using a
neural network instead (or making the u' into neural networks),
and seeing whether better performance can be had that way. Similar approaches
might be used to develop tools to help the expert expand his vocabulary, using
display techniques similar in flavor to exploratory data analysis[7].
The ELF
network has some very interesting capabilities as a type of neural network,
even apart from its link to fuzzy logic. It combines the ability to cope with
many input variables, in a smooth and adaptive way, together with an
essentially "local" structure. This kind of capability has been very
important (and hard to find) in real-time control applications of neural
networks[8].
3.
Alternative Learning Designs and Neurocontrol
Introduction
Broadly
speaking, neural networks have been used in four different ways in useful
applications in control[2,9]:
1. As subsystems
of a larger system, used for pattern recognition, diagnostics,
sensor
fusion, dynamic system identification, etc.;
2. As "clones"
which learn to imitate human experts or other existing controllers;
3. As "tracking"
systems, which make a robot arm track a desired trajectory in
space, make a
chemical plant stay at a desired setpoint, make a plant
track a
desired reference model, etc.
4. As optimization
systems, systems which learn strategies of action which maximize
some measure
of utility (or performance or profit or goal satisfaction, etc.) or minimize
some measure
of cost over future time.
Every one of these four tasks is associated with a large
body of learning designs, specifically aimed at each task.
Only the
fourth set of designs has any hope of explaining or replicating true brain-like
intelligence[11-13]. This follows from an obvious process of elimination: the
brain uses neural nets to compute actual control signals, not just as
sources of information to an external controller. The brain has some subtle capabilities
involving imitation, but human learning involves more than just slavish cloning;
children do not simply implement every rule their parents tell them, when their
parents tell them, and their parents do not give them complete "training
sets" of how to move every muscle at every moment in time. Likewise,
parents do not give their children "reference trajectories" of where
their muscles should be at every moment in time. This leaves the optimization
designs -- among all the existing, working neural designs -- as the only known
global organizing principles able to encompass all this behavior.
As a practical
matter[2], it is often useful to begin by using a simpler design to
develop a controller, and then use the resulting controller as the initial
state of a controller to be adapted by optimization methods. In a similar
way (but far more complex[1]), one may postulate special mechanisms in the
human brain that would use imitation and words to help "initialize"
the optimal control systems in the brain. The fit between these optimization
designs and the brain is discussed further in [11-133].
Supervised Fuzzy Control
The vast bulk
of adaptive fuzzy controllers today are limited to the second task above --
cloning.
The reason for this is very simple: to construct a
database of desired actions (as discussed in section 2), one must already
have some kind of controller or expert available to specify the desired
actions, to be inserted into the database.
Fuzzy clones
can still be useful in some applications. One can first ask an expert how to
control a process; thus one initializes the controller to what the expert says.
Then one can adapt the controller to match what the expert actually does
in controlling the plant. There are many, many variations of this
technique[1,2].
Unfortunately,
simple supervised control has important limitations even for the task of
cloning experts.
There are many difficult tasks which require that an
expert develop a sense of dynamics over time, based on some intuitive
understanding of phenomena which are not directly observed. In fact, if an
automatic controller for some plant must incorporate dynamics over time,
for the sake of stability or performance, then a human must also respond
to such dynamics in order to be competent. (Even humans cannot escape the laws
of mathematics!) No design which is based on static mapping from X(t)
to u(t) could adequately capture the behavior of the human expert
in that kind of application.
In that
application, one must first find a way to talk to the expert about dynamics. Of
course, it is easy enough in principle to define words which depend on time
t-1, or which are self-referring, etc.
There is a new, large literature on fuzzy semantics for
phrases like "quickly growing," etc. After one has such a dynamic
fuzzy controller, how does one adapt it to what the expert does? At that point,
one is basically engaging in adaptive system identification. One is
modelling the operator himself, using the same kinds of system identification
methods one normally uses on a plant. Adaptive system identification turns out
to be a very tricky business; some elementary approaches are given in [9], but
more robust and reliable methods are given in chapter 10 of [2]. Most failures
of neurocontrol in the past few years appear to be due to the use of elementary
approaches in system identification, in applications which demand more
robustness.
Tracking and Optimization
There are many
challenging control problems which human experts do not know how to control, or
which demand higher performance than human experts have exhibited. (It is very
irritating when a few people end with the conclusion that "neural networks
do not work" in some application, when they have only tried out simpler
designs which are not adequate for that application!) Also, to imitate human
learning abilities, artificial systems should be able to learn a control
strategy for the first time, without having to depend 100% on copying someone
else. Tracking and optimization designs provide a way of doing this.
As a practical
matter, the best existing designs for tracking[2] are based on treating the
task as a task in optimization. (For example, one can try to minimize tracking
error or energy consumption or some combination of the two[2,10].) Also, human
learning is not based on some kind of explicit step-by-step reference
trajectory for their lives. There are many manufacturing systems, as well,
for which the desired product is known but the trajectory which produces it at
maximum quality and minimum cost is not known. For all these reasons, I will
focus here on learning techniques for optimizing performance over time.
Strictly
speaking, supervised control can still be of use in problems which require
optimization. For example, one can ask the human expert for rules of action.
This gives the initial state of an ELF network. One can then adapt this
network, so as to imitate what the expert does instead of what he says.
One can then use the resulting controller as the initial state of a
controller adapted to optimize performance.
This multi-step approach can be very important, both in
reducing the overall difficulty of learning and in
minimizing the effects of the unavoidable real-world
difficulties involving local minima.
But how does
one train a network to go beyond the training database provided by the expert,
to improve performance beyond that of the human?
Two families
of designs are known to perform this task in a useful, efficient manner, and
can scale to large problems: (1) the backpropagation of utility; (2) adaptive
critics. The Handbook of Intelligent Control[2] describes both families
in great detail. It gives pseudocode for "main programs" which can be
used to adapt any network or system for which the dual subroutine is
known. Because this paper provides pseudocode for the ELF and F_ELF
subroutines, the reader should be able to plug in elastic fuzzy logic directly
into those main programs (though the F_X derivatives need to be added in some
cases).
In the past
few years, a variety of empirical studies have shown that these optimization
methods can indeed handle problems which had been too difficult for earlier
methods. For example, the composites division at McDonnell Douglas had spent
millions of dollars in past years, trying to control a continuous-production
system to make very high quality composite parts. (The cost of these parts is a
major problem in the aircraft industry.) AI, classical control, and extensive
physical modelling had not solved the problem, either at McDonnell-Douglas or
elsewhere. After reading [10] and [14], White and Sofge[2] decided to try
neurocontrol. The basic tracking methods did not work. But White and
Sofge did not publish a paper saying "neural networks don't work;' they
decided instead to climb up the ladder of designs, as suggested in [14], and
try out a simple adaptive critic method -- the Barto, Sutton and Anderson
method[15] -- on a simplified version of the problem. When that worked,
but did not scale to the larger problem, they remembered the parts about
scaling in [10] and [14]; thus they moved on up to an advanced adaptive critic,
combining backpropagation together with an adaptive critic design, in an
appropriate way[2]. That really solved the problem. In subsequent work, White
and Sofge have shown that a similar architecture can solve critical problems in
flight control, in thermal control and elsewhere, of even larger real-world
importance[2].
In very recent work[17], Jameson has performed
simulations showing that more complex critic designs -- using both
backpropagation and a model emulator -- can solve a simple but difficult robot
control problem which was resistant even to the kind of design used by White
and Sofge; working with me[17], Santiago --- now at BehavHeuristics -- has
shown faster, more robust learning in a simple broom-balancing application,
when more advanced critic designs are used. One of the reviewers of this paper
mentions that Jang has actually combined backpropagation with an adaptive
critic design for fuzzy systems.
The
backpropagation of utility is usually not quite so powerful; however,
Hrycej[16] of Daimler-Benz and McAvoy of Maryland[2] have shown some impressive
real-world results on automotive and chemical applications, with implications
for the environment.
Some of the
most exciting new applications of these methods have yet to be fully published,
because they are being done for commercial application. For example,
BehavHeuristics and USAir announced a new contract, months ago, which is now in
the final implementation stage, in which an advanced adaptive critic is
optimizing revenue management across all the seats flown by USAir. This is not
a small-scale broom-balancing problem! For obvious reasons, the details have
not all been published, but some sort of detailed presentation is scheduled for
WCNN94, the World Congress on Neural Networks in San Diego. A variety of
important papers have been published by Feldkamp and Puskorius and others from
Ford, in WCNN93 and elsewhere. Accurate Automation of Chattanooga, Tennessee,
has applied critics to a robust controller for the main arm of the space
shuttle, and for a prototype controller of a prototype National Aerospace
Plane, a physical prototype which they are now building for near-term flight
testing. (Contact the company directly for copies of the relevant reports.)
White and Sofge, at Neurodyne, have also succeeded in bench-scale tests of a
new controller to improve efficiency and reduce pollution in an internal
combustion engine. Applications which improve efficiency and reduce pollution
have had many real-world applications, discussed by the corporate members of
McAvoy's Neural Net Club [2].
An Example: Adapting ELF by Backpropagating Utility
Even though
the generic concepts are given in [2], the reader may find it somewhat easier
to get started if I give a more explicit example. (In fact, some readers of [2]
have told me that they found [2] much easier to implement after reading [9],
which offered a similar example, but for neural networks.) The adaptive critic
designs are more brain-like, but they are also more complicated. Therefore, I
will give an example based on the backpropagation of utility through time,
which is a good starting point.
Let us suppose
that we are trying to adapt a fuzzy controller which inputs X(t)
and outputs u(t), as before. Let us suppose that we are starting
from a fixed initial state X(0).
(It is easy to deal with the more general case, as in [2]
and [5], but I will assume one fixed starting value for now for the sake of
simplicity.) Let us suppose that we are trying to minimize:
_{} 12 |
for a known utility function U. Let us also suppose that X(t+1)
depends only on X(t) and u(t), without noise. How
could we adapt our ELF controller?
To use the
backpropagation of utility, we must first develop an explicit model of the
plant. For example, using the techniques in [9] or in chapter 10 of [2], we
could adapt an artificial neural network which inputs X(t) and u(t)
and outputs a prediction of X(t+1). We can program that network
into a computer subroutine, MODEL(Xold,u,Xnew). For the most common neural
network models, both [9] and [2] describe how to program the dual subroutine
for such a model, F_MODEL(F_Xold,F_u,F_Xnew);
that subroutine inputs F_Xnew and outputs F_u and F_Xold.
It is crucial to understand that you only need one dual subroutine for any
network, regardless of whether you are using it to calculate the
derivatives of error, the derivatives of utility, or anything else. (However,
you do have to do a little extra work to make sure it outputs derivatives with
respect to both weights and inputs; in this example, I only show
those derivatives which I need in this learning design.)
To adapt the
ELF controller, we could iterate over the following steps:
1. Initialize
F_gamma_total,F_uprime_total, and F_X(T+1) to zeroes.
2. For each
time, t=1 to time T, calculate X(t) and U(X(t)) by
calling
three
subroutines in order:
CALL
ELF(u(t-1),X(t-1))
(to calculate u(t-1))
CALL
MODEL(X(t-1),u(t-1),X(t)) (to calculate X(t))
CALL U(X(t))
3. For each
time, starting from t=T and working back to t=0,
perform the
following calculations in order:
CALL
F_MODEL(F0_X(t),F_U(t),F_X(t+1))
F_X(t) =
F0_X(t) + F_U(X(t))
CALL
F_ELF(F_gamma,F_uprime,F_u(t))
F_gamma_total = F_gamma_total + F_gamma
F_uprime_total = F_uprime_total + F_uprime
4. Adapt gamma
and uprime:
new gamma
= old gamma - LR1*F_gamma_total
new uprime
= old uprime - LR2*F_uprime_total
The five assignment statement in this algorithm all
represent the addition or substraction of arrays, rather than scalars.
Once again,
the subroutine F_ELF used here is exactly the same subroutine as the one
described in section 2, even though it is used here to calculate a different
kind of derivative. Also, F_U represents a subroutine which inputs X(t)
and outputs the derivatives of U(X(t)) with respect to X(t)
The algorithm
above should be very straightforward to implement, just like the one in section
2. To get maximum performance, however, there are many, many tricks which have
been learned through experience[2]. For example, one can actually start out by
using possible values for X(T-1) as a starting point, instead of X(0);
one can gradually work one's way back in time. Also, one must pay careful
attention to the quality of the MODEL (perhaps by testing for performance in
simulations where the model generating the simulations is known). Convergence can be sped up by using adaptive
learning rates; for example, as in [2], we could use the update rule:
_{} 13 |
for some "arbitrary" a and b (such as 0.2 and
0.9).
A few tips
regarding debugging are given in [2] and [9]. Among the most important are
perturbations to check the validity of derivatives as
calculated, and the use of spreadsheet versions of the algorithm on simplified
test problems whenever the computer program has difficulties with such tests.
4.
Reasoning, Planning and Intelligence
Lotfi Zadeh has
argued that any capability which exists in classical AI can be replicated as
well in fuzzy logic. One can simply "fuzzify" the AI system, whatever
it may be. In exactly the same way, one could "elasticize" that fuzzy
system, by using the alternative "AND" operator described above.
("OR" operators follow trivially from AND, if we define
"NOT" as one minus the original truth value.) That, in turn, permits
one to use neural network learning methods to adapt any kind of AI system,
including systems used for complex reasoning and planning.
When inference
structures become complex, Zadeh has pointed out that they are not always
feedforward. Sometimes we must solve a system of simultaneous equations,
in order to calculate the output of such an inference structure. Nevertheless,
the procedures of section 2 (and 3) still work out, in a straightforward way.
For each example of reasoning, t, we can still calculate the derivatives
of error (i.e. actual minus desired output) with reference to the structure
used at time t, by backpropagating through that inference structure. If
the inference structure involves simultaneous equations, one can use the
procedures given in chapter 3 of [2] to backpropagate through
simultaneous-recurrent systems.
As a practical
matter, classical AI has found it very difficult to address the complex issues
posed by real-time control applications. Many people believe that Albus'
Real-time Control System (RCS)[3] represents the best capabilities now in
existence.. RCS includes a number of important user interfaces, and labelling
conventions, etc.; however, underneath all of that, the system is basically
built up from a set of tables of if-then rules. Each individual
table is exactly like the simple if-then rules used in conventional fuzzy
control, except that each table may input or output from a shared memory
instead of the just X or Y. Clearly it should be
straightforward to fuzzify such tables. The resulting system then turns out to
be a recurrent network, instead of a feedforward network, because of the shared
memory; however, [2] gives methods to adapt recurrent structures.
Strictly
speaking, many of us believe that planning and reasoning can be achieved by
neural networks -- as in the brain -- without any need to hardwire the
complex notions stressed in AI. This is a complex subject, discussed at great
length in [1,2,1,13]. The neurocontrol designs now in use clearly do not
display those kinds of capabilities; however, a ladder of designs has been spelled
out, at least on paper, which should be able to carry us that high, if only we
have the will to climb higher up that ladder, one step at a time. Insights from
AI may yet be of great value to us as we try to climb up that ladder, by
suggesting more sophisticated neural network designs[2,8].
References
1. P.Werbos,
Neurocontrol and fuzzy logic: connections and designs,Int'l J. Approx.
Reasoning, Feb. 1992. A less complete version was in R.Lea &
J.Villareal, eds., Proc. Second Joint Technology Workshop on Neural Networks
and Fuzzy Logic (April 1990). NASA Conference Pub. 10061, Feb. 1991.
2. D.White
& D. Sofge, eds, Handbook of Intelligent Control: Neural, Fuzzy and
Adaptive Approaches, Van Nostrand, September 1992.
3. M.Herman,
J.Albus and T.Hong, Intelligent control for multiple autonomous undersea
vehicles,
in W.Miller, Sutton & Werbos, eds, Neural Networks
for Control, MIT Press, 1990.
4. Yasuhiko
Dote, Fuzzy and neural network controllers, in M.Padgett, ed, Proc. Second Workshop on Neural Networks:
Academic/Industrial/NASA/Defense, Society for Computer Simulation, 1991.
5. P.Werbos, Beyond
Regression, Harvard Ph.D. thesis, 1974, reprinted in P.Werbos, The Roots
of Backpropagation: From Ordered Derivatives to Neural Networks and Political
Forecasting, Wiley, 1993.
6. R.Jacobs et
al, Adaptive mixtures of local experts, Neural Computation, 3:(1), 1991.
7. J.W.Tukey, Exploratory
Data Analysis, Addison-Wesley, 1977.
8. P.J.Werbos,
Supervised learning: can it escape from its local minimum,
WCNN93 Proceedings.
Washington DC: INNS Press (/Erlbaum), 1993.
9. P.Werbos,
Backpropagation through time: what it does and how to do it, Proc. of the
IEEE, October 1990 issue. Also reprinted with [5].
10. W. Miller,
Sutton & Werbos, Neural Networks for Control, MIT Press, 1990.
11. P.Werbos,
Neural networks and the human mind: new mathematics fits humanistic insight.
In Proc. IEEE Conf. Systems, Man and Cybernetics
(Chicago 1992), IEEE, 1992. Also reprinted with [5].
12. P.G.Madhavan et al, eds., Neuroscience
and Neural Networks: Efforts Towards an Empirical Synthesis (CNS 92
Proceedings). Preliminary edition available from Madhavan, Dept. of Electrical
Engineerining, Indiana-University-Purdue-University-at-Indianapolis (IUPUI),
Indiana.
13. P.Werbos,
The cytoskeleton: why it may be crucial to human learning and to
neurocontrol,
Nanobiology,
Vol. 1, No. 1, 1992.
14. P.Werbos,
Backpropagation and neurocontrol: a review and prospectus, in IJCNN
Proceedings
(Washington D.C.), IEEE, June 1989.
15. A.Barto,
Sutton & Anderson, Neuronlike elements that can solve difficult learning
control
problems, IEEE Trans. Systems, Man, Cyber.,
13(5):835-846.
16. T.Hrycej,
Model-based training method for neural controllers, in I.Aleksander &
J.Taylor, eds, Artificial Neural Networks 2, North Holland, 1992.
17. P. Werbos
and R. Santiago, Neurocontrol, Above Threshold (INNS), Vol. 2, No. 2,
1993.
APPENDIX:
PSEUDOCODE FOR THE PROCEDURES ABOVE
This Appendix
provides pseudocode for many of the designs discussed above. By the time this
paper appears in print, additional pseudocode and actual running code will
probably be available from BehavHeuristics , Inc. (BHI), of College Park,
Maryland, which has patents pending both for ELF and for a number of the
adaptation techniques mentioned here.
Equations 2
through 5 -- giving conventional fuzzy control -- can be expressed very easily
in pseudocode, in something which looks like a computer subroutine:
SUBROUTINE
FUZZ(u,X);
REAL
u(n),X(m),x(na),R(r),RSIGMA,uprime(n,r),
running_product,running_sum;
REAL FUNCTION MU(i,X);
INTEGER j,k,l,nj(r),i(r,na)
/* First implement equation 2. Use k
instead of i for computer.*/
FOR k=1 TO na;
x(k) = MU(k,X);
end;
/* Next implement equation 3.*/
FOR j=1 TO r;
running_product=1;
FOR k=1 TO nj(r);
running_product=running_product*x(i(j,k));
end;
R(j)=running_product;
end;
/* Next implement equation 5 */
running_sum=0;
FOR j=1 TO r;
running_sum=running_sum + R(j);
end;
RSIGMA=1/running_sum;
/* Next implement equation 4*/
FOR k=1 to n;
running_sum=0;
FOR j=1 to r;
running_sum=running_sum+R(j)*uprime(k,j);
end;
u(k)=running_sum*RSIGMA;
end;
end;
The subroutine above
inputs the sensor array X and outputs the control array u. The arrays uprime
and i and the function MU represent u'(j), i_{j,k} and
the set of membership functions,
respectively; they
need to be generated in additional, supplementary computer code.
In this code, I
allocated a huge amount of space for the array "i", just to keep
things simple;
in actuality, an
efficient program would be different, and harder to understand. This pseudocode
is given only to help explain the basic ideas in this paper.
To implement elastic fuzzy control, we can
write a new subroutine ELF(u,X,gamma,uprime).
This new subroutine is
exactly like the subroutine FUZZ above, except that we
can allocate space for
the array gamma by:
REAL
gamma(r, 0:na)
and we replace the
block which implemented equation 3 by:
/* Implement
equation 6 */
FOR j=1
TO r;
running_product = gamma(j,0);
FOR
k=1 to nj(r);
running_product=running_product*(x(i(j,k))**gamma(j,k));
end;
R(j)=running_product;
end;
To adapt this ELF
network in a supervised control situation, we may use the following
program, which
corresponds to the algorithm described in section 2:
INTEGER
iter,t,T,k,j,nj(r)
REAL
gamma(r,0:na),uprime(n,r),F_gamma_total(r,0:na),F_uprime_total(n,r),
F_gamma(r,0:na),F_uprime(n,r),X(m,T),ustar(n,T),u(n),F_u(n),lr1,lr2
DO iter= 1 to
maximum_iterations;
/* First
implement step 1*/
FOR j=1
TO r;
FOR
k=0 to nj(r);
F_gamma_total(j,k)=0;
end;
FOR
k=1 TO n;
F_uprime_total(k,j)=0;
end;
end;
/*Next
implement step 2, starting with 2a*/
FOR t=1
TO T;
CALL
ELF(u,X(,t),gamma,uprime);
/*Next
implement step 2b*/
FOR
k=1 TO n;
F_u(k)=2*(u(k) - ustar(k,t));
end;
/* Express
step 2c as a subroutine*/
CALL
F_ELF(F_gamma,F_uprime,F_u);
/* Implement
step 2d */
FOR
j=1 TO r;
FOR k=0 TO nj(r);
F_gamma_total(j,k)=F_gamma_total(j,k)+F_gamma(j,k);
end;
FOR k=1 to n;
F_uprime_total(k,j)=F_uprime_total(k,j)+F_uprime(k,j);
end;
end;
end;
/* Finally,
step 3*/
FOR j=1
TO r;
FOR
k=0 to nj(r);
gamma(j,k)=gamma(j,k)-lr1*F_gamma_total(j,k);
end;
FOR
k=1 TO n;
uprime(k,j)=uprime(k,j)-lr2*F_uprime_total(k,j);
end;
end;
end;
To program the
subroutine F_ELF, which implements equations 8 through 11,
you can start from the
following pseudocode:
SUBROUTINE F_ELF(F-gamma,F-uprime,F_u);
REAL
u(n),X(m),gamma(r,0:na),uprime(n,r),base,F_gamma(r,0:na),
F_uprime(n,r),running_sum,F_u(n),R(r),RSIGMA,F_R(r);
INTEGER nj(r),k,j,i(r,na);
/* First calculate F_u dot u,
the scalar "base". */
running_sum=0;
FOR k=1 TO n;
running_sum=running_sum +
F_u(k)*u(k);
end;
base=running_sum;
/* Next, implement equations 8 through 11
for each rule j*/
FOR j=1 TO r;
/* Equation 8*/
FOR k=1 TO n;
F_uprime(k,j)=F_u(k)*R(j)*RSIGMA;
end;
/*Equation 9*/
running_sum=0;
FOR k=1 TO m;
running_sum=running_sum+F_u(k)*uprime(k,j);
end;
F_R(j)=RSIGMA*(running_sum - base);
/*Equation 10*/
FOR k=1 TO nj(j);
F_gamma(j,k)=F_R(j)*R(j)*log(x(i(j,k)));
end;
/*Equation 11*/
F_gamma(j,0)=F_R(j)*R(j)/gamma(j,0);
end;