Better Sand Castles - A Model for Adaptive Systems
|A.G.Booth||3 July 1996|
|This paper was first published in the Japanese journal
Revue de la Pensée d' Aujour d'hui
1996 vol.24-11 pp103-109 in Japanese
This paper is about how we think of designing information processing mechanisms, and particularly about how we might extend commonplace ideas to handle the design of complex self adaptive systems. Reference is made to naturally occurring adaptive systems such as some types of neural networks, but the explicit objective is synthesis and realisation in engineering design, not the analysis of natural systems.
First I would like to recommend a certain simple pattern of ideas which I believe to be helpful in understanding many different sorts of creative thought.
Next I suggest that there is a connection with elementary unsupervised adaptive processes on the basis that their operation may be regarded as creative in a way which corresponds to this pattern of ideas. From this connection there arises an opportunity to develop the concept by applying statistical information theory to realisable elements of self adaptive mechanism.
I then wish to make a first few tentative steps exploring possible elements of unsupervised adaptive mechanism in the light of this correspondence of ideas.
Finally I wish to suggest a way of linking together the mental and physical aspects of these models to form a basis for practical work with complex systems.
Creative thought relies heavily upon the use of models, though not in the same way as does deductive reasoning. In the latter case the models are in place at the outset, they form the axiomatic basis. Conversely in the essential part of creative thought the working models are being proposed, tested and revised.
If we are to find models for the creative process itself they will need to be of a general nature and will have to handle the changing of working models under some criterion of improvement; in this respect they must be meta-models.
Working models which are to be considered as material for creative thought, unlike the fixed axiomatic basis for deductive thought, benefit less from rigour and more from elegance through their simplicity. This is because a model which is too well developed in its particular or specific aspects can easily resist the advance of alternatives based upon less well developed models. It does this merely by virtue of its completeness in detail rather than its radical overall properties.
By drawing analogies first between two sorts of working processes or models and then between two sorts of purposes a pattern can be formed.
|Domain of model:||Mental model / Physical model.|
|Domain of purpose:||Exploratory purpose / Regulatory purpose.|
Two views of similar things in each of these two domains combined distinguishes four sorts of action.
The term "model" in the above usage is perhaps not entirely satisfactory in that the way it relates to mental processes is not simply comparable to how it relates to the processes of a physical system. For clarification we might use the idea of an "agency" in that a mental model achieves the ability to act through the use of expressible symbols whilst the physical model is an agent through its being some sort of constructed causal mechanism.
Using these conceptual associations I want to explore the bottom left quarter of this diagram. Some support for such ideas can be found in the work by Maturana and Varela [HMa70], [MV80] and [MV92] regarding self organising biological processes and "autopoiesis".
Rather than look at macroscopic complex structures I wish to formulate a model of minimal elementary processes. Complexity is involved, but only in a general sense as the operational content of such processes. Their constant structure though perhaps unusual or novel and applied by the millions, must not itself be particularly complex if the model is to be a useful aid to comprehension.
Thus we set out to study the elements of a class of artificial adaptive mechanisms suitable for information processing.
First let's get an idea of the components of such mechanisms, and then formulate a basis for the assessment of their ultimate economic performance based upon thermodynamics.
Looking at only the broad needs of an adaptive element we might take examples from a wide variety of naturally occurring types ranging from crude happenstance structures such as grains in a mass of sand, through refined and specialised elements such as constitute the social and evolutionary functions of living systems, to concentrated information processing systems like neural arrays.
To recognise the issues of economics of operation at their most compact it appears that the more concentrated form of processing found in the neural system would be the best example from such a range, though others may give rise to interesting parallels. On this basis I shall be thinking here of an elemental adaptive mechanism which is expected to contain:
a) A primary signal system with complex routing paths between large numbers of cells, and non-linear dynamic operations to derive the output from each cell. c.f. Neural axonal firing criterion, amplifier and transmission.
b) A summation structure accepting the large number of inputs within each cell.
c) An array of adaptive state memory elements providing secondary signals to modulate the primary inputs to each cell according to the adaptive state. c.f. Neuronal synaptic weighting array.
d) An adjustment mechanism operating on the adaptive state memory which uses local influences but is distinct regarding each element of adaptive state.
For a general introduction to the terminology and forms of neural networks refer to S.Haykin [SHa94].
All of the processes in which we are interested may be rendered into and analysed according to the terms of thermodynamics.
The purpose of an approach at the level of these laws of natural economics is twofold. First it gives some idea of whether a given proposal is at risk of being unrealisable due to fundamental limitations, and second it suggests in various ways whether a given scheme has been bold enough in taking advantage of the statistics of large numbers. After all because of the prospect of a "self programming" effect we might look forward to the possibility of employing these structures in very large numbers indeed.
Thus we may observe that the fundamental element of amplification process is to magnify a small difference in a statistical parameter of a species to produce a greater difference in terms of total probability of random likelihood. For this to be done there is a necessary cost of consuming some amount of supplied negative entropy (i.e. destroying a rare species or "fuel"). For a given probabilistic performance of the amplified signal there is in terms of entropy requirement a lower bound to the quantity of this fuel which may be used.
This type of amplification process will be necessary in any adaptive mechanism, but it is not complex enough on its own because its operation needs to be guided to its purpose by a structure whose definition itself has a cost. In other words, where adaptation must occur the cost of structural alteration must also be borne.
A switch is a discretely modulated conductor. An amplifier in general may be regarded as a continuously modulated conduction path controlling a fluid consisting of a statistical species with relatively high negative entropy (i.e. at high potential). Modulation in this sense is a structure change and requires manipulation of some form of physical stress as the basis for control of flow of the high potential fluid. The adaptation process must encompass the means for this control of variable structure.
If we are to make structure change depend upon another state variable then this state must be implemented as the charge level of a second statistical species. It is vital that this charge should suffer relatively little erosion whilst it interacts with the flow of the primary fluid or its stored information will be lost like a child's sand castle in the oncoming tide. It is for the latter reason that the structure defining species must be of a type with a relatively high barrier (e.g. quantum energy) against disturbance by the thermal environment, and notably by the primary fluid which it controls.
If the structural state storage medium is of particularly high quantum level then the thermal cost of any given amount of adjustment is thereby also made high. Conversely if the controlling structure quantum level margin above the effective operating thermal level is reduced then the rate of spontaneous adaptive degradation in use is made higher instead. Thus there is the need to optimise the design of an adaptive mechanism to compromise between these two deleterious effects. On the above basis a practical design is likely to need its adaptive medium to be chosen for an energy level of quantum interactions about one hundred times the ambient mean thermal energy level.
This places the potential level for erosion of the adaptive variables at ten standard deviations of the main thermal variables, and therefore reduced in likelihood by a ratio in the order of exp(100), i.e. around 1040 times less likely. For room temperature operation of the primary operational fluid in such a system this energy level corresponds broadly to the quantum level of visible radiation.
Note that the value for the mean thermal energy level at room temperature, i.e. the product of Boltzmann's constant and absolute room temperature, is approximately 4.10-21 joules. As an example consider the human brain having something in the region of 1014 synapses. If each of these required an average of 10,000 high energy quantum sites to store the adaptive state then on the above basis the total energy required would be some 0.4 joules. So to set this in proportion as a total adaptive energy packing effectiveness for the human brain, it can be seen to be a very small fraction of the total energy that would be released by oxidation of the fabric of a human brain, which is in the region of 2 Mjoules.
In a similar manner the minimum feasible rate of consumption of entropy needed to support both the cell output amplifiers and any required rate of settling of the adaptation processes may be estimated.
If we think of the operational activity of a network as transforming given input signals into output signals then the process of adaptation can be viewed as making adjustments to the way that the transformation maps points in its input signal vector space to points in its output space. See also Matsuno [KMa89]. When a local basis for adaptation is used it can usually be described in terms of creating a tendency for the given individual point signals in the input vector space to produce mappings in the output vector space which drift towards preferred points. These preferred points are then called attractors because the actual output signal values keep on drifting towards them under the influence of the adaptation process.
Note that the existence of attractors and indeed the specific positions of these attractors in the output signal space of the adaptive transformation are independent of the particular set of signals applied. It is in this sense that the system can be said to be unsupervised in its adaptation. It is merely the tendency of the various cases of actual individual input signal vectors to become spontaneously associated with these attractor points in the output space that manifests the required effect.
One way to view the value of this process of unsupervised association of inputs with output attractor points is that it corresponds to the expression of given arbitrary real signals in terms which are standardised and separated, and are therefore suitable to be treated in some sense as symbols. It is the ability for these symbol-like signals to be developed and evolved entirely automatically which provides the sense of a creative process. Note that the adaptation process will provide valid but different results following each start-up unless it is always started in a deterministically identical state and follows an identical sequence of data, and that is not really a practical possibility in very complex cases.
Using only soft nonlinearity (i.e. having smooth high order derivatives) in association with the amplification in the output signal paths of a transformation and within the adaptive control loop, attractors can be realised in the transformation output signal vector space.
Soft nonlinearities as opposed to those with abrupt characteristics are specially attractive here because they are generally the more economical to realise under thermodynamic constraints. Also the number of paths to be processed in this way is many times fewer than the number of input weighting operations, which is attractive because the weighting operation is thermodynamically less extravagant than either the amplification or the provision of a nonlinear transmission characteristic. Note that the specially high energy costs of adaptive adjustment discussed above are relatively less of a problem because the adjustment needs only to operate much more slowly than the actions of the primary signal transmissions.
Practical networks with such attractor characteristics exist in the mathematical sense and can be realised in the engineering sense. See an earlier demonstration of this type of neuron by the author of this paper [ABo89]. In the case of the mapping of a vector onto a single scalar output variable (c.f. the function of a single neuron) the number of attractors will normally be only two in this scalar output space. However it must be remembered that there can be exponentially large numbers of such attractor points in practical adaptive output signal vector spaces of reasonable sizes.
I have talked of attractors in the output vector space of a signal transformation. Now consider what happens when many values of input signal are applied to the transformation process in rapid succession without time for the process to achieve complete resolution of its output to specific attractor points. This will cause the process either to become completely confused, or, if the quantity of intermixed values of input signal points is not altogether too many, it may be able to hold onto an average state which allows moderately good approximation of its output signals to the output space attractor values.
In fact if the number of different values of input vector in the succession is smaller than the order of the input vector space then after sufficient time of continuing application of these signals and subject to reasonable restriction on the choice of the set of input signal values, the transformation will be capable of finding exact attractor points. However the process can operate approximately with the number a few times greater than the order of the input vector space.
If such a compromise state is stable then it may in turn be viewed as an attractor point, but this will have to be thought of as in another space. This space is in any case a very much bigger one, and it may be thought of either on the basis of a constant ensemble of input signal points or else (the biggest space of all) on the basis of any ensemble of given input signal points.
When handling the case of a constant ensemble of input signal values which are to be applied in mixed succession within the adaptive settling period it is convenient to use the vector of adaptive state variables which are internal to the transformation as the basis of the space in which attractors may be considered. For the case of either one or a group of neurons the elements of this vector are the values of the individual synapse adaptive bias levels.
If it is necessary to deal with the more general case where perhaps the short term definition of the input signal ensemble has to change with the passage of time then the attractors can be considered as located in the biggest space of all. This is spanned by the possible output values corresponding to the actual set of given input signals. We are not now talking of the space of a single signal vector (and that alone can be pretty big!), nor even the vast space of practical adaptive state vectors, but the enormous space defined by all sets of signals which could constitute an ensemble of a given size to be applied to the transformation within its period of adaptive settling or convergence.
I hope I have managed to indicate the way in which really large spaces can be associated with rather modest realisable networks, because the power of self adaptive processes to produce an exploratory or, shall we say, a creative effect depends upon their ability to find opportunities for useful structure whilst moving in such high dimensional spaces. To do this it only requires that attractor points shall rather densely populate the adaptation state space in order to make the discovery of a preferred state adequately easy.
Also it is possible by suitable choice of form of the non-linear processing of the output signal within the adaptation loop to arrange that the fraction of the total population of input vectors to a given cell which cause a response is reduced so as to spread the cost in terms of state memory variance components. By this means the number of input values (i.e. the cardinality of the input signal ensemble) accommodated under adaptation can be many times the order of the input vector whilst still maintaining useful approximation to attractor output values. Putting it the other way round, this creates the possibility of a spacial separation of signal flow paths dependent upon signal value, and thereby enables large increases in the number of adaptive degrees of freedom without inducing total interaction between all regions of the signal space under the adaptation process.
Imagine that somehow a sympathetic action comes about between selection and survival processes such that the domain of selection is restricted to a space which is unusually rich in practical options. This is equivalent to there being grammatical rules in the adaptation channel, albeit mutually adaptable rules, which restrict the range of possible adaptive changes in the operational transformations to the more useful ones. These rules thereby economise on information required for adaptation by making it have low redundancy in the code space of valid grammar of adaptive change expressions.
There would appear to be a class of mechanisms whereby an independent and fairly complex model acts by a sort of proxy (like a tournament as a model of battle) to determine the way in which a main system shall adapt or evolve. It is as though this model makes a mutual adjustment with the main system such as to characterise a useful set of adaptive modes, and thereby saves the need for otherwise intolerably large amounts of feedback data to achieve the adaptation. In such a mechanism the additional degrees of freedom in the model act to help in finding simplified modes of adaptation. There is a useful class of feasible systems of this sort where the additional data required for the "tournament" model to operate present relatively little burden on the system. In fact these additional data can offer advantage by selecting on an opportunistic basis from the large number of options available.
Could this be achieved by coding only a translation rule to carry "goodness" results from operation into changes in the operational transformation? With this mechanism perhaps we could use a quantity of information which is independent of the order of the adaptation state vector. The trouble with that type of suggestion is that it would then be necessary for a second major adaptation process to exist to control continuously the translation rule.
Instead consider a model which is just one level more compounded as follows. There is an allusion in this diagram to the ancient but rather well known model in Buddhist Theravadin philosophy called in pali "paticcasamuppada", and it does in fact have some relevance here [RJo79] [VTR91].
I submit that a model with this level of complexity, and in particular one which makes a compound of the discrete and the continuous, the words and the reality, the symbolic and the physical etc., offers much prospect.
This model has potential to handle the sorts of complexity cited above from natural examples. Rather more evidently there are areas of human endeavour wherein the left side of the diagram may be viewed as corresponding to the world of human mental or symbolic forms to great effect, and design itself is one of these.
With this glimpse of a vast domain of prospective interest and work I close my thesis.
I wish to thank Koichiro Matsuno of Nagaoka Technical University for his helpful comments and encouragement in the preparation of this paper.
|[ABo89]||A.G.Booth "A Demonstration of Unsupervised Learning in a Model Neuron." International Association for Cybernetics 12th International Congress on Cybernetics, Namur, Belgium. August 1989|
|[SHa94]||S.Haykin "Neural Networks: A Comprehensive Foundation." Macmillan/IEEE Computer Society Press. 1994 ISBN 0-02-352761-7|
|[RJo79]||R.E.A.Johansson. "The Dynamic Psychology of Early Buddhism". Scandinavian Institute of Asian Studies, Copenhagen 1979. ISBN 0-7007-0114-1|
|[HMa70]||H.R.Maturana, "The Biology of Cognition" mBCL Report no. 9.0, 1970;|
|[KMa89]||K.Matsuno "Protobiology: Physical Basis of Biology" CRC Press, Boca Raton, Florida. 1989. ISBN 0-8493-6403-5|
|[MV80]||H.R.Maturana and F.J.Varela, "Autopoiesis and Cognition: the Realization of the Living" (Boston: D. Reidel, 1980);|
|[MV92]||H.R. Maturana and F.J.Varela, "The Tree of Knowledge: the Biological Roots of Human Understanding" (Boston: Shambhala, 1992) ISBN 0-87773-642-1|
|[VTR91]||F.J.Varela,E.Thompson and E.Rosch, "The Embodied Mind: Cognitive Science and Human Experience" (MIT Press, 1991) ISBN 0-262-72021-3|
|Phone and e-mail see foot of AGB home page.||Last updated 29 July 2004|
|Back to AGB home page.||Copyright © A.G.Booth 1996-1999 All rights reserved|