Niels Ole Finnemann: Thought, Sign and Machine, Chapter 6 © 1999 by Niels Ole Finnemann. |

| Table of Contents | Chapters: | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Literature | Download pdf-file | |

6. The breakthrough of information theory

The thread which runs from the physical to the later information concepts is not continuous. One of the breaks in it is expressed by the lack of interest in this area in the years up to World War II. In 1949, Warren Weawer in his interpretation of Claude Shannon's mathematical information theory in addition to Leo Szilard, who *extended [Boltzmann's] idea to a general discussion of information in physics*, only mentioned one other work on information theory, namely that of John von Neumann on the information concept in quantum mechanics and particle physics.[1]

A lack of interest is in itself a kind of break, but Shannon's interest in the information concept led to another, namely a break with the view of information as a mere function of energy. He took the first step towards this in 1938 when he introduced the idea of using mathematical-logical principles (symbolic analysis based on Boolean logic) in the construction of electrical circuits.[2] Although no theory proper was presented here, there was an implicit view of electrical circuits as functions of a logical - and not physical - order.

It had long been known that it was possible to use electricity to transfer messages in a not directly perceptible form, which could both be analogue, such as in the telephone and the radio or handled with a discrete notation system such as in telegraphy where the Morse alphabet was used. But Shannon's idea of describing the electrical circuit as a logical mechanism paved the way for a new, more complex use of electricity for symbolic purposes. Whereas the Morse alphabet transferred messages as a sequence of individual signals one by one, the logical description of the relay makes it possible to introduce conditional relationships between the individual signals. A given signal can thus produce a change in the following signals and the following signals can produce a change in the effect of previous signals before the total transport has been completed.

Shannon's description involved a leap to higher level in the symbolic utilization of electricity. Where this utilization had formerly been based on a definition of physical threshold values for symbolic notation units, Shannon's description contained the basis for a formal, syntactic organization of electrical circuits.

In itself, the linking of the mechanical and logical is related to the link Turing had made in his description of the universal computer. But Shannon did not have the same acute sense of theoretical reach and was also far more concerned with the practical use of logic for handling electricity. Nevertheless - or precisely because of this - his contribution contained a theoretical element which is not found in Turing, but which would be of great importance to the later computer technology.

Where Turing had worked on the basis of a traditional physical machine which was controlled by a logical description, Shannon's description contained the elements of a machine with a built-in syntactic structure. Such a machine is a far more complex physical construction than the Turing machine. Nor is it immediately obvious that it can possess the same universal properties, as the syntactic structure contains a set of restrictions which are not contained in the physical construction of the Turing machine.

There is another difference, however, as the syntactic structure which is incorporated in the machine prepares the ground for a conceptual distinction between the syntactic and semantic levels, whereas Turing's implicit precondition was that the syntactic and semantic levels coincided and were expressed in the programme.

While Turing's point lay in the description of how an entire class of mathematical and logical operations could be performed by traditional mechanical means, Shannon's point lay in the description of the way in which mechanical processes could be subordinated to a symbolic organization at a formal syntactic level. He thereby paved the way for the complex physical construction which is now found in modern computers.

That his description also contained one of the germs of a new information theory first became evident when a group of American scientists, with mathematical, technical and biological backgrounds - urged on by the advent of World War II - discussed the more long-term scientific perspectives which would be connected with the new computer technology.

After some more informal contacts during the first war years, on the initiative of mathematician Norbert Wiener, a number of scientists gathered in the winter of 1943-44 at a seminar, where Wiener himself tried out his ideas for describing intentional systems as based on feedback mechanisms. On the same occasion J.W. Tukey introduced the term a "bit" (binary digit) for the smallest informational unit, corresponding to the idea of a quantity of information as a quantity of yes-or-no answers.[3] In continuation of this meeting, ]*The Teleological Society*[4]* *was formed in 1944, the name of which was changed after the war, to the *Cybernetics Group*. Among its members were such figures as anthropologists Gregory Bateson and Margaret Mead, engineer Julian H. Bigelow, neuro-psychiatrist Warren McCulloch, physiologist Arthur Rosenblueth, mathematicians Walter Pitts, John von Neumann and Norbert Wiener.

Through discussions in this society the cybernetic paradigm, named by Norbert Wiener, and - in a relatively vague and general form - the idea of information as an abstract, quantifiable entity became crystallized.

According to Wiener's formulation of the cybernetic paradigm, it should be understood as a joint, basic paradigm for describing physical, biological, psychological and sociological "systems". It was believed that mathematical description, which had been of such use in physics, could now be used in a similar way to describe living systems. The central point lay in mathematical description, but the decisive innovation lay in the idea that with the feedback mechanism there was now finally a general (neo)mechanical basis for describing self-regulating systems, including consciousness - which did not lead back to the old reductionist rut.

As far as can be seen, there are only sporadic discussions of the epistemological problems inherent in the application of mathematics to energy physics. Norbert Wiener thus saw Werner Heisenberg's statistical quantum mechanics as a realistic and exhaustive synthesis of Newtonian particle mechanics and Planck-Bohr's quantum mechanics. But that he contented himself with the statistical and non-phenomenological character of this description is exclusively due to the technical utility of the statistical description which is bound to specific communication systems.

*The relation of these mechanisms to time demands careful study. It is clear, of course, that the relation in-output is a consecutive one in time and involves a definite past-future order. What is perhaps not so clear is that the theory of the sensitive automata is a statistical one. We are scarcely ever interested in the performance of a communication-engineering machine for a single input. To function adequately, it must give a satisfactory performance for a whole class of inputs, and this means a statistically satisfactory performance for the class of input which it is statistically expected to receive. Thus its theory belongs to the Gibbsian statistical mechanics rather than to the classical Newtonian mechanics*.[5]

The epistemological problems are limited to a criticism of Bergson's vitalistic objections to classical mechanics. Although it is true that the vitalists, according to Wiener, correctly claimed that the reversible time of classical mechanics was not suitable as a basis for a mechanical description of living organisms, the thermodynamic and quantum mechanical descriptions offered new images of irreversible and self-regulating mechanisms:

*Thus the modern automaton exists in the same sort of Bergsonian time as the living organisms and hence there is no reason in Bergson's considerations why the essential mode of functioning of the living organism should not be the same as that of the automaton of this type. Vitalism has won to the extent that even mechanisms correspond to the time structure of vitalism but as we have said, the victory is a complete defeat, for from every point of view which has the slightest relation to morality or religion, the new mechanics is fully as mechanistic as the old.*[6]

Wiener was correct in claiming that the new feedback mechanism on the one hand was equally as mechanistic as the old, while on the other hand it contained a considerable expansion of the concept of mechanical procedures. But he was not aware that the growth in the number of mutually disunited mechanical paradigms in physics had at the same time raised other completely different problems for a realistic interpretation of any form of mechanical description. He thus completely avoided the question - so critical for Boltzmann and other physicists - of how it could be possible to connect a mechanical description of a closed, local system to a realistic and universalistic interpretation of the mechanical paradigm.

A partial explanation probably lies in the pragmatic and technological perspective, but the fact that the cybernetic theory was presented as a universally realistic description model is probably to an equal degree due to the underlying idea of a unified science which, however, rapidly proved a failure.

The cybernetic society held ten conferences with invited guests during the years 1946-1953, after which it was dissolved. After the eighth meeting Wiener and von Neumann left and, at the tenth meeting, writes Steve J. Heims, the participants had nothing new to say to each other.[7]

The reason for this is obvious today. If we wish to describe different domains with the same conceptual apparatus, we must either ignore the differences or modify the conceptual apparatus so that it becomes capable of representing the differences. If we use the same procedure, such as the feedback mechanism, to describe both physical and psychological processes, we avoid the risk of taking one kind of phenomenon as a model for another by identifying the two with each other. The procedure simply ignores the difference which makes the comparison possible and the failure to make a distinction erroneous.

The history of the cybernetic society also appears as the history of a convergence, which immediately changes to a divergence. It was, as von Neumann expressed it at one of the meetings, far from given that it was possible to describe consciousness within the same logical categories as those used to describe other, less complex phenomena. On the contrary, it could be imagined that the phenomena of consciousness possessed an analytical irreducibility, that an object of consciousness comprised its own smallest description.[8]

The ambition of cybernetics to be the unifying science undoubtedly contributed to an overestimation of the explanatory force of the feedback mechanism, but the idea of a new - almost cosmological - world description influenced a number of sciences, with the creation of information theoretical paradigms in such areas as mathematics, biology, psychology, anthropology and linguistics as a consequence. In spite of its short history, cybernetic thinking thus comprises one of the central points of departure for the still ongoing technical-scientific revolution which began around World War II. It is therefore not surprising that this conceptual framework has also played a central role in the description of the symbolic properties of digital computers.

It is not only true in a general sense that the computer has been seen as the incarnation of a cybernetic system based on the use of the feedback mechanism as a conditional clause, but also in the sense that the symbolic processes which are performed in the computer have been described as formal procedures in line with other forms of algorithmic, mathematical and logical procedures.

While such a description is adequate for any single, finite procedure which can be performed automatically, it is not adequate to describe the way such procedures are performed in the computer, as the formal procedure, as shown in chapter 5, can only be performed in a computer if it is represented in a notation system which is ]*not* subject to the semantic restrictions which characterize a formal notation system.

At the time, nobody seems to have attached much importance to the principle difference between formal and informational notation, much less to have seen the new notation system as the most far-reaching or general innovation. The most important reason for this seems to lie in the widespread idea that the central point of this project was to perfect control theory, as this idea assumes that the innovation lay in a more comprehensive and perfect representation of the "world" rather than in the development of a new system of representation, which contained a number of questions regarding the relationship between what was represented and the forms of the representation.

The decisive point, however, is that it is neither possible to maintain the idea of a direct equivalence in the relationship between formal and informational notation, nor in the relationship between the symbolic and the physical process. On the contrary, informational notation comprises an independent link between the symbolic and the physical-mechanical.

The fact that this link - the informational notation system - was actually a new alphabet which formed the basis for a new sign structure different to any previously known sign structure - as will be explained in the following chapters - only emerged during the course of later developments. The same goes for the understanding of the new semantic potentialities and constraints of this sign system compared to any other hitherto known systems.

Although the acknowledgement of these aspects has only occurred slowly compared with the speed of technical innovations, there are sound reasons for assuming that they will represent the most far-reaching historical innovations prompted by the appearance of computers. First, while the development of a sign system is not subject to that technological obsolescence which affects specific technical innovations (such as individual articulations in a sign system compared to the system as such). Second, the properties of the new sign system are more general and inclusive than those of any other known system, which implies, among other things, that the latter can be represented in the former.

Since anything represented in a computer is represented in a sequential form and in the very same alphabet and sign structure, we can speak of a new kind of textual technology.

And since we can represent knowledge in any known form, whether expressed in common language, formal languages or in pictorial or auditive forms and integrate the basic functions in the handling of knowledge, such as production, editing, processing, retrieving, copying, validating, distribution and communication, we can also state that the computer has the capacity to become a new general or universal medium for the representation of knowledge.

Even if we cannot predict much regarding the knowledge content which will be expressed in this medium in the future, we can predict that the expression of knowledge in this medium will have a cultural and social impact of the same reach as the invention of modern printing technology, i.e. that computerization implies a change in the basic means of knowledge representation in modern societies, that is a - slowly but steadily developing - change in the very infrastructure of society

Although the informational notation system possesses properties which distinguish it from formal notation systems, it is nevertheless a product of the efforts to extend the area of use of formal representation and to bring about a generalized, universal system for formal representation, just as it is also such efforts which have produced the most important methods for using the new notation system.

These methods have primarily been developed at two levels: An algorithmic level, where there is both a quantitative and qualitative development of new, arbitrary syntactic methods of treatment and at the level of notation, where there are new methods for handling informational notation.

As these notation handling methods are themselves algorithmic, they can as such be regarded as a special branch of development at the algorithmic level, which will be discussed in chapter 8. They are also interesting because they can help to reveal the characteristic properties of informational notation.

6.2 Information as random variation

A pioneering work within this area appeared in 1948 in the form of an article by Claude Shannon under the ambitious title: *The Mathematical Theory of Communication*. The article aroused a great deal of interest and was reprinted in the following year together with Warren Weawer's comprehensive comments and his views of its perspectives.[9]

In essence the article was of a purely mathematical and technical character. Shannon's ambition was to demonstrate that it would always be possible to find a mathematical method for carrying out the optimum compression of a given message on the basis of a statistical knowledge of the system in which the message was expressed.

*The main point at issue is the effect of statistical knowledge about the source in reducing the required capacity of the channel, by the use of proper encoding of the information.*[10]

If the source is a linguistic message, in alphabetical or Morse alphabetical form, the question would then be whether there was a mathematical method for performing an optimum compression of the expression so that it would be possible to omit the transferral of as many individual symbols as possible, without the content being lost.

This, neither more nor less, is the subject of Shannon's theory and it thus appears quite directly, although it has often been passed over eller: it is often ignored?it has often been passed over, that it is a theory of the compression of symbolic *notation* systems. The background for the theory included the well known fact that the different symbols in a notation system, such as the letters in the alphabet, occur with irregular frequency and that many sequences of letters are defined by the structure of the given language, while others are "freely chosen", i.e. defined by the specific message.

The postulate was that expressions in a notation system can be regarded as the result of a stochastic process where a given, discrete symbol is handled as a unit which appears with a statistical probability characteristic of the chosen notation system as a whole, notwithstanding the symbol sequence which comprises the specific message.

No particularly precise description of the statistical structure as such is required. For instance, it can very well be assumed that the individual figures appear with equally great statistical probability, although the optimization of the compression will increase proportionally with the statistical knowledge of varying frequency

Now it is also true of common language that many individual symbols frequently appear in fixed constellations with surrounding symbols, such as "wh" and "th" and many suffixes in English. In these cases the probability of a given symbol occurrence is thus determined by the preceding symbol occurrence(s). In order to exploit the possibility for compression here, Shannon used a special series of stochastic processes, Markoff processes, which calculate probability with regard to the preceding "events".

It is thus possible, writes Shannon, to regard any message, any source which appears in a discrete form, as a stochastic process and any stochastic process can conversely be regarded as a source which generates a message, as every step in the process can be regarded as the production of a new symbol - by which is meant here a notation unit.

Shannon believed that it would be possible to arrive at such a comprehensive description of language on a probabilistic basis that it could be claimed that language was actually governed by a stochastic process.

He carried out a number of calculations which were intended to show that a relatively simple stochastic model (where first, letter frequency was taken into account, then dependency between two and then three succeeding symbols, then word frequency and finally the dependency between two immediately succeeding words in ordinary English) could produce linguistically plausible symbol sequences to an extent that was twice as great as the statistical basis of calculation. If, for example, the statistical bond between up to three succeeding symbols was taken into account, linguistically plausible symbol sequences of up to six letters could be produced and if the statistical bond between two immediately succeeding words was taken into account, it was possible to produce linguistically plausible expressions of up to four words.

The idea of a complete approximation of the linguistically plausible has given rise to many subsequent works, but as Shannon remarked, the work on the next step would be colossal and he refrained from continuing the series of approximations himself.[11]

This experiment in linguistic analysis, however, also served another, more specific purpose. It would show that it was also possible to increase the precision of a statistical analysis of apparently indeterministic systems by using a special series of Markoff processes, namely ergodic processes, in which the statistical structure of a - reasonably large - sample of a sequence is the same as for the entire sequence. It follows from this that all the sequences which can be produced in an ergodic system have the same statistical properties.

If we can thus - through a series of approximations - describe a given message with the help of an ergodic process, we can also describe a complete set of possible messages which can be handled in the same way. The question then is whether it is possible to exploit this statistical knowledge to compress - or re-code - all the messages which belong to the given set.

In its simplest form this is a question of finding a standard formula for the certainty with which we can predict what the next symbol will be, purely on the basis of the probability with which each symbol occurs in a given set.

It is reasonable, writes Shannon, to claim that such a standard in the given case must fulfil three demands:

- The measure of uncertainty (H) should be a continuous function of the probabilities of the occurrence, (p]i).

- If all pi are equal, then H should be a monotonic increasing function of the number of symbols. With equally likely events there is more choice, or uncertainty, when there are more possible events.

- If a choice be broken down into successive choices, the original H should be the weighted sum of the individual values of H.

The only H satisfying the above assumptions is of the form

Where K is a constant related to the choice of the unit of measurement and pi is the calculated probability for the occurrence of a given symbol.[12] The formula describes the uncertainty, "the entropy", of the total system as a function of the uncertainty valid for each occurrence. The fact that it is the logarithmic value of the probability which appears as a factor is because of an arbitrary choice, intuitively motivated by engineering and practical considerations, as a great number of functions vary linearly with the logarithm of the number of possibilities, just as the use of logarithmic values simplifies the mathematical calculation.

An illustration of the usefulness of the choice is that by using the logarithmic function with base 2 we get a binary unit of measurement corresponding to a relay with two stable positions which can contain one bit of information, while a system with N relays, which thus has 2^{N} possible states, can correspondingly contain N bits, as log (base 2) of 2^{N} = N. By using the logarithm for base 2 as a factor, we thereby obtain uncertainty expressed in bits.

In its mathematical structure Shannon's formula is equivalent to Boltzmann's measure of physical entropy, as the constant in Boltzmann, however, is a physical constant. Shannon provides no proof of this theorem:

*It is chiefly given to lend a certain plausibility to some of our later definitions. The real justification of the definitions, however, will reside in their implications*.[13]

The value of the formula is thus connected with its intuitively relevant properties: H becomes zero if all signals except one appear with probability zero and that one with 100% probability (or probability 1, when uncertainty is expressed as a positive value between 0 and 1). In all other cases, H is positive. H is conversely maximum in the case where all symbols occur with equally great probability and H increases in this case linearly with the logarithm of the number of symbols.

It can further be demonstrated that the formula accords with the intuitive supposition that the probability of the occurrence of two given symbols in a sequence is less than or equal to the probability of the individual occurrence of the two symbols.

In the remainder of the article Shannon demonstrates how to apply the general mathematical measurement for uncertainty, entropy, or what is identical here: the amount of information in a given message, *where the information concept is identified with the relative improbability which is valid for a decision as to what can occur as the next symbol*. If we use base 2 for the logarithmic value, this corresponds to expressing uncertainty as the number of bits necessary to identify a signal.

If we define the uncertainty of a given set of messages as an average of the uncertainties that are valid step by step for the next symbol, weighed with regard to the probabilities which are valid for the occurrence of the individual permissible symbols, it is also possible to show that, through a series of approximations, the uncertainty of the total system can be calculated with as great an approximate precision as desired, simply on the basis of the total statistical structure of the message, i.e. without taking into account the variations which are connected with the transitions between the individual steps.

If a given message thus contains a given, large number of signals, the permissible symbols will occur individually with a probability which approaches that probability valid for its occurrence in the total set of possible messages.

That uncertainty which is valid for a specific message will always be less than that uncertainty which is valid for the entire set of possible messages in the same stochastic structure and the relationship between these uncertainties comprises what Shannon calls relative entropy.

This standard defines the amount of information contained in a given message relative to the degree of freedom which is valid for the total expression system. If a given message uses 80%, for example, of the free choices which are permitted in a given system, the relative entropy is 0.8 and, at the same time, comprises the maximum compression of the amount of information contained.

Correspondingly, the system's redundancy, i.e. that amount which is not available to free choice, is determined as a residual factor of the *relative* entropy.

This definition, however, does not appear to be quite clear, as relative entropy expresses a relationship between the maximum possible and actually utilized freedom of choice. Redundancy is thus defined here as a measure of the freedom of choice not utilized and not as a measure of the number of occurrences which are not accessible to choice.

Shannon's exemplification fails to clear up the problem, as he refers to a number of investigations into the statistical structure of the English language, from which it appears that redundancy in "ordinary English" is around 50% and the amount of information therefore also 50%. Compared with this, Basic English, which is characterized by a limitation of the vocabulary to 850 words, has very high redundancy, while James Joyce is chosen to represent the linguistic contrast with very low redundancy. In all these examples, the redundancy concept is used of the number of symbol sequences that are determined by the language structure and not of the maximum number of free choices utilized.

That there are two different standards becomes clear if we consider a message in ordinary English, where relative entropy approaches 1, i.e. the amount of information approaches the maximum possible for the set of messages which belong to ordinary English. While redundancy as residual between the eligible and ineligible signals is, according to Shannon's statement, constant, equal to 0.5, when seen as residual to the relative entropy, it becomes very low, approaching 0 as the relative entropy approaches 1. If, conversely, we compare the possible choices used in ordinary English to the maximum number of choices the English language permits as a whole, the result will be that the relative entropy, the relationship between the possible choices utilized and not utilized, approaches 0, while the relationship between eligible and ineligible signals in the message remains 1:1 and the relative entropy is 0.5.

The relationship between the potential amount of information and the amount used is not included in Shannon's (nor in Weawer's) examples, but it is not possible on the basis of Shannon's definition to speak of higher or lower redundancy without placing it in relationship to a common - and maximum - standard for the possible free choices, of which any given text only utilizes some part or other.

If we therefore look at the different variants of English as different degrees of approximation of the maximum number of possible choices in the total system, the redundancy, which is expressed as residual to the relative entropy, would be low for Basic English because Basic English only uses a small number of the possible choices, while Joyce uses a greater number with a correspondingly higher redundancy. In order to ascertain this portion, however, we must take our point of departure in the maximum number of possible choices the English language allows for symbol sequences, which far exceeds any usage and has nothing to do with the 50-50 ratio which is considered typical for the relationship between redundancy and information in ordinary English, as in ordinary English nobody uses 50% of the maximum number of free choices in the English language as a whole.

So we have here two different definitions of the concept of redundancy. On the one hand, redundancy is defined as that part of the expression which is determined by the structure of the language, as Shannon assumes that a message can be divided into two portions, one of which is determined by the language system while the other is the part that is accessible to a free, meaning-bearing choice. This redundancy is defined in direct contrast to the meaning. The definition leads to a paradox, as Shannon here identifies redundancy with the system and rule determined part of an expression.

On the other hand, redundancy is defined as the unused possible choices in a given text. Here, redundancy is also defined in contrast to the meaning of the text, but where, in the first definition, this contrast was between the system determined and the freely chosen parts, in the second definition it is drawn between the freely chosen part and the unused, alternative possible choices in the given language.

In order to arrive at this result Shannon first had to use a third definition, as he could only determine the unused possible choices which characterize a given text by first carrying out a statistical analysis of the relationship between regularly occurring and irregularly occurring signals in a - representative section - of texts from the given language.

This - statistically expressed - redundancy differs from the two preceding definitions as it is not defined in contrast to, but completely without regard to the meaning of the text. The statistical standard is carried out on the entire set of messages no matter whether the individual notations occur because they are determined by the rule structure or belong to the specific, eligible content of the message. This determination also allows Shannon to avoid the question of how it is possible to differentiate between rule and meaning determined letters in a perfectly ordinary word.

Where redundancy in the two first definitions is defined in contrast to the meaning, redundancy in the third definition is defined quite independently of meaning. And while redundancy in the first definition is defined as the system determined part of the expression - in the two others it is defined as the used and unused parts respectively which are accessible to a free choice.

It is the first definition which comes closest to the traditional definition of redundancy as repetitively occurring, superfluous, structures which are of no importance for the content of the message. But this definition too is distorted by Shannon, as in this connection he simply defines the superfluous, meaningless structures as the regular occurrences, whereby they also come to include the occurrences determined by the rules of the system in which the message is expressed.

To the traditional definition of redundancy

- as a repetitively occurring, superfluous structure which is of no importance for the content of the message,

Shannon thus adds three new definitions:

- as the regularly occurring, system determined parts (in contrast to meaning)

- as the eligible, but unused parts (the alternative possible meanings - in another contrast to meaning)

- as the statistically determined parts (without regard to meaning at all)

The relationship between these different definitions gives occasion for a more detailed analysis of the redundancy concept which will be taken up in section 7.5.

We are also left, however, with the first part of a new definition of informational quantities, as these quantities are determined as residuals to quantities which are established in a statistical structure. The smallest informational unit is defined here by its degree of unexpectedness in relationship to a specified expectancy structure which, in its stringent form, can be described as a stochastic procedure.

The paradox in this definition appears when we become aware that the informational quantity is a statistical function which itself has no specific manifestation. Uncertainty is concerned with the degree of unpredictability whereby a symbol occurs at a given time, or the degree of unpredictability whereby the total number of symbols occurs in a sequence, or is concerned with the average number of units necessary to specify a symbol within a class of possible symbols. The smallest amount of information here is thus not the same as the smallest expression unit in a notation system.

The amount of information is, on the contrary, a specific attribute, or property, of individual symbols or of sequences of symbols, as the amount indicates the degree of unexpectedness in the occurrence of a given form. But this is a peculiar property which only appears occasionally and, in principle, independently of the existing message, as the degree of unexpectedness is not a property of the message, but a function of the stochastic procedure which is chosen to characterize the message. The same message thus contains different amounts of information if it is described or calculated on the basis of two different stochastic procedures. Other things being equal, the more complicated the procedure it is handled with the less information is contained in the source.

Characteristically, this is an almost diametrically opposite result of a semantic treatment in which the main rule would be that the more complicated interpreter yields more information than the less complicated.

There is nothing surprising in the fact that the statistically defined amount of information is independent of the semantic content, as this was the aim itself. The interesting thing is rather that, quite contrary to his own expectations, Shannon shows that the information which can be produced by a stochastic interpreter is not only independent of the meaning content of the message, but also of its notation structure, as the amount of information depends solely on the interpreter. The less the interpreter is capable of specifying the statistical properties of a message, the more information.

Of course it is correct that we can speak, in a certain vague manner, of such a connection at the semantic level. The less we know, the more, in a sense, we have the opportunity to learn. But this vague analogy obscures a significant discrepancy because more knowledge in the statistical theory is identical with less information. The knowledge which is absorbed in the statistical procedure is thus for the very same reason no longer possible information. There is only information in so far as it is missing.

Shannon's problem here is that signals which occur with great statistical regularity may well occur as the consequence of a free choice. Statistical regularity, which is not information, can therefore both be the result of a system determined order and of a semantic choice.

It is the definition of information as - the degree of - uncertainty which is the source of these paradoxical implications. In her book, *Chaos Bound*, N. Katherine Hayles writes of Shannon's conceptualization that it contains a transformation of the thermodynamic concept of uncertainty. While uncertainty in the thermodynamic description is seen as an actual micro-physical disorder, i.e. a state which cannot be known and where it is only possible to describe the statistical probabilities of possible states, uncertainty in Shannon's theory is understood as a degree of the "unexpectedness" of an actual event.[14] In Shannon's theory the micro-state is completely unambiguous and given. There is a message in the form of a fixed sequence of given, individual signals. Here, it could be added, the result of the measurement varies solely with the measurement procedure.

For the same reason Shannon's conceptualization of the information concept is only directly justified as part of the description of statistical properties in connection with the occurrence of symbols in different notation systems. The stochastic interpreter contains neither a description of the syntactic structure of the message nor of its informational content and can, precisely for this reason, be used on any set of physical forms, including physical notation forms.

As mentioned, there is no doubt that Shannon himself imagined that it was possible to design complicated stochastic models which could describe the syntactical structure of a common language, for example, as such a model would also make it possible to define a standard for the possible content of a message expressed as the degree of freedom in the choice of any succeeding symbol. The consequence of this, however, would be that the ability of a language to express a content decreases with the increasing complexity of linguistic rules.

As Shannon's paper indicates that he believed that we can always discriminate between rule determined and meaning determined notation occurrences, it would have been more obvious to use the model on formal languages in which such discrimination is obligatory, but this would have provided no confirmation. Although any notation in formal systems occurs as a consequence of a free - and semantically meaning-bearing - choice, the individual notations can nevertheless appear in a multiplicity of repetitive patterns.

It would not be difficult to construct a stochastic procedure which could produce plausible formal expressions, but it would be difficult to convince anybody that such a procedure thereby had any descriptive validity at all.

We need not, however, go down these blind alleys in order to derive some benefit from Shannon's theory and they were only of esoteric significance for his main purpose, which was to formulate mathematical means of optimizing transmission capacity in energy-based information media.

The definition of information as the degree of uncertainty of the occurrence of a signal comprised, in this connection, only one of the interesting new features.

Another lay in his account of how entropy per symbol in a text could be converted to an expression of the frequency with which a source produces entropy per unit of time. This conversion follows almost of its own accord providing that the stochastic procedure is seen as a generator which produces symbols*at a given speed*.

By looking at statistical uncertainty, the information, as a function which can be expressed in physical duration, measured in time, in addition Shannon incorporates the information concept into a general physical scale. *The purely formal, statistical definition of the informational amount is thereby transformed into a definition of the informational amount as a physically determined entity.*

This means that informational entropy can be measured on the same scale as any other physical signal defined by a time function, whether it occurs with complete certainty or with some statistical (im)probability, as time is a general measure of duration. The concept of informational entropy thereby becomes a common unit of measure for a comparison of the degree of uncertainty of different stochastic procedures.

There was nothing new in describing a physical symbol structure on the basis of the duration of the signals. On the contrary, this had been taken up by many engineers since the introduction of the Morse alphabet for telegraphic purposes, because the duration of the signals was a decisive factor for the capacity of the transmission channel. The Morse alphabet was itself an example of how a discrete notation system, such as the alphabet, where duration is not distinctive, could be advantageously converted to a system which uses only duration as a distinctive element. This is probably also the reason why Shannon himself introduced the measure of time for informational entropy without noting that this implies a reinterpretation of the concept, as it now becomes an expression of that frequency (measured in time) whereby a more or less unexpected, but distinct, *physical* phenomenon (which can also be measured in time) occurs.

Informational entropy can, as a physical time function, only be determined on the precondition that the time scale which defines the signals as physically distinct signals is unambiguously connected with the time scale which defines the frequency of unexpected occurrences.

This connection is guaranteed by the chosen stochastic procedure when it is regarded as a product of a mechanical generator which operates at a known speed. The generator thereby establishes both the code which separates the signals as distinct physical values and the code which defines the average frequency of the unexpected occurrences of such distinct signals.[15]

By utilizing the time scale established by the generator, informational entropy can both be measured 1) at the level which is concerned with omitting the signals determined by the statistical structure, 2) at the level concerned with minimizing the time taken to transfer the symbols which are not determined by the structure, and 3) at the level which is concerned with specifying the individual symbols in the most economical form with regard to transport, for example by calculating the average necessary number of bits which are needed for unambiguous identification (which makes subsequent re-coding possible).

With the definition of informational entropy as the optimum, i.e. least possible, physical duration, Shannon arrived at an expression for the entropy of a given source which could be made operational in relationship to the transmission capacity of the channel, as this could also be expressed as a function of the possible messages per time unit.

Shannon then attempted to demonstrate that there is always at least one mathematical method to calculate the optimum compression of messages which are manifested in a discrete notation system. The demonstration is carried out partly by showing that the average transmission speed cannot be greater than*the relationship between the channel's capacity per time unit and the source's entropy per symbol*, while it can conversely be optimized so that it coincides with this - calculable - value with an uncertainty that is almost non-existent.

Shannon mentions two different methods of carrying out such an optimization. One method involves a division between the more probable messages which are transmitted as they are, while the less probable messages are separated and sent in a different code. In the other method the messages are organized according to their degree of probability, after which they are re-coded in binary form, where the more probable messages are represented by a short code and the less probable by a long code. In both cases it can be shown that for messages of a certain greater length, the upper limit for average transmission speed will be determined by the relationship between the channel's capacity and the uncertainty of the source.

In practice the result must be modified, however, because the code procedure itself, which elapses in time, builds upon a calculation of probabilities. Coding requires an analysis of the structure of the message. The code mechanism must thus contain a store with a certain capacity. As a consequence of this the optimization of coding is always carried out at the expense of a certain delay in transmission. The same effect is produced at the other end of the channel as well, where the transmitted signals must be coded back to their original form.

In addition to this, there is yet another problem. As the theory is formulated up to this point, it is concerned with transmission through an idealized channel where the signals transmitted are supposed to move undisturbed through empty space. In this case, no transmission at all is possible because all signals are defined on the basis of some kind of physical manifestation in a medium - if nothing else then in the apparatus that registers the signal. It is therefore necessary to investigate how the determination of the optimum compression is influenced by physical noise in the channel.

This problem in itself would be of a purely technical nature if it were not precisely that the technical definition of informational entropy is identical with the technical definition of physical noise. As a consequence of this coincidence in the technical definitions, the technical possibility of distinguishing between information and noise depends upon conditions which lie outside the definition. The question is which?

6.3 Information and noise

With his definition of information as a random variation that can be described relative to an order defined by an arbitrary stochastic procedure, Shannon laid the foundation of a conceptual pattern which has since been the object of considerable attention. The heart of this conceptual pattern can be summarized as the thesis that it is possible to regard random variation as the basis of an order at a higher level. In Shannon this idea comes to direct expression, as without any detailed account, he assumes that the relative randomness which can be observed in the occurrence of a symbol sequence in a given message is a direct manifestation of the distinct content of the message. The disorganization which exists when the message is regarded as a sequence of individual symbols thus creates the basis for its meaning at the higher, semantic level of observation.

There is therefore a partial justification for this interpretation in what Shannon writes, but no reason is provided and his own analysis gives, on the contrary, several indications that it is a wrong approach.

One of these indications lies, as has already been discussed, in the description of informational entropy as a function of an arbitrarily chosen stochastic process, according to which the amount of information decreases with the increasing precision of the description of the message. It is immediately obvious that this relationship in itself prohibits any reference to the semantic content of the expression from being ascribed to the concept of informational entropy. Informational entropy can either be regarded as an arbitrary statistical function, such as is the case with the description of the source of the message, or as a function of time in a temporally defined signal system, such as is the case with both the description of the transmission channel and of the stochastic procedure as a mechanical signal generator.

Another indication appears from a more detailed observation of the coincidence in the definition of information and noise. The reason why Shannon uses the same definition of both phenomena is due to the circumstance that he is particularly interested in the mechanical transport of information, where doubt can arise as to whether a signal appears because it has been transmitted, or is a consequence of noise in the transmission channel. He is thus - in this connection - not interested in irregular noise which does not distort the signal transferred to such an extent that the receiving mechanism cannot distinguish the transmitted signal, nor is he interested in regular noise which always produces a certain distortion, except in those cases where this distortion can result in two different signals being received as the same.

On the other hand, he is particularly interested in how to determine with the greatest efficiency whether a received signal, with a legitimate physical form, stems from the transmitter or is due to noise during transmission.

In these cases, writes Shannon, it is reasonable to assume that the signal received can be understood as a function of two transmitted signals, one being the transmitted signal, the other being the noise signal. As both these signals can be understood as random variables, it can also be assumed in continuation of the preceding analysis, that they can be individually represented by appropriate stochastic processes.

This train of thought can be illustrated by imagining that the transmission is sent in binary code where the question is how it is possible to be certain that a transmitted 0 or 1 is also received as 0 or 1, if the physical noise in the channel sometimes means that a 0 is actually received for a transmitted 1. The idea therefore is to regard the transmitter and the noise source as two generators each operating with a measurable entropy.

This idea assumes that both the "informational entropy" and the noise are manifested in the same physical form as the physical signal - namely expressed in duration or amperage, conceived of as bits, for example.

The theoretical identification of noise and information has thus special relevance for the mechanical handling of notation systems where it is not immediately possible to use semantic criteria in the interpretation of the legitimacy of the notation unit and where the individual members of the notation system are defined on a common physical scale of values, because the relevant noise for the receiver occurs as though it came from the transmitter, completely on a par with and inseparable from the legitimate signals which comprise part of the message.

In noisy channels of this type a completely correct transmission is impossible and the question therefore was whether a coding procedure could be found which would enable a reduction in the frequency of errors or ambiguities in the received result to as great a degree as desired. A possibility would be to send the same message a great number of times and let the receiving apparatus carry out a statistical analysis of the individual messages in order to separate the most probable, correct version. The principles of the method are simply to increase redundancy in the total set of transmitted, identical signals which would imply a correspondingly great reduction of the effective capacity of the channel. Shannon could show, however, that it was possible to code the transmitted message in such a way as to minimize the limitation which lay in this increased redundancy by introducing a correction mechanism.

This mechanism was conceived of as an extra coding which would be added to or incorporated into the original message and the question therefore was whether it would be possible to determine the optimum reduction of the channel capacity which this extra coding would bring about.

For this purpose Shannon defined the effective transmission rate as the difference between the information transmitted and the information lacking at the receiver - due to noise. This missing information thus expressed the average uncertainty ("the equivocation") which obtained for the signals received. It therefore also expressed, wrote Shannon, "the additional information" necessary to correct the message and this measure consequently indicated the necessary capacity for correction.

According to this, it is possible to carry out a coding which ensures as close an approximation of the correct transmission as desired and which only limits the capacity of the channel with that uncertainty whereby noise is produced in the system.

At this point Shannon's analysis has a theoretical character, as he only supplies proof that on the basis of the given premises it is theoretically possible to find a coding procedure which can optimize the transmission. The procedure which can fulfil this condition, however, depends on the specific message.

This also appears from Shannon's own example of such an efficient re-coding, as he shows how a sequence of seven binary signals can be coded, x1, x2... x7 (where the individual signal has one of two possible values). Of the seven signals, four (x3, x5, x6, x7) comprise the content of the message, while three (x1, x2, x4) - in Shannon's terminology, redundant signals - comprise the necessary number of signals which are used to correct the message, if it is assumed that this block has either been transmitted free of error or with one error and that these eight possibilities are equally probable.

The value of the redundant signals is determined by a simple addition of the binary numerical values, as

x1 is defined so that x1 + x3 + x5 + x7 = 0

x2 is defined so that x2 + x3 + x6 + x7 = 0

x4 is defined so that x4 + x5 + x6 + x7 = 0

If one (and only one) error occurs during transmission this will appear from the fact that either one, two or three of the redundant signals has become a 1 when the same test procedure is carried out by the receiver. If one of the redundant signals has been distorted, there will be *one* 1 value, if x3, x5 or x6 has been distorted, there will be *two*, (different in each case) if x7 has been distorted, there will be *three* 1 values.

As this is a question of a binary system, the wrong signal can therefore be corrected automatically.[16]

Even though coding can be carried out mechanically, it is nevertheless based on a semantic description of the message, as the binary signal values are interpreted as numerical values which can be added to each other. In other words, coding is brought about by ascribing a certain semantic value to the individual notation units. Although the message is interpreted through a formal semantics which is independent of the language in which the message appears, a semantic interpretation is still necessary to ensure the legitimacy of the notation unit. This is thus not a question of an asemantic coding, but on the contrary of the use of a formal semantics in the coding of the necessary notational redundancy. The redundant notations are then also redundant when regarded in relationship to the meaning content of the message itself, while both as transmitted physical signals and as notations in the formal code they are equally as distinctive notation units as the others.

Here, Shannon formulated a fifth definition of the redundancy concept where redundancy is determined as:

- a formal control code which can be defined by subjecting the message to a formal calculation, the result of which is added during transmission and removed after reception.

The - calculated - redundancy is thus not contained in the original message's expression and has no relation to the meaning or rule structure of the message. Whether the individual x's in the example above represent the letters in a word, an arithmetical problem or the result of a physical measurement of the temperature of sea water, or something completely different, is of no significance whatsoever for the determination of the formal control code. This is precisely where the advantage lies, because the method hereby becomes generally usable.

The fact that Shannon understood this solution as an asemantic solution to the problem of noise is first and foremost because of a confusion in his information concept, as he uses the concept both of a mathematically quantified expression for a meaning content and of a simple physically defined notation unit. Both views aim at an asemantic description of semantic phenomena. But in the first case the information concept is defined as the specific meaning-bearing part of the expression seen in contrast to the rule determined part. In the second case the information concept is defined quite independently of the meaning content, as the physical view is valid for any notation, whether it belongs to a repetitive redundancy structure or not.

In other words this is a question of two different definitions of the superfluous or "meaningless". In the first case it is determined as the repetitive structure which, for the same reason, can be omitted from the transmission. In the second case it is determined as the unintended occurrence of one of the physically defined notation forms used. In the first case the noise is thus identical with the rule-determined, in the second with signals occurring at random which are only distinct from legitimate signals by not being intended.

On the other hand, there is also a certain inner connection between the two definitions, as the use of the first definition as a means for eliminating the redundant notations becomes definitively limited by the second noise problem concerning the question of how to decide whether the occurrence of a legitimate physical form is intended or not.

While Shannon's idea that redundant notation sequences are an expression of the rule-determined structure of the given symbol language must be rejected, because it - as will be considered in greater detail in chapter 7 - is neither valid for formal nor written language expression forms, his noise theoretical analysis shows that the occurrence of redundant notations is necessary to stabilize the recognition of the legitimacy of the physical form as notation form.

There is also a reason to attach importance to the fact that Shannon's physical noise problem also has a general background, because each notation can only manifest itself in a physically possible form. The specific coincidence between noise and information which is treated in the theory is thus also a specific manifestation of the fact that any notation form can coincide with a physical form which is not intended. From this it appears indirectly that there is always an intentional and symbolic element in the definition of a physical form as a legitimate notation unit, despite the precision in the definition of the physical form. In other words, it is not possible to maintain Shannon's idea of a purely physically defined, asemantic notation.

6.4 A generalization of the physical information concept?

Warren Weawer's comments on Shannon's theory give it the appearance of a generalization of Boltzmann-Szilard's physical information theory, because Shannon defined "informational entropy" in the same mathematical form, but independently of the physical medium in which the information was embedded. The same formula, however, describes two completely different relationships. While thermodynamic entropy describes how a number of molecules, whose individual motions are unknown, can be expected to act as a whole, informational entropy is a measure for the irregular recurrence, but actual occurrence, of individually identifiable single signals.

Shannon's definition of information as entropy, a degree of uncertainty, is not a more general definition of the thermodynamic entropy concept, but another specific use of the same mathematical expression. It does not differ, however, by being a mathematical definition instead of a physical definition, as informational entropy is a yardstick which is only used on - certain types - of physically manifested signals.

Shannon could not have found a more inappropriate title for his work if he had tried. It is not *the* mathematical theory, nor is it a purely *mathematical *theory and it only concerns *communication* in the very special sense of mechanical transmission.

Not the mathematical theory

That it is not *the* mathematical theory appears from the later mathematical-physical discussion, in which two different limitations were introduced. First, the theory contains no definition of the phenomenon it can express in quantified form, namely the concept of information and second, it has not made the need for other quantified information measures superfluous.[17]

Donald Mackay describes this limitation by differentiating between quantifications based on ]* selection* from a set of preconstructed forms such as Shannon's and quantifications based on *form construction*, exemplified by the construction of the form of a TV picture with the help of spots of light.[18] The decisive point in Mackay's argument is that the question "how much information" must be answered in different ways depending on the given forms of the information which are relevant in a given context. Constructive and selective information measures - among which Shannon's is just one of many possible - do not therefore represent competing theories of information either. On the contrary, they represent quantifications of an information concept which cannot be defined by one or the other quantified measures for an amount of information.

*It would be clearly absurd to regard these various measures of amount of information as rivals. They are no more rivals than are length, area and volume as measures of size. By the same token, it would be manifestly inept to take any of them as definitions of the concept of information itself.*[19]

Even though these quantifications can be regarded as complementary, continues Mackay, it is not possible as a matter of course to define an information concept through an abstraction based on the complementarity between different quantified information measures. The quantified theories also have in common the fact that they work with an operationally defined information concept which only allows a definition of information as "that which determines form". Common to these theories is thus that they refer to processes in which the time-spatial form of one set of events (a place) determines the form of another set without taking into account the energy process involved. Information is thus defined only as "something" which flows from one place to another.

According to Mackay, this view builds upon a false analogy, as by determining information through what it does (determines form) we look at information in the same way as we look at energy in physics, namely through what it does (produces acceleration) and not through what it is, namely some kind of specific physical energy process.

Mackay bases his criticism of this analogy by pointing out the difference between what energy is said to do: perform work of a physical character and what information is said to do: perform work of a logical character.

*In talking about information, there is always a suppressed reference to a third party, as in the physical theory of relativity we have to relate our definitions to an observer, actual or potential, before they become operationally precise*.[20]

The third party not overtly referred to which is waiting here in the wings, pops up precisely because the information concept, as will be discussed in more detail in the following section, must necessarily contain a semantic dimension connected to the choice of the viewpoint of the process.

While Mackay - on a par with George A. Miller[21] in this question - takes his point of departure in the need for other quantitative information measures, Peter Elias adds that the many different uses of Shannon's information measure depend on specific conditions, purposes and connections in each case. Validity does not depend on the mathematical measure, but on the character of the given way the problem presents itself and the theory can only hold true of transformations in which the reversible coding of a set of sequences to another occurs.[22]

This is a central limitation. The theory not only lacks a definition of the information concept, it concerns only a ]*re-coding* of an already physically defined message.

Myron Tribus can also refer to a private conversation where Shannon in 1961 was supposed to have expressed scepticism regarding the use of the theory outside the context of communication theory and acknowledged that it does not contain a definition of the information concept.[23] The contribution that the theory makes to this does not lie at a mathematical level at all, but at a physical level, as it is solely concerned with the physical dimension of the symbolic expression form. That Shannon assumed there was an unambiguous equivalence between the individual physical notation and the content of the message was perhaps due to the fact that he regarded the formal notation form as typical.

Not a purely mathematical theory

Shannon's theory is not *the* mathematical theory, but nor is it a purely *mathematical *theory. It is a mathematical-physical theory. Seen in comparison to thermodynamic theory it is a matter of a different description of the relationship between energy and information, as the new theory not only refers to the special case where a certain amount of energy "contains" a certain amount of information (on the same amount of energy) but on the contrary to all cases where an arbitrary meaning content has been manifested in a physically well-defined notation system. Considered in the light of Szilard's "narrow" information concept this represents a great expansion, as the informational notation unit is not only emancipated from a certain meaning content, but also from the physical, natural form. Although Shannon defines informational notation through certain physical values, this physical form is distinct from the physical, natural forms in two ways, as the informational entity is limited both in relationship to physical forms which are not identical and in relationship to physical forms which are identical, but not intended. Each of the two definitions is thus connected with its own noise problem. One that concerns the physical form and one that concerns the legitimacy of the physical form as a valid member of the message.

The mathematical definition is thus valid only for physical systems in which it is possible, on the basis of rules, that is instrumentally, to install the lower noise limit which will ensure a stable distinctiveness between non-informational physical noise variation and informationally significant, physical variation.

In this sense, Shannon's theory is only valid in connection with symbolic systems which operate with a well-defined and invariant noise limit. The quantified amount of information is at all stages determined relative to a controlled quantity of energy, where the noise which does not exceed the critical threshold separating the symbol from the medium can be ignored.

The physical character of the theory therefore manifests itself not only as a physically determined limitation of the possible applications, it also manifests itself in the sense that the theory only concerns informational entities which are available in a *physically defined *form, because the critical threshold which is the basis of the distinction between noise and information is brought about as a definition of physical threshold values. The physical definition of the informational entity comprises - as was also the case with the Turing machine - a necessary, but not complete, condition for carrying out the - presumably asemantic - re-codings, which at the time constituted a highly dramatic innovation.

That this is a mathematical-physical and not a purely mathematical theory follows for the same reasons, which meant that it was not *the* mathematical theory, but one among many possible mathematical information measures. That different forms of mathematical quantification can exist is due to the fact that the individual methods each measure different physical features of the symbolic expression units.

The whole point of mathematical theories of information is connected with the circumstance that they allow a mathematical handling of symbol systems solely on the basis of the physical properties which are used to define the physical form of the symbols. One the one hand, this justifies distinguishing between th**o**se theories of physically defined symbols and physical theories which describe physical distinctions independently of whether they are used for symbolic purposes. But, on the other hand, it does not justify ignoring the fact that the theories do handle the physical properties of symbols with mathematical methods.

Shannon's quantified information entities are bound by the physical definition. But it is also a question of a new determination of this bond. The bond is no longer, as in Szilard, naturally given as a causal connection or isomorphic combination. Within certain limits it is an open, arbitrary bond. Information is no longer seen as a simple, mathematically expressed function of energy, energy is seen, on the contrary, as a function of notation. This also implies that informational notation is subject to the demand that the notation units used be defined on the same scale of physical value. This is thus a question of a far more rigorous demand on the definition of the notation's physical form than the demand which obtains for written and formal notation, where the demand on the physical form is related to sensory recognizability, while informational notation is subject to the demand that it function as a mechanically effective entity. The relationship between the different notation systems is discussed in greater detail in chapter 7 and sections 8.1-8.3.

Shannon himself perhaps passed over the physical features of the theory because, in all essentials, the first physical noise problem was concerned with practical problems which were of no significance to engineering, as they could be solved with familiar mathematical-physical methods, while the remaining questions concerned the optimization of time consumption and the handling of the second noise problem which has no physical solution.

The problem of noise and the ability to generate symbols

The basic demand on a physical expression form is the well-defined lower physical limitation of the informational entities relative to the variability of the physical medium.

The demand for such a lower limit is true in principle of all symbol systems. For digital systems it is thus true that there is no isomorphism between the informational process and the physical process through which the former is performed. Turing touched on the same when he pointed out in his article in 1950 that certain mechanical systems could advantageously be regarded *as though* they were discrete-state-systems, although physically considered they are continuous.

On the face of it, it might appear as though the demand for a lower noise limit does not have the same validity for analogue systems, which are characterized by equivalence or isomorphism between physical and informational variability. But this equivalence can only be brought about at a certain macro-level. That isomorphism between an analogue symbol and the supporting physical structure depends on a previous coding of the physical structure appears from the fact that the same physical structure can be described independently of the sign bearing structure. As any symbol formation is bound to a physical manifestation, the smallest symbol unit cannot be smaller than the smallest organized physical variability, but it cannot - according to existing physical knowledge - be equal to the smallest physical variability either, as the micro-physical description here assumes the existence of irreducible noise.

The demand for a lower noise limit is thus valid not only for digital, but also for analogue symbol systems, nor does it appear possible to limit this demand to a demand which is only valid for certain technical information systems. We can assume that it is also valid for human perception and information processing. On this point Shannon's noise theoretical results therefore appear to be general. Symbolic activity assumes both the ability to separate the symbolic expression forms from the physical noise in the physical (or physiological) medium used and from identical physical forms. The question then is, how the critical threshold can be brought about and work in different biological, human and artificial information systems.

Here, a fundamental difference between the artificial systems covered by Shannon's theory and human information systems makes its appearance, as the latter possess the ability to produce codes of both the first and second orders (and many more), while the former are characterized by only being able to perform re-codings to the second order, if a model in the form of codes of the first order already exists. The most significant point is not that the one system can produce several types of code, but that it also possesses the ability to *produce* the critical thresholds which are a condition for symbol systems of both the first and second order. In Shannon's theory, the critical threshold is defined prior to and independently of the system. Artifactitious systems assume that there is already a coding of the first order *and* a defined critical threshold which makes re-coding to another order system possible.

To all appearances, only certain types of physical information systems possess the ability to produce codes of the first order themselves, namely those systems which are traditionally described as biological. As - some of - these systems have the ability to bring about the critical threshold itself, which is a condition for the first symbol formation, it is difficult to imagine that these systems should not retain this ability.

In that case the biological systems which possess the ability to create consciousness are characterized by the fact that they are not subject to the demand for a preceding, once-and-for-all established threshold for a given system which determines the condition for the distinction between physical noise and information as a functional condition for the artifactitious systems. It is therefore more plausible to assume that the conscious systems not only retain the ability to produce symbols of the first order, but that they also, at some stage of biological history, have generated an ability to release themselves from the established noise thresholds, for example in the form of a semantic re-interpretation or exploitation of "noise".

There can be little doubt that many biological information systems are capable of maintaining a given critical threshold for a certain period. Human beings can certainly maintain similarly critical mental thresholds, when we calculate, perform deductive conclusions and other systematic thought processes. In such cases, however, we usually prefer to use externalized aids precisely because they help to stabilize or maintain invariant thresholds for defining valid informational entities during the performance of well-delimited tasks. The concentration required and the difficulty involved in maintaining these thresholds for a certain time shows, however, that for consciousness to have defined thresholds as a characteristic, its constitutive properties must be the ability to vary and even create thresholds.

The human system has thereby a symbol producing property that computers do not have, namely the ability itself to establish the critical threshold which makes it possible to separate informational physical variations from physical noise variation.

The exact delimitation of what noise is, relative to a critical threshold defined outside the system, can therefore not be transferred to human consciousness. An analysis of Shannon's noise theoretical results thus confirms the relevance of the symbol generative test criterion for an understanding of consciousness and intelligence, as discussed in sections 5.8-5.9.

The reach of this appears, among other things, from the fact that it is incompatible with a main assumption in the later theories of cognition based on information theory, namely the idea that cognitive activity can be described as a closed informational system which can either be understood as isomorphic in relation to the neuro-physiological system, or as a self-supporting, learned or inherent, logos which organizes the underlying biological and physical components. These assumptions not only ignore that the biological theory of the origin of the higher organisms also includes consciousness, but also that the restrictions which apply to physical information systems of Shannon's type cannot be reconciled with our knowledge of our own ability to produce codes and symbols.

The mathematical-physical determination of the critical threshold first arose as a relevant theme - both in the understanding of analogue and digital systems - in connection with the appearance of a physical-technical potential for - invisible - symbol handling based on the technical mastery of micro-physical energy processes. This concerns performing symbol handling independently of the perceptual and cognitive potential which provides the basis for human symbol production.

It was this difference which gave Shannon's `choice theoretical' information considerations far-reaching practical and thereby also cultural and theoretical significance, as it pointed out the possibility of compensating for physical noise by increasing redundancy in the transferral of messages in not completely reliable electronic systems. The benefit lay in going beyond the direct equivalence between energy and information amount.

With the mathematical measure for technical compression, Shannon created an obvious technical advance within the area of information transport. But looked at from the point of view of information theory it has a grave defect because this solution assumes that the theory can only concern re-coding of already coded messages. It is also one of the two reasons which mean that it is not a theory of *communication* either.

... nor a communication theory

A reasonable demand on a communication theory is naturally that it concerns an exchange of meaning or signification. Meaning is also included in Shannon's theory in the sense that it is concerned with discovering an economical method of transferring messages without loss of meaning. Meaning is included, however, only as the ultimate test criterion of the success of the communication and the heart of the theory did not lie in the exchange of meaning between the producer and the interpreter, but in the transfer of already formed messages, as Shannon took his point of departure in the manifest expression form of the message.

*Frequently the messages have *meanin*g that is they refer to or are correlated according to some system with certain physical or conceptual entities. These semantic aspects of communication are irrelevant to the engineering problem. The significant aspect is that the actual message is one selected from a set of possible messages. The system must be designed to operate for each possible selection, not just the one which will actually be chosen since this is unknown at the time of design*.[24]

That Shannon in spite of this - with regard to engineering, well-founded - motivation of semantic irrelevance still concerns a communication theory is not only due to the ultimate criterion of understanding (in the final analysis the message must be recognized by a human interpreter), but also that he actually uses a theoretical model of communication as a point of departure for his engineer's perspective.

That he again here - and now tacitly - can ignore the semantic dimension is because he is only concerned with the reversible re-coding of existing codings.

The model comprises a functional unit which includes five components: 1) a source of information, 2) a transmitter with a built-in code set, 3) a channel, 4) a receiving apparatus with a de-coder and 5) a receiver. As we have seen, the weight of the theory lies in the effectivization of transport, i.e. the movement from the second to the fourth component. The fifth component, the final receiver, is included because all processes from 1 to 5 must operate in such a way as to ensure that not only the signals, but also the meaning reaches the receiver. The central operations, the establishment of the stochastic procedure which must be used in the re-coding procedure, is carried out, however, before the first step.

Shannon indicates that the theory includes a number of different types of message, namely:

- Sequences of letters (such as in the telegraph and teleprinter).

- Sequences expressed in a single time function (such as the radio and telephone).

- Sequences expressed in a time function and other variables (such as black and white television with two spatial co-ordinates).

- Sequences expressed in two or more time functions ("three-dimensional sound transmission" and multiplex).

- Sequences with several functions of several variables (such as colour television).

- Various combinations (such as television coupled with sound).[25]

It is true of all these examples that the communicative process occurs as a sequential, linear physical process through one or more separate channels. It is thus not only assumed that the message is available in a fully formulated state before it is transmitted, but also that there must be no meaning exchange or informational interference between the transmitter and the receiver during the process. There is no room here for confidential conversations, or telling glances and gesticulatory articulations of meaning.

These limitations are not of a temporary character. It is not the case that they could be modified through an extension of the theory. They comprise its constitutive basis, as they are contained in the demand this type of system makes on the physical definition of informational entities. That these demands do not apply to all systems and especially not to human communication, appears, for example, from the fact that we cannot mistake what we can see on a television screen, namely physically defined symbols with well-defined critical thresholds, with what we can see on the spot, where the definition is left entirely to the observer. What we can see on the spot has not been filtered by the coding the transmitter must carry out in order to transmit a television picture. There is thus a difference, because in the one case something is being communicated which is not being communicated in the other.

Although some of this missing "information" could perhaps be analysed and brought into a formal description, it would not remove the difference. Touch, for example, implies something other than a symbolic representation communicated through an electronic medium. The difference exists, among other things, because in spite of all general definitions, any information system is always a specific physical system implemented in a definite form.26

The artifactitious system does not cover the entire human perceptual potential because it depends on the establishment of a well defined critical threshold. Human perception is not subject to the same demands for a well-defined lower limit between noise and information. The lack of such a definition is, on the contrary, rather a constitutive condition for meaning production.

Shannon conceals the problem of the first coding behind the second problem, that of re-coding by simply filling up the box of information sources with a list of different means of information transport.

As the theory with regard to definition ignores the meaning dimension both in connection with production and exchange, it is impossible to consider it a legitimate candidate within the area of communication theory.

But it would be equally misleading to simply dismiss the theory with reference to Shannon's asemantic symbol concept. Although the theory neither includes meaning production nor oral communication, to a great degree it sheds light on the understanding of the physical dimensions of both alphabetical and pictorial symbol manifestations. And, although not formulated as such, Shannon's theory also presents the first theoretical attempt to describe a notation system independent of the senses, where the relationship between the individual notations is mutually conditioned. Whether this could have been done on the basis of a semantic approach cannot be decided. Shannon showed, apparently unwittingly, that it could be done with the point of departure in a physical symbol definition.

6.5 The semantic ghost

If Shannon's information concept is seen as a quantified measure of the meaning-bearing parts of an expression, the problem arises that the amount of information in the message grows in inverse proportion to the organization of the expression and that the most meaning-bearing expressions are identical with the least structured. Or in Katherine Hayles' words, the most muddled expression contains the maximum possible information.[27]

Nevertheless, in a considerable part of the later literature, the conceptual connection between noise and information has been maintained, as this has been joined to Shannon's definition and the information content of the received message described as the sum of two messages. But where Shannon distinguishes between these two messages by operating with two mechanical generators working within the same physical system, a distinction has been introduced between that part of the received message which is intended by the transmitter and that part which is not. Shannon's distinction is thus ascribed a semantic foundation joined to an intention.[28] As Hayles remarks, this re-interpretation assumes the introduction of an interpret]**er**, who can see the system from outside. As the transmitter and the receiver in Shannon's theory do not enter into a semantic relationship (the receiver is not aware of the transmitter's intention, but only of the message received), this description can only be carried out by introducing an outside observer of the total system.

This observer is the only one capable of making the distinction. On the other hand, he can therefore see both the information which is not intended as "destructive" noise or as "constructive" additional information, not measured in relationship to the message, but in relationship to the total system.

Within this idea lies a real emphasis of the fact that Shannon's theory assumes both a tacit semantic interpretation and an outside observer.

The semantic interpretation is assumed because there must be a valid, noise-free message as a starting point and because no asemantic criterion can be formulated for deciding the legitimacy of the received signal. This distinction can be ensured, as we have seen, with the help of the redundancy function, as the message can either be sent a great number of times, or be analysed and a set of control codes prepared through which the legitimacy of the individual signal can be determined by the surrounding signals. These codes cannot be prepared without some kind of semantic analysis of the message.

The outside observer is assumed in the sense that there is not only a transmitter and a receiver at each end of the system, but also a proof-reader, who can observe both the total system and describe the noise structure of the channel.

These descriptions mean that the signal process must be observed from several places. The informational entropy, which is measured at the message source, is thus distinct from the informational entropy which is transmitted from the noise source and both are distinct from the informational entropy in the received message. It is also evident from this that informational entropy varies exclusively with the interpreter and they can only be connected on the condition that the system is seen from outside, that a meta-interpreter also exists.

It is also only this meta-interpreter who is capable of differentiating between information and noise, as this distinction is only relevant because the two elements are identical in the system itself. Although this reading of Shannon's theory takes its point of departure in a demonstration that a meta-interpreter is also assumed in Shannon, it still contains a semantic short-circuit which obscures the point which lay in considering the signal independently of its semantic content. Instead of limiting the reach of the theory by maintaining this, constructively seen, rational point, Shannon's noise concept is interpreted as though it did *not* depend on an outside interpreter.

The whole point of Shannon's theory, however, lay in the fact that the concept of noise and information coincided inasmuch as only the physical properties which characterized the signal as a member of a symbolic notation system should be taken into account. If the interpreter observes the noise as a source of a new organization, he is actually looking at another system, in which Shannon's problem is not of importance. Shannon only had a problem *if there was noise which had no potential meaning at all*.

Although Shannon was incorrect in his postulate that the meaning of the message was irrelevant from the point of view of engineering, it is not tenable to re-interpret his definition of the noise concept as a potential meaning-bearing phenomenon. Shannon was not wrong because he ignored the fact that both noise and information were meaning-bearing, but because he overlooked the fact that the dividing line itself between information and noise can only be a *semantic* distinction in the - relevant - case where noise is manifested in the same physical form as the information. Here, on the other hand, it is indispensable.

Shannon's theory concerns separating those elements which physically seen manifest themselves exactly like the intended information, but which do not constitute information. In Shannon, therefore, noise is not something which can be added *to* the original message, but exactly "something" which is lacking, namely the knowledge of the legitimacy of the physical form.

However, Shannon not only contributed to a confusion of his own concepts with the postulate that it was possible to establish an asemantic point of view, he created just as much confusion by describing the information concept as though it were a physically indefinite element. The theory exclusively concerns how to handle symbols which are defined on the basis of their physical form. This is also the only reason why the theory need concern itself with transmission efficiency and noise, just as it is the explanation why the theory distinguishes between discrete and continuous transmission systems.

Strangely, Shannon claimed not only that he could ignore the semantic content, but also that he could compress a message so that it *only* contained those symbols which contained the semantic content. He thus spoke both for and against the idea that there was a connection between his definition of the information in the message and its meaning.

This ambiguity has manifested itself in two different directions in the reception of his theory. On the one hand, as mentioned, an interpretation which sought to maintain that there is a connection, which not only interpreted the information concept, but also the noise concept as signals which contain a meaning in themselves. On the other hand, there has been a widespread tendency, especially within linguistics, to take the asemantic postulate at face value, as here - contrary to the idea of viewing randomness as order at a higher level - the inclination has rather been to say that the engineering point of view is irrelevant precisely because it ignores the semantic aspect. This view is discussed in more detail in chapter 7.

On the face of it, neither of these two directions seem satisfactory. It is not satisfactory to regard the most arrant nonsense as the optimum achievable information. But nor is it satisfactory to assume that a mathematical theory on the treatment of physically defined notation systems should have no relevance to an understanding of notation systems.

A third possibility is to regard the engineering point of view as a contribution to an understanding of the notation concept and, in this case especially informational notation, through an approach which, as a starting point, places semantic coding in parenthesis.

The motivation for this point of view cannot simply be that it is a more moderate middle path between over-interpretation and under-interpretation, it is also motivated by Shannon's actual results.

Although, in the preceding analysis of Shannon's theory, it has been argued that its validity should be greatly limited and re-interpreted, it has also been claimed that the theory still has general implications for an understanding of notation systems as a "meeting place" between the physical and the symbolic. There are reasons to emphasize three points in particular here.

The first is his demonstration of the specific noise theoretical problem which is connected with the possible occurrence of legitimate, but unintended physical forms. While the precise physical definition of the signal contains a solution to what could be called the general noise problem (namely the separation of physical forms, which can be included as legitimate signal values, from illegitimate physical forms) it also produces another specific noise problem in connection with the legitimate physical forms, as all physically defined signals necessarily have a physical form which can exist without having a symbolic value. In other words, the physical definition excludes in principle the possibility of deciding whether a given, legitimate physical form is noise or information, which again implies that we can draw the conclusion that it is impossible in principle to formulate a purely physical theory of symbolic expression forms.

Although Shannon supplied all the premises for this conclusion, it also went against his own efforts to formulate an asemantic information concept. But the noise theoretical problem also has a more general character which holds true of all physically manifested signals. The question thereby arises as to how this noise problem manifests itself and is solved in different notation systems.

The second is his demonstration of the significance of the redundancy function for ensuring the stability of the message. In spite of the fact that Shannon uses the redundancy concept with several mutually unconnected meanings, his analysis shows that, as far as informational notation is concerned, it is possible to work with different forms of redundancy, as some of them can substitute each other and perform the same stabilizing function. The analysis thus produces both a need for a more consistent definition of the redundancy concept and a closer analysis of the significance of the redundancy function for the stability of notation systems.

The third is his demonstration that it is possible to stabilize informational notation with the help of a formal semantics which is independent of that semantics in which the original message is presented and which therefore does not depend on the content of the message either. His analysis hereby shows that it is possible to stabilize informational notation with a semantic component which is quite independent of the semantic content represented. It also shows that it is possible to use formal procedures as a redundancy function that is equivalent to other forms of redundancy, which again implies that the formal procedure in informational notation can act as a semantically empty or a meaning indifferent procedure relative to the message contained in the informational expression.

Although the asemantic view of the notation system thus ends by allowing the return of semantics, it does not return as it was when abandoned. Shannon's analysis makes it clear that there is a semantic component in the expression substance of informational notation and that the description of this notation form must therefore be concerned with semantic properties on at least three levels: 1) the level which establishes the notation system as notation system, 2) the level which establishes the syntactic structure of the notation and 3) the level which is concerned with the semantic interpretation of the content of the informational notation. While the last level concerns definite messages, the two first have to do with the general properties of the notation system and thereby its semantic potential.

Together, these two levels of the description of notation systems indicate the curious circumstance that a given semantic potential always builds upon a semantic restriction at another, underlying level. That we can ignore the content of the expression is due to the fact that the form itself has meaning on another level. That meaning which makes it possible to distinguish any piece of information from any other, identical physical form.

Rather than claim that Shannon was mistaken in one or another of his two contradictory postulates, there is thus a reason to claim that he was partly right because he was partly mistaken in both.

**Notes, chapter 6**

- Shannon and Weawer (1949). Weawer refers to Szilard, 1929, but incorrectly gives the year 1925, where Szilard had published another article in which he gave a phenomenological interpretation of statistical thermodynamics where he ignored that information which could reside in the individual, arbitrary deviations. Szilard, 1925: 757, note. John von Neumann, (1932) especially chapter 5.

- Shannon, (1938) 1976. Davis, 1988b: 319.

- C.f. Shannon, 1949: 32 and Wiener, 1962: 6 ff. and 1964: 269. The idea of using binary representation is older. It is not clear who originated it. Wiener 1962: 4, writes that the idea was accepted in accordance with a practice used by Bell Telephone Laboratories in another technical area. H. Goldstine, 1972: 123 wrote that John Atanasoff used binary representation in a calculating machine from 1940 and later claimed to be its originator. It was the German computer pioneer, Konrad Zuse, however, who came first. Zuse used binary representation in his first computer (Z1), which he developed during the years 1936-1938, but only for numerical notation. C.f. Williams, 1985: 216 ff. Zuse's work, incidentally, was not known outside - and received little attention in - Germany until much later.

- H. Goldstine, 1972: 275.

- Wiener, (1948) 1962: 37-44. Quotation: 43.

- Wiener, (1948) 1962: 44.

- Steve J. Heims, 1988: 75.

- Quoted here after Heims, 1988: 73. In a later discussion, (an incomplete, posthumously published manuscript) von Neumann emphasizes a number of differences between the digital computer and consciousness (understood as the neurophysiological system) and concludes, "the Language of the Brain [is] not the Language of Mathematics" J. von Neumann, 1958:80. Among his reasons for this conclusion are that the neurophysiological system consists of an interplay between digitalized and analogue processes. It can therefore not operate with the same numerical precision and is not subject to the same vulnerability to singular signal disturbances as digital computers. Ibid. 68-78.

- Shannon and Weawer (1949) 1969.

- Shannon, (1949) 1969: 39.

- Shannon, 1951, Burton and Licklider, 1955, Mandler, 1955.

- Ibid. 49-50.

- Ibid. 50.

- N. Katherine Hayles, 1990: 37, 54.

- It is incidentally also worth recalling these principles in connection with the discussion of informational theories of cognition, as they give occasion to ask the question of the degree to which - parts of - the brain or consciousness operate with criteria of this sort, as a - hitherto unanswered - empirical question.

- Shannon, (1949) 1969: 80.

- Shannon's theory has given rise to what are still continuing discussions. A resume - with summaries of various main viewpoints - can be found in Machlup and Mansfield, 1983 and Hayles, 1990, among others.

- Such measures were developed by, among others, Ronald A. Fischer in 1935 and Dennis Gabor in 1946. Gabor, who was later awarded the Nobel prize for his work on holography, used the concept a "logon" as a measure of an amount of information, as the number of logons in a signal represented the amount of freedom in the structure, or the smallest number of independent measures mathematically necessary for defining the form of the signal under the limiting conditions of frequencies, band widths and duration. Mackay, 1983: 487.

- Mackay, 1983: 488.

- Mackay, 1983: 486.

- George A. Miller, 1983: 493-497.

- Peter Elias, 1983: 500-502.

- Myron Tribus, 1983: 475.

- Shannon (1948) 1969: 31.

- Shannon (1948): 33.

- This difference also holds true of virtual reality systems which offer a symbolically mediated, mechanical sensory effect on an arbitrary part of the body. The mechanical effect, it is true, will also be accompanied by physical noise, but this noise is relative to the influencing medium and therefore different for different sensory media.

- Hayles, 1990: 55.

- Hayles, 1990: 56, where the later paradigmatic transformation of Shannon's theory is described as an extension of the significance of the noise concept, as noise is not simply seen as a potential destruction of the message, but also as a potential source for the reorganization of the system.