Turing Machines

A gentle introduction to Turing machines for CogSci

What does in mean to say that something is computable? We generally reply to this question by saying that something is computable if we can write a computer program for it. But does it matter what programming language we use? Might it happen that something is computable in Lisp but not in C# (or the other way around)?

The answer generally is "no". All modern general purpose computer languages are equivalent in that anything you can write in one language can be written in any other. There is, in fact, a (fairly) simple theorem in the theory of computation to that effect. What is true, of course, is that some things are easier in one language than they might be in another. It is possible, for example, to write a compiler for C++ in the programming language COBOL (COmmon Business Oriented Language), but nobody in their right mind would do such a thing except to make a point.

But suppose we wanted to prove things about computing and computation? What language should we use? What machine architecture should we use? How can we speak about computing in general? One answer is the Turing Machine. It gives a formal (mathematical) model for computation that is generally accepted to be as powerful as we might need to have, and is yet simple enough that we can understand how it works.

To start with, let us return to the Turing paper once again and to his discussion on discrete state machines. In his example (in section 5), he asks us to consider a machine with three states labeled q_1,q₂ and q_3. There are two inputs, labeled i₀ and i₁. Finally, there are two outputs, labeled o₀ and o₁. The inputs could be considered as turning a switch on or off, and the outputs might say if a light is turned on or off.

Let us suppose that the light is on if the machine is in state 3, and is off otherwise. Then the light is on or off depending only on the current state. An input causes us to change from one state to another, according to the following table:

current state / input	state 1	state 2	state 3
input 0	go to state 2	go to state 3	go to state 1
input 1	go to state 1	go to state 2	go to state 3

(refer to the diagram in part 5 of the Turing paper). To simplify our discussion we will assume (along with the assumption that output #3 corresponds to a light being turned on) that input 0 is actually a 0, and that input 1 is actually a one. The way to read this diagram is to say, for example, that if we are in state 2 and see an input of 0, we go to state 3 (and turn on the light).

Exercise (not to turn in): Suppose that we were in state 2 and had inputs in the order 1, 0, 1. In which state would we find ourselves? Would the light be on or off?

Exercise (not to turn in): Suppose we are in state 1. What inputs would we require, and in what order, to get to state 3?

Exercise (not to turn in): It is useful to be able to draw a picture of this. We can do this in the following way:

On a piece of paper, draw a circle for each of the three states, and identify them by the state number.
For any state (circle), and for each of the two possible inputs, draw an arc with an arrowhead from the state to the state to which the input carries us, and label the arc with that input. For example, from the circle labeled "2", we would have a directed arc labeled "0" to the state labeled "3". From state 2 would have a directed arc labeled "1" back to state 2.

Try this.

More examples: Although this model for computation is not powerful enough for all of our needs (it is not possible, for example, to construct a machine consisting only of a (finite) table for a (finite) variety of inputs and outputs which will recognize balanced parenthesis), it is useful for many practical applications. To see how this might work, let's look at two more examples.

In these examples, we identify one state as our starting state, and a series of states as final states. In our diagrams, we identify the start state by placing a "greater than" (">") sign before it, and indicate the final states by using a "double circle". Transitions from these final states are still allowed. The augmented rule will be that we begin at the start state and proceed through the inputs. If we end up at a final state having exhausted all of our inputs, we say that the discrete state machine (now called a Finite State Machine (FSA) or Transition Network) has "accepted" the string. Otherwise the input string is not recognized.

Example 1: Consider a machine whose inputs consist of the digits (0 - 9), '+', '-', and '.' (decimal point), and whose state table looks like:

input\state	1	2	3
digit	3	3	3
+, -	2	error	error
. (decimal point)	error	error	error

State 1 is the start state, and state 3 is the only final state:

Assignment (Due Monday, February 21 (after the exam)):

Sketch this discrete state machine. Do not indicate errors in any special way - the convention is to simply have no transitions on errors. For example, if we are in state 2, do not write a transition arrow for '+' or '-'.

Verify that it "recognizes" (in the sense described above) the following strings of inputs, listing the states that the machine passes through (it begins at state 1, where does it go next? etc.)

And that it fails to recognize the following strings of inputs (again listing the states through which the machine passes).

Answer the following question: What sort of thing is being "recognized" by this "machine"?

(there will be further parts to this assignment given below)

Example 2: Consider a machine whose inputs are the same as Example 1, but whose state/transition table is as follows:

input\state	1	2	3	4	5
digit	2	2	2	5	5
+, -	3	error	error	error	error
'.' (decimal)	error	4	error	error	error

Assignment (continued):

Sketch this discrete state machine. State 1 is once again the start state, and state 5 is the only final state.
- Verify that it "recognizes" the following strings of inputs, listing the states that the machine passes through.
  - 3.14
  - -17.0
- And that it fails to recognize the following strings of inputs (again listing the states through which the machine passes).
Answer the following question: What sort of thing is being "recognized" by this "machine"?

For the following examples, assume that 'a' and 'the' are (the only) determiners, that 'big', 'small', 'large', 'fierce', 'timid', 'red', and 'blue' are (the only) adjectives, that 'dog', 'cat' are (again, the only) nouns. Finally, let 'chased' and 'bit' be the only verbs.

Assignment (continued)

Sketch the machine implemented by the following state transition table (we'll refer to the corresponding machine as 'NP': State 1 is the start state, and state 3 is the only final state.

input \ state	1	2	3
determiner	2	error	error
adjective	error	2	error
noun	error	3	error

Give two examples of strings of determiners, adjectives, and nouns (and only those from the lists given above) "recognized" by this "machine", and two (simple) examples of strings using the same symbols but not recognized..

Sketch the machine implemented by the following state transition table, referring to it as 'VP': State 1 is again the start state, and state 4 the only final state.

input\state	1	2	3	4
verb	2	error	error	error
determiner	error	3	error	error
adjective	error	error	3	error
noun	error	error	4	error

Again, give two examples of strings made up of the verbs, nouns, adjectives, and determiners (and only those from the lists given above), which would be recognized by this machine, and two (again using words from the same lists) which would not be recognized.
Finally, simple sentences in English can be formed by putting these two "machines" together, in the sense that (very) simple sentences can be constructed as a noun phrase (NP) followed by a verb phrase (VP - which can also be thought of as a verb followed by a NP). Sketch a diagram for such a machine using the sketches you have made so far. Give several examples of sentences you could generate in this way. How many of them are there?

Turing Machines

So - finite state automata such as the above have the capability to recognize simple patterns of symbols, including simple declarative sentences in English. Such simple patterns are sometimes called regular expressions.

As powerful as these machines are, they are incapable of counting. For example, it is not possible to construct a finite state automata which can check that the parenthesis in an algebraic expression (or an s-expression in Lisp) are correctly balanced (try it!). The (provable) statement is one of those really neat theorems in mathematics that proves that something simply can not be done (like squaring the circle with ruler and compass, or trisecting an angle with only ruler and compass). The theorem which demonstrates this is called the "pumping lemma".

However, a simple modification of the idea of finite state automata works: If we add a stack to the finite state automaton, we get what is called a "Push-Down Automaton" or PDA.

A stack is similar to a stack of dishes in the cafeteria: the governing rule is that you put new items on the top of the stack (the "push" operation) and take items only from the top of the stack (the "pop" operation). The difference between a stack in a push-down automata and a stack of plates in the cafeteria is that in a PDA, the items on the stack are the symbols being processed by the automaton.

This really very simple addition adds considerable computing power to the basic finite state automata. In fact, every formal language we use (logic, mathematics, computer programs) can be recognized by a push-down automaton. Later on we will learn to refer to the languages recognized by push-down automata as "context-free languages".

So: Finite state automata for simple (regular) expressions such as integers and real numbers (ones with a decimal point), but suitable for simple declarative sentences, push-down automata for more complicated expressions, including programs in a modern programming language. How much further does this go?

We don't know for sure. The current belief of most computer scientists is given in the Turing-Church Hypothesis, that is that anything computable in a reasonable sense of the word is computable by a Turing Machine.

A Turing Machine (TM) begins with the mechanisms of a finite state machine: We have a finite number of states, and a finite table which tells us what to do next. In addition, however, we have

An infinite tape with cells from which a symbol can be read and to which a symbol can be written, and
A read-write head that, when positioned over a cell of the infinite tape, can read the symbol on that cell and can write something into the cell (destroying whatever was in the cell).

(neat picture here eventually)

The state table says what to do given

Which of the finite number of states the computer is currently in
What symbol is currently under the read/write head

And, for each such state/symbol combination

What state the TM will be in next time
What new symbol to write (if any)
Which direction to move the read/write head. The read/write head can either move one cell to the right, one cell to the left, or stay where it is.

This is the finite state machine Turing had in mind in his 1950's paper on computing machinery and intelligence.

Programming a Turing Machine is a painful exercise, matched only by microcoding. Here is a brief, informal example:

Suppose we want to write a program for a Turing machine which will add together two numbers. We begin by deciding how the numbers are to be represented. One way is to place three sorts of symbols on the infinite tape: B (for blank), 0, and 1. We will represent a number by an appropriate number of '1's, use the 0 to separate numbers, and place a B on the surrounding cells. So, for example, if we want to add the numbers 3 and 5, we would have something like the following on our tape:

We need to decide at a starting point. Let us position the read-write head over the last '1' of the second number:

										V
B	B	1	1	1	0	1	1	1	1	1	B

Then we can summarize the steps involved in "adding" the two numbers as follows:

Move the R/W head left until we encounter a 0

					V
B	B	1	1	1	0	1	1	1	1	1	B

Change the 0 to a 1

					V
B	B	1	1	1	1	1	1	1	1	1	B

We now have two many 1's (making it look like 3+5 = 9). So we next move left until we encounter a blank

	V				1
B	B	1	1	1	1	1	1	1	1	1	B

Then move right one

		V
B	B	1	1	1	1	1	1	1	1	1	B

And replace the '1' we find there by a 'B'

		V
B	B	B	1	1	1	1	1	1	1	1	B

We are now done. The number of 1's left remaining represent the sum of 3 and 5.

Of course, an informal description is not enough. Here is a small Turing Machine program which demonstrates what we need to do. We will use six states (1 - 6). The table describes what to do for each state and for each tape symbol:

state \ input symbol	1	0	B
1	MOVE LEFT	GO TO STATE 2	ERROR
2	MOVE LEFT, GO TO STATE 3	WRITE A '1'	ERROR
3	MOVE LEFT	ERROR	GO TO STATE 4
4	MOVE RIGHT, GO TO STATE5	ERROR	ERROR
5	WRITE A 'B', GO TO STATE 6	ERROR	ERROR
6	STOP (ERROR)	STOP (ERROR)	STOP (SUCCESS)

Tedious and artificial, but sufficient to do anything we can do with a modern computer and (we think - Turing Church Hypothesis again) sufficient for anything we think of as computable (see Penrose, The Emperor's New Mind, for a contrary position). The example above was the one we did in class (2/21/05).

Does this mean that we need to write our programs for Turing Machines? Not very likely. It's hard enough to add two numbers, and nearly impossible to do anything more complicated. But we don't need to, as demonstrated by the following (very informal) discussion.

The Turing-Church Hypothesis says that anything computable is Turing Computable. Hence any program we write for a computer can be implemented on a Turing Machine.

We can easily simulate a Turing Machine on a modern computer (there are lots of programs that do this: I have written one in Lisp). The problem of the "infinite tape" can be resolved by adding additional storage as necessary.

Therefore (at least informally), Turing Computable means that I can write a program on a modern computer if I have infinite budget for additional disk/tape storage.

Why, then, a Turing Machine? The Turing machine serves as a mathematical model for computation. We can prove theorems about computation using a Turing Machine. Envisioned just as the construction of (reasonably) modern computers was starting, the mathematical model of computation (the Turing Machine) started giving us insights into what would theoretically be possible.

So: A brief summary:

Finite state automata are sufficient for the recognition of basic patterns of symbols (which we call regular expressions), including simple declarative sentences in English.
Finite state automata together with a "stack" are sufficient for the recognition of most formal languages (algebra, logical expressions, programming languages - so called "context-free" languages (more on these later in the course)).
Turing machines can compute anything reasonably computable. For a more precise statement of this, see CSci 370, but this will work for us.

The basic premise of Cognitive Science is that cognition is computable. The Turing-Church hypothesis claims that what is computable is Turing Computable and therefore (informally) computable by a modern computer.

Physical Symbol Systems

When we write programs, we write them on a modern computer using a modern computing language, confident that the programs we write could be written for a Turing Machine. We write using modern programming languages instead of a Turing Machine for exactly the same reasons that we no longer write in machine language - except for very special circumstances, we don't need to, and writing programs in machine language can be slow. So - when we think about writing programs to perform intelligent actions in the most general way, we will probably not want to use a Turing Machine. We want something more, and one possible model is given by Newell and Simon: The Physical Symbol System.

Quotes, citations, and the like are taken from: Newell, Allen, and Simon, Herbert A. Computer Science as Empirical Inquiry, 1975 ACM Turing Award Lecture, reprinted in Mind Design, John Haugeland, ed.:

The knowledge representation schemes we will look at shortly describe how we may manipulate symbols in the computer. These symbols are combined into even more complicated symbol structures. The symbols are realizable in some physical structure (they exist on computers, and some correspond to physical entities). Some of the symbols relate to the "outside world" in that they represent things like doors, or with the process of opening a door. The computer, with these symbol representations and the algorithms for them, exist in a broader context (the world around us). Some of these symbol structures are instructions for manipulating the set of symbols (i.e., a "program" or an "algorithm").

The following is adapted from a set of powerpoint slides for a lecture on the Physical Symbol System Hypothesis. I will clean it up as time permits.

Recall Turing Machines

Infinite tape
Finite (perhaps very large) set of states.
Set of states does not change over time

What is a Symbol?

Symbols designate objects
Symbols designate processes
Symbols designate expressions of symbols
Every process in the machine is designated by a symbol

Symbols

There are processes for creating, modifying, and deleting expressions
The number of expressions is essentially unbounded
A Turing Machine has a finite number of states
A native speaker can generate an infinite number of sentences.

Physical Symbol Systems

"A Physical Symbol System is a machine that produces through time an evolving collection of symbol structures"
Designation: Some expressions designate objects (system affects the object or "responds in ways depending on the object"
Interpretation: Some expressions designate processes, and the PSS can carry out these processes.

From Dawson, Page 23: A PSS consists of

Memory
Operators
Controller
Receptors
Motor output (Actuators)

The Physical Symbol System Hypothesis

(Newell and Simon 1975)

"A physical symbol system has the necessary and sufficient means for general intelligent action"

To understand the meaning of these terms, we need a bit of logic.

If we have the statement "If p then q" where p and q are statements claiming a fact, then we call p the "sufficient condition" for q since, if p is true, then so is q (so long as the statement "If p then q" is true). We call q the "necessary condition" for p since if q is not true, then neither is p. To say that p is both necessary and sufficient for q is to assert two propositions:

If p then q (p is sufficient for q)
If q then p (p is necessary for q; equivalently if p is not true, then neither is q)

With this in mind,

Sufficient:
- We can in theory build a physical symbol system that exhibits intelligence.
- An appropriately programmed computer can exhibit intelligent behavior.
Necessary:
- Any system we might consider intelligent (including ourselves) is at root a physical symbol system.
- At some level of explanation entities capable of intelligent action can be explained as physical symbol systems.

In some more detail:
        p is sufficient for q: p => q. If we know p then we know q. Saying
        that a pss is sufficient for intelligent action is to say that if we have
        a pss, then it is capable of intelligent action.

        p is necessary for q: q => p (or, equivalently, ~p => ~q). Saying
        that a pss is necessary for intelligent action is to say that if we have
        something that exhibits intelligent action (q) then it must be a pss (p).

What is the evidence for this (according to Newell and Simon)?

Sufficiency
- Work in AI
Necessity
- Work in cognitive psychology

In my opinion, the second is more difficult to accept than the first.

The Heuristic Search Hypothesis

"The solution to problems are represented as symbol structures. A physical-symbol system exercises its intelligence in problem-solving by search - that is, by generating and progressively modifying symbol structures until it produces a solution structure" (Newell and Simon)

The Hope:

General Intelligent behavior is a property of a sufficiently complex physical symbol system
A physical symbol system is computable
(Turing-Church) Anything computable is Turing-computable
Anything Turing-computable can be implemented on a (sufficiently powerful, with sufficient resources) computer