A gentle introduction to Turing machines for CogSci

What does in mean to say that something is computable?  We generally reply to this question by saying that something is computable if we can write a computer program for it.  But does it matter what programming language we use?  Might it happen that something is computable in Lisp but not in C# (or the other way around)? 

The answer generally is "no".  All modern general purpose computer languages are equivalent in that anything you can write in one language can be written in any other.  There is, in fact, a (fairly) simple theorem in the theory of computation to that effect.  What is true, of course, is that some things are easier in one language than they might be in another.  It is possible, for example, to write a compiler for C++ in the programming language COBOL (COmmon Business Oriented Language), but nobody in their right mind would do such a thing except to make a point.

But suppose we wanted to prove things about computing and computation?  What language should we use?  What machine architecture should we use?  How can we speak about computing in general?  One answer is the Turing Machine.  It gives a formal (mathematical) model for computation that is generally accepted to be as powerful as we might need to have, and is yet simple enough that we can understand how it works.

To start with, let us return to the Turing paper once again and to his discussion on discrete state machines.  In his example (in section 5), he asks us to consider a machine with three states labeled q1, q2 and q3.  There are two inputs, labeled i0 and i1.  Finally, there are two outputs, labeled o0 and o1.  The inputs could be considered as turning a switch on or off, and the outputs might say if a light is turned on or off.

Let us suppose  that the light is on if the machine is in state 3, and is off otherwise.  Then the light is on or off depending only on the current state.  An input causes us to change from one state to another, according to the following table:

current state / input state 1 state 2 state 3
input 0 go to state 2 go to state 3 go to state 1
input 1 go to state 1 go to state 2 go to state 3

(refer to the diagram in part 5 of the Turing paper).  To simplify our discussion we will assume (along with the assumption that output #3 corresponds to a light being turned on) that input 0 is actually a 0, and that input 1 is actually a one.  The way to read this diagram is to say, for example, that if we are in state 2 and see an input of 0, we go to state 3 (and turn on the light).

Exercise (not to turn in):  Suppose that we were in state 2 and had inputs in the order 1, 0, 1.  In which state would we find ourselves?  Would the light be on or off?

Exercise (not to turn in):  Suppose we are in state 1.  What inputs would we require, and in what order, to get to state 3?

Exercise (not to turn in):  It is useful to be able to draw a picture of this.  We can do this in the following way:

  1. On a piece of paper, draw a circle for each of the three states, and identify them by the state number.
  2. For any state (circle), and for each of the two possible inputs, draw an arc with an arrowhead from the state to the state to which the input carries us, and label the arc with that input.  For example, from the circle labeled "2", we would have a directed arc labeled "0" to the state labeled "3".  From state 2 would have a directed arc labeled "1" back to state 2.

Try this.


More examples:  Although this model for computation is not powerful enough for all of our needs (it is not possible, for example, to construct a machine consisting only of a (finite)  table for a (finite) variety of inputs and outputs which will recognize balanced parenthesis), it is useful for many practical applications.  To see how this might work, let's look at two more examples.

In these examples, we identify one state as our starting state, and a series of states as final states.  In our diagrams, we identify the start state by placing a "greater than" (">") sign before it, and indicate the final states by using a "double circle".  Transitions from these final states are still allowed.  The augmented rule will be that we begin at the start state and proceed through the inputs.  If we end up at a final state having exhausted all of our inputs, we say that the discrete state machine (now called a Finite State Machine (FSA) or Transition Network) has "accepted" the string.  Otherwise the input string is not recognized.

Example 1:  Consider a machine whose inputs consist of the digits (0 - 9), '+', '-', and '.' (decimal point), and whose state table looks like:

input\state 1 2 3
digit 3 3 3
+, - 2 error error
. (decimal point) error error error

State 1 is the start state, and state 3 is the only final state:

Assignment (Due Monday, February 21 (after the exam)):

(there will be further parts to this assignment given below)

Example 2:  Consider a machine whose inputs are the same as Example 1, but whose state/transition table is as follows:

input\state 1 2 3 4 5
digit 2 2 2 5 5
+, - 3 error error error error
'.' (decimal) error 4 error error error

Assignment (continued):

For the following examples, assume that 'a' and 'the' are (the only) determiners, that 'big', 'small', 'large', 'fierce', 'timid', 'red', and 'blue' are (the only) adjectives, that 'dog', 'cat' are (again, the only) nouns.  Finally, let 'chased' and 'bit' be the only verbs.

Assignment (continued)


Turing Machines

So - finite state automata such as the above have the capability to recognize simple patterns of symbols, including simple declarative sentences in English.  Such simple patterns are sometimes called regular expressions

As powerful as these machines are, they are incapable of counting.  For example, it is not possible to construct a finite state automata which can check that the parenthesis in an algebraic expression (or an s-expression in Lisp) are correctly balanced (try it!).  The (provable) statement is one of those really neat theorems in mathematics that proves that something simply can not be done (like squaring the circle with ruler and compass, or trisecting an angle with only ruler and compass).  The theorem which demonstrates this is called the "pumping lemma".

However, a simple modification of the idea of finite state automata works:  If we add a stack to the finite state automaton, we get what is called a "Push-Down Automaton" or PDA.

A stack is similar to a stack of dishes in the cafeteria: the governing rule is that you put new items on the top of the stack (the "push" operation) and take items only from the top of the stack (the "pop" operation).  The difference between a stack in a push-down automata and a stack of plates in the cafeteria is that in a PDA, the items on the stack are the symbols being processed by the automaton.

This really very simple addition adds considerable computing power to the basic finite state automata.  In fact, every formal language we use (logic, mathematics, computer programs) can be recognized by a push-down automaton. Later on we will learn to refer to the languages recognized by push-down automata as "context-free languages".

So:  Finite state automata for simple (regular) expressions such as integers and real numbers (ones with a decimal point), but suitable for simple declarative sentences, push-down automata for more complicated expressions, including programs in a modern programming language.  How much further does this go?

We don't know for sure.  The current belief of most computer scientists is given in the Turing-Church Hypothesis, that is that anything computable in a reasonable sense of the word is computable by a Turing Machine.

A Turing Machine (TM) begins with the mechanisms of a finite state machine:  We have a finite number of states, and a finite table which tells us what to do next.  In addition, however, we have

(neat picture here eventually)

The state table says what to do given

And, for each such state/symbol combination

This is the finite state machine Turing had in mind in his 1950's paper on computing machinery and intelligence.

Programming a Turing Machine is a painful exercise, matched only by microcoding.  Here is a brief, informal example:

Suppose we want to write a program for a Turing machine which will add together two numbers.  We begin by deciding how the numbers are to be represented.  One way is to place three sorts of symbols on the infinite tape:  B (for blank), 0, and 1.  We will represent a number by an appropriate number of '1's, use the 0 to separate numbers, and place a B on the surrounding cells.  So, for example, if we want to add the numbers 3 and 5, we would have something like the following on our tape:

B B 1 1 1 0 1 1 1 1 1 B

We need to decide at a starting point.  Let us position the read-write head over the last '1' of the second number:

                    V  
B B 1 1 1 0 1 1 1 1 1 B

Then we can summarize the steps involved in "adding" the two numbers as follows:

  1. Move the R/W head left until we encounter a 0
              V            
    B B 1 1 1 0 1 1 1 1 1 B
  2. Change the 0 to a 1
              V            
    B B 1 1 1 1 1 1 1 1 1 B
  3. We now have two many 1's (making it look like 3+5 = 9).  So we next move left until we encounter a blank
      V       1            
    B B 1 1 1 1 1 1 1 1 1 B
  4. Then move right one
        V                  
    B B 1 1 1 1 1 1 1 1 1 B
  5. And replace the '1' we find there by a 'B'
        V                  
    B B B 1 1 1 1 1 1 1 1 B
  6. We are now done.  The number of 1's left remaining represent the sum of 3 and 5.

Of course, an informal description is not enough.  Here is a small Turing Machine program which demonstrates what we need to do.  We will use six states (1 - 6).  The table describes what to do for each state and for each tape symbol:

state \ input symbol 1 0 B
1 MOVE LEFT GO TO STATE 2 ERROR
2 MOVE LEFT, GO TO STATE 3 WRITE A '1' ERROR
3 MOVE LEFT ERROR GO TO STATE 4
4 MOVE RIGHT, GO TO STATE5 ERROR ERROR
5 WRITE A 'B', GO TO STATE 6 ERROR ERROR
6 STOP (ERROR) STOP (ERROR) STOP (SUCCESS)

Tedious and artificial, but sufficient to do anything we can do with a modern computer and (we think - Turing Church Hypothesis again) sufficient for anything we think of as computable (see Penrose, The Emperor's New Mind, for a contrary position).  The example above was the one we did in class (2/21/05).

Does this mean that we need to write our programs for Turing Machines?  Not very likely.  It's hard enough to add two numbers, and nearly impossible to do anything more complicated.  But we don't need to, as demonstrated by the following (very informal) discussion.

The Turing-Church Hypothesis says that anything computable is Turing Computable.  Hence any program we write for a computer can be implemented on a Turing Machine.

We can easily simulate a Turing Machine on a modern computer (there are lots of programs that do this:  I have written one in Lisp).  The problem of the "infinite tape" can be resolved by adding additional storage as necessary.

Therefore (at least informally), Turing Computable means that I can write a program on a modern computer if I have infinite budget for additional disk/tape storage. 

Why, then, a Turing Machine?  The Turing machine serves as a mathematical model for computation.  We can prove theorems about computation using a Turing Machine.  Envisioned just as the construction of (reasonably) modern computers was starting, the mathematical model of computation (the Turing Machine) started giving us insights into what would theoretically be possible.

So:  A brief summary:

The basic premise of Cognitive Science is that cognition is computable.  The Turing-Church hypothesis claims that what is computable is Turing Computable and therefore (informally) computable by a modern computer.


Physical Symbol Systems

When we write programs, we write them on a modern computer using a modern computing language, confident that the programs we write could be written for a Turing Machine.  We write using modern programming languages instead of a Turing Machine for exactly the same reasons that we no longer write in machine language - except for very special circumstances, we don't need to, and writing programs in machine language can be slow.  So - when we think about writing programs to perform intelligent actions in the most general way, we will probably not want to use a Turing Machine.  We want something more, and one possible model is given by Newell and Simon:  The Physical Symbol System.

Quotes, citations, and the like are taken from: Newell, Allen, and Simon, Herbert A. Computer Science as Empirical Inquiry, 1975 ACM Turing Award Lecture, reprinted in Mind Design, John Haugeland, ed.:

The knowledge representation schemes we will look at shortly describe how we may  manipulate symbols in the computer.  These symbols are combined into even more complicated symbol structures.  The symbols are realizable in some physical structure (they exist on computers, and some correspond to physical entities).  Some of the symbols relate to the "outside world" in that they represent things like doors, or with the process of opening a door.  The computer, with these symbol representations and the algorithms for them, exist in a broader context (the world around us).  Some of these symbol structures are instructions for manipulating the set of symbols (i.e., a "program" or an "algorithm").

The following is adapted from a set of powerpoint slides for a lecture on the Physical Symbol System Hypothesis.  I will clean it up as time permits.

Recall Turing Machines

What is a Symbol?

Symbols

Physical Symbol Systems

From Dawson, Page 23:  A PSS consists of

The Physical Symbol System Hypothesis

(Newell and Simon 1975)

"A physical symbol system has the necessary and sufficient means for general intelligent action"

To understand the meaning of these terms, we need a bit of logic.

If we have the statement "If p then q" where p and q are statements claiming a fact, then we call p the "sufficient condition" for q since, if p is true, then so is q (so long as the statement "If p then q" is true). We call q the "necessary condition" for p since if q is not true, then neither is p. To say that p is both necessary and sufficient for q is to assert two propositions:

With this in mind,

In some more detail:
        p is sufficient for q:  p => q.  If we know p then we know q.  Saying
        that a pss is sufficient for intelligent action is to say that if we have
        a pss, then it is capable of intelligent action.

        p is necessary for q:  q => p (or, equivalently, ~p => ~q).  Saying
        that a pss is necessary for intelligent action is to say that if we have
        something that exhibits intelligent action (q) then it must be a pss (p).
 

What is the evidence for this (according to Newell and Simon)?

In my opinion, the second is more difficult to accept than the first.


The Heuristic Search Hypothesis

"The solution to problems are represented as symbol structures. A physical-symbol system exercises its intelligence in problem-solving by search - that is, by generating and progressively modifying symbol structures until it produces a solution structure" (Newell and Simon)


The Hope: