Self-Describing Halting Problems (Web Services)

This page attempts to collect some thoughts on the theory (read: fantasy)
of "self-describing interfaces".  I believe that being able to invent a
descriptive/declarative way to truly and fully describe the operational
semantics of an arbitrary software interface in a machine-readable way
is equivalent to solving the Halting Problem, but I'm not sure if I have
a good, formal way of expressing or proving this.

The attitude of many WSDL fanboys out there seems to be "Halting problem,
Schmalting problem!"

-----=-----
Date: Fri, 8 Dec 2000 21:50:47 -0500
Message-Id: <200012090250.VAA20769@tux.cubewerx.com>
From: Craig Bruce 
To: ip2000@opengis.org
Subject: Re: ALL-religious schism (was Re: WMS Capabilities XML)

Jeff de La Beaujardiere  wrote:

> I actually fully share your reservations.  I believe that
> automatic usage of a service that a client has never seen
> before is a very long way off and may not belong in
> present-day implementation specs.

Automatic usage of a service that a client has never seen before seems to
be more of an artifical-intelligence problem than a distributed-protocol
one.  How do you describe general semantics without using a programming
language?  (Is this impossible in the general case because of the halting
problem?)

> However, there are others
> in WMT2 who disagree and who want to at least leave the door
> open for self-describing services. My XML merely tries to
> what give an example compliant with what seems to be the
> direction favored by others in WMT2.

Two things seem to be missing from the  section that are
present in the 1.0.x series: the list of protocols that can be used (i.e.,
HTTP-CGI-GET and HTTP-XML-POST), and the URL of the service.

The  fields dive into the syntax of the HTTP-CGI-GET
method, but what happens if other methods are available, for example,
HTTP-XML-POST for the still-incomplete XML GetMap request described in
the SLD documentation.  The HTTP-CGI-GET has a fairly simple syntax,
but the XML syntax description would be significantly more complex.
Strange that the BSM group didn't use XML-Schema like the WFS group.
(Not that XML-Schema would be particularly appealing for picking out the
few bits and pieces of information, such as the supported image formats,
that a client would actually use).

Is the URL of the service going to be implied or intended to be passed
around out-of-line?  We have found that allowing the capabilities document
and the server to reside at different base URLs to be quite useful in
the past, for dealing with map servers that report invalid capabilites
documents, among other things.

BTW, Neither version 1.1.0 nor 1.0.5 descriptions are available at:

http://www.digitalearth.gov/wmt/xml/

> Some aspects of the proposed scheme are useful.  For
> example, listing Vendor-Specific Parameters or allowed
> formats is easy without putting them in a DTD.

The allowed formats in the 1.1.0 example are MIME strings, which could
be described in a DTD as generic strings.  I'm not sure how amenable
Vendor-specific parameters would be to being described in the 1.1.0
 syntax, as vendor-specific information can be arbitrarily
complex with arbitrarily complex semantics, and there will likely be
information that needs to be passed to a vendor client that doesn't
materialize as a request parameter.  The full power of XML should be
available to describe this information, rather than attempting to flatten
it into linear CGI-parameter definitions.

--------------------------+------------------------+--------------------------
Dr. Craig S. Bruce        | Ph.: 819-771-8303 x205 |             CubeWerx Inc.
Senior Software Developer |   Fax: 819-771-8388    |      Hull, Québec, Canada
csbruce@cubewerx.com      | http://www.csbruce.com |  http://www.cubewerx.com/
--------------------------+------------------------+--------------------------
                "I don't suffer from stress.  I'm a carrier."

-----=-----
Date: Wed, 16 May 2001 11:11:17 -0400
Message-Id: <200105161511.LAA07333@tux.cubewerx.com>
From: Craig Bruce 
To: "servicemodel@opengis.org" 
Subject: Re: developerWorks : Web services

Jeff de La Beaujardiere  wrote:

> That article points out that "the UDDI business registry
> [...] implies [...] design-time service discovery.  We need
> design-time discovery since it is often not feasible to
> implement dynamic discovery at run-time due to overwhelming
> complexity."
>
> I am in full agreement--you need to know about the service
> while designing your application.  The idea implied in ISO
> 19119 that you can describe enough of the service for
> run-time discovery and invocation is too simplistic, IMHO.

.From a computer-science perspective, as far as I can tell, the
"overwhelming complexity" here is otherwise known as the Halting Problem.
Predicated on the notion that there is no declarative way to describe
general semantics, the only general way to describe semantics is with
a Turing-complete imperative language (i.e., a language that executes
commands and has the same "power" as any and all other general-purpose
programming languages).  With Turing-complete languages, one runs into
the halting problem, which has a mathematically rigorous proof and which
roughly generalizes into "there exists no algorithm that can determine
what another algorithm will do".

Back to the first predicate, if you could prove that general semantics can
be described in a declarative way by using a language that is different
from the class of Turing-complete languages, write up a short note and
submit it to any university since you'd be a shoe-in for a Ph.D. in
Computer Science and almost certainly, ironically enough, a Turing Award
(the CS equivalent of a Nobel Prize).

So, setting "general semantics" aside, a declarative language is left
to describing "special-purpose" semantics.  A semantic-declaration
mechanism that is general enough to be useful, but without violating
any laws of computer science, would be interesting to see, but would
be quite difficult.  Perhaps this is what the author above means by
"overwhelming complexity".  Syntactic declaration, which is all I have
ever seen these service-registry or "self-describing-data" mechanisms do,
is pretty easy to do, but is insufficient, certainly insufficient for
"run-time" service discovery.

--------------------------+------------------------+--------------------------
Dr. Craig S. Bruce        | Ph.: 819-771-8303 x205 |             CubeWerx Inc.
Senior Software Developer |   Fax: 819-771-8388    |      Hull, Québec, Canada
csbruce@cubewerx.com      | http://www.csbruce.com |  http://www.cubewerx.com/
--------------------------+------------------------+--------------------------
        "Oddly enough, 'Luke, I am your father' is not the actual line
                               from the movie."

-----=-----
Date: Thu, 17 May 2001 15:15:57 -0400
Message-Id: <200105171915.PAA25367@tux.cubewerx.com>
From: Craig Bruce 
To: servicemodel@opengis.org
Subject: Re: developerWorks : Web services

> I agree that the Halting Problem is overwhelming.

Technically, the description would be "mathematically impossible".

> Perhaps a simple use case could help get things started. So, what
> would it take to discover (at runtime) a WFS for vegetation features
> in a given bounding box B?  (1) We already have a schema for features
> (gml:AbstractFeatureType).  (2) Given that something A claims to be a
> feature (i.e., conforms to the feature schema), we know how to discover
> what kind of feature it is, and what its bounding box is. (3) We know
> how to do spatial queries (convert coordinates, intersect regions,
> etc.) using bounding boxes. So what is missing?

Um, my understanding is that the issue is "service" description and
discovery, not "content" description and discovery, which is what you
are discussing.  For content, you can do effective searches by spatial
constraints and keywords/phrases in the descriptive metadata.

I assume that we are distinguishing "services" from "content".  Otherwise,
I'm very confused.

One could do metadata searches for services in the same was as for
content, but the issue I was addressing is that service descriptions
like WDSL are not very helpful for describing the interface to a service.
A syntactic declaration of message arguments, etc., has little practical
utility because won't allow a computer program to use an interface that
it hasn't already been programmed for.  This is what I was addressing
in my previous message.  Some unknown-option declarations for requests
can be used to present a selection box to a user on a GUI, but that's
about it.  The major semantics of an interface cannot be described in
a meaningful way.  The only truly meaningful declarations for the major
semantics are "this is an OGC OWS service with a WMS interface" and "it
is located at this URL", with the requirement that a program already knows
what this means.  The rest of the syntactic declarations are mostly fluff,
syntactic sugar, disposable wrappings.

My understanding is that WDSL, etc. don't address "content" searching at
all (and, in fact, don't even have a concept of "content").  Am I mistaken?
"Content" would seem to be a pretty important concept to me.  Or are we
hoping to twist these service-description systems into describing content?

--------------------------+------------------------+--------------------------
Dr. Craig S. Bruce        | Ph.: 819-771-8303 x205 |             CubeWerx Inc.
Senior Software Developer |   Fax: 819-771-8388    |      Hull, Québec, Canada
csbruce@cubewerx.com      | http://www.csbruce.com |  http://www.cubewerx.com/
--------------------------+------------------------+--------------------------
              "The sooner we finish the initial implementation,
                     the sooner we can start fixing it."

-----=-----
Date: Mon, 12 Aug 2002 19:10:16 -0400
Message-Id: <200208122310.TAA24944@tux.cubewerx.com>
From: Craig Bruce 
To: xxxxxxx@xxxxxxx.uwaterloo.ca
Subject: Self-describing halting problems

Hi Xxxxx,

I have been dealing with a group of WSDL 'fanboys' for a while who seem to
be taken into the notion that the operational semantics of an arbitrary
web-service interface can be fully described in a declarative way.
Now, WSDL is obviously inadequate to the task since it only describes
the syntax of message formats using XML-Schema and makes no attempt
to tackle semantics, but I have a very strong suspicion that only a
Turing-complete, executed language can possibly be 'powerful' enough to
'describe' a general interface, and that assuming that a declarative
language can fully describe what a Turing-complete language will 'do' is
equivalent to solving the Halting Problem.  Or, from a different angle,
the interface itself *is* to a Turing machine and to be able to describe
it fully with a finite set of declarations that can be understood by a
machine also would be equivalent to solving the Halting problem since it
seems to imply machine decidibility about the arbitrary Turing-complete
algorithm implemented inside of the interface.

Also, if one uses a Turing-complete language to describe an interface,
it seems that one runs into nightmarish practical problems about how to
design a generic-enough framework for arbitrary code to execute and do
useful things and tie into a 'fully-generic' application.  This path kind
of seems like an AI-complete problem by itself.

Do you know if there has been any formal treatment (or a one-paragraph
dismissal) of this subject in the literature or do you have any thoughts
on the matter?

Thanks for any help.

--------------------------+------------------------+--------------------------
Dr. Craig S. Bruce        | Tel: 819-771-8303 x205 |             CubeWerx Inc.
Senior Software Developer |   Home: 613-565-1344   |  Gatineau, Québec, Canada
csbruce@cubewerx.com      | http://www.csbruce.com |  http://www.cubewerx.com/
--------------------------+------------------------+--------------------------
         "Computer science is as much about computers as astronomy is
                   about telescopes." -- Edsger W. Dijkstra

-----=-----
The following is from:

http://london.pm.org/pipermail/bots/2002-June/000093.html

[Bots]  HIT THE COMPUTER HIT THE COMPUTER
Simon Wistow bots@london.pm.org
Tue, 11 Jun 2002 16:24:08 +0100

    * Previous message: [Bots]  HIT THE COMPUTER HIT THE COMPUTER
    * Next message: [Bots]  HIT THE COMPUTER HIT THE COMPUTER
    * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Tue, Jun 11, 2002 at 03:51:32PM +0100, Matt Webb said:
> Not in terms of technology available, just the kind of things 
> that have been built (or at least the stuff that's been build 
> that I've seen).

The way Paul often uses dipsy is to open up a private query window and
speak directly with it. I sometimes do the same especially when I'm
trying to feed stuff in. This should be pretty similar to having many
people talking to the same bot via different IM clients. Think of it as
lots of people, none of them speaking in the main IRC window just all
speaking to the bot in private message windows.

The community memory at that point does become an, err, issue. But in
some ways it's quite nice - the bot learns behind your back. 

> I've got AIM + MSN working, but not in a pretty way. Every 
> client on every platform understands a subtly different form of 
> html for the messages.

This is what Jabber is for. As well as being a messaging 'plasma' and
having a IM client of its own it's also capable of translating between
networks transparently. Including IRC private messages :)

So our bot gets *all* the IM neyworks for free. w00t!

> State is really important.
> 
> When I've been modeling the conversational UIs so far, each 
> location on the interface landscape is a node. A node may expect 
> to hear a set of phrases: it's the node that looks after what 
> phrases are valid or not (and I've tended to make the syntax 
> reasonably strict here). A matching phrase moves the user to 
> another node. Fortunately my graph scribbles can be converted 
> into POE really easily.

Or in fact a - turing machine (also known as a state machine) :)

Wiz zis bot vee can solve ze Entscheidungsproblem ...

... I'm sorry. I've had too much coffee.

Basically, for those who wonder what the fuck I'm talking about here :
you've described a Turing machine which was the original work that
Turing did on computers. And Turing machines are in themselves [0]
programs. So a Turing machine is easy to map to a program. Using POE
would probably be overkill - you can do it just as well with an event
loop and a state variable. This is also how most web 'applications' are
done - you have a look at the current state and the new variables and
decide which state to go to next.

> This is probably  easier with ad-hoc diagrams made out of beermats and
> pens. Or maybe not.

Whiteboards. more pubs need whiteboards.

Simon

[0] A quick crash course ...

A Turing machine is an infinitely long TAPE split into squares. Each
square contains a SYMBOL from an ALPHABET. [1] The HEAD of the machine
sits over the starting SQUARE. The head has a STATE. Depending on the
SYMBOL under the HEAD and the current STATE the head changes the SYMBOL
to something else in the alphabet, CHANGES its state and either moves
left or right (actually it doesn't have to do anything but you get the
point).

That's a really simplified version but what you can do is prove that you
can set up a state machine and with that you can solve pretty much every
problem [1]

[1] Using self describing alphabets Godel was able to prove that any
sufficently logical language will always be able to describe a
contradiction - this is his Incompleteness Theorem and basically
destroyed Whitedhead and Russell's attempt to redefine maths and logic
from scratch in Principia Mathematica. 

Similarly Turing showed you can't build a Turing machine that, given an
algorithm and its initial arguments (i.e a Turing machine), will be
able to tell if the algorithm ever halts or if it runs forever.

This doesn't sound like much but if you can reduce another problem to
this Halting problem then you can prove it can't be solved.  

Christ. I'm sorry, I'll shut up now.

-----=-----
Date: Tue, 13 Aug 2002 04:00:56 -0400
Message-Id: <200208130800.EAA27993@tux.cubewerx.com>
From: Craig Bruce 
To: ows1.2@opengis.org, wwwmap.sig@opengis.org
Subject: [Ows1.2] Re: [Wwwmap.sig] Fwd: [rest-discuss] A Sane Approach to Migrati...

Ron Lake  wrote:

> My point is that without a formal specification of these messages
> (i,e. something like WSDL) we have only an informal statement of
> this fact.

The big issue is not so much whether to specify the syntax of an interface
in a precise, formal way (all implementation specs should strive for
this, in part), but more why to pollute the run-time information that is
exchanged between systems with syntax descriptions which are *necessarily*
of low or no utility.

The problem of fully describing a web-service interface has two components:
syntax and operational semantics.  The problem of describing the syntax
is immanentyly solveable and has been solved in any number of ways,
including using WSDL and XML Schema.  The problem of describing operational
semantics, however, is NOT SOLVABLE.  It is directly equivalent to the
problem of giving a complete description of an arbitrary algorithm in a
machine-understandable way (since the web service implements some kind
of an algorithm inside of itself).

Church's Thesis states that any instance of an algorithm is equivalent to
some instance of a Turing machine.  This is a 'Thesis' and not a 'Theorem'
because it is merely a definition since the term "algorithm" does not
have a precise definition.  However, the only practical definition of an
"algorithm" for us is that of a procedure that can be implemented on the
computers that we have.

"A Turing machine has an infinitely long TAPE split into squares.
Each square contains a SYMBOL from an ALPHABET.  The HEAD of the machine
sits over the starting SQUARE.  The head has a STATE.  Depending on the
SYMBOL under the HEAD and the current STATE the head changes the SYMBOL to
something else in the alphabet, CHANGES its state and either moves left or
right (actually it doesn't have to do anything but you get the point)." [1]

This kind of machine may seem extremely abstract and hypothetical, but it
implies numerous important results in computing theory and mathematics,
including that all modern computers are fundamentally equivalent to
Turing machines in terms of what they can 'do', and it also implies the
Halting Problem.

The Halting Problem is the name for a mathematical proof that there
exists no Turing machine that can determine whether another, arbitrary
Turing machine will ever finish a computation (halt) for an arbitrary
input data-set string (tape).  It also implies that many other related
problems are also unsolvable (uncomputable), including that any other
problem that can be reduced to the Halting Problem is also unsolvable,
such as "there exists no algorithm that can determine what an arbitrary
other algorithm will 'do'", where 'halt' is a proper subset of 'do'.
A truly 'machine-understandable' description of an algorithm necessarily
violates this last assertion.

The inability to describe the operational semantics of an interface makes
the capability to describe the syntax at run-time much less useful,
if not completely useless.  What good is finding out the syntax if an
application needs to be pre-programmed at development time to make any
'intelligent' use of an interface?  What is the use of being able to
automatically generate function stubs for when an interface changes when
you must go in and manually change all of the code inside of the old
stubs to make it cope with the new stub interfaces?  What is the use
of validating messages at run-time when all you can do if they fail is
blow up?  Validation is a software-development/debugging activity.

Various people seem to think that providing syntax or assertions or whatnot
at run-time are "steps in the right direction" of fully describing an
interface, but they don't seem to realize that their ultimate goal *IS
MATHEMATICALLY IMPOSSIBLE*.  It is equivalent to trying to invent a
perpetual-motion machine or a method of recursive lossless compression
of random bit strings (there are numerous patents for the latter even
though it is provably impossible).

The alternate approach would seem to be to exchange Turing-complete
languages for the descriptions of the interfaces that are executed
at run-time (thus not requiring the computer to 'understand' the
descriptions), but I think that this also runs into 'AI-complete'
or unsolvable problems when you attempt to have computers dynamically
integrate executable sub-components together with no a-priori knowledge.
This would seem to reduce to the first problem since the downloadable
component would be a black box that does the 'right stuff' inside but has
its own interface to the host system which the downloading system must
'understand' in an automatic way, so basically this only accomplishes
moving the undescribable black box from the server to the client.

[1] Simon Wistow, "http://london.pm.org/pipermail/bots/2002-June/000093.html"

--------------------------+------------------------+--------------------------
Dr. Craig S. Bruce        | Tel: 819-771-8303 x205 |             CubeWerx Inc.
Senior Software Developer |   Home: 613-565-1344   |  Gatineau, Québec, Canada
csbruce@cubewerx.com      | http://www.csbruce.com |  http://www.cubewerx.com/
--------------------------+------------------------+--------------------------
         "Computer science is as much about computers as astronomy is
                   about telescopes." -- Edsger W. Dijkstra

[Go to my Homepage] [Send me mail]