All,
This past week I have been thinking about a document format and hashing
method that allows arbitrary level partial disclosure of a verifiably signed
document, as well as formal reference/reuse of subdocuments inside of larger
documents. Basically it is a technique for using a hash tree within a
standard document format.
Below is a description of a system that satisfies some of my needs. I would
appreciate some feed back. Thanks.
--Sean Hastings
--mailto:sean@havenco.com
--vmsg/fax:1.800.why.sean
- - -
<html>
<head>
<title>VOXP - Value and Obligation eXchange Processing</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body bgcolor="#FFFFFF">
<h1>Sectional Parameter Hashing</h1>
<p>A description of a possible format and uses for documents that are
designed
to behave like Hash Trees.</p>
<h2>0. Contents:</h2>
<blockquote>
<p>0.) Contents</p>
<p>1.) Format</p>
<blockquote>
<p>1.1) Headings</p>
<p>1.2) Digests</p>
<p>1.3) White Space</p>
</blockquote>
<p>2.) Storage</p>
<blockquote>
<p>2.1) Parameterization</p>
<p>2.2) Text File</p>
<p>2.3) Database</p>
</blockquote>
<p>3.) Parameters</p>
<p>4.) Types</p>
<blockquote>
<p>3.1.) Signatures</p>
<p>3.2.) Certificates</p>
<p>3.3.) Vehicles</p>
</blockquote>
</blockquote>
<h2></h2>
<h2>1. Format:</h2>
<blockquote>
<p>The system makes use of a document format and digest technique called
Sectional
Parameter Hashing (SPH).</p>
<p>The general document format is designed to support hashing technique in
which
the subdocuments under a heading may be reduced to (or expanded from) a
text
representation of its digest, without altering the overall documents
digest
value. This is done by having a structured document format, and an
understood
technique for computing the document's digest.</p>
<h3>1.1 Sections:</h3>
<blockquote>
<p>A section is indicated with a heading followed by a colon followed by
a
newline. This is followed by one or more lines of indented text, which
may
include sub sections. Where no indented text follows the heading
designation,
the section is considered to be empty. Where subsections are present,
they
are always located after any original body text, and are ordered by
numeric
character value of their heading's component characters. For
example:</p>
<pre><code>report from alpha group
received:
2001:02:25:21:52:07:258
report:
We have met the enemy, and they have won.
Regrouping to await further orders.
casualties:
heavy
kia:
Sgt. Ben Kilroy
Pvt. John Brown
Gen. May Ham
mia:
situation:
fubar</code></pre>
</blockquote>
<h3>1.2 Digests:</h3>
<blockquote>
<p>The idea of a digest (or hash) function is that a section of text can
be
reduced to a unique value. This is used in digital signatures, where
the
document to be signed is first reduced to a unique number, and then
that
number is encrypted with the signing party's private key. The
resulting
cipher text can be decrypted with the corresponding public key to
obtain
the digest value, and if the document is known, it can be used to
recompute
the digest value and see that they match. This allows any party to
verify
that the holder of a certain private key has signed a given document.
It
also allows any party to hold the record of a signature, without (yet)
having
access to the document that was signed. At a later time the document
text
can be produced and the signature verified.</p>
<p>SPH uses a hashing by section protocol that allows a final hash of
any
document (or subdocument) to be produced (and possibly signed) that
does
not change with the expansion or collapse of a particular sub section
to
its own unique digest value. This allows verifiable signatures to be
produced
and disclosed with a great degree of variety as to how much of the
document
needs be revealed to a particular party.</p>
<p>This is done as follows. The digest of the entire document is
computed
after first reducing contained subdocuments to a
HEADINGNAME:TEXTHASHCODE.
The digest of each subdocument is computed the same way, with the
digests
of its subdocuments computed first. Where a subsection is already
expressed
as a digest, no work is done. By following this recursive process, the
same
digest is obtained for any document, regardless of which headings and
sub
headings may be revealed in plain text, or represented only by their
unique
digest value.</p>
<p>So our sample document above can be displayed as:</p>
<pre><code>report from alpha group
received:
2001:02:25:21:52:07:258
report:
We have met the enemy, and they have won.
Regrouping to await further orders.
casualties:TEXTHASHCODE
kia:TEXTHASHCODE
mia:TEXTHASHCODE
situation:TEXTHASHCODE</code></pre>
<p>OR:</p>
<pre><code>report from alpha group
received:
2001:02:25:21:52:07:258
report:TEXTHASHCODE</code></pre>
<p>OR:</p>
<pre><code>report from alpha group
received:TEXTHASHCODE
report:TEXTHASHCODE</code></pre>
<p> Whatever the combination of collapsed or expanded headings, the
digest
computed for the entire document using the SPH technique will remain
the
same. The final digest value will always represent a hash of the last
example
above, regardless of how much or little of the document is known.</p>
</blockquote>
<h3>1.3 White Space:</h3>
<blockquote>
<blockquote>
<h2></h2>
</blockquote>
<p>Headings must not begin or end with a whitespace character, but they
can
include white space inside of them, and they must be terminated by a
colon.
Any text after the colon will be interpreted as a digest of the
subdocument
under the heading. If this text does not fit the format of the hash
(Sha1
hex - 40 characters 0-1,A-F), the line will be interpreted as body
text,
but if it is not properly positioned at the beginning of an indented
section
to be body text, or the heading is not in proper alphanumeric order,
the
document will be considered malformed.</p>
<p>Indentation under each heading will take the form of a single
additional
<Tab> character at the beginning of each line. Additional tabs
or
whitespace beyond the expected level at the beginning of a document or
sub
document before any headings will be considered part of body text.
Anywhere
else, they will indicate a malformed document.</p>
<p>These rules of heading indentation, naming, and order by alphanumeric
value
are designed to ensure that there is only one possible way to rebuild
a
stored document.</p>
</blockquote>
</blockquote>
<h2>2. Storage:</h2>
<blockquote>
<p>The text representation of the digest of a document can be used as a
name
by which the document can be referenced. This allows SPH documents to be
stored
in whole or part, broken into known component pieces.</p>
<p>In addition to the advantages mentioned above, concerning disclosure of
only
some portions of a signed document, the other big reason for storing
documents
in multiple pieces is document reference and reuse. For example, a party
might
create a document to described himself, containing both a public key and
personal
data such as name, adders, email, and so on. This document could then be
referenced/included
in other documents to refer to that party. Stored documents take the
form
of document references, with different trees possibly sharing some of
the
same branches..</p>
<h3>2.1 Parameterization:</h3>
<blockquote>
<p>Document may be broken apart and stored internally as parameters with
sub
parameters. The following represents a parameter breakdown of the
sample
document above:</p>
<pre><code>DOC.VALUE="report from alpha group"
DOC.SEC1.NAME="received"
DOC.SEC1.VALUE="2001:02:25:21:52:07:258"
DOC.SEC2.NAME="report"
DOC.SEC2.VALUE="We have met the enemy, and they have won./nRegrouping
to await further orders."
DOC.SEC2.SEC1.NAME="casualties"
DOC.SEC2.SEC1.VALUE="heavy"
DOC.SEC2.SEC2.NAME="kia"
DOC.SEC2.SEC2.VALUE="Sgt. Ben Kilroy/nPvt. John Brown/nGen. May
Ham"
DOC.SEC2.SEC3.NAME="mia"
DOC.SEC2.SEC3.VALUE=""
</code><code>DOC.SEC2.SEC4.NAME="situation"
DOC.SEC2.SEC4.VALUE="fubar"</code></pre>
</blockquote>
<h3>2.2 Text Files:</h3>
<blockquote>
<p>Documents can be easily stored as text files, with filenames
corresponding
to the text representation of the contained documents digest. The
format
is as follows:</p>
<pre><code>FILENAME=TEXTHASHCODE.voxp
body text
heading1:TEXTHASHCODE-1
heading2:TEXTHASHCODE-2
headingN:TEXTHASHCODE-N</code></pre>
</blockquote>
<h3>2.3 Database:</h3>
<blockquote>
<p>Documents can be stored in two simple database tables defined as
follows:</p>
<pre><code>TABLE documents
digest CHAR(40) #40 for hex SHA-1
bodytext VARCHAR
TABLE sections
doc CHAR(40) #digest of parent document
heading CHAR(80) #80 arbitrary max length for heading name
sub CHAR(40) #digest of sub document</code></pre>
</blockquote>
</blockquote>
<h2>3. Parameters</h2>
<blockquote>
<p>A number of standard parameters can be defined for general purposes.
Some
possibilities are: Author, Title, Date, Version, Type. There is no real
difference
between a Parameter and a section containing a sub document. The only
obvious
difference would be that a parameter would be a much smaller piece of
data
- for example a Date, a number, or a short string. Note that Author
above
could be a full certificate that describes the author, or just a hash of
same
certificate.</p>
<p>Since parameters like the number "10" would produce well
known
digests, if concealing them is necessary, a protocol for adding random
noise
to them should be included. For example, the number "10" and
"10
+ 'SOMERANDOMENOISE'" could be parsed the same by an application,
but
the later would have an unrecognizable digest.</p>
<p>Use of a standard set of parameters such as Author and Title and
Version
in storage of the documents, could create an additional mechanism of
uniquely
identifying a document other than just its hash code. If the document
storage
and transfer system defines some standard parameters for such purposes,
it
might be a good idea to prefix them with a designation and version
number
like "SPH0.1-Title" or some such...</p>
</blockquote>
<h2>4. Types:</h2>
<blockquote>
<p>Once a standard parameter called "Type" is defined, it makes
sense
to define a few types. A few possibly useful types are (somewhat)
defined
here.</p>
<h3>4.1 Signatures:</h3>
<blockquote>
<p>Signature Documents make use of public key encryption techniques to
show
that a party has digitally signed a given document. The body text
represents
a digital signature of an SPH document digest. The defined headings
following
this signature are Certificate and Document. The Certificate heading
contains
a sub document identifying the signing party (see below). The Document
heading
contains the document from which the signed SPH digest was
computed.</p>
</blockquote>
<h3>4.2 Certificates:</h3>
<blockquote>
<p>Certificate Documents associate real world obligations and identities
with
specific public keys. The body text of the certificate contains text
representing
a public key. The subsections of the certificate are called the
Assertions.
These are such things as Name, Address, Phone Number, Email, etc., and
information
on how the certified party plays certain defined roles in Vehicle
execution
or other defined document types - such as rates charged for certain
services,
whom they are willing to trade with, or a definition of their currency
and
the real world goods/services for which their issued currency. A
currency
can therefore be expressed simply as the digest of the certificate
that
defines it - or perhaps more appropriately a digest of a Signature
document
self signing the certificate that defines it.</p>
</blockquote>
<h3>4.3 Vehicles:</h3>
<blockquote>
<p> Vehicles contain information concerning the exchange of obligations
between
parties. A vehicle is an executable document that specifies exchanges
of
a user defined currency. A vehicle may also contain an agreement
binding
to some of the signing parties.</p>
</blockquote>
</blockquote>
<h1>etc... (work in progress)</h1>
</body>
</html>