Sectional Parameter Hashing

6 Mar 2001

      All,

This past week I have been thinking about a document format and hashing
method that allows arbitrary level partial disclosure of a verifiably signed
document, as well as formal reference/reuse of subdocuments inside of larger
documents. Basically it is a technique for using a hash tree within a
standard document format.

Below is a description of a system that satisfies some of my needs. I would
appreciate some feed back. Thanks.

--Sean Hastings
--mailto:sean@havenco.com
--vmsg/fax:1.800.why.sean

 - - -

<html>
<head>
<title>VOXP - Value and Obligation eXchange Processing</title>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>

<body bgcolor="#FFFFFF">
<h1>Sectional Parameter Hashing</h1>
<p>A description of a possible format and uses for documents that are
designed
  to behave like Hash Trees.</p>
<h2>0. Contents:</h2>
<blockquote>
  <p>0.) Contents</p>
  <p>1.) Format</p>
  <blockquote>
    <p>1.1) Headings</p>
    <p>1.2) Digests</p>
    <p>1.3) White Space</p>
  </blockquote>
  <p>2.) Storage</p>
  <blockquote>
    <p>2.1) Parameterization</p>
    <p>2.2) Text File</p>
    <p>2.3) Database</p>
  </blockquote>
  <p>3.) Parameters</p>
  <p>4.) Types</p>
  <blockquote>
    <p>3.1.) Signatures</p>
    <p>3.2.) Certificates</p>
    <p>3.3.) Vehicles</p>
  </blockquote>
  </blockquote>
<h2></h2>
<h2>1. Format:</h2>
<blockquote>
  <p>The system makes use of a document format and digest technique called
Sectional
    Parameter Hashing (SPH).</p>
  <p>The general document format is designed to support hashing technique in
which
    the subdocuments under a heading may be reduced to (or expanded from) a
text
    representation of its digest, without altering the overall documents
digest
    value. This is done by having a structured document format, and an
understood
    technique for computing the document's digest.</p>
  <h3>1.1 Sections:</h3>
  <blockquote>
    <p>A section is indicated with a heading followed by a colon followed by
a
      newline. This is followed by one or more lines of indented text, which
may
      include sub sections. Where no indented text follows the heading
designation,
      the section is considered to be empty. Where subsections are present,
they
      are always located after any original body text, and are ordered by
numeric
      character value of their heading's component characters. For
example:</p>
    <pre><code>report from alpha group
received:
	2001:02:25:21:52:07:258
report:
	We have met the enemy, and they have won.
	Regrouping to await further orders.
	casualties:
		heavy
	kia:
		Sgt. Ben Kilroy
		Pvt. John Brown
		Gen. May Ham
 	mia:
	situation:
		fubar</code></pre>
  </blockquote>
  <h3>1.2 Digests:</h3>
  <blockquote>
    <p>The idea of a digest (or hash) function is that a section of text can
be
      reduced to a unique value. This is used in digital signatures, where
the
      document to be signed is first reduced to a unique number, and then
that
      number is encrypted with the signing party's private key. The
resulting
      cipher text can be decrypted with the corresponding public key to
obtain
      the digest value, and if the document is known, it can be used to
recompute
      the digest value and see that they match. This allows any party to
verify
      that the holder of a certain private key has signed a given document.
It
      also allows any party to hold the record of a signature, without (yet)
having
      access to the document that was signed. At a later time the document
text
      can be produced and the signature verified.</p>
    <p>SPH uses a hashing by section protocol that allows a final hash of
any
      document (or subdocument) to be produced (and possibly signed) that
does
      not change with the expansion or collapse of a particular sub section
to
      its own unique digest value. This allows verifiable signatures to be
produced
      and disclosed with a great degree of variety as to how much of the
document
      needs be revealed to a particular party.</p>
    <p>This is done as follows. The digest of the entire document is
computed
      after first reducing contained subdocuments to a
HEADINGNAME:TEXTHASHCODE.
      The digest of each subdocument is computed the same way, with the
digests
      of its subdocuments computed first. Where a subsection is already
expressed
      as a digest, no work is done. By following this recursive process, the
same
      digest is obtained for any document, regardless of which headings and
sub
      headings may be revealed in plain text, or represented only by their
unique
      digest value.</p>
    <p>So our sample document above can be displayed as:</p>
    <pre><code>report from alpha group
received:
	2001:02:25:21:52:07:258
report:
	We have met the enemy, and they have won.
	Regrouping to await further orders.
	casualties:TEXTHASHCODE
	kia:TEXTHASHCODE
	mia:TEXTHASHCODE
	situation:TEXTHASHCODE</code></pre>
    <p>OR:</p>
    <pre><code>report from alpha group
received:
	2001:02:25:21:52:07:258
report:TEXTHASHCODE</code></pre>
    <p>OR:</p>
    <pre><code>report from alpha group
received:TEXTHASHCODE
report:TEXTHASHCODE</code></pre>
    <p> Whatever the combination of collapsed or expanded headings, the
digest
      computed for the entire document using the SPH technique will remain
the
      same. The final digest value will always represent a hash of the last
example
      above, regardless of how much or little of the document is known.</p>
  </blockquote>
  <h3>1.3 White Space:</h3>
  <blockquote>
    <blockquote>
      <h2></h2>
    </blockquote>
    <p>Headings must not begin or end with a whitespace character, but they
can
      include white space inside of them, and they must be terminated by a
colon.
      Any text after the colon will be interpreted as a digest of the
subdocument
      under the heading. If this text does not fit the format of the hash
(Sha1
      hex - 40 characters 0-1,A-F), the line will be interpreted as body
text,
      but if it is not properly positioned at the beginning of an indented
section
      to be body text, or the heading is not in proper alphanumeric order,
the
      document will be considered malformed.</p>
    <p>Indentation under each heading will take the form of a single
additional
      <Tab> character at the beginning of each line. Additional tabs
or
      whitespace beyond the expected level at the beginning of a document or
sub
      document before any headings will be considered part of body text.
Anywhere
      else, they will indicate a malformed document.</p>
    <p>These rules of heading indentation, naming, and order by alphanumeric
value
      are designed to ensure that there is only one possible way to rebuild
a
      stored document.</p>
  </blockquote>
</blockquote>
<h2>2. Storage:</h2>
<blockquote>
  <p>The text representation of the digest of a document can be used as a
name
    by which the document can be referenced. This allows SPH documents to be
stored
    in whole or part, broken into known component pieces.</p>
  <p>In addition to the advantages mentioned above, concerning disclosure of
only
    some portions of a signed document, the other big reason for storing
documents
    in multiple pieces is document reference and reuse. For example, a party
might
    create a document to described himself, containing both a public key and
personal
    data such as name, adders, email, and so on. This document could then be
referenced/included
    in other documents to refer to that party. Stored documents take the
form
    of document references, with different trees possibly sharing some of
the
    same branches..</p>
  <h3>2.1 Parameterization:</h3>
  <blockquote>
    <p>Document may be broken apart and stored internally as parameters with
sub
      parameters. The following represents a parameter breakdown of the
sample
      document above:</p>
    <pre><code>DOC.VALUE="report from alpha group"
DOC.SEC1.NAME="received"
DOC.SEC1.VALUE="2001:02:25:21:52:07:258"
DOC.SEC2.NAME="report"
DOC.SEC2.VALUE="We have met the enemy, and they have won./nRegrouping
to await further orders."
DOC.SEC2.SEC1.NAME="casualties"
DOC.SEC2.SEC1.VALUE="heavy"
DOC.SEC2.SEC2.NAME="kia"
DOC.SEC2.SEC2.VALUE="Sgt. Ben Kilroy/nPvt. John Brown/nGen. May
Ham"
DOC.SEC2.SEC3.NAME="mia"
DOC.SEC2.SEC3.VALUE=""
</code><code>DOC.SEC2.SEC4.NAME="situation"
DOC.SEC2.SEC4.VALUE="fubar"</code></pre>
  </blockquote>
  <h3>2.2 Text Files:</h3>
  <blockquote>
    <p>Documents can be easily stored as text files, with filenames
corresponding
      to the text representation of the contained documents digest. The
format
      is as follows:</p>
    <pre><code>FILENAME=TEXTHASHCODE.voxp
  body text
  heading1:TEXTHASHCODE-1
  heading2:TEXTHASHCODE-2
  headingN:TEXTHASHCODE-N</code></pre>
  </blockquote>
  <h3>2.3 Database:</h3>
  <blockquote>
    <p>Documents can be stored in two simple database tables defined as
follows:</p>
    <pre><code>TABLE documents
  digest   CHAR(40)  #40 for hex SHA-1
  bodytext VARCHAR

TABLE sections
  doc      CHAR(40)  #digest of parent document
  heading  CHAR(80)  #80 arbitrary max length for heading name
  sub      CHAR(40)  #digest of sub document</code></pre>
  </blockquote>
</blockquote>
<h2>3. Parameters</h2>
<blockquote>
  <p>A number of standard parameters can be defined for general purposes.
Some
    possibilities are: Author, Title, Date, Version, Type. There is no real
difference
    between a Parameter and a section containing a sub document. The only
obvious
    difference would be that a parameter would be a much smaller piece of
data
    - for example a Date, a number, or a short string. Note that Author
above
    could be a full certificate that describes the author, or just a hash of
same
    certificate.</p>
  <p>Since parameters like the number "10" would produce well
known
    digests, if concealing them is necessary, a protocol for adding random
noise
    to them should be included. For example, the number "10" and
"10
    + 'SOMERANDOMENOISE'" could be parsed the same by an application,
but
    the later would have an unrecognizable digest.</p>
  <p>Use of a standard set of parameters such as Author and Title and
Version
    in storage of the documents, could create an additional mechanism of
uniquely
    identifying a document other than just its hash code. If the document
storage
    and transfer system defines some standard parameters for such purposes,
it
    might be a good idea to prefix them with a designation and version
number
    like "SPH0.1-Title" or some such...</p>
</blockquote>
<h2>4. Types:</h2>
<blockquote>
  <p>Once a standard parameter called "Type" is defined, it makes
sense
    to define a few types. A few possibly useful types are (somewhat)
defined
    here.</p>
  <h3>4.1 Signatures:</h3>
  <blockquote>
    <p>Signature Documents make use of public key encryption techniques to
show
      that a party has digitally signed a given document. The body text
represents
      a digital signature of an SPH document digest. The defined headings
following
      this signature are Certificate and Document. The Certificate heading
contains
      a sub document identifying the signing party (see below). The Document
heading
      contains the document from which the signed SPH digest was
computed.</p>
  </blockquote>
  <h3>4.2 Certificates:</h3>
  <blockquote>
    <p>Certificate Documents associate real world obligations and identities
with
      specific public keys. The body text of the certificate contains text
representing
      a public key. The subsections of the certificate are called the
Assertions.
      These are such things as Name, Address, Phone Number, Email, etc., and
information
      on how the certified party plays certain defined roles in Vehicle
execution
      or other defined document types - such as rates charged for certain
services,
      whom they are willing to trade with, or a definition of their currency
and
      the real world goods/services for which their issued currency. A
currency
      can therefore be expressed simply as the digest of the certificate
that
      defines it - or perhaps more appropriately a digest of a Signature
document
      self signing the certificate that defines it.</p>
  </blockquote>
  <h3>4.3 Vehicles:</h3>
  <blockquote>
    <p> Vehicles contain information concerning the exchange of obligations
between
      parties. A vehicle is an executable document that specifies exchanges
of
      a user defined currency. A vehicle may also contain an agreement
binding
      to some of the signing parties.</p>
  </blockquote>
</blockquote>
<h1>etc... (work in progress)</h1>
</body>
</html>