Re: N-Gram

17 Dec 2003

      Sounds like Bugajsky creates a generative grammar and then stores list
of productions that specifies a walk on the tree to extract data.  This
is a form of Kolmogorov Complexity compression, which has been expanded
upon most notably by Chaitin.  In the general case, the program could
be for a Turing complete machine: e.g., if I want to compress
3.14159265..., the compression algorithm could recognize the sequence
and give an essentially infinite compression if you want an infinite
number of digits (that's right, my patented algorithm can compress your
data, as found inside pi, to 0.0000000000000% of original size!  Oops,
pi can't be patented, well, then I'll have to use the mumble secret
sequence which can be patented! ;^)

I wonder whether Bugajsky includes the size of his grammar in the compressed
size...if he doesn't then his 0.5% non-lossy compression is overstated.

Barnsley fractal compression achieves very high compression rates like
this, but it could take days to compress one picture (of course, faster
compression algorithms exist that don't compress as much).  JPEG can get
very high compression rates if loss of exact data is okay (which it is
for pictures).  And yet another lossy compression example is Sony's MiniDisc
which biases to the loss of data to areas that are difficult for most 
humans to recognize.

Paul E. Baclace
peb@procase.com

Bib:

Chaitin. "Algorithmic Information Theory", Cambridge University Press, 1987.

Kolmogorov. "Three Approaches to the Quantitative Definition of Information", 
	Problems of Information Transmission 1, 1-7 [1965].

Kolmogorov. "On the Logical Foundations of Information Theory", Problems of 
	Information Transmission 5, 3-7 [1969].

Re: N-Gram

peb＠PROCASE.COM