Article 41931 of comp.sys.cbm:
Xref: undergrad.math.uwaterloo.ca comp.sys.cbm:41931
Newsgroups: comp.sys.cbm
Path: undergrad.math.uwaterloo.ca!csbruce
From: csbruce@ccnga.uwaterloo.ca (Craig Bruce)
Subject: Re: VBM
Message-ID: <DEHp14.8Ip@undergrad.math.uwaterloo.ca>
Sender: news@undergrad.math.uwaterloo.ca (system PRIVILEGED account)
Nntp-Posting-Host: ccnga.uwaterloo.ca
Organization: University of Waterloo, Canada (eh!)
Date: Wed, 6 Sep 1995 15:15:03 GMT

Alan Jones <alan.jones@qcs.org> writes:

>From: 057184449-0001@btxgate.de (Arndt Dettke)
>Hi Alan, I finished both a VBM-loader an
>d saver for GoDot (640x400 saver) and wi
>ll post it anywhere in the internet when
> I know how to do it.
>
>I just recieved this message from Arndt Dettke.  We now have a way to
>create modest size VBM images using a C64.  I don't have these
>routines, but I do have the Godot demo package and it seems to work
>well.  The docs have not been translated into english yet.  Arndt is
>still struggling with very limited Internet access.

Programming for the VBM format isn't really very difficult.  The basic idea
is to have a header followed by data in the basic format of how the C128's
VDC chip stores bitmaps (the format is officially named after "VDC
BitMap").  Unfortunately, there are three different versions of the format:
version #2, version #3 uncompressed, and version #3 compressed.  The version
#2 format exists because I didn't get the format right the first time.  You
can tell which format the VBM file is in by reading the header.  The header
of all the formats is as follows (you can extract all of this information
from the "pbmtovbm" version 1.99 conversion program):

POS   SIZ   DESC
---   ---   -----
  0     1   the character 'b': $43
  1     1   the character 'm': $4d
  2     1   the binary value $cb
  3     1   the VBM format version number: $02 or $03
  4     2   the width (X) of the image in Hi/Lo format
  6     2   the height (Y) of the image in Hi/Lo format

If the image is in version #2 format, then this is it for the header.
Version #3 images have the following additional header information:

POS   SIZ   DESC
---   ---   -----
  8     1   data-encoding type: $00=uncompressed, $01=RLE-compressed
  9     1   byte code for general RLE repetitions
 10     1   byte code for repeated zeroes
 11     1   byte code for repeated $ff values
 12     1   byte code for two repeated zeroes
 13     1   byte code for two repeated $ff values
 14     2   reserved := 0
 16     2   length of comment text (0 == no comment text)  (Hi/Lo format)
 18     n   characters of comment text in PETSCII

If the data-encoding type is "uncompressed", then the "repeated" fields are
ignored; otherwise, they contain the binary byte code that is to be used to
trigger an RLE expansion when uncompressing.  For version #3, I allowed the
data to be either compressed or uncompressed because uncompressed data will
be able to be processed and displayed faster, and compressed data will be
shorter.  Or, depending on the storage device and the image involved, the
compressed format may turn out to be faster since fewer characters will be
read from a slow I/O device.

For the uncompressed formats, the raw data is stored row by row from top to
bottom, with each row stored left to right, eight pixels per byte.  The
most-significant bit of the byte will contain the left-most pixel, and the
least-significant bit, the right-most pixel.  Where an image has a width in
pixels that is not evenly divisible by eight, the image is padded with black
pixels to make it so.  Thus, each row of the image occupies an integral
number of uncompressed bytes.  This is also roughly the format that the VDC
chip uses to store bitmaps.

The difference between version #2 and uncompressed version #3 images is that
for version #2 images, "1" bits mean black pixels and "0" bits mean white
pixels, after the X-Windows format.  For version #3 images, the bits have
the opposite meanings.  Yes, this is an unfortunate blunder, but easy to
fix; you just scan through the version #2 bytes of display data and EOR them
with $ff before using them on a display with a black background color and a
white foreground color.

The compressed format uncompresses into exactly the uncompressed format
(sic).  To uncompress compressed data, you read one byte of the data at a
time and compare it against all five of the "repeated" byte values given in
the header.  If the byte doesn't match any of them, then you display the
byte as-is; it's a literal.  Otherwise, if the byte is the code for the
"general repetition", then you read the next byte value which is the literal
value that will be repeated and then read the next byte after that which
gives the number of repetitions to make.  This is sufficient to give you
Run-Length Encoding compression, but I found that I got better performance
if I included additional, shorter RLE sequences for repeated $00 and $ff
strings of both general length and of specific length 2.  You always use the
one that gives you the shortest data.  The sequences are summarized as
follows:

CODE          SEQUENCE
----          ---------
general rep   <code> <literal-byte value> <repetition count>
rep $00       <code> <repetition count>
two $00       <code>
rep $ff       <code> <repetition count>
two $ff       <code>

If you run into a string of only one $00 or $ff byte, then you encode it
simply as a literal.  If you have a literal that you want to encode but it
equals one of the RLE repetition codes, then you must encode it using the
general repetition sequence.  Since this one literal must be encoded using
three bytes, it is possible that you could end up with data that is longer
when compressed than when it is uncompressed, although this is extremely
unlikely.

You can examine the "pbmtovbm" program for algorithms for compressing and
uncompressing the data according to this scheme.  You can statistically
determine what the best code values to use for these five RLE code sequences
for each file you compress, but I have found that the following values work
quite well for the sample data I have tried (mostly dithered images): $31,
$8c, $39, $cc, and $9c, respectively.

>BTW Bruce, you said "A pair beats no cards."  I think you meant, "An
>ACE beats no cards." ;)       alan.jones@qcs.org

My name is _CRAIG_.

Keep on Hackin'!

-Craig Bruce
csbruce@ccnga.uwaterloo.ca
"The irony, of course, is that if the entire herd of deer were to turn and
 trample the tiger, the tiger wouldn't stand a chance."