Random Thoughts on the new BOS multitasking operating system
------------------------------------------------------------------------------
[Wednesday, July 30, 1997]

You could make your RTOS maintain a cookie jar for each process.  Into this
cookie jar goes every resource that the process owns.  Memory could be
allocated dynamically with each block being pointed to by a resource in the
cookie jar, and each allocation having:

cookie ptr
size

This is inefficient for small allocations.
------------------------------------------------------------------------------
[Sunday, October 5, 1997]

Cookie-based operating system.

Cookie: 8-bit index value into kernel structure.

Error return (unless otherwise noted): .CC=no error, .CS=error + .A=errorCode

KERNEL SERVER: process, thread, semaphore, and cookie control.

ProcessCreate( args? ) : .X=processCookie
ProcessDestroy( .X=processCookie )
ProcessWait( .X=processCookie ) : .A=exitCode

ThreadCreate( [zp]=addr, .X=processCookie, args? ) : .X=threadCookie
ThreadDestroy( .X=threadCookie )

SemCreate( ) : .X=semCookie
SemDestroy( .X=semCookie )
SemWait( .X=semCookie )
SemSignal( .X=semCookie )
SemSignalBarrier( .X=semCookie, .AY=threadCount )

ServerOpen( .AY=nameString, .X=mode ) : .X=serverCookie
ServerClose( .X=serverCookie )
ServerRegister( .AY=nameStr )
ServerRemove( .X=serverCookie )
FunctionLookup( .AY=funcNameStr, .X=serverCookie ) : .AYX=funcAddr

CookieInherit( .X=cookieOfServer, .Y=serverCookie )
CookieRegister( .X=cookie, [zp]=cookieLiteral )
CookieDestroy( .X=cookie )
CookieLookup( .X=cookie, .Y=zpBufIndex )

--maintains a "process map" of all processes in system, up to 256
----process id, process addr, process name
----a process can be a 
--server maps to namespace as "/kernel"

MEMORY SERVER: allocates memory.

MemFetch    ( [zp]=farPtr, [zw]=nearPtr, .AY=len );
MemStash    ( [zp]=farPtr, [zw]=nearPtr, .AY=len );
MemAlloc    ( .AY=bytes ) : [zp]=ptr
MemFree     ( [zp]=farPtr );
MemPageAlloc( .A=flags, .X=pages ) : [zp]=ptr
MemPageFree ( [zp]=farPtr, .X=pages

--memory is allocated in a byte-wise fashion.
--system memory is cut into two types: ramdisk and dynamic, and they don't meet
--pointer is 4 bytes: addr_low, addr_high, bank, type
--types: $00=internal, $01=reu, $02=ramlink
--memory is managed at three levels: type, bank, and byte.
--each type of memory has a bank-allocation map with up to 256 entries.
--this points to the first free memory block on the bank
----or null if there are no free blocks
----or a special value if whole bank is allocated to a process
--a free block is: next ptr (2), size (2)
--an allocated block is: proc next ptr (2), proc prev ptr (2), size (2)

LOADER: loads executable programs / libraries / drivers.

NAME SERVER: records system-object names and mount points.

COMMODORE FILE SYSTEM SERVER: accesses commodore file systems.

COMMODORE DISK-BLOCK SERVERS: reads and writes raw blocks to commodore drives.

RAMDISK DISK-BLOCK SERVER: reads/writes raw blocks.

COMMODORE KERNAL SERVER: interfaces kernal calls.

NETWORK SERVER: communication between hosts.

MODEM SERVER: serial communication.

CONSOLE SERVER: primitive console calls.

WINDOW SERVER: higher-level console calls.
------------------------------------------------------------------------------
[Friday, November 7, 1997]

Kernel cookie structure:

16 bits: server id ($00==kernel), others as loaded in nameserver
16 bits: extra
32 bits: object id
-------
64 bits  total
------------------------------------------------------------------------------
[Friday, February 13, 1998]

I should really be finishing my thesis right now, but this is much more
interesting to think about.

- Context switching on a 6502 with an MMU is very slow, and therefore, we
  want to do it as little as possible.  Since I envision that only rarely
  will there be more than one thread needing to execute at a time.

- We want to be able to call system services and device/kernel services with
  as little overhead as possible.

system concepts:

* process: executable code, variables, allocated memory, ownership of system
           resources.  It can also be called a "library" or "server",
           depending on exactly how the program is used.

* thread:  a virtual CPU that executes code.  A thread is "owned" by one
           process, but it can, under special circumstances, call functions
           of other processes and make use of resources of both processes.
           This facility allows system and device-driver calls to be made
           with no context switching.

* semaphore: a locking/mutual-exclusion mechanism.

* cookie: a special cirtificate, owned by a process and handled internally
          by the kernel, that gives the threads of a process access to
          resources held on servers in the system.  This mechanism controls
          security and process clean-up.

* program: is the binary image of a program, plus relocation information,
           stored in a file, in a special format that the loader understands.

System servers:

* Kernel server: process and thread control, semaphores and cookies, plus
                 simple library management.

* Memory server: memory management is sufficiently complicated that it
                 deserves its own server.  Its purpose is to allocate and
                 free pages and/or byte-aligned memory blocks in all of the
                 types of memory supported: regular internal RAM, REU RAM,
                 RAMLink RAM, and SuperCPU RAM.  It will also provide
                 functions for Fetch and Stash to the memory types.

* Loader server: load and relocate programs.  This is a complicated job and
                 doesn't need to be in the kernel proper.

* Name server:   will keep track of loaded processes, the root file system,
                 and various named objects used for interprocess
                 communication.

Device drivers (servers):

* Console: high-level screen/keyboard accessing, including window management.

* Screen: as distinguished from the console server, this gives low-level
          access functions to the screen.  There is a different one for the
          VDC and VIC screens.

* Disk-Block: reads and writes raw blocks to various drive types
              using the fastest methods available.  Devices supported
              will be: 1541, 1571, 1581, CMD FDs, CMD HDs (serial/parallel
              cables), RAMLink, and custom RAM-disk pages.  The RAMdisk
              image will not disappear when the system exits, but only when
              the power is turned off.

* File system: high-level file-system stuff.  It will use the Commodore-DOS
               file format and read/write disk blocks through the Disk-Block
               server.

Program memory format:
offst  siz  description
-----  ---  ------------
$0000  (3)  JMP initialize (constructor)
$0003  (3)  JMP clean_up   (destructor)
$0006  (2)  flags
$0008  (2)  number of entry functions
$000A  (2)  near address of 8-bit-func far jump table
$000C  (2)  near address of 16-bit-func far jump table
$000E  (2)  reserved
$0010  (x)  code/data/jump tables
$XXXX  (0)  END+1

------------------------------------------------------------------------------
[Wednesday, July 1, 1998]

modules for BOS-128:

kernel  - kernel.s   - kernel
mem     - memory.s   - memory
kbd     - keyboard.s - keyboard/mouse/joystick driver
vdc     - vdc.s      - screen driver
con     - console.s  - console driver
swift   - swift.s    - swiftlink driver
name    - name.s     - name server
block   - block.s    - block drivers/cache controller
file    - filesys.s  - file system
par     - par.s      - parallel-port printer driver
usr232  - usr232.s   - user-port serial

init    - init.s     - process spawner
sh      - sh.s       - command shell

hello   - hello.s    - hello world program
cp      - cp.s       - file copy
rm      - rm.s       - remove
wc      - wc.s       - word count
tr      - tr.s       - translate
crc32   - crc32.s    - crc computer

uue     - uue.s      - uucode encoder
uud     - uud.s      - uucode decoder
pbm     - pbm.s      - pbm viewer
fx      - fx.s       - file exchange
as      - as.s       - assembler
more    - more.s     - more
z       - z.s        - zed text editor

-----=-----

SemCreate( void ) : .AY=semid
SemDestroy( .AY=semid )
SemSignal( .AY=semid )
SemWait( .AY=semid )

ThreadCreate( .AYX=addr, stacksize ) : .AY=threadid
ThreadDestroy( .AY=threadid )
ThreadSuspend( .AY=threadid )
ThreadResume( .AY=threadid )
--------------------------------------------------------------------------------
[Monday, January 3, 2000]

Time to get serious about producing this software.  I think I should start
using 16-bit register .AX instead of .AY.  This will allow things like:

   ldy #10
   sta (ptr),y
   iny
   txa
   sta (ptr),y
--------------------------------------------------------------------------------
[Monday, August 7, 2000]

Okay, really time to get serious about things.  BOS should be a
semaphore-based microkernel and it should offer the following services:

- process & thread (task?) management
- low-level memory management
- some real-time stuff
- some miscellaneous stuff for the services model

Should I have pure processes & threads or should I have a single thing
that is a combination of the two?  On a non-protected memory system,
there may not be all that much difference.  The Amiga had a similar
hardware architecture and it had "Tasks".  I do need another concept,
however, that of a loaded program, or "module", or "service".  The two
of these things need to be tied together into a services model and there
needs to be a means of tying things together.  How do I identify services?
What is my memory model?

Well, the memory model should be an easy decision... I'll just use the
one from ACE.  I hit the nail on the head there.  Basically, all memory
will be accessed through the kernel using 32-bit pointers.  The structure
of a pointer is:

   ptr+0  low byte of address
   ptr+1  high byte of address
   ptr+2  bank of memory
   ptr+3  type of memory

There will be few different types of memory: $00=NULL, $01=Internal,
$02=REU, $03=RamLink, and any more types that surface.

So, where do I do memory allocation and deallocation?  I think it should be
split into two layers, like it is in ACE.  The lower layer handles page-
oriented allocations for a process and the higher layer handles byte-
oriented allocations.  In ACE, this higher-level layer is implemented
into the standard library of an application, although I think it should
be implemented into a system service in BOS to avoid redundancy.

As part of my services model, I am going to need some means of invoking
a service.  A service is a OS manager of some sort, a device driver, a
user program, or whatever; it is code that can be executed.  Basically,
with a task/semaphore system, all services are "self-serve" in that
the thread that wants the action performed actually enters the external
program and executes the code itself.  I will need a kernel call to invoke
an external service.  I will need a pointer to the code to be executed,
some sort of an object identifier for the service to make use of, and maybe
permissions or something for what operations are permitted, or maybe this
can be contained inside of the object.  This "object" would be kind of
like a capability or cookie and its definition and usage would be up to
the service.

I also need to have the system be involved in managing objects so that if a
process is killed, all service objects associated with it are cleaned up.
I think that the process can do this itself in a posthumous execution
state.  Actually, this cleanup could be done with a mechanism whereby
the service makes a process-manager call to register a destructor for
the process for its posthumous cleanup.

So, clearly, I have two kinds of objects, loaded programs and tasks,
that I need to tie together in a convenient way.  I guess I should
use the term of "process" to describe a loaded program and the term of
"thread" to describe the active program-execution entities.  Here's where
things get a little funky; threads need to be a kernel-level entity, but
processes do not need to be.  Processes can be managed by a higher-level
OS service.  Also, Threads will be owned by Processes, but external
Threads are allowed to enter and exit processes at any time.  This is
needed as part of the service model.  These external threads can be a
pesky problem when it comes time to destroy a process, so I will probably
need some higher-level controls when it comes to external invocations.
This can probably be tied into external-function-call-binding mechanism.

So, threads are owned by processes, and when all threads of a process
exit, the process will normally be destroyed itself.  However, there will
be need in cases of device drivers and other services for the process to
continue to live without any threads.

One more thing: semaphores.  Should they be locking or counting semaphores
and what term should I use to describe them?  Locking semaphores are
simpler to handle and are sufficient for most purposes, so I think I'll
go with them.  Should I call them "semaphores" or "locks".  The latter
term will be more meaningful to some people.

How about a stab at the full set of kernel calls:

zp = 16-bit pointer
zw = 16-bit word
mp = 32-bit memory/object pointer
fp = 32-bit (24-bit) function pointer

BosStartup( )
BosShutdown( )

BosThreadCreate( [fp]=addr, .A=zpsize, .X=stksize, (zw)=processId ):.AX=threadId
BosThreadDestroy( .AX=threadId )
BosThreadSleep( .Y=waitFlag($40=abs), .AX=jiffies )

BosLockCreate( void ) : .AX=lockId
BosLockDestroy( .AX=lockId )
BosLockAcquire( .AX=lockId, .Y=waitFlag($80=yes,$40=abs), (zw)=jiffies )
BosLockRelease( .AX=lockId )

BosMemFetch( [mp]=farMemoryPtr, (zp)=nearBuffer, .AX=length )
BosMemStash( [mp]=farMemoryPtr, (zp)=nearBuffer, .AX=length )
BosMemPageAlloc( .AX=nPages ) : [mp]=farMemoryPtr
BosMemPageFree( [mp]=farMemoryPtr, .AX=nPages )

BosLocateNameService( ) : [fp]=nameServiceLookup
BosCallFar( [fp]=functionPtr, etc. ) : etc.
BosCallSystem( (fp)=functionPtr, etc. ) : etc.
BosGetJiffies( ) : .AX=jiffies
BosGetSystemType( ) : .A=sysTypeCode

I think this is about all that I need for the microkernel component.  Mind
you, there are many things not included here, such as process management,
loading & relocating programs, device drivers, and a file system.
These will be provided by higher-level system components.  The services
above could be used to implement a statically-linked embedded system.

The intention is to have the kernel take care of most of the hardware
dependencies, and for higher-level system services to take care of
the rest.  The intention of BOS is to make things portable to a number
of different hardware platforms, including:

SYSTYPECODE   PLATFORM
-----------   --------
          0   C-64
          1   SuperCPU C-64
          2   C-128
          3   SuperCPU C-128
          4   VIC-20
          5   Plus/4
          6   C-16
          7   PET/CBM 40-col
          8   PET/CBM 80-col

The kernel itself will be fairly small.
--------------------------------------------------------------------------------
[Sunday, August 27, 2000]

Houston, we have a problem.  I want to make this system so that a thread
of one process can casually execute code in another process, but there
is a problem with this.  If a thread from process A is executing code
inside of process B and has various global variables inside of process B
in some confused state and then I kill the thread (or the whole process
A), then the thread will disappear and the state of process B will remain
all mixed up.

There is great advantage in allowing threads of one process to execute
code inside of another.
--------------------------------------------------------------------------------
[Monday, February 12, 2001]

I want to write a micro-kernel-like system that only uses semaphores
and does as little context switching as possible.  Minimizing context
switching is the only way to make this system practical on machines that
came before the C128.  So, I want a system where threads can enter into
the code of remote processes, but this needs to be done safely, in case
the calling process is killed.

The best way to achieve this is to make it so that a calling thread
is temporarily "inherited" by the called (server) process while it is
executing inside of it, and then is released back to the calling process
when the thread finishes.  This will loosely be an emulation of message
passing, where a thread inside of a server carrying out a request for
a client will likely finish and attempt to send back a Reply() message
before finding out that the calling process has died.  Thus, the internal
state of the server will be put into a consistent state before the death
is dealt with.

If a thread were temporarily inherited, it could continue executing inside
of the server process until it attempted to return to its parent.  At that
time, the thread would be terminated.  The two major advantages of this
approach over message passing are that no expensive context switching is
needed between client/server threads, and there no need to guess about
how many server threads to have idling inside of the server process; the
right level of multithreading inside of the server process will always
be achieved.

The client process will likely contain references to the parent process'
memory.  So, the client process will need to be zombified until all
inherited threads return.  This would be needed in a shared-memory
message-passing system also (such as JOS).

This is relatively clean when the client dies, but what happens when
the server dies?  In a message-passing system, the client thread would
probably return from its Send() call with an error.  I think that I'm going
to need to implement the semantics of inter-process thread jumping to be
very similar to message passing.

I'm going to need ThreadRemoteProcessCall() and ThreadRemoteProcessReturn()
calls.  Well, I'm kind of going to need these things anyway, to make a
subroutine to a "far" internal memory bank.  These calls will need to
tweak the process information as well, to indicate that a thread of one
process has been inherited by another process.  Note that no Receive()
call is needed since the thread entering the remote process has the effect.

With this, all I need to do is standardize the error-return method from
a ThreadRemoteProcessCall() (for both the mechanism failing or the remote
call failing), and I can simply make a thread return with an error if the
remote process it calls dies while it is doing stuff.  The internal state
of the remote process won't matter in this case, so I can just stop the
thread without worrying about that.

Inside of a process, the usual semaphore mechanism can be used to keep
all threads in line and manage the consistency of the internal state of
the process.

In a multi-computer environment, this type of RPC could also be used
if threads virtually passed from computer to computer by creating a
thread on the remote machine to execute stuff and then killing this
thread when it is done its work, and passing control back and forth to
make it seem as though the local thread executed on the remote machine.
This could be made more efficient if a pool of zombie threads are kept on
the computers for handling these remote requests, to reduce the cost of
constantly creating and destroying threads.  Not that this will help with
far/remote-memory access, though this can be handled if all far-memory
accessing is done in chunks through the kernel, such as in my design.

Here are the pieces of the Operating System that I am envisioning:

Kernel - thread management, far-memory access, interrupts, low-level things
Process Manager - process management, program loading, namespace management
Memory Manager - memory management, byte-oriented dynamic memory

These are similar to many microkernel designs.  One minor thing will
be that these three "processes" will need to be assembled together,
for bootstrapping purposes, since I can't very well dynamically load
either the process manager or the memory manager since they are needed
to perform process loading.

There's also the matter of cleaning up resources in a server process after
a client process dies.  I like the idea in JOS about "connections".  All I
need is a simple call to the process manager for a thread to register its
"home" process(es) as being having open resources on the current process.
Then I need a mechanism for the server process to be notified when the
client dies.  Perhaps I could have a standard server interface for this
and the process manager could commandeer a thread of a dying process
(or make one) and call the entry point of the server.  Should there be
a global one for a process or should info be kept for each open "object"
that a process holds on a server.

Note that in my design, "process", "client", "server", "library", and
"device driver" all refer to the same kind of object: a process.