Skip to content

Specification

dfgordon edited this page Oct 15, 2025 · 2 revisions

Specifications

This provides specifications for various serialized objects that are either generated or consumed by a2kit.

File Image Informal Description

The a2kit file system layer relies on having a common description of a file, in all its details, that works for any supported file system. This is the FileImage structure. It is exposed to the user as a JSON string. File images are being used under the hood all the time, even though you can often ignore this fact.

One may wonder, why not use AppleSingle as the native file image? Actually, AppleSingle falls far short of what we want to capture. It does not allow for sparse structure, nor does it handle CP/M, nor does it handle Pascal. It does not strictly handle Apple DOS 3.x, although a2kit will make it appear that it does, by inserting a ProDOS transformation.

When you want to specify that an item is a file image you use the any type. As an example, suppose we have a binary file named thechip containing the 4 byte sequence 6,5,0, and 2. We can get the file image using

a2kit get -f thechip -t any -d mydos33.dsk --indent 4

Assuming console output, this would display

{
    "fimg_version": "2.1.0",
    "file_system": "a2 dos",
    "chunk_len": 256,
    "eof": "",
    "fs_type": "04",
    "aux": "",
    "access": "",
    "accessed": "",
    "created": "",
    "modified": "",
    "version": "",
    "min_version": "",
    "full_path": "thechip",
    "chunks": {
        "0": "00030400060500020000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
    }
}

For DOS, most of the metadata is empty. In this case there is only one "chunk," but generally there could be many. The same file retrieved from ProDOS would look different:

{
    "fimg_version": "2.1.0",
    "file_system": "prodos",
    "chunk_len": 512,
    "eof": "040000",
    "fs_type": "06",
    "aux": "0003",
    "access": "E3",
    "accessed": "",
    "created": "842D1C0A",
    "modified": "842D1C0A",
    "version": "24",
    "min_version": "00",
    "full_path": "thechip",
    "chunks": {
        "0": "0605000200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
    }
}

First let us note a few things about "chunks":

  • A chunk represents the data from a file system allocation block. Chunks are identified using an abstract chunk key, which determines the relative ordering of the chunks.
  • Unlike blocks, chunks are not tied to a specific sector or set of sectors on the disk.
  • The chunk keys do not have to be in any kind of sequence, or even be numbers. The file system implementation must know how to interpret its own keys.
  • Generally sequential files will have keys in an unbroken sequence, while sparse files will have "missing" keys.

A few other things to note:

  • Most value fields are hex strings. The interpretation of metadata depends on the file system, but wherever possible, the bytes are in direct correspondence with what is stored on disk.
  • For DOS 3.3, the starting address and length of the data are in the first two words of chunk 0. This is a characteristic of the file system, not the file image representation.
  • For ProDOS, the starting address is in the aux value, and the length is in the eof value.
  • The full_path must start at the root directory. Leading slashes (FAT), user numbers (CP/M), and volume names (ProDOS) are optional.

You can pass file images through the a2kit pipeline the same as any other object, but when writing it is required that the file_system key match the file system found on the disk image. You can use an unpack node to work around this requirement:

# the following is an error because Apple is on the left and MS-DOS is on the right
a2kit get -f apple_fimg.json | a2kit put -d msdos.imd -t any -f myfile
# inserting the unpack node and committing to a type allows the copy to proceed
a2kit get -f apple_fimg.json | a2kit unpack -t txt | a2kit put -d msdos.imd -t txt -f myfile.txt

File Image Specification

Overall Structure

  • The serialized FileImage is a JSON object
  • Root keys shall be valid JavaScript identifiers, all keys shall be visible ASCII characters
  • The values are either strings or numbers
  • Binary data shall be represented by upper-case hex strings with no prefix or byte-separators

Gotchas

  • Binary data is normally stored exactly as found on disk; an exception is the CP/M EOF
  • If you want to sort the chunk keys, parse them as numbers first
  • Don't confuse the access key (permissions) with the accessed key (timestamp)
  • The version and min_version keys refer to file system version, while fimg_version is the version of this specification

File Image Version

This gives the version of the file image. This spec is for version 2.1.0.

  • key fimg_version
  • value "<major>.<minor>.<patch>", for this spec it should be "2.1.0"

File System Version

This gives the version of the home file system of this file image.

  • key version
  • value is either
    • an empty string in case the FS does not maintain a version
    • a hex string containing the binary representation as found disk

File System Minimum Version

This gives the minimum version of the home file system needed for this file image.

  • key min_version
  • value is either
    • an empty string in case the FS does not maintain a minimum version
    • a hex string containing the binary representation as found disk

File System

This gives the home file system for this file image

  • key file_system
  • value is one of "prodos", "a2 dos", "a2 pascal", "cpm", or "fat"

Chunk Length

This is the size in bytes of the file system's allocation blocks.

  • key chunk_len
  • value is a number dictated by the file system
    • for CP/M the value can vary from vendor to vendor

End of File

This refers only to an EOF that is stored separately from the file's data. Some file systems do not have this.

  • key eof
  • value is either
    • an empty string in case the FS does not maintain the EOF apart from the file's data (e.g. Apple DOS)
    • a hex string containing the binary representation as found disk

File System Type

This can refer either to a file type extension, or a binary type code. Some file systems include access bits with the type.

  • key fs_type
  • value is either
    • a hex string containing the binary representation as found on disk
    • a hex string containing the binary representation of the file extension

Auxiliary

This is a catch-all for other data maintained with the file. For ProDOS it maps directly to "auxiliary." This refers only to auxiliary data that is stored separately from the file's data.

  • key aux
  • value is either
    • an empty string in case there is no auxiliary data, or if auxiliary data is packed with the file's data
    • a hex string containing the binary representation as found on disk

Access

The access priviliges or other file attributes

  • key access
  • value is either
    • empty if there is no access control, or if access bits are stored in fs_type
    • a hex string containing the file name bytes, if access bits are kept with the file name
    • a hex string containing the binary representation as found on disk

Time Stamps

  • keys accessed, created, modified
  • values are either
    • an empty string if the time is not stamped
    • a hex string containing the binary representation as found on disk

Full Path

  • key full_path
  • value is the path where the file came from, or where it is intended to go
    • it is acceptable to have a bare file name, in which case root is implied
    • CP/M user number can be added as a colon-delimited prefix

Chunks

This is the actual data of the file, organized into allocation blocks.

  • outer key chunks
  • outer value is an object
  • inner keys are abstract chunk identifiers, usually ordered unsigned integers
  • inner values are hex strings containing the data in the allocation block
    • it is acceptable to truncate the last hex string at the EOF provided the subsequent values are known to be unimportant

Sequential files will usually be identified as those where the chunk identifiers form an unbroken sequence of integers starting with zero. Anything else is a sparse file.

Clone this wiki locally