-
Notifications
You must be signed in to change notification settings - Fork 3
Specification
This provides specifications for various serialized objects that are either generated or consumed by a2kit.
The a2kit file system layer relies on having a common description of a file, in all its details, that works for any supported file system. This is the FileImage structure. It is exposed to the user as a JSON string. File images are being used under the hood all the time, even though you can often ignore this fact.
One may wonder, why not use AppleSingle as the native file image? Actually, AppleSingle falls far short of what we want to capture. It does not allow for sparse structure, nor does it handle CP/M, nor does it handle Pascal. It does not strictly handle Apple DOS 3.x, although a2kit will make it appear that it does, by inserting a ProDOS transformation.
When you want to specify that an item is a file image you use the any type. As an example, suppose we have a binary file named thechip containing the 4 byte sequence 6,5,0, and 2. We can get the file image using
a2kit get -f thechip -t any -d mydos33.dsk --indent 4Assuming console output, this would display
{
"fimg_version": "2.1.0",
"file_system": "a2 dos",
"chunk_len": 256,
"eof": "",
"fs_type": "04",
"aux": "",
"access": "",
"accessed": "",
"created": "",
"modified": "",
"version": "",
"min_version": "",
"full_path": "thechip",
"chunks": {
"0": "00030400060500020000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
}
}For DOS, most of the metadata is empty. In this case there is only one "chunk," but generally there could be many. The same file retrieved from ProDOS would look different:
{
"fimg_version": "2.1.0",
"file_system": "prodos",
"chunk_len": 512,
"eof": "040000",
"fs_type": "06",
"aux": "0003",
"access": "E3",
"accessed": "",
"created": "842D1C0A",
"modified": "842D1C0A",
"version": "24",
"min_version": "00",
"full_path": "thechip",
"chunks": {
"0": "0605000200000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000"
}
}First let us note a few things about "chunks":
- A chunk represents the data from a file system allocation block. Chunks are identified using an abstract chunk key, which determines the relative ordering of the chunks.
- Unlike blocks, chunks are not tied to a specific sector or set of sectors on the disk.
- The chunk keys do not have to be in any kind of sequence, or even be numbers. The file system implementation must know how to interpret its own keys.
- Generally sequential files will have keys in an unbroken sequence, while sparse files will have "missing" keys.
A few other things to note:
- Most value fields are hex strings. The interpretation of metadata depends on the file system, but wherever possible, the bytes are in direct correspondence with what is stored on disk.
- For DOS 3.3, the starting address and length of the data are in the first two words of chunk 0. This is a characteristic of the file system, not the file image representation.
- For ProDOS, the starting address is in the
auxvalue, and the length is in theeofvalue. - The
full_pathmust start at the root directory. Leading slashes (FAT), user numbers (CP/M), and volume names (ProDOS) are optional.
You can pass file images through the a2kit pipeline the same as any other object, but when writing it is required that the file_system key match the file system found on the disk image. You can use an unpack node to work around this requirement:
# the following is an error because Apple is on the left and MS-DOS is on the right
a2kit get -f apple_fimg.json | a2kit put -d msdos.imd -t any -f myfile
# inserting the unpack node and committing to a type allows the copy to proceed
a2kit get -f apple_fimg.json | a2kit unpack -t txt | a2kit put -d msdos.imd -t txt -f myfile.txt- The serialized
FileImageis a JSON object - Root keys shall be valid JavaScript identifiers, all keys shall be visible ASCII characters
- The values are either strings or numbers
- Binary data shall be represented by upper-case hex strings with no prefix or byte-separators
- Binary data is normally stored exactly as found on disk; an exception is the CP/M EOF
- If you want to sort the chunk keys, parse them as numbers first
- Don't confuse the
accesskey (permissions) with theaccessedkey (timestamp) - The
versionandmin_versionkeys refer to file system version, whilefimg_versionis the version of this specification
This gives the version of the file image. This spec is for version 2.1.0.
- key
fimg_version - value
"<major>.<minor>.<patch>", for this spec it should be"2.1.0"
This gives the version of the home file system of this file image.
- key
version - value is either
- an empty string in case the FS does not maintain a version
- a hex string containing the binary representation as found disk
This gives the minimum version of the home file system needed for this file image.
- key
min_version - value is either
- an empty string in case the FS does not maintain a minimum version
- a hex string containing the binary representation as found disk
This gives the home file system for this file image
- key
file_system - value is one of
"prodos","a2 dos","a2 pascal","cpm", or"fat"
This is the size in bytes of the file system's allocation blocks.
- key
chunk_len - value is a number dictated by the file system
- for CP/M the value can vary from vendor to vendor
This refers only to an EOF that is stored separately from the file's data. Some file systems do not have this.
- key
eof - value is either
- an empty string in case the FS does not maintain the EOF apart from the file's data (e.g. Apple DOS)
- a hex string containing the binary representation as found disk
This can refer either to a file type extension, or a binary type code. Some file systems include access bits with the type.
- key
fs_type - value is either
- a hex string containing the binary representation as found on disk
- a hex string containing the binary representation of the file extension
This is a catch-all for other data maintained with the file. For ProDOS it maps directly to "auxiliary." This refers only to auxiliary data that is stored separately from the file's data.
- key
aux - value is either
- an empty string in case there is no auxiliary data, or if auxiliary data is packed with the file's data
- a hex string containing the binary representation as found on disk
The access priviliges or other file attributes
- key
access - value is either
- empty if there is no access control, or if access bits are stored in
fs_type - a hex string containing the file name bytes, if access bits are kept with the file name
- a hex string containing the binary representation as found on disk
- empty if there is no access control, or if access bits are stored in
- keys
accessed,created,modified - values are either
- an empty string if the time is not stamped
- a hex string containing the binary representation as found on disk
- key
full_path - value is the path where the file came from, or where it is intended to go
- it is acceptable to have a bare file name, in which case root is implied
- CP/M user number can be added as a colon-delimited prefix
This is the actual data of the file, organized into allocation blocks.
- outer key
chunks - outer value is an object
- inner keys are abstract chunk identifiers, usually ordered unsigned integers
- inner values are hex strings containing the data in the allocation block
- it is acceptable to truncate the last hex string at the EOF provided the subsequent values are known to be unimportant
Sequential files will usually be identified as those where the chunk identifiers form an unbroken sequence of integers starting with zero. Anything else is a sparse file.