Akeeba Backup for Joomla

AppendixA.The JPA archive format, v.1.3

AppendixA.The JPA archive format, v.1.3
Prev	PartIII.Appendices	Next

AppendixA.The JPA archive format, v.1.3

Design goals

The JPA format strives to be a compressed archive format designed specifically for efficiency of creation by a PHP script. It is similar in design to the PKZIP format, with a few notable differences:

CRC32 is not used; calculation of file checksums is time consuming and can lead to errors when attempted on large files.
Only allowed compression methods are store and deflate.
There is no Central Directory (simplifies management of the file).
File permissions (UNIX style) are stored within the file.

Even though JPA is designed for use by PHP scripts, creating a command-line utility, a programming library or even a GUI program in any other language is still possible. JPA is not supposed to have high compression rations, or be secure and error-tolerant as other archive formats. It merely an attempt to provide the best compromise for creating archives of very large directory trees using nothing but PHP code to do it.

This is an open format. You may use it in any commercial or non-commercial application royalty-free. Even though the PHP implementation is GPL-licensed, we can provide it under commercial-friendly licenses, e.g. LGPL v3. Please ask us if you want to use it on your own software.

Migration notes

Migrating from 1.2 to 1.3

Tools reading / extracting version 1.2 archives will continue working the same as long as they are programed to ignore the extra fields they do not understand. They will not be able to extract archives with file sizes over 4GiB (4294967296 bytes), or correctly report the size of archives with a total compressed and/or uncompressed size over 4GiB (4294967296 bytes), something which was already the case.

Tools supporting version 1.3 archives MUST also support version 1.2 archives. They MAY support version 1.1 archives. They MUST NOT support version 1.0 archives (it was an internal revision).

Tools supporting version 1.3 MUST overwrite their read values for the fields available in both unsigned long (32-bit) and unsigned long long (64-bit) format with the values conveyed in the latter format, which were introduced in version 1.3.

Structure of an archive

An archive consists of exactly one Standard Header and one or more Entity Blocks . Each Entity Block consists of exactly one Entity Description Block and at most one File Data Block . All values are stored in little-endian byte order, unless otherwise specified.

All textual data, e.g. file names and symlink targets, must be written as little-endian UTF-8, non null terminated strings, for the widest compatibility possible.

Standard Header

The function of the Standard Header is to allow identification of the archive format and supply the client with general information regarding the archive at hand. It is a binary block appearing at the beginning of the archive file and there alone. It consists of the following data (in order of appearance):

Signature, 3 bytes: The bytes 0x4A 0x50 0x41 (uppercase ASCII string “JPA”) used for identification purposes.
Header length, 2 bytes: Unsigned short integer represented as two bytes, holding the size of the header in bytes. This is now fixed to 19 bytes, but this variable is here to allow for forward compatibility. When extra header fields are present, this value will be 19 + the length of all extra fields.
Major version, 1 byte: Unsigned integer represented as single byte, holding the archive format major version, e.g. 0X01 for version 1.2.
Minor version, 1 byte: Unsigned integer represented as single byte, holding the archive format minor version, e.g. 0X02 for version 1.2.
File count, 4 bytes: Unsigned long integer represented as four bytes, holding the number of files present in the archive.
Uncompressed size, 4 bytes: Unsigned long integer represented as four bytes, holding the total size of the archive's files when uncompressed.
Compressed size, 4 bytes: Unsigned long integer represented as four bytes, holding the total size of the archive's files in their stored (compressed) form

Extra Header Field - Spanned Archive Marker

This is an optional field, written after the Standard Header but before the first Entity Block, denoting that the current archive spans multiple files. Its structure is:

Signature, 4 bytes: The bytes 0x4A, 0x50, 0x01, 0x01
Extra Field Length, 2 bytes: The length of the extra field, without counting the signature length. It's value is fixed and equals the decimal number 4.
Number of parts, 2 bytes: The total number of parts this archive consists of.

When creating spanned archives, the first file (part) of the archive set has an extension of .j01, the next part has an extension of .j02 and so on. The last file of the archive set has the extension .jpa.

When creating spanned archives you must ensure that the Entity Description Block is within the limits of a single part, i.e. the contents of the Entity Description Block must not cross part boundaries. The File Data Block data can cross one or multiple part blocks.

Extra Header Field - Extended Size Attributes

Added in version 1.3.

This is an optional field, written after the Standard Header but before the first Entity Block, adding support for 64-bit values for the total uncompressed and compressed size of the files in the archive. Its structure is:

Signature, 4 bytes: The bytes 0x4A, 0x50, 0x01, 0x02
Extra Field Length, 2 bytes: The length of the extra field, without counting the signature length. It's value is fixed and equals the decimal number 18.
Uncompressed size, 8 bytes: Unsigned long long integer (64-bit) represented as eight bytes, little endian order, holding the total size of the archive's files when uncompressed.
Compressed size, 8 bytes: Unsigned long long integer (64-bit) represented as eight bytes, little endian order, holding the total size of the archive's files in their stored (compressed) form.

Entity Block

An Entity Block is merely the aggregation of an Entity Description Block and at most one File Data Block. An Entity can be at present either a File or a Directory. If the entity is a File of zero length or if it is a Directory the File Data Block is omitted. In any other case, the File Data Block must exist.

Entity Description Block

The function of the Entity Description Block is to provide the client information about an Entity included in the archive. The client can then use this information in order to reconstruct a copy of the Entity on the client's file system. It is a binary block consisting of the following data (in order of appearance):

Signature, 3 bytes

The bytes 0x4A, 0x50, 0x46 (uppercase ASCII string “JPF”) used for identification purposes.

Block length, 2 bytes

Unsigned short integer, represented as 2 bytes, holding the total size of this Entity Description Block.

Length of entity path, 2 bytes.

Unsigned short integer, represented as 2 bytes, holding the size of the entity path data below.

Entity path data, variable length.

Holds the complete (relative) path of the Entity as a UTF16 encoded string, without trailing null. The path separator must be a forward slash (“/”), even on systems which use a different path separator, e.g. Windows.

Entity type, 1 byte.

0x00 for directories (instructs the client to recursively create the directory specified in Entity path data).
0x01 for files (instructs the client to reconstruct the file specified in Entity path data)
0x02 for symbolic links (instructs the client to create a symbolic link whose target is stored, uncompressed, as the entity's File Data Block). When the type is 0x02 the Compression Type MUST be 0x00 as well.

Compression type, 1 byte.

0x00 for no compression; the data contained in File Data Block should be written as-is to the file. Also used for directories, symbolic links and zero-sized files.
0x01 for deflate (Gzip) compression; the data contained in File Data Block must be deflated using Gzip before written to the file.
0x02 for Bzip2 compression; the data contained in File Data Block must be uncompressed using BZip2 before written to the file. This is generally discouraged, as both the archiving and unarchiving scripts must be ran in a PHP environment which supports the bzip2 library.

Compressed size, 4 bytes

An unsigned long integer representing the size of the File Data Block in bytes. For directories, symlinks and zero-sized files it is zero (0x00000000).

Uncompressed size, 4 bytes

An unsigned long integer representing the size of the resulting file in bytes. For directories, symlinks and zero-sized files it is zero (0x00000000).

Entity permissions, 4 bytes

UNIX-style permissions of the stored entity.

Extra fields data, variable length

The extra fields for each file are stored here. The total length of extra fields is included in the Block Length above

Each Extra Fields consists of:

Extra Field Identifier, 2 bytes: A signature denoting the data stored in the extra field
Extra Field Length, 2 bytes: The length (in bytes) of the Extra Field Data
Extra Field Data, variable length: The internal structure varies by the type of the Extra Field, as noted in the Extra Field Identifier

Timestamp Extra Field

Its purpose is to store the date and time the file was modified. This extra field should be ignored for directories and symlinks, or - if present - the Timestamp should be set to 0x00000000. Its format is:

Extra Field Identifier, 2 bytes: The bytes 0x00 0x01
Extra Field Length, 2 bytes: The value 0x08 stored in little-endian format
Timestamp, 4 bytes: A 4-byte UNIX timestamp of the file's modification time, as returned by filemtime().

Long Long File Sizes Extra Field

Added in Version 1.3.

This field stores the file sizes (compressed and uncompressed) as 64-bit unsigned long long integers. Its format is:

Extra Field Identifier, 2 bytes: The bytes 0x00 0x02
Extra Field Length, 2 bytes: The value 0x14 stored in little-endian format
Compressed size, 8 bytes: An unsigned long long (64-bit) integer in little endian format representing the size of the File Data Block in bytes. For directories, symlinks and zero-sized files it is zero (0x00000000).
Uncompressed size, 8 bytes: An unsigned long long (64-bit) integer in little endian format representing the size of the resulting file in bytes. For directories, symlinks and zero-sized files it is zero (0x00000000).

File Date Block

The File Data Block is only present if the Entity is a file with a non-zero file size. It can consist of one and only one of the following, depending on the Compression Type:

Binary dump of file contents or textual representation of the symlink's target, for CT=0x00
Gzip compression output, without the trailing Adler32 checksum, for CT=0x01
Bzip2 compression output, for CT=0x02

Change Log

Revision History
	June 2009	NKD,
Updated to format version 1.1, fixed incorrect descriptions of header signatures
	May 2023	NKD,
Updated to format version 1.3

Prev	Up	Next
PartIII.Appendices	Home	AppendixB.The JPS archive format, v.2.0