The JPA format strives to be a compressed archive format designed specifically for efficiency of creation by a PHP script. It is similar in design to the PKZIP format, with a few notable differences:
CRC32 is not used; calculation of file checksums is time consuming and can lead to errors when attempted on large files.
Only allowed compression methods are store and deflate.
There is no Central Directory (simplifies management of the file).
File permissions (UNIX style) are stored within the file.
Even though JPA is designed for use by PHP scripts, creating a command-line utility, a programming library or even a GUI program in any other language is still possible. JPA is not supposed to have high compression rations, or be secure and error-tolerant as other archive formats. It merely an attempt to provide the best compromise for creating archives of very large directory trees using nothing but PHP code to do it.
This is an open format. You may use it in any commercial or non-commercial application royalty-free. Even though the PHP implementation is GPL-licensed, we can provide it under commercial-friendly licenses, e.g. LGPL v3. Please ask us if you want to use it on your own software.
Tools reading / extracting version 1.2 archives will continue working the same as long as they are programed to ignore the extra fields they do not understand. They will not be able to extract archives with file sizes over 4GiB (4294967296 bytes), or correctly report the size of archives with a total compressed and/or uncompressed size over 4GiB (4294967296 bytes), something which was already the case.
Tools supporting version 1.3 archives MUST also support version 1.2 archives. They MAY support version 1.1 archives. They MUST NOT support version 1.0 archives (it was an internal revision).
Tools supporting version 1.3 MUST overwrite their read values for the fields available in both unsigned long (32-bit) and unsigned long long (64-bit) format with the values conveyed in the latter format, which were introduced in version 1.3.
An archive consists of exactly one Standard Header and one or more Entity Blocks . Each Entity Block consists of exactly one Entity Description Block and at most one File Data Block . All values are stored in little-endian byte order, unless otherwise specified.
All textual data, e.g. file names and symlink targets, must be written as little-endian UTF-8, non null terminated strings, for the widest compatibility possible.
The function of the Standard Header is to allow identification of the archive format and supply the client with general information regarding the archive at hand. It is a binary block appearing at the beginning of the archive file and there alone. It consists of the following data (in order of appearance):
The bytes 0x4A 0x50 0x41 (uppercase ASCII string “JPA”) used for identification purposes.
Unsigned short integer represented as two bytes, holding the size of the header in bytes. This is now fixed to 19 bytes, but this variable is here to allow for forward compatibility. When extra header fields are present, this value will be 19 + the length of all extra fields.
Unsigned integer represented as single byte, holding the archive format major version, e.g. 0X01 for version 1.2.
Unsigned integer represented as single byte, holding the archive format minor version, e.g. 0X02 for version 1.2.
Unsigned long integer represented as four bytes, holding the number of files present in the archive.
Unsigned long integer represented as four bytes, holding the total size of the archive's files when uncompressed.
Unsigned long integer represented as four bytes, holding the total size of the archive's files in their stored (compressed) form
This is an optional field, written after the Standard Header but before the first Entity Block, denoting that the current archive spans multiple files. Its structure is:
The bytes 0x4A, 0x50, 0x01, 0x01
The length of the extra field, without counting the signature length. It's value is fixed and equals the decimal number 4.
The total number of parts this archive consists of.
When creating spanned archives, the first file (part) of the archive set has an extension of .j01, the next part has an extension of .j02 and so on. The last file of the archive set has the extension .jpa.
When creating spanned archives you must ensure that the Entity Description Block is within the limits of a single part, i.e. the contents of the Entity Description Block must not cross part boundaries. The File Data Block data can cross one or multiple part blocks.
Added in version 1.3.
This is an optional field, written after the Standard Header but before the first Entity Block, adding support for 64-bit values for the total uncompressed and compressed size of the files in the archive. Its structure is:
The bytes 0x4A, 0x50, 0x01, 0x02
The length of the extra field, without counting the signature length. It's value is fixed and equals the decimal number 18.
Unsigned long long integer (64-bit) represented as eight bytes, little endian order, holding the total size of the archive's files when uncompressed.
Unsigned long long integer (64-bit) represented as eight bytes, little endian order, holding the total size of the archive's files in their stored (compressed) form.
An Entity Block is merely the aggregation of an Entity Description Block and at most one File Data Block. An Entity can be at present either a File or a Directory. If the entity is a File of zero length or if it is a Directory the File Data Block is omitted. In any other case, the File Data Block must exist.
The function of the Entity Description Block is to provide the client information about an Entity included in the archive. The client can then use this information in order to reconstruct a copy of the Entity on the client's file system. It is a binary block consisting of the following data (in order of appearance):
The bytes 0x4A, 0x50, 0x46 (uppercase ASCII string “JPF”) used for identification purposes.
Unsigned short integer, represented as 2 bytes, holding the total size of this Entity Description Block.
Unsigned short integer, represented as 2 bytes, holding the size of the entity path data below.
Holds the complete (relative) path of the Entity as a UTF16 encoded string, without trailing null. The path separator must be a forward slash (“/”), even on systems which use a different path separator, e.g. Windows.
0x00 for directories (instructs the client to recursively create the directory specified in Entity path data).
0x01 for files (instructs the client to reconstruct the file specified in Entity path data)
0x02 for symbolic links (instructs the client to create a symbolic link whose target is stored, uncompressed, as the entity's File Data Block). When the type is 0x02 the Compression Type MUST be 0x00 as well.
0x00 for no compression; the data contained in File Data Block should be written as-is to the file. Also used for directories, symbolic links and zero-sized files.
0x01 for deflate (Gzip) compression; the data contained in File Data Block must be deflated using Gzip before written to the file.
0x02 for Bzip2 compression; the data contained in File Data Block must be uncompressed using BZip2 before written to the file. This is generally discouraged, as both the archiving and unarchiving scripts must be ran in a PHP environment which supports the bzip2 library.
An unsigned long integer representing the size of the File Data Block in bytes. For directories, symlinks and zero-sized files it is zero (0x00000000).
An unsigned long integer representing the size of the resulting file in bytes. For directories, symlinks and zero-sized files it is zero (0x00000000).
UNIX-style permissions of the stored entity.
The extra fields for each file are stored here. The total length of extra fields is included in the Block Length above
Each Extra Fields consists of:
A signature denoting the data stored in the extra field
The length (in bytes) of the Extra Field Data
The internal structure varies by the type of the Extra Field, as noted in the Extra Field Identifier
Its purpose is to store the date and time the file was modified. This extra field should be ignored for directories and symlinks, or - if present - the Timestamp should be set to 0x00000000. Its format is:
The bytes 0x00 0x01
The value 0x08 stored in little-endian format
A 4-byte UNIX timestamp of the file's modification time, as returned by filemtime().
Added in Version 1.3.
This field stores the file sizes (compressed and uncompressed) as 64-bit unsigned long long integers. Its format is:
The bytes 0x00 0x02
The value 0x14 stored in little-endian format
An unsigned long long (64-bit) integer in little endian format representing the size of the File Data Block in bytes. For directories, symlinks and zero-sized files it is zero (0x00000000).
An unsigned long long (64-bit) integer in little endian format representing the size of the resulting file in bytes. For directories, symlinks and zero-sized files it is zero (0x00000000).
The File Data Block is only present if the Entity is a file with a non-zero file size. It can consist of one and only one of the following, depending on the Compression Type:
Binary dump of file contents or textual representation of the symlink's target, for CT=0x00
Gzip compression output, without the trailing Adler32 checksum, for CT=0x01
Bzip2 compression output, for CT=0x02
Revision History | ||
---|---|---|
June 2009 | NKD, | |
Updated to format version 1.1, fixed incorrect descriptions of header signatures | ||
May 2023 | NKD, | |
Updated to format version 1.3 |