| File: APPNOTE.TXT - .ZIP File Format Specification | |
| Version: 6.3.3 | |
| Status: Final - replaces version 6.3.2 | |
| Revised: September 1, 2012 | |
| Copyright (c) 1989 - 2012 PKWARE Inc., All Rights Reserved. | |
| 1.0 Introduction | |
| --------------- | |
| 1.1 Purpose | |
| ----------- | |
| 1.1.1 This specification is intended to define a cross-platform, | |
| interoperable file storage and transfer format. Since its | |
| first publication in 1989, PKWARE, Inc. ("PKWARE") has remained | |
| committed to ensuring the interoperability of the .ZIP file | |
| format through periodic publication and maintenance of this | |
| specification. We trust that all .ZIP compatible vendors and | |
| application developers that use and benefit from this format | |
| will share and support this commitment to interoperability. | |
| 1.2 Scope | |
| --------- | |
| 1.2.1 ZIP is one of the most widely used compressed file formats. It is | |
| universally used to aggregate, compress, and encrypt files into a single | |
| interoperable container. No specific use or application need is | |
| defined by this format and no specific implementation guidance is | |
| provided. This document provides details on the storage format for | |
| creating ZIP files. Information is provided on the records and | |
| fields that describe what a ZIP file is. | |
| 1.3 Trademarks | |
| -------------- | |
| 1.3.1 PKWARE, PKZIP, SecureZIP, and PKSFX are registered trademarks of | |
| PKWARE, Inc. in the United States and elsewhere. PKPatchMaker, | |
| Deflate64, and ZIP64 are trademarks of PKWARE, Inc. Other marks | |
| referenced within this document appear for identification | |
| purposes only and are the property of their respective owners. | |
| 1.4 Permitted Use | |
| ----------------- | |
| 1.4.1 This document, "APPNOTE.TXT - .ZIP File Format Specification" is the | |
| exclusive property of PKWARE. Use of the information contained in this | |
| document is permitted solely for the purpose of creating products, | |
| programs and processes that read and write files in the ZIP format | |
| subject to the terms and conditions herein. | |
| 1.4.2 Use of the content of this document within other publications is | |
| permitted only through reference to this document. Any reproduction | |
| or distribution of this document in whole or in part without prior | |
| written permission from PKWARE is strictly prohibited. | |
| 1.4.3 Certain technological components provided in this document are the | |
| patented proprietary technology of PKWARE and as such require a | |
| separate, executed license agreement from PKWARE. Applicable | |
| components are marked with the following, or similar, statement: | |
| 'Refer to the section in this document entitled "Incorporating | |
| PKWARE Proprietary Technology into Your Product" for more information'. | |
| 1.5 Contacting PKWARE | |
| --------------------- | |
| 1.5.1 If you have questions on this format, its use, or licensing, or if you | |
| wish to report defects, request changes or additions, please contact: | |
| PKWARE, Inc. | |
| 648 N. Plankinton Avenue, Suite 220 | |
| Milwaukee, WI 53203 | |
| +1-414-289-9788 | |
| +1-414-289-9789 FAX | |
| zipformat@pkware.com | |
| 1.5.2 Information about this format and copies of this document are publicly | |
| available at: | |
| http://www.pkware.com/appnote | |
| 1.6 Disclaimer | |
| -------------- | |
| 1.6.1 Although PKWARE will attempt to supply current and accurate | |
| information relating to its file formats, algorithms, and the | |
| subject programs, the possibility of error or omission cannot | |
| be eliminated. PKWARE therefore expressly disclaims any warranty | |
| that the information contained in the associated materials relating | |
| to the subject programs and/or the format of the files created or | |
| accessed by the subject programs and/or the algorithms used by | |
| the subject programs, or any other matter, is current, correct or | |
| accurate as delivered. Any risk of damage due to any possible | |
| inaccurate information is assumed by the user of the information. | |
| Furthermore, the information relating to the subject programs | |
| and/or the file formats created or accessed by the subject | |
| programs and/or the algorithms used by the subject programs is | |
| subject to change without notice. | |
| 2.0 Revisions | |
| -------------- | |
| 2.1 Document Status | |
| -------------------- | |
| 2.1.1 If the STATUS of this file is marked as DRAFT, the content | |
| defines proposed revisions to this specification which may consist | |
| of changes to the ZIP format itself, or that may consist of other | |
| content changes to this document. Versions of this document and | |
| the format in DRAFT form may be subject to modification prior to | |
| publication STATUS of FINAL. DRAFT versions are published periodically | |
| to provide notification to the ZIP community of pending changes and to | |
| provide opportunity for review and comment. | |
| 2.1.2 Versions of this document having a STATUS of FINAL are | |
| considered to be in the final form for that version of the document | |
| and are not subject to further change until a new, higher version | |
| numbered document is published. Newer versions of this format | |
| specification are intended to remain interoperable with with all prior | |
| versions whenever technically possible. | |
| 2.2 Change Log | |
| -------------- | |
| Version Change Description Date | |
| ------- ------------------ ---------- | |
| 5.2 -Single Password Symmetric Encryption 06/02/2003 | |
| storage | |
| 6.1.0 -Smartcard compatibility 01/20/2004 | |
| -Documentation on certificate storage | |
| 6.2.0 -Introduction of Central Directory 04/26/2004 | |
| Encryption for encrypting metadata | |
| -Added OS X to Version Made By values | |
| 6.2.1 -Added Extra Field placeholder for 04/01/2005 | |
| POSZIP using ID 0x4690 | |
| -Clarified size field on | |
| "zip64 end of central directory record" | |
| 6.2.2 -Documented Final Feature Specification 01/06/2006 | |
| for Strong Encryption | |
| -Clarifications and typographical | |
| corrections | |
| 6.3.0 -Added tape positioning storage 09/29/2006 | |
| parameters | |
| -Expanded list of supported hash algorithms | |
| -Expanded list of supported compression | |
| algorithms | |
| -Expanded list of supported encryption | |
| algorithms | |
| -Added option for Unicode filename | |
| storage | |
| -Clarifications for consistent use | |
| of Data Descriptor records | |
| -Added additional "Extra Field" | |
| definitions | |
| 6.3.1 -Corrected standard hash values for 04/11/2007 | |
| SHA-256/384/512 | |
| 6.3.2 -Added compression method 97 09/28/2007 | |
| -Documented InfoZIP "Extra Field" | |
| values for UTF-8 file name and | |
| file comment storage | |
| 6.3.3 -Formatting changes to support 09/01/2012 | |
| easier referencing of this APPNOTE | |
| from other documents and standards | |
| 3.0 Notations | |
| ------------- | |
| 3.1 Use of the term MUST or SHALL indicates a required element. | |
| 3.2 MAY NOT or SHALL NOT indicates an element is prohibited from use. | |
| 3.3 SHOULD indicates a RECOMMENDED element. | |
| 3.4 SHOULD NOT indicates an element NOT RECOMMENDED for use. | |
| 3.5 MAY indicates an OPTIONAL element. | |
| 4.0 ZIP Files | |
| ------------- | |
| 4.1 What is a ZIP file | |
| ---------------------- | |
| 4.1.1 ZIP files MAY be identified by the standard .ZIP file extension | |
| although use of a file extension is not required. Use of the | |
| extension .ZIPX is also recognized and MAY be used for ZIP files. | |
| Other common file extensions using the ZIP format include .JAR, .WAR, | |
| .DOCX, .XLXS, .PPTX, .ODT, .ODS, .ODP and others. Programs reading or | |
| writing ZIP files SHOULD rely on internal record signatures described | |
| in this document to identify files in this format. | |
| 4.1.2 ZIP files SHOULD contain at least one file and MAY contain | |
| multiple files. | |
| 4.1.3 Data compression MAY be used to reduce the size of files | |
| placed into a ZIP file, but is not required. This format supports the | |
| use of multiple data compression algorithms. When compression is used, | |
| one of the documented compression algorithms MUST be used. Implementors | |
| are advised to experiment with their data to determine which of the | |
| available algorithms provides the best compression for their needs. | |
| Compression method 8 (Deflate) is the method used by default by most | |
| ZIP compatible application programs. | |
| 4.1.4 Data encryption MAY be used to protect files within a ZIP file. | |
| Keying methods supported for encryption within this format include | |
| passwords and public/private keys. Either MAY be used individually | |
| or in combination. Encryption MAY be applied to individual files. | |
| Additional security MAY be used through the encryption of ZIP file | |
| metadata stored within the Central Directory. See the section on the | |
| Strong Encryption Specification for information. Refer to the section | |
| in this document entitled "Incorporating PKWARE Proprietary Technology | |
| into Your Product" for more information. | |
| 4.1.5 Data integrity MUST be provided for each file using CRC32. | |
| 4.1.6 Additional data integrity MAY be included through the use of | |
| digital signatures. Individual files MAY be signed with one or more | |
| digital signatures. The Central Directory, if signed, MUST use a | |
| single signature. | |
| 4.1.7 Files MAY be placed within a ZIP file uncompressed or stored. | |
| The term "stored" as used in the context of this document means the file | |
| is copied into the ZIP file uncompressed. | |
| 4.1.8 Each data file placed into a ZIP file MAY be compressed, stored, | |
| encrypted or digitally signed independent of how other data files in the | |
| same ZIP file are archived. | |
| 4.1.9 ZIP files MAY be streamed, split into segments (on fixed or on | |
| removable media) or "self-extracting". Self-extracting ZIP | |
| files MUST include extraction code for a target platform within | |
| the ZIP file. | |
| 4.1.10 Extensibility is provided for platform or application specific | |
| needs through extra data fields that MAY be defined for custom | |
| purposes. Extra data definitions MUST NOT conflict with existing | |
| documented record definitions. | |
| 4.1.11 Common uses for ZIP MAY also include the use of manifest files. | |
| Manifest files store application specific information within a file stored | |
| within the ZIP file. This manifest file SHOULD be the first file in the | |
| ZIP file. This specification does not provide any information or guidance on | |
| the use of manifest files within ZIP files. Refer to the application developer | |
| for information on using manifest files and for any additional profile | |
| information on using ZIP within an application. | |
| 4.1.12 ZIP files MAY be placed within other ZIP files. | |
| 4.2 ZIP Metadata | |
| ---------------- | |
| 4.2.1 ZIP files are identified by metadata consisting of defined record types | |
| containing the storage information necessary for maintaining the files | |
| placed into a ZIP file. Each record type MUST be identified using a header | |
| signature that identifies the record type. Signature values begin with the | |
| two byte constant marker of 0x4b50, representing the characters "PK". | |
| 4.3 General Format of a .ZIP file | |
| --------------------------------- | |
| 4.3.1 A ZIP file MUST contain an "end of central directory record". A ZIP | |
| file containing only an "end of central directory record" is considered an | |
| empty ZIP file. Files may be added or replaced within a ZIP file, or deleted. | |
| A ZIP file MUST have only one "end of central directory record". Other | |
| records defined in this specification MAY be used as needed to support | |
| storage requirements for individual ZIP files. | |
| 4.3.2 Each file placed into a ZIP file MUST be preceeded by a "local | |
| file header" record for that file. Each "local file header" MUST be | |
| accompanied by a corresponding "central directory header" record within | |
| the central directory section of the ZIP file. | |
| 4.3.3 Files MAY be stored in arbitrary order within a ZIP file. A ZIP | |
| file MAY span multiple volumes or it MAY be split into user-defined | |
| segment sizes. All values MUST be stored in little-endian byte order unless | |
| otherwise specified in this document for a specific data element. | |
| 4.3.4 Compression MUST NOT be applied to a "local file header", an "encryption | |
| header", or an "end of central directory record". Individual "central | |
| directory records" must not be compressed, but the aggregate of all central | |
| directory records MAY be compressed. | |
| 4.3.5 File data MAY be followed by a "data descriptor" for the file. Data | |
| descriptors are used to facilitate ZIP file streaming. | |
| 4.3.6 Overall .ZIP file format: | |
| [local file header 1] | |
| [encryption header 1] | |
| [file data 1] | |
| [data descriptor 1] | |
| . | |
| . | |
| . | |
| [local file header n] | |
| [encryption header n] | |
| [file data n] | |
| [data descriptor n] | |
| [archive decryption header] | |
| [archive extra data record] | |
| [central directory header 1] | |
| . | |
| . | |
| . | |
| [central directory header n] | |
| [zip64 end of central directory record] | |
| [zip64 end of central directory locator] | |
| [end of central directory record] | |
| 4.3.7 Local file header: | |
| local file header signature 4 bytes (0x04034b50) | |
| version needed to extract 2 bytes | |
| general purpose bit flag 2 bytes | |
| compression method 2 bytes | |
| last mod file time 2 bytes | |
| last mod file date 2 bytes | |
| crc-32 4 bytes | |
| compressed size 4 bytes | |
| uncompressed size 4 bytes | |
| file name length 2 bytes | |
| extra field length 2 bytes | |
| file name (variable size) | |
| extra field (variable size) | |
| 4.3.8 File data | |
| Immediately following the local header for a file | |
| SHOULD be placed the compressed or stored data for the file. | |
| If the file is encrypted, the encryption header for the file | |
| SHOULD be placed after the local header and before the file | |
| data. The series of [local file header][encryption header] | |
| [file data][data descriptor] repeats for each file in the | |
| .ZIP archive. | |
| Zero-byte files, directories, and other file types that | |
| contain no content MUST not include file data. | |
| 4.3.9 Data descriptor: | |
| crc-32 4 bytes | |
| compressed size 4 bytes | |
| uncompressed size 4 bytes | |
| 4.3.9.1 This descriptor MUST exist if bit 3 of the general | |
| purpose bit flag is set (see below). It is byte aligned | |
| and immediately follows the last byte of compressed data. | |
| This descriptor SHOULD be used only when it was not possible to | |
| seek in the output .ZIP file, e.g., when the output .ZIP file | |
| was standard output or a non-seekable device. For ZIP64(tm) format | |
| archives, the compressed and uncompressed sizes are 8 bytes each. | |
| 4.3.9.2 When compressing files, compressed and uncompressed sizes | |
| should be stored in ZIP64 format (as 8 byte values) when a | |
| file's size exceeds 0xFFFFFFFF. However ZIP64 format may be | |
| used regardless of the size of a file. When extracting, if | |
| the zip64 extended information extra field is present for | |
| the file the compressed and uncompressed sizes will be 8 | |
| byte values. | |
| 4.3.9.3 Although not originally assigned a signature, the value | |
| 0x08074b50 has commonly been adopted as a signature value | |
| for the data descriptor record. Implementers should be | |
| aware that ZIP files may be encountered with or without this | |
| signature marking data descriptors and SHOULD account for | |
| either case when reading ZIP files to ensure compatibility. | |
| 4.3.9.4 When writing ZIP files, implementors SHOULD include the | |
| signature value marking the data descriptor record. When | |
| the signature is used, the fields currently defined for | |
| the data descriptor record will immediately follow the | |
| signature. | |
| 4.3.9.5 An extensible data descriptor will be released in a | |
| future version of this APPNOTE. This new record is intended to | |
| resolve conflicts with the use of this record going forward, | |
| and to provide better support for streamed file processing. | |
| 4.3.9.6 When the Central Directory Encryption method is used, | |
| the data descriptor record is not required, but MAY be used. | |
| If present, and bit 3 of the general purpose bit field is set to | |
| indicate its presence, the values in fields of the data descriptor | |
| record MUST be set to binary zeros. See the section on the Strong | |
| Encryption Specification for information. Refer to the section in | |
| this document entitled "Incorporating PKWARE Proprietary Technology | |
| into Your Product" for more information. | |
| 4.3.10 Archive decryption header: | |
| 4.3.10.1 The Archive Decryption Header is introduced in version 6.2 | |
| of the ZIP format specification. This record exists in support | |
| of the Central Directory Encryption Feature implemented as part of | |
| the Strong Encryption Specification as described in this document. | |
| When the Central Directory Structure is encrypted, this decryption | |
| header MUST precede the encrypted data segment. | |
| 4.3.10.2 The encrypted data segment SHALL consist of the Archive | |
| extra data record (if present) and the encrypted Central Directory | |
| Structure data. The format of this data record is identical to the | |
| Decryption header record preceding compressed file data. If the | |
| central directory structure is encrypted, the location of the start of | |
| this data record is determined using the Start of Central Directory | |
| field in the Zip64 End of Central Directory record. See the | |
| section on the Strong Encryption Specification for information | |
| on the fields used in the Archive Decryption Header record. | |
| Refer to the section in this document entitled "Incorporating | |
| PKWARE Proprietary Technology into Your Product" for more information. | |
| 4.3.11 Archive extra data record: | |
| archive extra data signature 4 bytes (0x08064b50) | |
| extra field length 4 bytes | |
| extra field data (variable size) | |
| 4.3.11.1 The Archive Extra Data Record is introduced in version 6.2 | |
| of the ZIP format specification. This record MAY be used in support | |
| of the Central Directory Encryption Feature implemented as part of | |
| the Strong Encryption Specification as described in this document. | |
| When present, this record MUST immediately precede the central | |
| directory data structure. | |
| 4.3.11.2 The size of this data record SHALL be included in the | |
| Size of the Central Directory field in the End of Central | |
| Directory record. If the central directory structure is compressed, | |
| but not encrypted, the location of the start of this data record is | |
| determined using the Start of Central Directory field in the Zip64 | |
| End of Central Directory record. Refer to the section in this document | |
| entitled "Incorporating PKWARE Proprietary Technology into Your | |
| Product" for more information. | |
| 4.3.12 Central directory structure: | |
| [central directory header 1] | |
| . | |
| . | |
| . | |
| [central directory header n] | |
| [digital signature] | |
| File header: | |
| central file header signature 4 bytes (0x02014b50) | |
| version made by 2 bytes | |
| version needed to extract 2 bytes | |
| general purpose bit flag 2 bytes | |
| compression method 2 bytes | |
| last mod file time 2 bytes | |
| last mod file date 2 bytes | |
| crc-32 4 bytes | |
| compressed size 4 bytes | |
| uncompressed size 4 bytes | |
| file name length 2 bytes | |
| extra field length 2 bytes | |
| file comment length 2 bytes | |
| disk number start 2 bytes | |
| internal file attributes 2 bytes | |
| external file attributes 4 bytes | |
| relative offset of local header 4 bytes | |
| file name (variable size) | |
| extra field (variable size) | |
| file comment (variable size) | |
| 4.3.13 Digital signature: | |
| header signature 4 bytes (0x05054b50) | |
| size of data 2 bytes | |
| signature data (variable size) | |
| With the introduction of the Central Directory Encryption | |
| feature in version 6.2 of this specification, the Central | |
| Directory Structure MAY be stored both compressed and encrypted. | |
| Although not required, it is assumed when encrypting the | |
| Central Directory Structure, that it will be compressed | |
| for greater storage efficiency. Information on the | |
| Central Directory Encryption feature can be found in the section | |
| describing the Strong Encryption Specification. The Digital | |
| Signature record will be neither compressed nor encrypted. | |
| 4.3.14 Zip64 end of central directory record | |
| zip64 end of central dir | |
| signature 4 bytes (0x06064b50) | |
| size of zip64 end of central | |
| directory record 8 bytes | |
| version made by 2 bytes | |
| version needed to extract 2 bytes | |
| number of this disk 4 bytes | |
| number of the disk with the | |
| start of the central directory 4 bytes | |
| total number of entries in the | |
| central directory on this disk 8 bytes | |
| total number of entries in the | |
| central directory 8 bytes | |
| size of the central directory 8 bytes | |
| offset of start of central | |
| directory with respect to | |
| the starting disk number 8 bytes | |
| zip64 extensible data sector (variable size) | |
| 4.3.14.1 The value stored into the "size of zip64 end of central | |
| directory record" should be the size of the remaining | |
| record and should not include the leading 12 bytes. | |
| Size = SizeOfFixedFields + SizeOfVariableData - 12. | |
| 4.3.14.2 The above record structure defines Version 1 of the | |
| zip64 end of central directory record. Version 1 was | |
| implemented in versions of this specification preceding | |
| 6.2 in support of the ZIP64 large file feature. The | |
| introduction of the Central Directory Encryption feature | |
| implemented in version 6.2 as part of the Strong Encryption | |
| Specification defines Version 2 of this record structure. | |
| Refer to the section describing the Strong Encryption | |
| Specification for details on the version 2 format for | |
| this record. Refer to the section in this document entitled | |
| "Incorporating PKWARE Proprietary Technology into Your Product" | |
| for more information applicable to use of Version 2 of this | |
| record. | |
| 4.3.14.3 Special purpose data MAY reside in the zip64 extensible | |
| data sector field following either a V1 or V2 version of this | |
| record. To ensure identification of this special purpose data | |
| it must include an identifying header block consisting of the | |
| following: | |
| Header ID - 2 bytes | |
| Data Size - 4 bytes | |
| The Header ID field indicates the type of data that is in the | |
| data block that follows. | |
| Data Size identifies the number of bytes that follow for this | |
| data block type. | |
| 4.3.14.4 Multiple special purpose data blocks MAY be present. | |
| Each MUST be preceded by a Header ID and Data Size field. Current | |
| mappings of Header ID values supported in this field are as | |
| defined in APPENDIX C. | |
| 4.3.15 Zip64 end of central directory locator | |
| zip64 end of central dir locator | |
| signature 4 bytes (0x07064b50) | |
| number of the disk with the | |
| start of the zip64 end of | |
| central directory 4 bytes | |
| relative offset of the zip64 | |
| end of central directory record 8 bytes | |
| total number of disks 4 bytes | |
| 4.3.16 End of central directory record: | |
| end of central dir signature 4 bytes (0x06054b50) | |
| number of this disk 2 bytes | |
| number of the disk with the | |
| start of the central directory 2 bytes | |
| total number of entries in the | |
| central directory on this disk 2 bytes | |
| total number of entries in | |
| the central directory 2 bytes | |
| size of the central directory 4 bytes | |
| offset of start of central | |
| directory with respect to | |
| the starting disk number 4 bytes | |
| .ZIP file comment length 2 bytes | |
| .ZIP file comment (variable size) | |
| 4.4 Explanation of fields | |
| -------------------------- | |
| 4.4.1 General notes on fields | |
| 4.4.1.1 All fields unless otherwise noted are unsigned and stored | |
| in Intel low-byte:high-byte, low-word:high-word order. | |
| 4.4.1.2 String fields are not null terminated, since the length | |
| is given explicitly. | |
| 4.4.1.3 The entries in the central directory may not necessarily | |
| be in the same order that files appear in the .ZIP file. | |
| 4.4.1.4 If one of the fields in the end of central directory | |
| record is too small to hold required data, the field should be | |
| set to -1 (0xFFFF or 0xFFFFFFFF) and the ZIP64 format record | |
| should be created. | |
| 4.4.1.5 The end of central directory record and the Zip64 end | |
| of central directory locator record MUST reside on the same | |
| disk when splitting or spanning an archive. | |
| 4.4.2 version made by (2 bytes) | |
| 4.4.2.1 The upper byte indicates the compatibility of the file | |
| attribute information. If the external file attributes | |
| are compatible with MS-DOS and can be read by PKZIP for | |
| DOS version 2.04g then this value will be zero. If these | |
| attributes are not compatible, then this value will | |
| identify the host system on which the attributes are | |
| compatible. Software can use this information to determine | |
| the line record format for text files etc. | |
| 4.4.2.2 The current mappings are: | |
| 0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems) | |
| 1 - Amiga 2 - OpenVMS | |
| 3 - UNIX 4 - VM/CMS | |
| 5 - Atari ST 6 - OS/2 H.P.F.S. | |
| 7 - Macintosh 8 - Z-System | |
| 9 - CP/M 10 - Windows NTFS | |
| 11 - MVS (OS/390 - Z/OS) 12 - VSE | |
| 13 - Acorn Risc 14 - VFAT | |
| 15 - alternate MVS 16 - BeOS | |
| 17 - Tandem 18 - OS/400 | |
| 19 - OS X (Darwin) 20 thru 255 - unused | |
| 4.4.2.3 The lower byte indicates the ZIP specification version | |
| (the version of this document) supported by the software | |
| used to encode the file. The value/10 indicates the major | |
| version number, and the value mod 10 is the minor version | |
| number. | |
| 4.4.3 version needed to extract (2 bytes) | |
| 4.4.3.1 The minimum supported ZIP specification version needed | |
| to extract the file, mapped as above. This value is based on | |
| the specific format features a ZIP program MUST support to | |
| be able to extract the file. If multiple features are | |
| applied to a file, the minimum version MUST be set to the | |
| feature having the highest value. New features or feature | |
| changes affecting the published format specification will be | |
| implemented using higher version numbers than the last | |
| published value to avoid conflict. | |
| 4.4.3.2 Current minimum feature versions are as defined below: | |
| 1.0 - Default value | |
| 1.1 - File is a volume label | |
| 2.0 - File is a folder (directory) | |
| 2.0 - File is compressed using Deflate compression | |
| 2.0 - File is encrypted using traditional PKWARE encryption | |
| 2.1 - File is compressed using Deflate64(tm) | |
| 2.5 - File is compressed using PKWARE DCL Implode | |
| 2.7 - File is a patch data set | |
| 4.5 - File uses ZIP64 format extensions | |
| 4.6 - File is compressed using BZIP2 compression* | |
| 5.0 - File is encrypted using DES | |
| 5.0 - File is encrypted using 3DES | |
| 5.0 - File is encrypted using original RC2 encryption | |
| 5.0 - File is encrypted using RC4 encryption | |
| 5.1 - File is encrypted using AES encryption | |
| 5.1 - File is encrypted using corrected RC2 encryption** | |
| 5.2 - File is encrypted using corrected RC2-64 encryption** | |
| 6.1 - File is encrypted using non-OAEP key wrapping*** | |
| 6.2 - Central directory encryption | |
| 6.3 - File is compressed using LZMA | |
| 6.3 - File is compressed using PPMd+ | |
| 6.3 - File is encrypted using Blowfish | |
| 6.3 - File is encrypted using Twofish | |
| 4.4.3.3 Notes on version needed to extract | |
| * Early 7.x (pre-7.2) versions of PKZIP incorrectly set the | |
| version needed to extract for BZIP2 compression to be 50 | |
| when it should have been 46. | |
| ** Refer to the section on Strong Encryption Specification | |
| for additional information regarding RC2 corrections. | |
| *** Certificate encryption using non-OAEP key wrapping is the | |
| intended mode of operation for all versions beginning with 6.1. | |
| Support for OAEP key wrapping MUST only be used for | |
| backward compatibility when sending ZIP files to be opened by | |
| versions of PKZIP older than 6.1 (5.0 or 6.0). | |
| + Files compressed using PPMd MUST set the version | |
| needed to extract field to 6.3, however, not all ZIP | |
| programs enforce this and may be unable to decompress | |
| data files compressed using PPMd if this value is set. | |
| When using ZIP64 extensions, the corresponding value in the | |
| zip64 end of central directory record MUST also be set. | |
| This field should be set appropriately to indicate whether | |
| Version 1 or Version 2 format is in use. | |
| 4.4.4 general purpose bit flag: (2 bytes) | |
| Bit 0: If set, indicates that the file is encrypted. | |
| (For Method 6 - Imploding) | |
| Bit 1: If the compression method used was type 6, | |
| Imploding, then this bit, if set, indicates | |
| an 8K sliding dictionary was used. If clear, | |
| then a 4K sliding dictionary was used. | |
| Bit 2: If the compression method used was type 6, | |
| Imploding, then this bit, if set, indicates | |
| 3 Shannon-Fano trees were used to encode the | |
| sliding dictionary output. If clear, then 2 | |
| Shannon-Fano trees were used. | |
| (For Methods 8 and 9 - Deflating) | |
| Bit 2 Bit 1 | |
| 0 0 Normal (-en) compression option was used. | |
| 0 1 Maximum (-exx/-ex) compression option was used. | |
| 1 0 Fast (-ef) compression option was used. | |
| 1 1 Super Fast (-es) compression option was used. | |
| (For Method 14 - LZMA) | |
| Bit 1: If the compression method used was type 14, | |
| LZMA, then this bit, if set, indicates | |
| an end-of-stream (EOS) marker is used to | |
| mark the end of the compressed data stream. | |
| If clear, then an EOS marker is not present | |
| and the compressed data size must be known | |
| to extract. | |
| Note: Bits 1 and 2 are undefined if the compression | |
| method is any other. | |
| Bit 3: If this bit is set, the fields crc-32, compressed | |
| size and uncompressed size are set to zero in the | |
| local header. The correct values are put in the | |
| data descriptor immediately following the compressed | |
| data. (Note: PKZIP version 2.04g for DOS only | |
| recognizes this bit for method 8 compression, newer | |
| versions of PKZIP recognize this bit for any | |
| compression method.) | |
| Bit 4: Reserved for use with method 8, for enhanced | |
| deflating. | |
| Bit 5: If this bit is set, this indicates that the file is | |
| compressed patched data. (Note: Requires PKZIP | |
| version 2.70 or greater) | |
| Bit 6: Strong encryption. If this bit is set, you MUST | |
| set the version needed to extract value to at least | |
| 50 and you MUST also set bit 0. If AES encryption | |
| is used, the version needed to extract value MUST | |
| be at least 51. See the section describing the Strong | |
| Encryption Specification for details. Refer to the | |
| section in this document entitled "Incorporating PKWARE | |
| Proprietary Technology into Your Product" for more | |
| information. | |
| Bit 7: Currently unused. | |
| Bit 8: Currently unused. | |
| Bit 9: Currently unused. | |
| Bit 10: Currently unused. | |
| Bit 11: Language encoding flag (EFS). If this bit is set, | |
| the filename and comment fields for this file | |
| MUST be encoded using UTF-8. (see APPENDIX D) | |
| Bit 12: Reserved by PKWARE for enhanced compression. | |
| Bit 13: Set when encrypting the Central Directory to indicate | |
| selected data values in the Local Header are masked to | |
| hide their actual values. See the section describing | |
| the Strong Encryption Specification for details. Refer | |
| to the section in this document entitled "Incorporating | |
| PKWARE Proprietary Technology into Your Product" for | |
| more information. | |
| Bit 14: Reserved by PKWARE. | |
| Bit 15: Reserved by PKWARE. | |
| 4.4.5 compression method: (2 bytes) | |
| 0 - The file is stored (no compression) | |
| 1 - The file is Shrunk | |
| 2 - The file is Reduced with compression factor 1 | |
| 3 - The file is Reduced with compression factor 2 | |
| 4 - The file is Reduced with compression factor 3 | |
| 5 - The file is Reduced with compression factor 4 | |
| 6 - The file is Imploded | |
| 7 - Reserved for Tokenizing compression algorithm | |
| 8 - The file is Deflated | |
| 9 - Enhanced Deflating using Deflate64(tm) | |
| 10 - PKWARE Data Compression Library Imploding (old IBM TERSE) | |
| 11 - Reserved by PKWARE | |
| 12 - File is compressed using BZIP2 algorithm | |
| 13 - Reserved by PKWARE | |
| 14 - LZMA (EFS) | |
| 15 - Reserved by PKWARE | |
| 16 - Reserved by PKWARE | |
| 17 - Reserved by PKWARE | |
| 18 - File is compressed using IBM TERSE (new) | |
| 19 - IBM LZ77 z Architecture (PFS) | |
| 97 - WavPack compressed data | |
| 98 - PPMd version I, Rev 1 | |
| 4.4.6 date and time fields: (2 bytes each) | |
| The date and time are encoded in standard MS-DOS format. | |
| If input came from standard input, the date and time are | |
| those at which compression was started for this data. | |
| If encrypting the central directory and general purpose bit | |
| flag 13 is set indicating masking, the value stored in the | |
| Local Header will be zero. | |
| 4.4.7 CRC-32: (4 bytes) | |
| The CRC-32 algorithm was generously contributed by | |
| David Schwaderer and can be found in his excellent | |
| book "C Programmers Guide to NetBIOS" published by | |
| Howard W. Sams & Co. Inc. The 'magic number' for | |
| the CRC is 0xdebb20e3. The proper CRC pre and post | |
| conditioning is used, meaning that the CRC register | |
| is pre-conditioned with all ones (a starting value | |
| of 0xffffffff) and the value is post-conditioned by | |
| taking the one's complement of the CRC residual. | |
| If bit 3 of the general purpose flag is set, this | |
| field is set to zero in the local header and the correct | |
| value is put in the data descriptor and in the central | |
| directory. When encrypting the central directory, if the | |
| local header is not in ZIP64 format and general purpose | |
| bit flag 13 is set indicating masking, the value stored | |
| in the Local Header will be zero. | |
| 4.4.8 compressed size: (4 bytes) | |
| 4.4.9 uncompressed size: (4 bytes) | |
| The size of the file compressed (4.4.8) and uncompressed, | |
| (4.4.9) respectively. When a decryption header is present it | |
| will be placed in front of the file data and the value of the | |
| compressed file size will include the bytes of the decryption | |
| header. If bit 3 of the general purpose bit flag is set, | |
| these fields are set to zero in the local header and the | |
| correct values are put in the data descriptor and | |
| in the central directory. If an archive is in ZIP64 format | |
| and the value in this field is 0xFFFFFFFF, the size will be | |
| in the corresponding 8 byte ZIP64 extended information | |
| extra field. When encrypting the central directory, if the | |
| local header is not in ZIP64 format and general purpose bit | |
| flag 13 is set indicating masking, the value stored for the | |
| uncompressed size in the Local Header will be zero. | |
| 4.4.10 file name length: (2 bytes) | |
| 4.4.11 extra field length: (2 bytes) | |
| 4.4.12 file comment length: (2 bytes) | |
| The length of the file name, extra field, and comment | |
| fields respectively. The combined length of any | |
| directory record and these three fields should not | |
| generally exceed 65,535 bytes. If input came from standard | |
| input, the file name length is set to zero. | |
| 4.4.13 disk number start: (2 bytes) | |
| The number of the disk on which this file begins. If an | |
| archive is in ZIP64 format and the value in this field is | |
| 0xFFFF, the size will be in the corresponding 4 byte zip64 | |
| extended information extra field. | |
| 4.4.14 internal file attributes: (2 bytes) | |
| Bits 1 and 2 are reserved for use by PKWARE. | |
| 4.4.14.1 The lowest bit of this field indicates, if set, | |
| that the file is apparently an ASCII or text file. If not | |
| set, that the file apparently contains binary data. | |
| The remaining bits are unused in version 1.0. | |
| 4.4.14.2 The 0x0002 bit of this field indicates, if set, that | |
| a 4 byte variable record length control field precedes each | |
| logical record indicating the length of the record. The | |
| record length control field is stored in little-endian byte | |
| order. This flag is independent of text control characters, | |
| and if used in conjunction with text data, includes any | |
| control characters in the total length of the record. This | |
| value is provided for mainframe data transfer support. | |
| 4.4.15 external file attributes: (4 bytes) | |
| The mapping of the external attributes is | |
| host-system dependent (see 'version made by'). For | |
| MS-DOS, the low order byte is the MS-DOS directory | |
| attribute byte. If input came from standard input, this | |
| field is set to zero. | |
| 4.4.16 relative offset of local header: (4 bytes) | |
| This is the offset from the start of the first disk on | |
| which this file appears, to where the local header should | |
| be found. If an archive is in ZIP64 format and the value | |
| in this field is 0xFFFFFFFF, the size will be in the | |
| corresponding 8 byte zip64 extended information extra field. | |
| 4.4.17 file name: (Variable) | |
| 4.4.17.1 The name of the file, with optional relative path. | |
| The path stored MUST not contain a drive or | |
| device letter, or a leading slash. All slashes | |
| MUST be forward slashes '/' as opposed to | |
| backwards slashes '\' for compatibility with Amiga | |
| and UNIX file systems etc. If input came from standard | |
| input, there is no file name field. | |
| 4.4.17.2 If using the Central Directory Encryption Feature and | |
| general purpose bit flag 13 is set indicating masking, the file | |
| name stored in the Local Header will not be the actual file name. | |
| A masking value consisting of a unique hexadecimal value will | |
| be stored. This value will be sequentially incremented for each | |
| file in the archive. See the section on the Strong Encryption | |
| Specification for details on retrieving the encrypted file name. | |
| Refer to the section in this document entitled "Incorporating PKWARE | |
| Proprietary Technology into Your Product" for more information. | |
| 4.4.18 file comment: (Variable) | |
| The comment for this file. | |
| 4.4.19 number of this disk: (2 bytes) | |
| The number of this disk, which contains central | |
| directory end record. If an archive is in ZIP64 format | |
| and the value in this field is 0xFFFF, the size will | |
| be in the corresponding 4 byte zip64 end of central | |
| directory field. | |
| 4.4.20 number of the disk with the start of the central | |
| directory: (2 bytes) | |
| The number of the disk on which the central | |
| directory starts. If an archive is in ZIP64 format | |
| and the value in this field is 0xFFFF, the size will | |
| be in the corresponding 4 byte zip64 end of central | |
| directory field. | |
| 4.4.21 total number of entries in the central dir on | |
| this disk: (2 bytes) | |
| The number of central directory entries on this disk. | |
| If an archive is in ZIP64 format and the value in | |
| this field is 0xFFFF, the size will be in the | |
| corresponding 8 byte zip64 end of central | |
| directory field. | |
| 4.4.22 total number of entries in the central dir: (2 bytes) | |
| The total number of files in the .ZIP file. If an | |
| archive is in ZIP64 format and the value in this field | |
| is 0xFFFF, the size will be in the corresponding 8 byte | |
| zip64 end of central directory field. | |
| 4.4.23 size of the central directory: (4 bytes) | |
| The size (in bytes) of the entire central directory. | |
| If an archive is in ZIP64 format and the value in | |
| this field is 0xFFFFFFFF, the size will be in the | |
| corresponding 8 byte zip64 end of central | |
| directory field. | |
| 4.4.24 offset of start of central directory with respect to | |
| the starting disk number: (4 bytes) | |
| Offset of the start of the central directory on the | |
| disk on which the central directory starts. If an | |
| archive is in ZIP64 format and the value in this | |
| field is 0xFFFFFFFF, the size will be in the | |
| corresponding 8 byte zip64 end of central | |
| directory field. | |
| 4.4.25 .ZIP file comment length: (2 bytes) | |
| The length of the comment for this .ZIP file. | |
| 4.4.26 .ZIP file comment: (Variable) | |
| The comment for this .ZIP file. ZIP file comment data | |
| is stored unsecured. No encryption or data authentication | |
| is applied to this area at this time. Confidential information | |
| should not be stored in this section. | |
| 4.4.27 zip64 extensible data sector (variable size) | |
| (currently reserved for use by PKWARE) | |
| 4.4.28 extra field: (Variable) | |
| This SHOULD be used for storage expansion. If additional | |
| information needs to be stored within a ZIP file for special | |
| application or platform needs, it SHOULD be stored here. | |
| Programs supporting earlier versions of this specification can | |
| then safely skip the file, and find the next file or header. | |
| This field will be 0 length in version 1.0. | |
| Existing extra fields are defined in the section | |
| Extensible data fields that follows. | |
| 4.5 Extensible data fields | |
| -------------------------- | |
| 4.5.1 In order to allow different programs and different types | |
| of information to be stored in the 'extra' field in .ZIP | |
| files, the following structure MUST be used for all | |
| programs storing data in this field: | |
| header1+data1 + header2+data2 . . . | |
| Each header should consist of: | |
| Header ID - 2 bytes | |
| Data Size - 2 bytes | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| The Header ID field indicates the type of data that is in | |
| the following data block. | |
| Header IDs of 0 thru 31 are reserved for use by PKWARE. | |
| The remaining IDs can be used by third party vendors for | |
| proprietary usage. | |
| 4.5.2 The current Header ID mappings defined by PKWARE are: | |
| 0x0001 Zip64 extended information extra field | |
| 0x0007 AV Info | |
| 0x0008 Reserved for extended language encoding data (PFS) | |
| (see APPENDIX D) | |
| 0x0009 OS/2 | |
| 0x000a NTFS | |
| 0x000c OpenVMS | |
| 0x000d UNIX | |
| 0x000e Reserved for file stream and fork descriptors | |
| 0x000f Patch Descriptor | |
| 0x0014 PKCS#7 Store for X.509 Certificates | |
| 0x0015 X.509 Certificate ID and Signature for | |
| individual file | |
| 0x0016 X.509 Certificate ID for Central Directory | |
| 0x0017 Strong Encryption Header | |
| 0x0018 Record Management Controls | |
| 0x0019 PKCS#7 Encryption Recipient Certificate List | |
| 0x0065 IBM S/390 (Z390), AS/400 (I400) attributes | |
| - uncompressed | |
| 0x0066 Reserved for IBM S/390 (Z390), AS/400 (I400) | |
| attributes - compressed | |
| 0x4690 POSZIP 4690 (reserved) | |
| 4.5.3 -Zip64 Extended Information Extra Field (0x0001): | |
| The following is the layout of the zip64 extended | |
| information "extra" block. If one of the size or | |
| offset fields in the Local or Central directory | |
| record is too small to hold the required data, | |
| a Zip64 extended information record is created. | |
| The order of the fields in the zip64 extended | |
| information record is fixed, but the fields MUST | |
| only appear if the corresponding Local or Central | |
| directory record field is set to 0xFFFF or 0xFFFFFFFF. | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (ZIP64) 0x0001 2 bytes Tag for this "extra" block type | |
| Size 2 bytes Size of this "extra" block | |
| Original | |
| Size 8 bytes Original uncompressed file size | |
| Compressed | |
| Size 8 bytes Size of compressed data | |
| Relative Header | |
| Offset 8 bytes Offset of local header record | |
| Disk Start | |
| Number 4 bytes Number of the disk on which | |
| this file starts | |
| This entry in the Local header MUST include BOTH original | |
| and compressed file size fields. If encrypting the | |
| central directory and bit 13 of the general purpose bit | |
| flag is set indicating masking, the value stored in the | |
| Local Header for the original file size will be zero. | |
| 4.5.4 -OS/2 Extra Field (0x0009): | |
| The following is the layout of the OS/2 attributes "extra" | |
| block. (Last Revision 09/05/95) | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (OS/2) 0x0009 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size for the following data block | |
| BSize 4 bytes Uncompressed Block Size | |
| CType 2 bytes Compression type | |
| EACRC 4 bytes CRC value for uncompress block | |
| (var) variable Compressed block | |
| The OS/2 extended attribute structure (FEA2LIST) is | |
| compressed and then stored in its entirety within this | |
| structure. There will only ever be one "block" of data in | |
| VarFields[]. | |
| 4.5.5 -NTFS Extra Field (0x000a): | |
| The following is the layout of the NTFS attributes | |
| "extra" block. (Note: At this time the Mtime, Atime | |
| and Ctime values MAY be used on any WIN32 system.) | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (NTFS) 0x000a 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size of the total "extra" block | |
| Reserved 4 bytes Reserved for future use | |
| Tag1 2 bytes NTFS attribute tag value #1 | |
| Size1 2 bytes Size of attribute #1, in bytes | |
| (var) Size1 Attribute #1 data | |
| . | |
| . | |
| . | |
| TagN 2 bytes NTFS attribute tag value #N | |
| SizeN 2 bytes Size of attribute #N, in bytes | |
| (var) SizeN Attribute #N data | |
| For NTFS, values for Tag1 through TagN are as follows: | |
| (currently only one set of attributes is defined for NTFS) | |
| Tag Size Description | |
| ----- ---- ----------- | |
| 0x0001 2 bytes Tag for attribute #1 | |
| Size1 2 bytes Size of attribute #1, in bytes | |
| Mtime 8 bytes File last modification time | |
| Atime 8 bytes File last access time | |
| Ctime 8 bytes File creation time | |
| 4.5.6 -OpenVMS Extra Field (0x000c): | |
| The following is the layout of the OpenVMS attributes | |
| "extra" block. | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (VMS) 0x000c 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size of the total "extra" block | |
| CRC 4 bytes 32-bit CRC for remainder of the block | |
| Tag1 2 bytes OpenVMS attribute tag value #1 | |
| Size1 2 bytes Size of attribute #1, in bytes | |
| (var) Size1 Attribute #1 data | |
| . | |
| . | |
| . | |
| TagN 2 bytes OpenVMS attribute tag value #N | |
| SizeN 2 bytes Size of attribute #N, in bytes | |
| (var) SizeN Attribute #N data | |
| OpenVMS Extra Field Rules: | |
| 4.5.6.1. There will be one or more attributes present, which | |
| will each be preceded by the above TagX & SizeX values. | |
| These values are identical to the ATR$C_XXXX and ATR$S_XXXX | |
| constants which are defined in ATR.H under OpenVMS C. Neither | |
| of these values will ever be zero. | |
| 4.5.6.2. No word alignment or padding is performed. | |
| 4.5.6.3. A well-behaved PKZIP/OpenVMS program should never produce | |
| more than one sub-block with the same TagX value. Also, there will | |
| never be more than one "extra" block of type 0x000c in a particular | |
| directory record. | |
| 4.5.7 -UNIX Extra Field (0x000d): | |
| The following is the layout of the UNIX "extra" block. | |
| Note: all fields are stored in Intel low-byte/high-byte | |
| order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (UNIX) 0x000d 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size for the following data block | |
| Atime 4 bytes File last access time | |
| Mtime 4 bytes File last modification time | |
| Uid 2 bytes File user ID | |
| Gid 2 bytes File group ID | |
| (var) variable Variable length data field | |
| The variable length data field will contain file type | |
| specific data. Currently the only values allowed are | |
| the original "linked to" file names for hard or symbolic | |
| links, and the major and minor device node numbers for | |
| character and block device nodes. Since device nodes | |
| cannot be either symbolic or hard links, only one set of | |
| variable length data is stored. Link files will have the | |
| name of the original file stored. This name is NOT NULL | |
| terminated. Its size can be determined by checking TSize - | |
| 12. Device entries will have eight bytes stored as two 4 | |
| byte entries (in little endian format). The first entry | |
| will be the major device number, and the second the minor | |
| device number. | |
| 4.5.8 -PATCH Descriptor Extra Field (0x000f): | |
| 4.5.8.1 The following is the layout of the Patch Descriptor | |
| "extra" block. | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (Patch) 0x000f 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size of the total "extra" block | |
| Version 2 bytes Version of the descriptor | |
| Flags 4 bytes Actions and reactions (see below) | |
| OldSize 4 bytes Size of the file about to be patched | |
| OldCRC 4 bytes 32-bit CRC of the file to be patched | |
| NewSize 4 bytes Size of the resulting file | |
| NewCRC 4 bytes 32-bit CRC of the resulting file | |
| 4.5.8.2 Actions and reactions | |
| Bits Description | |
| ---- ---------------- | |
| 0 Use for auto detection | |
| 1 Treat as a self-patch | |
| 2-3 RESERVED | |
| 4-5 Action (see below) | |
| 6-7 RESERVED | |
| 8-9 Reaction (see below) to absent file | |
| 10-11 Reaction (see below) to newer file | |
| 12-13 Reaction (see below) to unknown file | |
| 14-15 RESERVED | |
| 16-31 RESERVED | |
| 4.5.8.2.1 Actions | |
| Action Value | |
| ------ ----- | |
| none 0 | |
| add 1 | |
| delete 2 | |
| patch 3 | |
| 4.5.8.2.2 Reactions | |
| Reaction Value | |
| -------- ----- | |
| ask 0 | |
| skip 1 | |
| ignore 2 | |
| fail 3 | |
| 4.5.8.3 Patch support is provided by PKPatchMaker(tm) technology | |
| and is covered under U.S. Patents and Patents Pending. The use or | |
| implementation in a product of certain technological aspects set | |
| forth in the current APPNOTE, including those with regard to | |
| strong encryption or patching requires a license from PKWARE. | |
| Refer to the section in this document entitled "Incorporating | |
| PKWARE Proprietary Technology into Your Product" for more | |
| information. | |
| 4.5.9 -PKCS#7 Store for X.509 Certificates (0x0014): | |
| This field MUST contain information about each of the certificates | |
| files may be signed with. When the Central Directory Encryption | |
| feature is enabled for a ZIP file, this record will appear in | |
| the Archive Extra Data Record, otherwise it will appear in the | |
| first central directory record and will be ignored in any | |
| other record. | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (Store) 0x0014 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size of the store data | |
| TData TSize Data about the store | |
| 4.5.10 -X.509 Certificate ID and Signature for individual file (0x0015): | |
| This field contains the information about which certificate in | |
| the PKCS#7 store was used to sign a particular file. It also | |
| contains the signature data. This field can appear multiple | |
| times, but can only appear once per certificate. | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (CID) 0x0015 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size of data that follows | |
| TData TSize Signature Data | |
| 4.5.11 -X.509 Certificate ID and Signature for central directory (0x0016): | |
| This field contains the information about which certificate in | |
| the PKCS#7 store was used to sign the central directory structure. | |
| When the Central Directory Encryption feature is enabled for a | |
| ZIP file, this record will appear in the Archive Extra Data Record, | |
| otherwise it will appear in the first central directory record. | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (CDID) 0x0016 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size of data that follows | |
| TData TSize Data | |
| 4.5.12 -Strong Encryption Header (0x0017): | |
| Value Size Description | |
| ----- ---- ----------- | |
| 0x0017 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size of data that follows | |
| Format 2 bytes Format definition for this record | |
| AlgID 2 bytes Encryption algorithm identifier | |
| Bitlen 2 bytes Bit length of encryption key | |
| Flags 2 bytes Processing flags | |
| CertData TSize-8 Certificate decryption extra field data | |
| (refer to the explanation for CertData | |
| in the section describing the | |
| Certificate Processing Method under | |
| the Strong Encryption Specification) | |
| See the section describing the Strong Encryption Specification | |
| for details. Refer to the section in this document entitled | |
| "Incorporating PKWARE Proprietary Technology into Your Product" | |
| for more information. | |
| 4.5.13 -Record Management Controls (0x0018): | |
| Value Size Description | |
| ----- ---- ----------- | |
| (Rec-CTL) 0x0018 2 bytes Tag for this "extra" block type | |
| CSize 2 bytes Size of total extra block data | |
| Tag1 2 bytes Record control attribute 1 | |
| Size1 2 bytes Size of attribute 1, in bytes | |
| Data1 Size1 Attribute 1 data | |
| . | |
| . | |
| . | |
| TagN 2 bytes Record control attribute N | |
| SizeN 2 bytes Size of attribute N, in bytes | |
| DataN SizeN Attribute N data | |
| 4.5.14 -PKCS#7 Encryption Recipient Certificate List (0x0019): | |
| This field MAY contain information about each of the certificates | |
| used in encryption processing and it can be used to identify who is | |
| allowed to decrypt encrypted files. This field should only appear | |
| in the archive extra data record. This field is not required and | |
| serves only to aid archive modifications by preserving public | |
| encryption key data. Individual security requirements may dictate | |
| that this data be omitted to deter information exposure. | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (CStore) 0x0019 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size of the store data | |
| TData TSize Data about the store | |
| TData: | |
| Value Size Description | |
| ----- ---- ----------- | |
| Version 2 bytes Format version number - must 0x0001 at this time | |
| CStore (var) PKCS#7 data blob | |
| See the section describing the Strong Encryption Specification | |
| for details. Refer to the section in this document entitled | |
| "Incorporating PKWARE Proprietary Technology into Your Product" | |
| for more information. | |
| 4.5.15 -MVS Extra Field (0x0065): | |
| The following is the layout of the MVS "extra" block. | |
| Note: Some fields are stored in Big Endian format. | |
| All text is in EBCDIC format unless otherwise specified. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (MVS) 0x0065 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size for the following data block | |
| ID 4 bytes EBCDIC "Z390" 0xE9F3F9F0 or | |
| "T4MV" for TargetFour | |
| (var) TSize-4 Attribute data (see APPENDIX B) | |
| 4.5.16 -OS/400 Extra Field (0x0065): | |
| The following is the layout of the OS/400 "extra" block. | |
| Note: Some fields are stored in Big Endian format. | |
| All text is in EBCDIC format unless otherwise specified. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (OS400) 0x0065 2 bytes Tag for this "extra" block type | |
| TSize 2 bytes Size for the following data block | |
| ID 4 bytes EBCDIC "I400" 0xC9F4F0F0 or | |
| "T4MV" for TargetFour | |
| (var) TSize-4 Attribute data (see APPENDIX A) | |
| 4.6 Third Party Mappings | |
| ------------------------ | |
| 4.6.1 Third party mappings commonly used are: | |
| 0x07c8 Macintosh | |
| 0x2605 ZipIt Macintosh | |
| 0x2705 ZipIt Macintosh 1.3.5+ | |
| 0x2805 ZipIt Macintosh 1.3.5+ | |
| 0x334d Info-ZIP Macintosh | |
| 0x4341 Acorn/SparkFS | |
| 0x4453 Windows NT security descriptor (binary ACL) | |
| 0x4704 VM/CMS | |
| 0x470f MVS | |
| 0x4b46 FWKCS MD5 (see below) | |
| 0x4c41 OS/2 access control list (text ACL) | |
| 0x4d49 Info-ZIP OpenVMS | |
| 0x4f4c Xceed original location extra field | |
| 0x5356 AOS/VS (ACL) | |
| 0x5455 extended timestamp | |
| 0x554e Xceed unicode extra field | |
| 0x5855 Info-ZIP UNIX (original, also OS/2, NT, etc) | |
| 0x6375 Info-ZIP Unicode Comment Extra Field | |
| 0x6542 BeOS/BeBox | |
| 0x7075 Info-ZIP Unicode Path Extra Field | |
| 0x756e ASi UNIX | |
| 0x7855 Info-ZIP UNIX (new) | |
| 0xa220 Microsoft Open Packaging Growth Hint | |
| 0xfd4a SMS/QDOS | |
| Detailed descriptions of Extra Fields defined by third | |
| party mappings will be documented as information on | |
| these data structures is made available to PKWARE. | |
| PKWARE does not guarantee the accuracy of any published | |
| third party data. | |
| 4.6.2 Third-party Extra Fields must include a Header ID using | |
| the format defined in the section of this document | |
| titled Extensible Data Fields (section 4.5). | |
| The Data Size field indicates the size of the following | |
| data block. Programs can use this value to skip to the | |
| next header block, passing over any data blocks that are | |
| not of interest. | |
| Note: As stated above, the size of the entire .ZIP file | |
| header, including the file name, comment, and extra | |
| field should not exceed 64K in size. | |
| 4.6.3 In case two different programs should appropriate the same | |
| Header ID value, it is strongly recommended that each | |
| program SHOULD place a unique signature of at least two bytes in | |
| size (and preferably 4 bytes or bigger) at the start of | |
| each data area. Every program SHOULD verify that its | |
| unique signature is present, in addition to the Header ID | |
| value being correct, before assuming that it is a block of | |
| known type. | |
| Third-party Mappings: | |
| 4.6.4 -ZipIt Macintosh Extra Field (long) (0x2605): | |
| The following is the layout of the ZipIt extra block | |
| for Macintosh. The local-header and central-header versions | |
| are identical. This block must be present if the file is | |
| stored MacBinary-encoded and it should not be used if the file | |
| is not stored MacBinary-encoded. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (Mac2) 0x2605 Short tag for this extra block type | |
| TSize Short total data size for this block | |
| "ZPIT" beLong extra-field signature | |
| FnLen Byte length of FileName | |
| FileName variable full Macintosh filename | |
| FileType Byte[4] four-byte Mac file type string | |
| Creator Byte[4] four-byte Mac creator string | |
| 4.6.5 -ZipIt Macintosh Extra Field (short, for files) (0x2705): | |
| The following is the layout of a shortened variant of the | |
| ZipIt extra block for Macintosh (without "full name" entry). | |
| This variant is used by ZipIt 1.3.5 and newer for entries of | |
| files (not directories) that do not have a MacBinary encoded | |
| file. The local-header and central-header versions are identical. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (Mac2b) 0x2705 Short tag for this extra block type | |
| TSize Short total data size for this block (12) | |
| "ZPIT" beLong extra-field signature | |
| FileType Byte[4] four-byte Mac file type string | |
| Creator Byte[4] four-byte Mac creator string | |
| fdFlags beShort attributes from FInfo.frFlags, | |
| may be omitted | |
| 0x0000 beShort reserved, may be omitted | |
| 4.6.6 -ZipIt Macintosh Extra Field (short, for directories) (0x2805): | |
| The following is the layout of a shortened variant of the | |
| ZipIt extra block for Macintosh used only for directory | |
| entries. This variant is used by ZipIt 1.3.5 and newer to | |
| save some optional Mac-specific information about directories. | |
| The local-header and central-header versions are identical. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (Mac2c) 0x2805 Short tag for this extra block type | |
| TSize Short total data size for this block (12) | |
| "ZPIT" beLong extra-field signature | |
| frFlags beShort attributes from DInfo.frFlags, may | |
| be omitted | |
| View beShort ZipIt view flag, may be omitted | |
| The View field specifies ZipIt-internal settings as follows: | |
| Bits of the Flags: | |
| bit 0 if set, the folder is shown expanded (open) | |
| when the archive contents are viewed in ZipIt. | |
| bits 1-15 reserved, zero; | |
| 4.6.7 -FWKCS MD5 Extra Field (0x4b46): | |
| The FWKCS Contents_Signature System, used in | |
| automatically identifying files independent of file name, | |
| optionally adds and uses an extra field to support the | |
| rapid creation of an enhanced contents_signature: | |
| Header ID = 0x4b46 | |
| Data Size = 0x0013 | |
| Preface = 'M','D','5' | |
| followed by 16 bytes containing the uncompressed file's | |
| 128_bit MD5 hash(1), low byte first. | |
| When FWKCS revises a .ZIP file central directory to add | |
| this extra field for a file, it also replaces the | |
| central directory entry for that file's uncompressed | |
| file length with a measured value. | |
| FWKCS provides an option to strip this extra field, if | |
| present, from a .ZIP file central directory. In adding | |
| this extra field, FWKCS preserves .ZIP file Authenticity | |
| Verification; if stripping this extra field, FWKCS | |
| preserves all versions of AV through PKZIP version 2.04g. | |
| FWKCS, and FWKCS Contents_Signature System, are | |
| trademarks of Frederick W. Kantor. | |
| (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer | |
| Science and RSA Data Security, Inc., April 1992. | |
| ll.76-77: "The MD5 algorithm is being placed in the | |
| public domain for review and possible adoption as a | |
| standard." | |
| 4.6.8 -Info-ZIP Unicode Comment Extra Field (0x6375): | |
| Stores the UTF-8 version of the file comment as stored in the | |
| central directory header. (Last Revision 20070912) | |
| Value Size Description | |
| ----- ---- ----------- | |
| (UCom) 0x6375 Short tag for this extra block type ("uc") | |
| TSize Short total data size for this block | |
| Version 1 byte version of this extra field, currently 1 | |
| ComCRC32 4 bytes Comment Field CRC32 Checksum | |
| UnicodeCom Variable UTF-8 version of the entry comment | |
| Currently Version is set to the number 1. If there is a need | |
| to change this field, the version will be incremented. Changes | |
| may not be backward compatible so this extra field should not be | |
| used if the version is not recognized. | |
| The ComCRC32 is the standard zip CRC32 checksum of the File Comment | |
| field in the central directory header. This is used to verify that | |
| the comment field has not changed since the Unicode Comment extra field | |
| was created. This can happen if a utility changes the File Comment | |
| field but does not update the UTF-8 Comment extra field. If the CRC | |
| check fails, this Unicode Comment extra field should be ignored and | |
| the File Comment field in the header should be used instead. | |
| The UnicodeCom field is the UTF-8 version of the File Comment field | |
| in the header. As UnicodeCom is defined to be UTF-8, no UTF-8 byte | |
| order mark (BOM) is used. The length of this field is determined by | |
| subtracting the size of the previous fields from TSize. If both the | |
| File Name and Comment fields are UTF-8, the new General Purpose Bit | |
| Flag, bit 11 (Language encoding flag (EFS)), can be used to indicate | |
| both the header File Name and Comment fields are UTF-8 and, in this | |
| case, the Unicode Path and Unicode Comment extra fields are not | |
| needed and should not be created. Note that, for backward | |
| compatibility, bit 11 should only be used if the native character set | |
| of the paths and comments being zipped up are already in UTF-8. It is | |
| expected that the same file comment storage method, either general | |
| purpose bit 11 or extra fields, be used in both the Local and Central | |
| Directory Header for a file. | |
| 4.6.9 -Info-ZIP Unicode Path Extra Field (0x7075): | |
| Stores the UTF-8 version of the file name field as stored in the | |
| local header and central directory header. (Last Revision 20070912) | |
| Value Size Description | |
| ----- ---- ----------- | |
| (UPath) 0x7075 Short tag for this extra block type ("up") | |
| TSize Short total data size for this block | |
| Version 1 byte version of this extra field, currently 1 | |
| NameCRC32 4 bytes File Name Field CRC32 Checksum | |
| UnicodeName Variable UTF-8 version of the entry File Name | |
| Currently Version is set to the number 1. If there is a need | |
| to change this field, the version will be incremented. Changes | |
| may not be backward compatible so this extra field should not be | |
| used if the version is not recognized. | |
| The NameCRC32 is the standard zip CRC32 checksum of the File Name | |
| field in the header. This is used to verify that the header | |
| File Name field has not changed since the Unicode Path extra field | |
| was created. This can happen if a utility renames the File Name but | |
| does not update the UTF-8 path extra field. If the CRC check fails, | |
| this UTF-8 Path Extra Field should be ignored and the File Name field | |
| in the header should be used instead. | |
| The UnicodeName is the UTF-8 version of the contents of the File Name | |
| field in the header. As UnicodeName is defined to be UTF-8, no UTF-8 | |
| byte order mark (BOM) is used. The length of this field is determined | |
| by subtracting the size of the previous fields from TSize. If both | |
| the File Name and Comment fields are UTF-8, the new General Purpose | |
| Bit Flag, bit 11 (Language encoding flag (EFS)), can be used to | |
| indicate that both the header File Name and Comment fields are UTF-8 | |
| and, in this case, the Unicode Path and Unicode Comment extra fields | |
| are not needed and should not be created. Note that, for backward | |
| compatibility, bit 11 should only be used if the native character set | |
| of the paths and comments being zipped up are already in UTF-8. It is | |
| expected that the same file name storage method, either general | |
| purpose bit 11 or extra fields, be used in both the Local and Central | |
| Directory Header for a file. | |
| 4.6.10 -Microsoft Open Packaging Growth Hint (0xa220): | |
| Value Size Description | |
| ----- ---- ----------- | |
| 0xa220 Short tag for this extra block type | |
| TSize Short size of Sig + PadVal + Padding | |
| Sig Short verification signature (A028) | |
| PadVal Short Initial padding value | |
| Padding variable filled with NULL characters | |
| 4.7 Manifest Files | |
| ------------------ | |
| 4.7.1 Applications using ZIP files may have a need for additional | |
| information that must be included with the files placed into | |
| a ZIP file. Application specific information that cannot be | |
| stored using the defined ZIP storage records SHOULD be stored | |
| using the extensible Extra Field convention defined in this | |
| document. However, some applications may use a manifest | |
| file as a means for storing additional information. One | |
| example is the META-INF/MANIFEST.MF file used in ZIP formatted | |
| files having the .JAR extension (JAR files). | |
| 4.7.2 A manifest file is a file created for the application process | |
| that requires this information. A manifest file MAY be of any | |
| file type required by the defining application process. It is | |
| placed within the same ZIP file as files to which this information | |
| applies. By convention, this file is typically the first file placed | |
| into the ZIP file and it may include a defined directory path. | |
| 4.7.3 Manifest files may be compressed or encrypted as needed for | |
| application processing of the files inside the ZIP files. | |
| Manifest files are outside of the scope of this specification. | |
| 5.0 Explanation of compression methods | |
| -------------------------------------- | |
| 5.1 UnShrinking - Method 1 | |
| -------------------------- | |
| 5.1.1 Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm | |
| with partial clearing. The initial code size is 9 bits, and the | |
| maximum code size is 13 bits. Shrinking differs from conventional | |
| Dynamic Ziv-Lempel-Welch implementations in several respects: | |
| 5.1.2 The code size is controlled by the compressor, and is | |
| not automatically increased when codes larger than the current | |
| code size are created (but not necessarily used). When | |
| the decompressor encounters the code sequence 256 | |
| (decimal) followed by 1, it should increase the code size | |
| read from the input stream to the next bit size. No | |
| blocking of the codes is performed, so the next code at | |
| the increased size should be read from the input stream | |
| immediately after where the previous code at the smaller | |
| bit size was read. Again, the decompressor should not | |
| increase the code size used until the sequence 256,1 is | |
| encountered. | |
| 5.1.3 When the table becomes full, total clearing is not | |
| performed. Rather, when the compressor emits the code | |
| sequence 256,2 (decimal), the decompressor should clear | |
| all leaf nodes from the Ziv-Lempel tree, and continue to | |
| use the current code size. The nodes that are cleared | |
| from the Ziv-Lempel tree are then re-used, with the lowest | |
| code value re-used first, and the highest code value | |
| re-used last. The compressor can emit the sequence 256,2 | |
| at any time. | |
| 5.2 Expanding - Methods 2-5 | |
| --------------------------- | |
| 5.2.1 The Reducing algorithm is actually a combination of two | |
| distinct algorithms. The first algorithm compresses repeated | |
| byte sequences, and the second algorithm takes the compressed | |
| stream from the first algorithm and applies a probabilistic | |
| compression method. | |
| 5.2.2 The probabilistic compression stores an array of 'follower | |
| sets' S(j), for j=0 to 255, corresponding to each possible | |
| ASCII character. Each set contains between 0 and 32 | |
| characters, to be denoted as S(j)[0],...,S(j)[m], where m<32. | |
| The sets are stored at the beginning of the data area for a | |
| Reduced file, in reverse order, with S(255) first, and S(0) | |
| last. | |
| 5.2.3 The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] }, | |
| where N(j) is the size of set S(j). N(j) can be 0, in which | |
| case the follower set for S(j) is empty. Each N(j) value is | |
| encoded in 6 bits, followed by N(j) eight bit character values | |
| corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If | |
| N(j) is 0, then no values for S(j) are stored, and the value | |
| for N(j-1) immediately follows. | |
| 5.2.4 Immediately after the follower sets, is the compressed data | |
| stream. The compressed data stream can be interpreted for the | |
| probabilistic decompression as follows: | |
| let Last-Character <- 0. | |
| loop until done | |
| if the follower set S(Last-Character) is empty then | |
| read 8 bits from the input stream, and copy this | |
| value to the output stream. | |
| otherwise if the follower set S(Last-Character) is non-empty then | |
| read 1 bit from the input stream. | |
| if this bit is not zero then | |
| read 8 bits from the input stream, and copy this | |
| value to the output stream. | |
| otherwise if this bit is zero then | |
| read B(N(Last-Character)) bits from the input | |
| stream, and assign this value to I. | |
| Copy the value of S(Last-Character)[I] to the | |
| output stream. | |
| assign the last value placed on the output stream to | |
| Last-Character. | |
| end loop | |
| B(N(j)) is defined as the minimal number of bits required to | |
| encode the value N(j)-1. | |
| 5.2.5 The decompressed stream from above can then be expanded to | |
| re-create the original file as follows: | |
| let State <- 0. | |
| loop until done | |
| read 8 bits from the input stream into C. | |
| case State of | |
| 0: if C is not equal to DLE (144 decimal) then | |
| copy C to the output stream. | |
| otherwise if C is equal to DLE then | |
| let State <- 1. | |
| 1: if C is non-zero then | |
| let V <- C. | |
| let Len <- L(V) | |
| let State <- F(Len). | |
| otherwise if C is zero then | |
| copy the value 144 (decimal) to the output stream. | |
| let State <- 0 | |
| 2: let Len <- Len + C | |
| let State <- 3. | |
| 3: move backwards D(V,C) bytes in the output stream | |
| (if this position is before the start of the output | |
| stream, then assume that all the data before the | |
| start of the output stream is filled with zeros). | |
| copy Len+3 bytes from this position to the output stream. | |
| let State <- 0. | |
| end case | |
| end loop | |
| The functions F,L, and D are dependent on the 'compression | |
| factor', 1 through 4, and are defined as follows: | |
| For compression factor 1: | |
| L(X) equals the lower 7 bits of X. | |
| F(X) equals 2 if X equals 127 otherwise F(X) equals 3. | |
| D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1. | |
| For compression factor 2: | |
| L(X) equals the lower 6 bits of X. | |
| F(X) equals 2 if X equals 63 otherwise F(X) equals 3. | |
| D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1. | |
| For compression factor 3: | |
| L(X) equals the lower 5 bits of X. | |
| F(X) equals 2 if X equals 31 otherwise F(X) equals 3. | |
| D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1. | |
| For compression factor 4: | |
| L(X) equals the lower 4 bits of X. | |
| F(X) equals 2 if X equals 15 otherwise F(X) equals 3. | |
| D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1. | |
| 5.3 Imploding - Method 6 | |
| ------------------------ | |
| 5.3.1 The Imploding algorithm is actually a combination of two | |
| distinct algorithms. The first algorithm compresses repeated byte | |
| sequences using a sliding dictionary. The second algorithm is | |
| used to compress the encoding of the sliding dictionary output, | |
| using multiple Shannon-Fano trees. | |
| 5.3.2 The Imploding algorithm can use a 4K or 8K sliding dictionary | |
| size. The dictionary size used can be determined by bit 1 in the | |
| general purpose flag word; a 0 bit indicates a 4K dictionary | |
| while a 1 bit indicates an 8K dictionary. | |
| 5.3.3 The Shannon-Fano trees are stored at the start of the | |
| compressed file. The number of trees stored is defined by bit 2 in | |
| the general purpose flag word; a 0 bit indicates two trees stored, | |
| a 1 bit indicates three trees are stored. If 3 trees are stored, | |
| the first Shannon-Fano tree represents the encoding of the | |
| Literal characters, the second tree represents the encoding of | |
| the Length information, the third represents the encoding of the | |
| Distance information. When 2 Shannon-Fano trees are stored, the | |
| Length tree is stored first, followed by the Distance tree. | |
| 5.3.4 The Literal Shannon-Fano tree, if present is used to represent | |
| the entire ASCII character set, and contains 256 values. This | |
| tree is used to compress any data not compressed by the sliding | |
| dictionary algorithm. When this tree is present, the Minimum | |
| Match Length for the sliding dictionary is 3. If this tree is | |
| not present, the Minimum Match Length is 2. | |
| 5.3.5 The Length Shannon-Fano tree is used to compress the Length | |
| part of the (length,distance) pairs from the sliding dictionary | |
| output. The Length tree contains 64 values, ranging from the | |
| Minimum Match Length, to 63 plus the Minimum Match Length. | |
| 5.3.6 The Distance Shannon-Fano tree is used to compress the Distance | |
| part of the (length,distance) pairs from the sliding dictionary | |
| output. The Distance tree contains 64 values, ranging from 0 to | |
| 63, representing the upper 6 bits of the distance value. The | |
| distance values themselves will be between 0 and the sliding | |
| dictionary size, either 4K or 8K. | |
| 5.3.7 The Shannon-Fano trees themselves are stored in a compressed | |
| format. The first byte of the tree data represents the number of | |
| bytes of data representing the (compressed) Shannon-Fano tree | |
| minus 1. The remaining bytes represent the Shannon-Fano tree | |
| data encoded as: | |
| High 4 bits: Number of values at this bit length + 1. (1 - 16) | |
| Low 4 bits: Bit Length needed to represent value + 1. (1 - 16) | |
| 5.3.8 The Shannon-Fano codes can be constructed from the bit lengths | |
| using the following algorithm: | |
| 1) Sort the Bit Lengths in ascending order, while retaining the | |
| order of the original lengths stored in the file. | |
| 2) Generate the Shannon-Fano trees: | |
| Code <- 0 | |
| CodeIncrement <- 0 | |
| LastBitLength <- 0 | |
| i <- number of Shannon-Fano codes - 1 (either 255 or 63) | |
| loop while i >= 0 | |
| Code = Code + CodeIncrement | |
| if BitLength(i) <> LastBitLength then | |
| LastBitLength=BitLength(i) | |
| CodeIncrement = 1 shifted left (16 - LastBitLength) | |
| ShannonCode(i) = Code | |
| i <- i - 1 | |
| end loop | |
| 3) Reverse the order of all the bits in the above ShannonCode() | |
| vector, so that the most significant bit becomes the least | |
| significant bit. For example, the value 0x1234 (hex) would | |
| become 0x2C48 (hex). | |
| 4) Restore the order of Shannon-Fano codes as originally stored | |
| within the file. | |
| Example: | |
| This example will show the encoding of a Shannon-Fano tree | |
| of size 8. Notice that the actual Shannon-Fano trees used | |
| for Imploding are either 64 or 256 entries in size. | |
| Example: 0x02, 0x42, 0x01, 0x13 | |
| The first byte indicates 3 values in this table. Decoding the | |
| bytes: | |
| 0x42 = 5 codes of 3 bits long | |
| 0x01 = 1 code of 2 bits long | |
| 0x13 = 2 codes of 4 bits long | |
| This would generate the original bit length array of: | |
| (3, 3, 3, 3, 3, 2, 4, 4) | |
| There are 8 codes in this table for the values 0 thru 7. Using | |
| the algorithm to obtain the Shannon-Fano codes produces: | |
| Reversed Order Original | |
| Val Sorted Constructed Code Value Restored Length | |
| --- ------ ----------------- -------- -------- ------ | |
| 0: 2 1100000000000000 11 101 3 | |
| 1: 3 1010000000000000 101 001 3 | |
| 2: 3 1000000000000000 001 110 3 | |
| 3: 3 0110000000000000 110 010 3 | |
| 4: 3 0100000000000000 010 100 3 | |
| 5: 3 0010000000000000 100 11 2 | |
| 6: 4 0001000000000000 1000 1000 4 | |
| 7: 4 0000000000000000 0000 0000 4 | |
| The values in the Val, Order Restored and Original Length columns | |
| now represent the Shannon-Fano encoding tree that can be used for | |
| decoding the Shannon-Fano encoded data. How to parse the | |
| variable length Shannon-Fano values from the data stream is beyond | |
| the scope of this document. (See the references listed at the end of | |
| this document for more information.) However, traditional decoding | |
| schemes used for Huffman variable length decoding, such as the | |
| Greenlaw algorithm, can be successfully applied. | |
| 5.3.9 The compressed data stream begins immediately after the | |
| compressed Shannon-Fano data. The compressed data stream can be | |
| interpreted as follows: | |
| loop until done | |
| read 1 bit from input stream. | |
| if this bit is non-zero then (encoded data is literal data) | |
| if Literal Shannon-Fano tree is present | |
| read and decode character using Literal Shannon-Fano tree. | |
| otherwise | |
| read 8 bits from input stream. | |
| copy character to the output stream. | |
| otherwise (encoded data is sliding dictionary match) | |
| if 8K dictionary size | |
| read 7 bits for offset Distance (lower 7 bits of offset). | |
| otherwise | |
| read 6 bits for offset Distance (lower 6 bits of offset). | |
| using the Distance Shannon-Fano tree, read and decode the | |
| upper 6 bits of the Distance value. | |
| using the Length Shannon-Fano tree, read and decode | |
| the Length value. | |
| Length <- Length + Minimum Match Length | |
| if Length = 63 + Minimum Match Length | |
| read 8 bits from the input stream, | |
| add this value to Length. | |
| move backwards Distance+1 bytes in the output stream, and | |
| copy Length characters from this position to the output | |
| stream. (if this position is before the start of the output | |
| stream, then assume that all the data before the start of | |
| the output stream is filled with zeros). | |
| end loop | |
| 5.4 Tokenizing - Method 7 | |
| ------------------------- | |
| 5.4.1 This method is not used by PKZIP. | |
| 5.5 Deflating - Method 8 | |
| ------------------------ | |
| 5.5.1 The Deflate algorithm is similar to the Implode algorithm using | |
| a sliding dictionary of up to 32K with secondary compression | |
| from Huffman/Shannon-Fano codes. | |
| 5.5.2 The compressed data is stored in blocks with a header describing | |
| the block and the Huffman codes used in the data block. The header | |
| format is as follows: | |
| Bit 0: Last Block bit This bit is set to 1 if this is the last | |
| compressed block in the data. | |
| Bits 1-2: Block type | |
| 00 (0) - Block is stored - All stored data is byte aligned. | |
| Skip bits until next byte, then next word = block | |
| length, followed by the ones compliment of the block | |
| length word. Remaining data in block is the stored | |
| data. | |
| 01 (1) - Use fixed Huffman codes for literal and distance codes. | |
| Lit Code Bits Dist Code Bits | |
| --------- ---- --------- ---- | |
| 0 - 143 8 0 - 31 5 | |
| 144 - 255 9 | |
| 256 - 279 7 | |
| 280 - 287 8 | |
| Literal codes 286-287 and distance codes 30-31 are | |
| never used but participate in the huffman construction. | |
| 10 (2) - Dynamic Huffman codes. (See expanding Huffman codes) | |
| 11 (3) - Reserved - Flag a "Error in compressed data" if seen. | |
| 5.5.3 Expanding Huffman Codes | |
| If the data block is stored with dynamic Huffman codes, the Huffman | |
| codes are sent in the following compressed format: | |
| 5 Bits: # of Literal codes sent - 256 (256 - 286) | |
| All other codes are never sent. | |
| 5 Bits: # of Dist codes - 1 (1 - 32) | |
| 4 Bits: # of Bit Length codes - 3 (3 - 19) | |
| The Huffman codes are sent as bit lengths and the codes are built as | |
| described in the implode algorithm. The bit lengths themselves are | |
| compressed with Huffman codes. There are 19 bit length codes: | |
| 0 - 15: Represent bit lengths of 0 - 15 | |
| 16: Copy the previous bit length 3 - 6 times. | |
| The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6) | |
| Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will | |
| expand to 12 bit lengths of 8 (1 + 6 + 5) | |
| 17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length) | |
| 18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length) | |
| The lengths of the bit length codes are sent packed 3 bits per value | |
| (0 - 7) in the following order: | |
| 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15 | |
| The Huffman codes should be built as described in the Implode algorithm | |
| except codes are assigned starting at the shortest bit length, i.e. the | |
| shortest code should be all 0's rather than all 1's. Also, codes with | |
| a bit length of zero do not participate in the tree construction. The | |
| codes are then used to decode the bit lengths for the literal and | |
| distance tables. | |
| The bit lengths for the literal tables are sent first with the number | |
| of entries sent described by the 5 bits sent earlier. There are up | |
| to 286 literal characters; the first 256 represent the respective 8 | |
| bit character, code 256 represents the End-Of-Block code, the remaining | |
| 29 codes represent copy lengths of 3 thru 258. There are up to 30 | |
| distance codes representing distances from 1 thru 32k as described | |
| below. | |
| Length Codes | |
| ------------ | |
| Extra Extra Extra Extra | |
| Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s) | |
| ---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- --------- | |
| 257 0 3 265 1 11,12 273 3 35-42 281 5 131-162 | |
| 258 0 4 266 1 13,14 274 3 43-50 282 5 163-194 | |
| 259 0 5 267 1 15,16 275 3 51-58 283 5 195-226 | |
| 260 0 6 268 1 17,18 276 3 59-66 284 5 227-257 | |
| 261 0 7 269 2 19-22 277 4 67-82 285 0 258 | |
| 262 0 8 270 2 23-26 278 4 83-98 | |
| 263 0 9 271 2 27-30 279 4 99-114 | |
| 264 0 10 272 2 31-34 280 4 115-130 | |
| Distance Codes | |
| -------------- | |
| Extra Extra Extra Extra | |
| Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance | |
| ---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- -------- | |
| 0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144 | |
| 1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192 | |
| 2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288 | |
| 3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384 | |
| 4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576 | |
| 5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768 | |
| 6 2 9-12 14 6 129-192 22 10 2049-3072 | |
| 7 2 13-16 15 6 193-256 23 10 3073-4096 | |
| 5.5.4 The compressed data stream begins immediately after the | |
| compressed header data. The compressed data stream can be | |
| interpreted as follows: | |
| do | |
| read header from input stream. | |
| if stored block | |
| skip bits until byte aligned | |
| read count and 1's compliment of count | |
| copy count bytes data block | |
| otherwise | |
| loop until end of block code sent | |
| decode literal character from input stream | |
| if literal < 256 | |
| copy character to the output stream | |
| otherwise | |
| if literal = end of block | |
| break from loop | |
| otherwise | |
| decode distance from input stream | |
| move backwards distance bytes in the output stream, and | |
| copy length characters from this position to the output | |
| stream. | |
| end loop | |
| while not last block | |
| if data descriptor exists | |
| skip bits until byte aligned | |
| read crc and sizes | |
| endif | |
| 5.6 Enhanced Deflating - Method 9 | |
| --------------------------------- | |
| 5.6.1 The Enhanced Deflating algorithm is similar to Deflate but uses | |
| a sliding dictionary of up to 64K. Deflate64(tm) is supported | |
| by the Deflate extractor. | |
| 5.7 BZIP2 - Method 12 | |
| --------------------- | |
| 5.7.1 BZIP2 is an open-source data compression algorithm developed by | |
| Julian Seward. Information and source code for this algorithm | |
| can be found on the internet. | |
| 5.8 LZMA - Method 14 | |
| --------------------- | |
| 5.8.1 LZMA is a block-oriented, general purpose data compression | |
| algorithm developed and maintained by Igor Pavlov. It is a derivative | |
| of LZ77 that utilizes Markov chains and a range coder. Information and | |
| source code for this algorithm can be found on the internet. Consult | |
| with the author of this algorithm for information on terms or | |
| restrictions on use. | |
| Support for LZMA within the ZIP format is defined as follows: | |
| 5.8.2 The Compression method field within the ZIP Local and Central | |
| Header records will be set to the value 14 to indicate data was | |
| compressed using LZMA. | |
| 5.8.3 The Version needed to extract field within the ZIP Local and | |
| Central Header records will be set to 6.3 to indicate the minimum | |
| ZIP format version supporting this feature. | |
| 5.8.4 File data compressed using the LZMA algorithm must be placed | |
| immediately following the Local Header for the file. If a standard | |
| ZIP encryption header is required, it will follow the Local Header | |
| and will precede the LZMA compressed file data segment. The location | |
| of LZMA compressed data segment within the ZIP format will be as shown: | |
| [local header file 1] | |
| [encryption header file 1] | |
| [LZMA compressed data segment for file 1] | |
| [data descriptor 1] | |
| [local header file 2] | |
| 5.8.5 The encryption header and data descriptor records may | |
| be conditionally present. The LZMA Compressed Data Segment | |
| will consist of an LZMA Properties Header followed by the | |
| LZMA Compressed Data as shown: | |
| [LZMA properties header for file 1] | |
| [LZMA compressed data for file 1] | |
| 5.8.6 The LZMA Compressed Data will be stored as provided by the | |
| LZMA compression library. Compressed size, uncompressed size and | |
| other file characteristics about the file being compressed must be | |
| stored in standard ZIP storage format. | |
| 5.8.7 The LZMA Properties Header will store specific data required | |
| to decompress the LZMA compressed Data. This data is set by the | |
| LZMA compression engine using the function WriteCoderProperties() | |
| as documented within the LZMA SDK. | |
| 5.8.8 Storage fields for the property information within the LZMA | |
| Properties Header are as follows: | |
| LZMA Version Information 2 bytes | |
| LZMA Properties Size 2 bytes | |
| LZMA Properties Data variable, defined by "LZMA Properties Size" | |
| 5.8.8.1 LZMA Version Information - this field identifies which version | |
| of the LZMA SDK was used to compress a file. The first byte will | |
| store the major version number of the LZMA SDK and the second | |
| byte will store the minor number. | |
| 5.8.8.2 LZMA Properties Size - this field defines the size of the | |
| remaining property data. Typically this size should be determined by | |
| the version of the SDK. This size field is included as a convenience | |
| and to help avoid any ambiguity should it arise in the future due | |
| to changes in this compression algorithm. | |
| 5.8.8.3 LZMA Property Data - this variable sized field records the | |
| required values for the decompressor as defined by the LZMA SDK. | |
| The data stored in this field should be obtained using the | |
| WriteCoderProperties() in the version of the SDK defined by | |
| the "LZMA Version Information" field. | |
| 5.8.8.4 The layout of the "LZMA Properties Data" field is a function of | |
| the LZMA compression algorithm. It is possible that this layout may be | |
| changed by the author over time. The data layout in version 4.3 of the | |
| LZMA SDK defines a 5 byte array that uses 4 bytes to store the dictionary | |
| size in little-endian order. This is preceded by a single packed byte as | |
| the first element of the array that contains the following fields: | |
| PosStateBits | |
| LiteralPosStateBits | |
| LiteralContextBits | |
| Refer to the LZMA documentation for a more detailed explanation of | |
| these fields. | |
| 5.8.9 Data compressed with method 14, LZMA, may include an end-of-stream | |
| (EOS) marker ending the compressed data stream. This marker is not | |
| required, but its use is highly recommended to facilitate processing | |
| and implementers should include the EOS marker whenever possible. | |
| When the EOS marker is used, general purpose bit 1 must be set. If | |
| general purpose bit 1 is not set, the EOS marker is not present. | |
| 5.9 WavPack - Method 97 | |
| ----------------------- | |
| 5.9.1 Information describing the use of compression method 97 is | |
| provided by WinZIP International, LLC. This method relies on the | |
| open source WavPack audio compression utility developed by David Bryant. | |
| Information on WavPack is available at www.wavpack.com. Please consult | |
| with the author of this algorithm for information on terms and | |
| restrictions on use. | |
| 5.9.2 WavPack data for a file begins immediately after the end of the | |
| local header data. This data is the output from WavPack compression | |
| routines. Within the ZIP file, the use of WavPack compression is | |
| indicated by setting the compression method field to a value of 97 | |
| in both the local header and the central directory header. The Version | |
| needed to extract and version made by fields use the same values as are | |
| used for data compressed using the Deflate algorithm. | |
| 5.9.3 An implementation note for storing digital sample data when using | |
| WavPack compression within ZIP files is that all of the bytes of | |
| the sample data should be compressed. This includes any unused | |
| bits up to the byte boundary. An example is a 2 byte sample that | |
| uses only 12 bits for the sample data with 4 unused bits. If only | |
| 12 bits are passed as the sample size to the WavPack routines, the 4 | |
| unused bits will be set to 0 on extraction regardless of their original | |
| state. To avoid this, the full 16 bits of the sample data size | |
| should be provided. | |
| 5.10 PPMd - Method 98 | |
| --------------------- | |
| 5.10.1 PPMd is a data compression algorithm developed by Dmitry Shkarin | |
| which includes a carryless rangecoder developed by Dmitry Subbotin. | |
| This algorithm is based on predictive phrase matching on multiple | |
| order contexts. Information and source code for this algorithm | |
| can be found on the internet. Consult with the author of this | |
| algorithm for information on terms or restrictions on use. | |
| 5.10.2 Support for PPMd within the ZIP format currently is provided only | |
| for version I, revision 1 of the algorithm. Storage requirements | |
| for using this algorithm are as follows: | |
| 5.10.3 Parameters needed to control the algorithm are stored in the two | |
| bytes immediately preceding the compressed data. These bytes are | |
| used to store the following fields: | |
| Model order - sets the maximum model order, default is 8, possible | |
| values are from 2 to 16 inclusive | |
| Sub-allocator size - sets the size of sub-allocator in MB, default is 50, | |
| possible values are from 1MB to 256MB inclusive | |
| Model restoration method - sets the method used to restart context | |
| model at memory insufficiency, values are: | |
| 0 - restarts model from scratch - default | |
| 1 - cut off model - decreases performance by as much as 2x | |
| 2 - freeze context tree - not recommended | |
| 5.10.4 An example for packing these fields into the 2 byte storage field is | |
| illustrated below. These values are stored in Intel low-byte/high-byte | |
| order. | |
| wPPMd = (Model order - 1) + | |
| ((Sub-allocator size - 1) << 4) + | |
| (Model restoration method << 12) | |
| 6.0 Traditional PKWARE Encryption | |
| ---------------------------------- | |
| 6.0.1 The following information discusses the decryption steps | |
| required to support traditional PKWARE encryption. This | |
| form of encryption is considered weak by today's standards | |
| and its use is recommended only for situations with | |
| low security needs or for compatibility with older .ZIP | |
| applications. | |
| 6.1 Traditional PKWARE Decryption | |
| --------------------------------- | |
| 6.1.1 PKWARE is grateful to Mr. Roger Schlafly for his expert | |
| contribution towards the development of PKWARE's traditional | |
| encryption. | |
| 6.1.2 PKZIP encrypts the compressed data stream. Encrypted files | |
| must be decrypted before they can be extracted to their original | |
| form. | |
| 6.1.3 Each encrypted file has an extra 12 bytes stored at the start | |
| of the data area defining the encryption header for that file. The | |
| encryption header is originally set to random values, and then | |
| itself encrypted, using three, 32-bit keys. The key values are | |
| initialized using the supplied encryption password. After each byte | |
| is encrypted, the keys are then updated using pseudo-random number | |
| generation techniques in combination with the same CRC-32 algorithm | |
| used in PKZIP and described elsewhere in this document. | |
| 6.1.4 The following are the basic steps required to decrypt a file: | |
| 1) Initialize the three 32-bit keys with the password. | |
| 2) Read and decrypt the 12-byte encryption header, further | |
| initializing the encryption keys. | |
| 3) Read and decrypt the compressed data stream using the | |
| encryption keys. | |
| 6.1.5 Initializing the encryption keys | |
| Key(0) <- 305419896 | |
| Key(1) <- 591751049 | |
| Key(2) <- 878082192 | |
| loop for i <- 0 to length(password)-1 | |
| update_keys(password(i)) | |
| end loop | |
| Where update_keys() is defined as: | |
| update_keys(char): | |
| Key(0) <- crc32(key(0),char) | |
| Key(1) <- Key(1) + (Key(0) & 000000ffH) | |
| Key(1) <- Key(1) * 134775813 + 1 | |
| Key(2) <- crc32(key(2),key(1) >> 24) | |
| end update_keys | |
| Where crc32(old_crc,char) is a routine that given a CRC value and a | |
| character, returns an updated CRC value after applying the CRC-32 | |
| algorithm described elsewhere in this document. | |
| 6.1.6 Decrypting the encryption header | |
| The purpose of this step is to further initialize the encryption | |
| keys, based on random data, to render a plaintext attack on the | |
| data ineffective. | |
| Read the 12-byte encryption header into Buffer, in locations | |
| Buffer(0) thru Buffer(11). | |
| loop for i <- 0 to 11 | |
| C <- buffer(i) ^ decrypt_byte() | |
| update_keys(C) | |
| buffer(i) <- C | |
| end loop | |
| Where decrypt_byte() is defined as: | |
| unsigned char decrypt_byte() | |
| local unsigned short temp | |
| temp <- Key(2) | 2 | |
| decrypt_byte <- (temp * (temp ^ 1)) >> 8 | |
| end decrypt_byte | |
| After the header is decrypted, the last 1 or 2 bytes in Buffer | |
| should be the high-order word/byte of the CRC for the file being | |
| decrypted, stored in Intel low-byte/high-byte order. Versions of | |
| PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is | |
| used on versions after 2.0. This can be used to test if the password | |
| supplied is correct or not. | |
| 6.1.7 Decrypting the compressed data stream | |
| The compressed data stream can be decrypted as follows: | |
| loop until done | |
| read a character into C | |
| Temp <- C ^ decrypt_byte() | |
| update_keys(temp) | |
| output Temp | |
| end loop | |
| 7.0 Strong Encryption Specification | |
| ----------------------------------- | |
| 7.0.1 Portions of the Strong Encryption technology defined in this | |
| specification are covered under patents and pending patent applications. | |
| Refer to the section in this document entitled "Incorporating | |
| PKWARE Proprietary Technology into Your Product" for more information. | |
| 7.1 Strong Encryption Overview | |
| ------------------------------ | |
| 7.1.1 Version 5.x of this specification introduced support for strong | |
| encryption algorithms. These algorithms can be used with either | |
| a password or an X.509v3 digital certificate to encrypt each file. | |
| This format specification supports either password or certificate | |
| based encryption to meet the security needs of today, to enable | |
| interoperability between users within both PKI and non-PKI | |
| environments, and to ensure interoperability between different | |
| computing platforms that are running a ZIP program. | |
| 7.1.2 Password based encryption is the most common form of encryption | |
| people are familiar with. However, inherent weaknesses with | |
| passwords (e.g. susceptibility to dictionary/brute force attack) | |
| as well as password management and support issues make certificate | |
| based encryption a more secure and scalable option. Industry | |
| efforts and support are defining and moving towards more advanced | |
| security solutions built around X.509v3 digital certificates and | |
| Public Key Infrastructures(PKI) because of the greater scalability, | |
| administrative options, and more robust security over traditional | |
| password based encryption. | |
| 7.1.3 Most standard encryption algorithms are supported with this | |
| specification. Reference implementations for many of these | |
| algorithms are available from either commercial or open source | |
| distributors. Readily available cryptographic toolkits make | |
| implementation of the encryption features straight-forward. | |
| This document is not intended to provide a treatise on data | |
| encryption principles or theory. Its purpose is to document the | |
| data structures required for implementing interoperable data | |
| encryption within the .ZIP format. It is strongly recommended that | |
| you have a good understanding of data encryption before reading | |
| further. | |
| 7.1.4 The algorithms introduced in Version 5.0 of this specification | |
| include: | |
| RC2 40 bit, 64 bit, and 128 bit | |
| RC4 40 bit, 64 bit, and 128 bit | |
| DES | |
| 3DES 112 bit and 168 bit | |
| Version 5.1 adds support for the following: | |
| AES 128 bit, 192 bit, and 256 bit | |
| 7.1.5 Version 6.1 introduces encryption data changes to support | |
| interoperability with Smartcard and USB Token certificate storage | |
| methods which do not support the OAEP strengthening standard. | |
| 7.1.6 Version 6.2 introduces support for encrypting metadata by compressing | |
| and encrypting the central directory data structure to reduce information | |
| leakage. Information leakage can occur in legacy ZIP applications | |
| through exposure of information about a file even though that file is | |
| stored encrypted. The information exposed consists of file | |
| characteristics stored within the records and fields defined by this | |
| specification. This includes data such as a file's name, its original | |
| size, timestamp and CRC32 value. | |
| 7.1.7 Version 6.3 introduces support for encrypting data using the Blowfish | |
| and Twofish algorithms. These are symmetric block ciphers developed | |
| by Bruce Schneier. Blowfish supports using a variable length key from | |
| 32 to 448 bits. Block size is 64 bits. Implementations should use 16 | |
| rounds and the only mode supported within ZIP files is CBC. Twofish | |
| supports key sizes 128, 192 and 256 bits. Block size is 128 bits. | |
| Implementations should use 16 rounds and the only mode supported within | |
| ZIP files is CBC. Information and source code for both Blowfish and | |
| Twofish algorithms can be found on the internet. Consult with the author | |
| of these algorithms for information on terms or restrictions on use. | |
| 7.1.8 Central Directory Encryption provides greater protection against | |
| information leakage by encrypting the Central Directory structure and | |
| by masking key values that are replicated in the unencrypted Local | |
| Header. ZIP compatible programs that cannot interpret an encrypted | |
| Central Directory structure cannot rely on the data in the corresponding | |
| Local Header for decompression information. | |
| 7.1.9 Extra Field records that may contain information about a file that should | |
| not be exposed should not be stored in the Local Header and should only | |
| be written to the Central Directory where they can be encrypted. This | |
| design currently does not support streaming. Information in the End of | |
| Central Directory record, the Zip64 End of Central Directory Locator, | |
| and the Zip64 End of Central Directory records are not encrypted. Access | |
| to view data on files within a ZIP file with an encrypted Central Directory | |
| requires the appropriate password or private key for decryption prior to | |
| viewing any files, or any information about the files, in the archive. | |
| 7.1.10 Older ZIP compatible programs not familiar with the Central Directory | |
| Encryption feature will no longer be able to recognize the Central | |
| Directory and may assume the ZIP file is corrupt. Programs that | |
| attempt streaming access using Local Headers will see invalid | |
| information for each file. Central Directory Encryption need not be | |
| used for every ZIP file. Its use is recommended for greater security. | |
| ZIP files not using Central Directory Encryption should operate as | |
| in the past. | |
| 7.1.11 This strong encryption feature specification is intended to provide for | |
| scalable, cross-platform encryption needs ranging from simple password | |
| encryption to authenticated public/private key encryption. | |
| 7.1.12 Encryption provides data confidentiality and privacy. It is | |
| recommended that you combine X.509 digital signing with encryption | |
| to add authentication and non-repudiation. | |
| 7.2 Single Password Symmetric Encryption Method | |
| ----------------------------------------------- | |
| 7.2.1 The Single Password Symmetric Encryption Method using strong | |
| encryption algorithms operates similarly to the traditional | |
| PKWARE encryption defined in this format. Additional data | |
| structures are added to support the processing needs of the | |
| strong algorithms. | |
| The Strong Encryption data structures are: | |
| 7.2.2 General Purpose Bits - Bits 0 and 6 of the General Purpose bit | |
| flag in both local and central header records. Both bits set | |
| indicates strong encryption. Bit 13, when set indicates the Central | |
| Directory is encrypted and that selected fields in the Local Header | |
| are masked to hide their actual value. | |
| 7.2.3 Extra Field 0x0017 in central header only. | |
| Fields to consider in this record are: | |
| 7.2.3.1 Format - the data format identifier for this record. The only | |
| value allowed at this time is the integer value 2. | |
| 7.2.3.2 AlgId - integer identifier of the encryption algorithm from the | |
| following range | |
| 0x6601 - DES | |
| 0x6602 - RC2 (version needed to extract < 5.2) | |
| 0x6603 - 3DES 168 | |
| 0x6609 - 3DES 112 | |
| 0x660E - AES 128 | |
| 0x660F - AES 192 | |
| 0x6610 - AES 256 | |
| 0x6702 - RC2 (version needed to extract >= 5.2) | |
| 0x6720 - Blowfish | |
| 0x6721 - Twofish | |
| 0x6801 - RC4 | |
| 0xFFFF - Unknown algorithm | |
| 7.2.3.3 Bitlen - Explicit bit length of key | |
| 32 - 448 bits | |
| 7.2.3.4 Flags - Processing flags needed for decryption | |
| 0x0001 - Password is required to decrypt | |
| 0x0002 - Certificates only | |
| 0x0003 - Password or certificate required to decrypt | |
| Values > 0x0003 reserved for certificate processing | |
| 7.2.4 Decryption header record preceding compressed file data. | |
| -Decryption Header: | |
| Value Size Description | |
| ----- ---- ----------- | |
| IVSize 2 bytes Size of initialization vector (IV) | |
| IVData IVSize Initialization vector for this file | |
| Size 4 bytes Size of remaining decryption header data | |
| Format 2 bytes Format definition for this record | |
| AlgID 2 bytes Encryption algorithm identifier | |
| Bitlen 2 bytes Bit length of encryption key | |
| Flags 2 bytes Processing flags | |
| ErdSize 2 bytes Size of Encrypted Random Data | |
| ErdData ErdSize Encrypted Random Data | |
| Reserved1 4 bytes Reserved certificate processing data | |
| Reserved2 (var) Reserved for certificate processing data | |
| VSize 2 bytes Size of password validation data | |
| VData VSize-4 Password validation data | |
| VCRC32 4 bytes Standard ZIP CRC32 of password validation data | |
| 7.2.4.1 IVData - The size of the IV should match the algorithm block size. | |
| The IVData can be completely random data. If the size of | |
| the randomly generated data does not match the block size | |
| it should be complemented with zero's or truncated as | |
| necessary. If IVSize is 0,then IV = CRC32 + Uncompressed | |
| File Size (as a 64 bit little-endian, unsigned integer value). | |
| 7.2.4.2 Format - the data format identifier for this record. The only | |
| value allowed at this time is the integer value 3. | |
| 7.2.4.3 AlgId - integer identifier of the encryption algorithm from the | |
| following range | |
| 0x6601 - DES | |
| 0x6602 - RC2 (version needed to extract < 5.2) | |
| 0x6603 - 3DES 168 | |
| 0x6609 - 3DES 112 | |
| 0x660E - AES 128 | |
| 0x660F - AES 192 | |
| 0x6610 - AES 256 | |
| 0x6702 - RC2 (version needed to extract >= 5.2) | |
| 0x6720 - Blowfish | |
| 0x6721 - Twofish | |
| 0x6801 - RC4 | |
| 0xFFFF - Unknown algorithm | |
| 7.2.4.4 Bitlen - Explicit bit length of key | |
| 32 - 448 bits | |
| 7.2.4.5 Flags - Processing flags needed for decryption | |
| 0x0001 - Password is required to decrypt | |
| 0x0002 - Certificates only | |
| 0x0003 - Password or certificate required to decrypt | |
| Values > 0x0003 reserved for certificate processing | |
| 7.2.4.6 ErdData - Encrypted random data is used to store random data that | |
| is used to generate a file session key for encrypting | |
| each file. SHA1 is used to calculate hash data used to | |
| derive keys. File session keys are derived from a master | |
| session key generated from the user-supplied password. | |
| If the Flags field in the decryption header contains | |
| the value 0x4000, then the ErdData field must be | |
| decrypted using 3DES. If the value 0x4000 is not set, | |
| then the ErdData field must be decrypted using AlgId. | |
| 7.2.4.7 Reserved1 - Reserved for certificate processing, if value is | |
| zero, then Reserved2 data is absent. See the explanation | |
| under the Certificate Processing Method for details on | |
| this data structure. | |
| 7.2.4.8 Reserved2 - If present, the size of the Reserved2 data structure | |
| is located by skipping the first 4 bytes of this field | |
| and using the next 2 bytes as the remaining size. See | |
| the explanation under the Certificate Processing Method | |
| for details on this data structure. | |
| 7.2.4.9 VSize - This size value will always include the 4 bytes of the | |
| VCRC32 data and will be greater than 4 bytes. | |
| 7.2.4.10 VData - Random data for password validation. This data is VSize | |
| in length and VSize must be a multiple of the encryption | |
| block size. VCRC32 is a checksum value of VData. | |
| VData and VCRC32 are stored encrypted and start the | |
| stream of encrypted data for a file. | |
| 7.2.5 Useful Tips | |
| 7.2.5.1 Strong Encryption is always applied to a file after compression. The | |
| block oriented algorithms all operate in Cypher Block Chaining (CBC) | |
| mode. The block size used for AES encryption is 16. All other block | |
| algorithms use a block size of 8. Two IDs are defined for RC2 to | |
| account for a discrepancy found in the implementation of the RC2 | |
| algorithm in the cryptographic library on Windows XP SP1 and all | |
| earlier versions of Windows. It is recommended that zero length files | |
| not be encrypted, however programs should be prepared to extract them | |
| if they are found within a ZIP file. | |
| 7.2.5.2 A pseudo-code representation of the encryption process is as follows: | |
| Password = GetUserPassword() | |
| MasterSessionKey = DeriveKey(SHA1(Password)) | |
| RD = CryptographicStrengthRandomData() | |
| For Each File | |
| IV = CryptographicStrengthRandomData() | |
| VData = CryptographicStrengthRandomData() | |
| VCRC32 = CRC32(VData) | |
| FileSessionKey = DeriveKey(SHA1(IV + RD) | |
| ErdData = Encrypt(RD,MasterSessionKey,IV) | |
| Encrypt(VData + VCRC32 + FileData, FileSessionKey,IV) | |
| Done | |
| 7.2.5.3 The function names and parameter requirements will depend on | |
| the choice of the cryptographic toolkit selected. Almost any | |
| toolkit supporting the reference implementations for each | |
| algorithm can be used. The RSA BSAFE(r), OpenSSL, and Microsoft | |
| CryptoAPI libraries are all known to work well. | |
| 7.3 Single Password - Central Directory Encryption | |
| -------------------------------------------------- | |
| 7.3.1 Central Directory Encryption is achieved within the .ZIP format by | |
| encrypting the Central Directory structure. This encapsulates the metadata | |
| most often used for processing .ZIP files. Additional metadata is stored for | |
| redundancy in the Local Header for each file. The process of concealing | |
| metadata by encrypting the Central Directory does not protect the data within | |
| the Local Header. To avoid information leakage from the exposed metadata | |
| in the Local Header, the fields containing information about a file are masked. | |
| 7.3.2 Local Header | |
| Masking replaces the true content of the fields for a file in the Local | |
| Header with false information. When masked, the Local Header is not | |
| suitable for streaming access and the options for data recovery of damaged | |
| archives is reduced. Extra Data fields that may contain confidential | |
| data should not be stored within the Local Header. The value set into | |
| the Version needed to extract field should be the correct value needed to | |
| extract the file without regard to Central Directory Encryption. The fields | |
| within the Local Header targeted for masking when the Central Directory is | |
| encrypted are: | |
| Field Name Mask Value | |
| ------------------ --------------------------- | |
| compression method 0 | |
| last mod file time 0 | |
| last mod file date 0 | |
| crc-32 0 | |
| compressed size 0 | |
| uncompressed size 0 | |
| file name (variable size) Base 16 value from the | |
| range 1 - 0xFFFFFFFFFFFFFFFF | |
| represented as a string whose | |
| size will be set into the | |
| file name length field | |
| The Base 16 value assigned as a masked file name is simply a sequentially | |
| incremented value for each file starting with 1 for the first file. | |
| Modifications to a ZIP file may cause different values to be stored for | |
| each file. For compatibility, the file name field in the Local Header | |
| should never be left blank. As of Version 6.2 of this specification, | |
| the Compression Method and Compressed Size fields are not yet masked. | |
| Fields having a value of 0xFFFF or 0xFFFFFFFF for the ZIP64 format | |
| should not be masked. | |
| 7.3.3 Encrypting the Central Directory | |
| Encryption of the Central Directory does not include encryption of the | |
| Central Directory Signature data, the Zip64 End of Central Directory | |
| record, the Zip64 End of Central Directory Locator, or the End | |
| of Central Directory record. The ZIP file comment data is never | |
| encrypted. | |
| Before encrypting the Central Directory, it may optionally be compressed. | |
| Compression is not required, but for storage efficiency it is assumed | |
| this structure will be compressed before encrypting. Similarly, this | |
| specification supports compressing the Central Directory without | |
| requiring that it also be encrypted. Early implementations of this | |
| feature will assume the encryption method applied to files matches the | |
| encryption applied to the Central Directory. | |
| Encryption of the Central Directory is done in a manner similar to | |
| that of file encryption. The encrypted data is preceded by a | |
| decryption header. The decryption header is known as the Archive | |
| Decryption Header. The fields of this record are identical to | |
| the decryption header preceding each encrypted file. The location | |
| of the Archive Decryption Header is determined by the value in the | |
| Start of the Central Directory field in the Zip64 End of Central | |
| Directory record. When the Central Directory is encrypted, the | |
| Zip64 End of Central Directory record will always be present. | |
| The layout of the Zip64 End of Central Directory record for all | |
| versions starting with 6.2 of this specification will follow the | |
| Version 2 format. The Version 2 format is as follows: | |
| The leading fixed size fields within the Version 1 format for this | |
| record remain unchanged. The record signature for both Version 1 | |
| and Version 2 will be 0x06064b50. Immediately following the last | |
| byte of the field known as the Offset of Start of Central | |
| Directory With Respect to the Starting Disk Number will begin the | |
| new fields defining Version 2 of this record. | |
| 7.3.4 New fields for Version 2 | |
| Note: all fields stored in Intel low-byte/high-byte order. | |
| Value Size Description | |
| ----- ---- ----------- | |
| Compression Method 2 bytes Method used to compress the | |
| Central Directory | |
| Compressed Size 8 bytes Size of the compressed data | |
| Original Size 8 bytes Original uncompressed size | |
| AlgId 2 bytes Encryption algorithm ID | |
| BitLen 2 bytes Encryption key length | |
| Flags 2 bytes Encryption flags | |
| HashID 2 bytes Hash algorithm identifier | |
| Hash Length 2 bytes Length of hash data | |
| Hash Data (variable) Hash data | |
| The Compression Method accepts the same range of values as the | |
| corresponding field in the Central Header. | |
| The Compressed Size and Original Size values will not include the | |
| data of the Central Directory Signature which is compressed or | |
| encrypted. | |
| The AlgId, BitLen, and Flags fields accept the same range of values | |
| the corresponding fields within the 0x0017 record. | |
| Hash ID identifies the algorithm used to hash the Central Directory | |
| data. This data does not have to be hashed, in which case the | |
| values for both the HashID and Hash Length will be 0. Possible | |
| values for HashID are: | |
| Value Algorithm | |
| ------ --------- | |
| 0x0000 none | |
| 0x0001 CRC32 | |
| 0x8003 MD5 | |
| 0x8004 SHA1 | |
| 0x8007 RIPEMD160 | |
| 0x800C SHA256 | |
| 0x800D SHA384 | |
| 0x800E SHA512 | |
| 7.3.5 When the Central Directory data is signed, the same hash algorithm | |
| used to hash the Central Directory for signing should be used. | |
| This is recommended for processing efficiency, however, it is | |
| permissible for any of the above algorithms to be used independent | |
| of the signing process. | |
| The Hash Data will contain the hash data for the Central Directory. | |
| The length of this data will vary depending on the algorithm used. | |
| The Version Needed to Extract should be set to 62. | |
| The value for the Total Number of Entries on the Current Disk will | |
| be 0. These records will no longer support random access when | |
| encrypting the Central Directory. | |
| 7.3.6 When the Central Directory is compressed and/or encrypted, the | |
| End of Central Directory record will store the value 0xFFFFFFFF | |
| as the value for the Total Number of Entries in the Central | |
| Directory. The value stored in the Total Number of Entries in | |
| the Central Directory on this Disk field will be 0. The actual | |
| values will be stored in the equivalent fields of the Zip64 | |
| End of Central Directory record. | |
| 7.3.7 Decrypting and decompressing the Central Directory is accomplished | |
| in the same manner as decrypting and decompressing a file. | |
| 7.4 Certificate Processing Method | |
| --------------------------------- | |
| The Certificate Processing Method for ZIP file encryption | |
| defines the following additional data fields: | |
| 7.4.1 Certificate Flag Values | |
| Additional processing flags that can be present in the Flags field of both | |
| the 0x0017 field of the central directory Extra Field and the Decryption | |
| header record preceding compressed file data are: | |
| 0x0007 - reserved for future use | |
| 0x000F - reserved for future use | |
| 0x0100 - Indicates non-OAEP key wrapping was used. If this | |
| this field is set, the version needed to extract must | |
| be at least 61. This means OAEP key wrapping is not | |
| used when generating a Master Session Key using | |
| ErdData. | |
| 0x4000 - ErdData must be decrypted using 3DES-168, otherwise use the | |
| same algorithm used for encrypting the file contents. | |
| 0x8000 - reserved for future use | |
| 7.4.2 CertData - Extra Field 0x0017 record certificate data structure | |
| The data structure used to store certificate data within the section | |
| of the Extra Field defined by the CertData field of the 0x0017 | |
| record are as shown: | |
| Value Size Description | |
| ----- ---- ----------- | |
| RCount 4 bytes Number of recipients. | |
| HashAlg 2 bytes Hash algorithm identifier | |
| HSize 2 bytes Hash size | |
| SRList (var) Simple list of recipients hashed public keys | |
| RCount This defines the number intended recipients whose | |
| public keys were used for encryption. This identifies | |
| the number of elements in the SRList. | |
| HashAlg This defines the hash algorithm used to calculate | |
| the public key hash of each public key used | |
| for encryption. This field currently supports | |
| only the following value for SHA-1 | |
| 0x8004 - SHA1 | |
| HSize This defines the size of a hashed public key. | |
| SRList This is a variable length list of the hashed | |
| public keys for each intended recipient. Each | |
| element in this list is HSize. The total size of | |
| SRList is determined using RCount * HSize. | |
| 7.4.3 Reserved1 - Certificate Decryption Header Reserved1 Data | |
| Value Size Description | |
| ----- ---- ----------- | |
| RCount 4 bytes Number of recipients. | |
| RCount This defines the number intended recipients whose | |
| public keys were used for encryption. This defines | |
| the number of elements in the REList field defined below. | |
| 7.4.4 Reserved2 - Certificate Decryption Header Reserved2 Data Structures | |
| Value Size Description | |
| ----- ---- ----------- | |
| HashAlg 2 bytes Hash algorithm identifier | |
| HSize 2 bytes Hash size | |
| REList (var) List of recipient data elements | |
| HashAlg This defines the hash algorithm used to calculate | |
| the public key hash of each public key used | |
| for encryption. This field currently supports | |
| only the following value for SHA-1 | |
| 0x8004 - SHA1 | |
| HSize This defines the size of a hashed public key | |
| defined in REHData. | |
| REList This is a variable length of list of recipient data. | |
| Each element in this list consists of a Recipient | |
| Element data structure as follows: | |
| Recipient Element (REList) Data Structure: | |
| Value Size Description | |
| ----- ---- ----------- | |
| RESize 2 bytes Size of REHData + REKData | |
| REHData HSize Hash of recipients public key | |
| REKData (var) Simple key blob | |
| RESize This defines the size of an individual REList | |
| element. This value is the combined size of the | |
| REHData field + REKData field. REHData is defined by | |
| HSize. REKData is variable and can be calculated | |
| for each REList element using RESize and HSize. | |
| REHData Hashed public key for this recipient. | |
| REKData Simple Key Blob. The format of this data structure | |
| is identical to that defined in the Microsoft | |
| CryptoAPI and generated using the CryptExportKey() | |
| function. The version of the Simple Key Blob | |
| supported at this time is 0x02 as defined by | |
| Microsoft. | |
| 7.5 Certificate Processing - Central Directory Encryption | |
| --------------------------------------------------------- | |
| 7.5.1 Central Directory Encryption using Digital Certificates will | |
| operate in a manner similar to that of Single Password Central | |
| Directory Encryption. This record will only be present when there | |
| is data to place into it. Currently, data is placed into this | |
| record when digital certificates are used for either encrypting | |
| or signing the files within a ZIP file. When only password | |
| encryption is used with no certificate encryption or digital | |
| signing, this record is not currently needed. When present, this | |
| record will appear before the start of the actual Central Directory | |
| data structure and will be located immediately after the Archive | |
| Decryption Header if the Central Directory is encrypted. | |
| 7.5.2 The Archive Extra Data record will be used to store the following | |
| information. Additional data may be added in future versions. | |
| Extra Data Fields: | |
| 0x0014 - PKCS#7 Store for X.509 Certificates | |
| 0x0016 - X.509 Certificate ID and Signature for central directory | |
| 0x0019 - PKCS#7 Encryption Recipient Certificate List | |
| The 0x0014 and 0x0016 Extra Data records that otherwise would be | |
| located in the first record of the Central Directory for digital | |
| certificate processing. When encrypting or compressing the Central | |
| Directory, the 0x0014 and 0x0016 records must be located in the | |
| Archive Extra Data record and they should not remain in the first | |
| Central Directory record. The Archive Extra Data record will also | |
| be used to store the 0x0019 data. | |
| 7.5.3 When present, the size of the Archive Extra Data record will be | |
| included in the size of the Central Directory. The data of the | |
| Archive Extra Data record will also be compressed and encrypted | |
| along with the Central Directory data structure. | |
| 7.6 Certificate Processing Differences | |
| -------------------------------------- | |
| 7.6.1 The Certificate Processing Method of encryption differs from the | |
| Single Password Symmetric Encryption Method as follows. Instead | |
| of using a user-defined password to generate a master session key, | |
| cryptographically random data is used. The key material is then | |
| wrapped using standard key-wrapping techniques. This key material | |
| is wrapped using the public key of each recipient that will need | |
| to decrypt the file using their corresponding private key. | |
| 7.6.2 This specification currently assumes digital certificates will follow | |
| the X.509 V3 format for 1024 bit and higher RSA format digital | |
| certificates. Implementation of this Certificate Processing Method | |
| requires supporting logic for key access and management. This logic | |
| is outside the scope of this specification. | |
| 7.7 OAEP Processing with Certificate-based Encryption | |
| ----------------------------------------------------- | |
| 7.7.1 OAEP stands for Optimal Asymmetric Encryption Padding. It is a | |
| strengthening technique used for small encoded items such as decryption | |
| keys. This is commonly applied in cryptographic key-wrapping techniques | |
| and is supported by PKCS #1. Versions 5.0 and 6.0 of this specification | |
| were designed to support OAEP key-wrapping for certificate-based | |
| decryption keys for additional security. | |
| 7.7.2 Support for private keys stored on Smartcards or Tokens introduced | |
| a conflict with this OAEP logic. Most card and token products do | |
| not support the additional strengthening applied to OAEP key-wrapped | |
| data. In order to resolve this conflict, versions 6.1 and above of this | |
| specification will no longer support OAEP when encrypting using | |
| digital certificates. | |
| 7.7.3 Versions of PKZIP available during initial development of the | |
| certificate processing method set a value of 61 into the | |
| version needed to extract field for a file. This indicates that | |
| non-OAEP key wrapping is used. This affects certificate encryption | |
| only, and password encryption functions should not be affected by | |
| this value. This means values of 61 may be found on files encrypted | |
| with certificates only, or on files encrypted with both password | |
| encryption and certificate encryption. Files encrypted with both | |
| methods can safely be decrypted using the password methods documented. | |
| 8.0 Splitting and Spanning ZIP files | |
| ------------------------------------- | |
| 8.1 Spanned ZIP files | |
| 8.1.1 Spanning is the process of segmenting a ZIP file across | |
| multiple removable media. This support has typically only | |
| been provided for DOS formatted floppy diskettes. | |
| 8.2 Split ZIP files | |
| 8.2.1 File splitting is a newer derivation of spanning. | |
| Splitting follows the same segmentation process as | |
| spanning, however, it does not require writing each | |
| segment to a unique removable medium and instead supports | |
| placing all pieces onto local or non-removable locations | |
| such as file systems, local drives, folders, etc. | |
| 8.3 File Naming Differences | |
| 8.3.1 A key difference between spanned and split ZIP files is | |
| that all pieces of a spanned ZIP file have the same name. | |
| Since each piece is written to a separate volume, no name | |
| collisions occur and each segment can reuse the original | |
| .ZIP file name given to the archive. | |
| 8.3.2 Sequence ordering for DOS spanned archives uses the DOS | |
| volume label to determine segment numbers. Volume labels | |
| for each segment are written using the form PKBACK#xxx, | |
| where xxx is the segment number written as a decimal | |
| value from 001 - nnn. | |
| 8.3.3 Split ZIP files are typically written to the same location | |
| and are subject to name collisions if the spanned name | |
| format is used since each segment will reside on the same | |
| drive. To avoid name collisions, split archives are named | |
| as follows. | |
| Segment 1 = filename.z01 | |
| Segment n-1 = filename.z(n-1) | |
| Segment n = filename.zip | |
| 8.3.4 The .ZIP extension is used on the last segment to support | |
| quickly reading the central directory. The segment number | |
| n should be a decimal value. | |
| 8.4 Spanned Self-extracting ZIP Files | |
| 8.4.1 Spanned ZIP files may be PKSFX Self-extracting ZIP files. | |
| PKSFX files may also be split, however, in this case | |
| the first segment must be named filename.exe. The first | |
| segment of a split PKSFX archive must be large enough to | |
| include the entire executable program. | |
| 8.5 Capacities and Markers | |
| 8.5.1 Capacities for split archives are as follows: | |
| Maximum number of segments = 4,294,967,295 - 1 | |
| Maximum .ZIP segment size = 4,294,967,295 bytes | |
| Minimum segment size = 64K | |
| Maximum PKSFX segment size = 2,147,483,647 bytes | |
| 8.5.2 Segment sizes may be different however by convention, all | |
| segment sizes should be the same with the exception of the | |
| last, which may be smaller. Local and central directory | |
| header records must never be split across a segment boundary. | |
| When writing a header record, if the number of bytes remaining | |
| within a segment is less than the size of the header record, | |
| end the current segment and write the header at the start | |
| of the next segment. The central directory may span segment | |
| boundaries, but no single record in the central directory | |
| should be split across segments. | |
| 8.5.3 Spanned/Split archives created using PKZIP for Windows | |
| (V2.50 or greater), PKZIP Command Line (V2.50 or greater), | |
| or PKZIP Explorer will include a special spanning | |
| signature as the first 4 bytes of the first segment of | |
| the archive. This signature (0x08074b50) will be | |
| followed immediately by the local header signature for | |
| the first file in the archive. | |
| 8.5.4 A special spanning marker may also appear in spanned/split | |
| archives if the spanning or splitting process starts but | |
| only requires one segment. In this case the 0x08074b50 | |
| signature will be replaced with the temporary spanning | |
| marker signature of 0x30304b50. Split archives can | |
| only be uncompressed by other versions of PKZIP that | |
| know how to create a split archive. | |
| 8.5.5 The signature value 0x08074b50 is also used by some | |
| ZIP implementations as a marker for the Data Descriptor | |
| record. Conflict in this alternate assignment can be | |
| avoided by ensuring the position of the signature | |
| within the ZIP file to determine the use for which it | |
| is intended. | |
| 9.0 Change Process | |
| ------------------ | |
| 9.1 In order for the .ZIP file format to remain a viable technology, this | |
| specification should be considered as open for periodic review and | |
| revision. Although this format was originally designed with a | |
| certain level of extensibility, not all changes in technology | |
| (present or future) were or will be necessarily considered in its | |
| design. | |
| 9.2 If your application requires new definitions to the | |
| extensible sections in this format, or if you would like to | |
| submit new data structures or new capabilities, please forward | |
| your request to zipformat@pkware.com. All submissions will be | |
| reviewed by the ZIP File Specification Committee for possible | |
| inclusion into future versions of this specification. | |
| 9.3 Periodic revisions to this specification will be published as | |
| DRAFT or as FINAL status to ensure interoperability. We encourage | |
| comments and feedback that may help improve clarity or content. | |
| 10.0 Incorporating PKWARE Proprietary Technology into Your Product | |
| ------------------------------------------------------------------ | |
| 10.1 The Use or Implementation in a product of APPNOTE technological | |
| components pertaining to either strong encryption or patching requires | |
| a separate, executed license agreement from PKWARE. Please contact | |
| PKWARE at zipformat@pkware.com or +1-414-289-9788 with regard to | |
| acquiring such a license. | |
| 10.2 Additional information regarding PKWARE proprietray technology is | |
| available at http://www.pkware.com/appnote. | |
| 11.0 Acknowledgements | |
| --------------------- | |
| In addition to the above mentioned contributors to PKZIP and PKUNZIP, | |
| PKWARE would like to extend special thanks to Robert Mahoney for | |
| suggesting the extension .ZIP for this software. | |
| 12.0 References | |
| --------------- | |
| Fiala, Edward R., and Greene, Daniel H., "Data compression with | |
| finite windows", Communications of the ACM, Volume 32, Number 4, | |
| April 1989, pages 490-505. | |
| Held, Gilbert, "Data Compression, Techniques and Applications, | |
| Hardware and Software Considerations", John Wiley & Sons, 1987. | |
| Huffman, D.A., "A method for the construction of minimum-redundancy | |
| codes", Proceedings of the IRE, Volume 40, Number 9, September 1952, | |
| pages 1098-1101. | |
| Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14, | |
| Number 10, October 1989, pages 29-37. | |
| Nelson, Mark, "The Data Compression Book", M&T Books, 1991. | |
| Storer, James A., "Data Compression, Methods and Theory", | |
| Computer Science Press, 1988 | |
| Welch, Terry, "A Technique for High-Performance Data Compression", | |
| IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19. | |
| Ziv, J. and Lempel, A., "A universal algorithm for sequential data | |
| compression", Communications of the ACM, Volume 30, Number 6, | |
| June 1987, pages 520-540. | |
| Ziv, J. and Lempel, A., "Compression of individual sequences via | |
| variable-rate coding", IEEE Transactions on Information Theory, | |
| Volume 24, Number 5, September 1978, pages 530-536. | |
| APPENDIX A - AS/400 Extra Field (0x0065) Attribute Definitions | |
| -------------------------------------------------------------- | |
| A.1 Field Definition Structure: | |
| a. field length including length 2 bytes | |
| b. field code 2 bytes | |
| c. data x bytes | |
| A.2 Field Code Description | |
| 4001 Source type i.e. CLP etc | |
| 4002 The text description of the library | |
| 4003 The text description of the file | |
| 4004 The text description of the member | |
| 4005 x'F0' or 0 is PF-DTA, x'F1' or 1 is PF_SRC | |
| 4007 Database Type Code 1 byte | |
| 4008 Database file and fields definition | |
| 4009 GZIP file type 2 bytes | |
| 400B IFS code page 2 bytes | |
| 400C IFS Creation Time 4 bytes | |
| 400D IFS Access Time 4 bytes | |
| 400E IFS Modification time 4 bytes | |
| 005C Length of the records in the file 2 bytes | |
| 0068 GZIP two words 8 bytes | |
| APPENDIX B - z/OS Extra Field (0x0065) Attribute Definitions | |
| ------------------------------------------------------------ | |
| B.1 Field Definition Structure: | |
| a. field length including length 2 bytes | |
| b. field code 2 bytes | |
| c. data x bytes | |
| B.2 Field Code Description | |
| 0001 File Type 2 bytes | |
| 0002 NonVSAM Record Format 1 byte | |
| 0003 Reserved | |
| 0004 NonVSAM Block Size 2 bytes Big Endian | |
| 0005 Primary Space Allocation 3 bytes Big Endian | |
| 0006 Secondary Space Allocation 3 bytes Big Endian | |
| 0007 Space Allocation Type1 byte flag | |
| 0008 Modification Date Retired with PKZIP 5.0 + | |
| 0009 Expiration Date Retired with PKZIP 5.0 + | |
| 000A PDS Directory Block Allocation 3 bytes Big Endian binary value | |
| 000B NonVSAM Volume List variable | |
| 000C UNIT Reference Retired with PKZIP 5.0 + | |
| 000D DF/SMS Management Class 8 bytes EBCDIC Text Value | |
| 000E DF/SMS Storage Class 8 bytes EBCDIC Text Value | |
| 000F DF/SMS Data Class 8 bytes EBCDIC Text Value | |
| 0010 PDS/PDSE Member Info. 30 bytes | |
| 0011 VSAM sub-filetype 2 bytes | |
| 0012 VSAM LRECL 13 bytes EBCDIC "(num_avg num_max)" | |
| 0013 VSAM Cluster Name Retired with PKZIP 5.0 + | |
| 0014 VSAM KSDS Key Information 13 bytes EBCDIC "(num_length num_position)" | |
| 0015 VSAM Average LRECL 5 bytes EBCDIC num_value padded with blanks | |
| 0016 VSAM Maximum LRECL 5 bytes EBCDIC num_value padded with blanks | |
| 0017 VSAM KSDS Key Length 5 bytes EBCDIC num_value padded with blanks | |
| 0018 VSAM KSDS Key Position 5 bytes EBCDIC num_value padded with blanks | |
| 0019 VSAM Data Name 1-44 bytes EBCDIC text string | |
| 001A VSAM KSDS Index Name 1-44 bytes EBCDIC text string | |
| 001B VSAM Catalog Name 1-44 bytes EBCDIC text string | |
| 001C VSAM Data Space Type 9 bytes EBCDIC text string | |
| 001D VSAM Data Space Primary 9 bytes EBCDIC num_value left-justified | |
| 001E VSAM Data Space Secondary 9 bytes EBCDIC num_value left-justified | |
| 001F VSAM Data Volume List variable EBCDIC text list of 6-character Volume IDs | |
| 0020 VSAM Data Buffer Space 8 bytes EBCDIC num_value left-justified | |
| 0021 VSAM Data CISIZE 5 bytes EBCDIC num_value left-justified | |
| 0022 VSAM Erase Flag 1 byte flag | |
| 0023 VSAM Free CI % 3 bytes EBCDIC num_value left-justified | |
| 0024 VSAM Free CA % 3 bytes EBCDIC num_value left-justified | |
| 0025 VSAM Index Volume List variable EBCDIC text list of 6-character Volume IDs | |
| 0026 VSAM Ordered Flag 1 byte flag | |
| 0027 VSAM REUSE Flag 1 byte flag | |
| 0028 VSAM SPANNED Flag 1 byte flag | |
| 0029 VSAM Recovery Flag 1 byte flag | |
| 002A VSAM WRITECHK Flag 1 byte flag | |
| 002B VSAM Cluster/Data SHROPTS 3 bytes EBCDIC "n,y" | |
| 002C VSAM Index SHROPTS 3 bytes EBCDIC "n,y" | |
| 002D VSAM Index Space Type 9 bytes EBCDIC text string | |
| 002E VSAM Index Space Primary 9 bytes EBCDIC num_value left-justified | |
| 002F VSAM Index Space Secondary 9 bytes EBCDIC num_value left-justified | |
| 0030 VSAM Index CISIZE 5 bytes EBCDIC num_value left-justified | |
| 0031 VSAM Index IMBED 1 byte flag | |
| 0032 VSAM Index Ordered Flag 1 byte flag | |
| 0033 VSAM REPLICATE Flag 1 byte flag | |
| 0034 VSAM Index REUSE Flag 1 byte flag | |
| 0035 VSAM Index WRITECHK Flag 1 byte flag Retired with PKZIP 5.0 + | |
| 0036 VSAM Owner 8 bytes EBCDIC text string | |
| 0037 VSAM Index Owner 8 bytes EBCDIC text string | |
| 0038 Reserved | |
| 0039 Reserved | |
| 003A Reserved | |
| 003B Reserved | |
| 003C Reserved | |
| 003D Reserved | |
| 003E Reserved | |
| 003F Reserved | |
| 0040 Reserved | |
| 0041 Reserved | |
| 0042 Reserved | |
| 0043 Reserved | |
| 0044 Reserved | |
| 0045 Reserved | |
| 0046 Reserved | |
| 0047 Reserved | |
| 0048 Reserved | |
| 0049 Reserved | |
| 004A Reserved | |
| 004B Reserved | |
| 004C Reserved | |
| 004D Reserved | |
| 004E Reserved | |
| 004F Reserved | |
| 0050 Reserved | |
| 0051 Reserved | |
| 0052 Reserved | |
| 0053 Reserved | |
| 0054 Reserved | |
| 0055 Reserved | |
| 0056 Reserved | |
| 0057 Reserved | |
| 0058 PDS/PDSE Member TTR Info. 6 bytes Big Endian | |
| 0059 PDS 1st LMOD Text TTR 3 bytes Big Endian | |
| 005A PDS LMOD EP Rec # 4 bytes Big Endian | |
| 005B Reserved | |
| 005C Max Length of records 2 bytes Big Endian | |
| 005D PDSE Flag 1 byte flag | |
| 005E Reserved | |
| 005F Reserved | |
| 0060 Reserved | |
| 0061 Reserved | |
| 0062 Reserved | |
| 0063 Reserved | |
| 0064 Reserved | |
| 0065 Last Date Referenced 4 bytes Packed Hex "yyyymmdd" | |
| 0066 Date Created 4 bytes Packed Hex "yyyymmdd" | |
| 0068 GZIP two words 8 bytes | |
| 0071 Extended NOTE Location 12 bytes Big Endian | |
| 0072 Archive device UNIT 6 bytes EBCDIC | |
| 0073 Archive 1st Volume 6 bytes EBCDIC | |
| 0074 Archive 1st VOL File Seq# 2 bytes Binary | |
| APPENDIX C - Zip64 Extensible Data Sector Mappings | |
| --------------------------------------------------- | |
| -Z390 Extra Field: | |
| The following is the general layout of the attributes for the | |
| ZIP 64 "extra" block for extended tape operations. | |
| Note: some fields stored in Big Endian format. All text is | |
| in EBCDIC format unless otherwise specified. | |
| Value Size Description | |
| ----- ---- ----------- | |
| (Z390) 0x0065 2 bytes Tag for this "extra" block type | |
| Size 4 bytes Size for the following data block | |
| Tag 4 bytes EBCDIC "Z390" | |
| Length71 2 bytes Big Endian | |
| Subcode71 2 bytes Enote type code | |
| FMEPos 1 byte | |
| Length72 2 bytes Big Endian | |
| Subcode72 2 bytes Unit type code | |
| Unit 1 byte Unit | |
| Length73 2 bytes Big Endian | |
| Subcode73 2 bytes Volume1 type code | |
| FirstVol 1 byte Volume | |
| Length74 2 bytes Big Endian | |
| Subcode74 2 bytes FirstVol file sequence | |
| FileSeq 2 bytes Sequence | |
| APPENDIX D - Language Encoding (EFS) | |
| ------------------------------------ | |
| D.1 The ZIP format has historically supported only the original IBM PC character | |
| encoding set, commonly referred to as IBM Code Page 437. This limits storing | |
| file name characters to only those within the original MS-DOS range of values | |
| and does not properly support file names in other character encodings, or | |
| languages. To address this limitation, this specification will support the | |
| following change. | |
| D.2 If general purpose bit 11 is unset, the file name and comment should conform | |
| to the original ZIP character encoding. If general purpose bit 11 is set, the | |
| filename and comment must support The Unicode Standard, Version 4.1.0 or | |
| greater using the character encoding form defined by the UTF-8 storage | |
| specification. The Unicode Standard is published by the The Unicode | |
| Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files | |
| is expected to not include a byte order mark (BOM). | |
| D.3 Applications may choose to supplement this file name storage through the use | |
| of the 0x0008 Extra Field. Storage for this optional field is currently | |
| undefined, however it will be used to allow storing extended information | |
| on source or target encoding that may further assist applications with file | |
| name, or file content encoding tasks. Please contact PKWARE with any | |
| requirements on how this field should be used. | |
| D.4 The 0x0008 Extra Field storage may be used with either setting for general | |
| purpose bit 11. Examples of the intended usage for this field is to store | |
| whether "modified-UTF-8" (JAVA) is used, or UTF-8-MAC. Similarly, other | |
| commonly used character encoding (code page) designations can be indicated | |
| through this field. Formalized values for use of the 0x0008 record remain | |
| undefined at this time. The definition for the layout of the 0x0008 field | |
| will be published when available. Use of the 0x0008 Extra Field provides | |
| for storing data within a ZIP file in an encoding other than IBM Code | |
| Page 437 or UTF-8. | |
| D.5 General purpose bit 11 will not imply any encoding of file content or | |
| password. Values defining character encoding for file content or | |
| password must be stored within the 0x0008 Extended Language Encoding | |
| Extra Field. | |
| D.6 Ed Gordon of the Info-ZIP group has defined a pair of "extra field" records | |
| that can be used to store UTF-8 file name and file comment fields. These | |
| records can be used for cases when the general purpose bit 11 method | |
| for storing UTF-8 data in the standard file name and comment fields is | |
| not desirable. A common case for this alternate method is if backward | |
| compatibility with older programs is required. | |
| D.7 Definitions for the record structure of these fields are included above | |
| in the section on 3rd party mappings for "extra field" records. These | |
| records are identified by Header ID's 0x6375 (Info-ZIP Unicode Comment | |
| Extra Field) and 0x7075 (Info-ZIP Unicode Path Extra Field). | |
| D.8 The choice of which storage method to use when writing a ZIP file is left | |
| to the implementation. Developers should expect that a ZIP file may | |
| contain either method and should provide support for reading data in | |
| either format. Use of general purpose bit 11 reduces storage requirements | |
| for file name data by not requiring additional "extra field" data for | |
| each file, but can result in older ZIP programs not being able to extract | |
| files. Use of the 0x6375 and 0x7075 records will result in a ZIP file | |
| that should always be readable by older ZIP programs, but requires more | |
| storage per file to write file name and/or file comment fields. |