From fc9c1727b5b3483ce49c3cb668e8332fb001b8a7 Mon Sep 17 00:00:00 2001 From: Luigi 'Comio' Mantellini Date: Mon, 8 Sep 2008 02:46:13 +0200 Subject: Add support for LZMA uncompression algorithm. Signed-off-by: Luigi 'Comio' Mantellini Signed-off-by: Jean-Christophe PLAGNIOL-VILLARD --- lib_generic/lzma/lzma.txt | 663 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 663 insertions(+) create mode 100644 lib_generic/lzma/lzma.txt (limited to 'lib_generic/lzma/lzma.txt') diff --git a/lib_generic/lzma/lzma.txt b/lib_generic/lzma/lzma.txt new file mode 100644 index 0000000..c40d133 --- /dev/null +++ b/lib_generic/lzma/lzma.txt @@ -0,0 +1,663 @@ +LZMA SDK 4.57 +------------- + +LZMA SDK Copyright (C) 1999-2007 Igor Pavlov + +LZMA SDK provides the documentation, samples, header files, libraries, +and tools you need to develop applications that use LZMA compression. + +LZMA is default and general compression method of 7z format +in 7-Zip compression program (www.7-zip.org). LZMA provides high +compression ratio and very fast decompression. + +LZMA is an improved version of famous LZ77 compression algorithm. +It was improved in way of maximum increasing of compression ratio, +keeping high decompression speed and low memory requirements for +decompressing. + + + +LICENSE +------- + +LZMA SDK is available under any of the following licenses: + +1) GNU Lesser General Public License (GNU LGPL) +2) Common Public License (CPL) +3) Simplified license for unmodified code (read SPECIAL EXCEPTION) +4) Proprietary license + +It means that you can select one of these four options and follow rules of that license. + + +1,2) GNU LGPL and CPL licenses are pretty similar and both these +licenses are classified as + - "Free software licenses" at http://www.gnu.org/ + - "OSI-approved" at http://www.opensource.org/ + + +3) SPECIAL EXCEPTION + +Igor Pavlov, as the author of this code, expressly permits you +to statically or dynamically link your code (or bind by name) +to the files from LZMA SDK without subjecting your linked +code to the terms of the CPL or GNU LGPL. +Any modifications or additions to files from LZMA SDK, however, +are subject to the GNU LGPL or CPL terms. + +SPECIAL EXCEPTION allows you to use LZMA SDK in applications with closed code, +while you keep LZMA SDK code unmodified. + + +SPECIAL EXCEPTION #2: Igor Pavlov, as the author of this code, expressly permits +you to use this code under the same terms and conditions contained in the License +Agreement you have for any previous version of LZMA SDK developed by Igor Pavlov. + +SPECIAL EXCEPTION #2 allows owners of proprietary licenses to use latest version +of LZMA SDK as update for previous versions. + + +SPECIAL EXCEPTION #3: Igor Pavlov, as the author of this code, expressly permits +you to use code of the following files: +BranchTypes.h, LzmaTypes.h, LzmaTest.c, LzmaStateTest.c, LzmaAlone.cpp, +LzmaAlone.cs, LzmaAlone.java +as public domain code. + + +4) Proprietary license + +LZMA SDK also can be available under a proprietary license which +can include: + +1) Right to modify code without subjecting modified code to the +terms of the CPL or GNU LGPL +2) Technical support for code + +To request such proprietary license or any additional consultations, +send email message from that page: +http://www.7-zip.org/support.html + + +You should have received a copy of the GNU Lesser General Public +License along with this library; if not, write to the Free Software +Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA + +You should have received a copy of the Common Public License +along with this library. + + +LZMA SDK Contents +----------------- + +LZMA SDK includes: + + - C++ source code of LZMA compressing and decompressing + - ANSI-C compatible source code for LZMA decompressing + - C# source code for LZMA compressing and decompressing + - Java source code for LZMA compressing and decompressing + - Compiled file->file LZMA compressing/decompressing program for Windows system + +ANSI-C LZMA decompression code was ported from original C++ sources to C. +Also it was simplified and optimized for code size. +But it is fully compatible with LZMA from 7-Zip. + + +UNIX/Linux version +------------------ +To compile C++ version of file->file LZMA, go to directory +C/7zip/Compress/LZMA_Alone +and type "make" or "make clean all" to recompile all. + +In some UNIX/Linux versions you must compile LZMA with static libraries. +To compile with static libraries, change string in makefile +LIB = -lm +to string +LIB = -lm -static + + +Files +--------------------- +C - C source code +CPP - CPP source code +CS - C# source code +Java - Java source code +lzma.txt - LZMA SDK description (this file) +7zFormat.txt - 7z Format description +7zC.txt - 7z ANSI-C Decoder description (this file) +methods.txt - Compression method IDs for .7z +LGPL.txt - GNU Lesser General Public License +CPL.html - Common Public License +lzma.exe - Compiled file->file LZMA encoder/decoder for Windows +history.txt - history of the LZMA SDK + + +Source code structure +--------------------- + +C - C files + Compress - files related to compression/decompression + Lz - files related to LZ (Lempel-Ziv) compression algorithm + Lzma - ANSI-C compatible LZMA decompressor + + LzmaDecode.h - interface for LZMA decoding on ANSI-C + LzmaDecode.c - LZMA decoding on ANSI-C (new fastest version) + LzmaDecodeSize.c - LZMA decoding on ANSI-C (old size-optimized version) + LzmaTest.c - test application that decodes LZMA encoded file + LzmaTypes.h - basic types for LZMA Decoder + LzmaStateDecode.h - interface for LZMA decoding (State version) + LzmaStateDecode.c - LZMA decoding on ANSI-C (State version) + LzmaStateTest.c - test application (State version) + + Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code + + Archive - files related to archiving + 7z_C - 7z ANSI-C Decoder + + +CPP -- CPP files + + Common - common files for C++ projects + Windows - common files for Windows related code + 7zip - files related to 7-Zip Project + + Common - common files for 7-Zip + + Compress - files related to compression/decompression + + LZ - files related to LZ (Lempel-Ziv) compression algorithm + + Copy - Copy coder + RangeCoder - Range Coder (special code of compression/decompression) + LZMA - LZMA compression/decompression on C++ + LZMA_Alone - file->file LZMA compression/decompression + + Branch - Filters for x86, IA-64, ARM, ARM-Thumb, PowerPC and SPARC code + + Archive - files related to archiving + + Common - common files for archive handling + 7z - 7z C++ Encoder/Decoder + + Bundles - Modules that are bundles of other modules + + Alone7z - 7zr.exe: Standalone version of 7z.exe that supports only 7z/LZMA/BCJ/BCJ2 + Format7zR - 7zr.dll: Reduced version of 7za.dll: extracting/compressing to 7z/LZMA/BCJ/BCJ2 + Format7zExtractR - 7zxr.dll: Reduced version of 7zxa.dll: extracting from 7z/LZMA/BCJ/BCJ2. + + UI - User Interface files + + Client7z - Test application for 7za.dll, 7zr.dll, 7zxr.dll + Common - Common UI files + Console - Code for console archiver + + + +CS - C# files + 7zip + Common - some common files for 7-Zip + Compress - files related to compression/decompression + LZ - files related to LZ (Lempel-Ziv) compression algorithm + LZMA - LZMA compression/decompression + LzmaAlone - file->file LZMA compression/decompression + RangeCoder - Range Coder (special code of compression/decompression) + +Java - Java files + SevenZip + Compression - files related to compression/decompression + LZ - files related to LZ (Lempel-Ziv) compression algorithm + LZMA - LZMA compression/decompression + RangeCoder - Range Coder (special code of compression/decompression) + +C/C++ source code of LZMA SDK is part of 7-Zip project. + +You can find ANSI-C LZMA decompressing code at folder + C/7zip/Compress/Lzma +7-Zip doesn't use that ANSI-C LZMA code and that code was developed +specially for this SDK. And files from C/7zip/Compress/Lzma do not need +files from other directories of SDK for compiling. + +7-Zip source code can be downloaded from 7-Zip's SourceForge page: + + http://sourceforge.net/projects/sevenzip/ + + +LZMA features +------------- + - Variable dictionary size (up to 1 GB) + - Estimated compressing speed: about 1 MB/s on 1 GHz CPU + - Estimated decompressing speed: + - 8-12 MB/s on 1 GHz Intel Pentium 3 or AMD Athlon + - 500-1000 KB/s on 100 MHz ARM, MIPS, PowerPC or other simple RISC + - Small memory requirements for decompressing (8-32 KB + DictionarySize) + - Small code size for decompressing: 2-8 KB (depending from + speed optimizations) + +LZMA decoder uses only integer operations and can be +implemented in any modern 32-bit CPU (or on 16-bit CPU with some conditions). + +Some critical operations that affect to speed of LZMA decompression: + 1) 32*16 bit integer multiply + 2) Misspredicted branches (penalty mostly depends from pipeline length) + 3) 32-bit shift and arithmetic operations + +Speed of LZMA decompressing mostly depends from CPU speed. +Memory speed has no big meaning. But if your CPU has small data cache, +overall weight of memory speed will slightly increase. + + +How To Use +---------- + +Using LZMA encoder/decoder executable +-------------------------------------- + +Usage: LZMA inputFile outputFile [...] + + e: encode file + + d: decode file + + b: Benchmark. There are two tests: compressing and decompressing + with LZMA method. Benchmark shows rating in MIPS (million + instructions per second). Rating value is calculated from + measured speed and it is normalized with AMD Athlon 64 X2 CPU + results. Also Benchmark checks possible hardware errors (RAM + errors in most cases). Benchmark uses these settings: + (-a1, -d21, -fb32, -mfbt4). You can change only -d. Also you + can change number of iterations. Example for 30 iterations: + LZMA b 30 + Default number of iterations is 10. + + + + + -a{N}: set compression mode 0 = fast, 1 = normal + default: 1 (normal) + + d{N}: Sets Dictionary size - [0, 30], default: 23 (8MB) + The maximum value for dictionary size is 1 GB = 2^30 bytes. + Dictionary size is calculated as DictionarySize = 2^N bytes. + For decompressing file compressed by LZMA method with dictionary + size D = 2^N you need about D bytes of memory (RAM). + + -fb{N}: set number of fast bytes - [5, 273], default: 128 + Usually big number gives a little bit better compression ratio + and slower compression process. + + -lc{N}: set number of literal context bits - [0, 8], default: 3 + Sometimes lc=4 gives gain for big files. + + -lp{N}: set number of literal pos bits - [0, 4], default: 0 + lp switch is intended for periodical data when period is + equal 2^N. For example, for 32-bit (4 bytes) + periodical data you can use lp=2. Often it's better to set lc0, + if you change lp switch. + + -pb{N}: set number of pos bits - [0, 4], default: 2 + pb switch is intended for periodical data + when period is equal 2^N. + + -mf{MF_ID}: set Match Finder. Default: bt4. + Algorithms from hc* group doesn't provide good compression + ratio, but they often works pretty fast in combination with + fast mode (-a0). + + Memory requirements depend from dictionary size + (parameter "d" in table below). + + MF_ID Memory Description + + bt2 d * 9.5 + 4MB Binary Tree with 2 bytes hashing. + bt3 d * 11.5 + 4MB Binary Tree with 3 bytes hashing. + bt4 d * 11.5 + 4MB Binary Tree with 4 bytes hashing. + hc4 d * 7.5 + 4MB Hash Chain with 4 bytes hashing. + + -eos: write End Of Stream marker. By default LZMA doesn't write + eos marker, since LZMA decoder knows uncompressed size + stored in .lzma file header. + + -si: Read data from stdin (it will write End Of Stream marker). + -so: Write data to stdout + + +Examples: + +1) LZMA e file.bin file.lzma -d16 -lc0 + +compresses file.bin to file.lzma with 64 KB dictionary (2^16=64K) +and 0 literal context bits. -lc0 allows to reduce memory requirements +for decompression. + + +2) LZMA e file.bin file.lzma -lc0 -lp2 + +compresses file.bin to file.lzma with settings suitable +for 32-bit periodical data (for example, ARM or MIPS code). + +3) LZMA d file.lzma file.bin + +decompresses file.lzma to file.bin. + + +Compression ratio hints +----------------------- + +Recommendations +--------------- + +To increase compression ratio for LZMA compressing it's desirable +to have aligned data (if it's possible) and also it's desirable to locate +data in such order, where code is grouped in one place and data is +grouped in other place (it's better than such mixing: code, data, code, +data, ...). + + +Using Filters +------------- +You can increase compression ratio for some data types, using +special filters before compressing. For example, it's possible to +increase compression ratio on 5-10% for code for those CPU ISAs: +x86, IA-64, ARM, ARM-Thumb, PowerPC, SPARC. + +You can find C/C++ source code of such filters in folder "7zip/Compress/Branch" + +You can check compression ratio gain of these filters with such +7-Zip commands (example for ARM code): +No filter: + 7z a a1.7z a.bin -m0=lzma + +With filter for little-endian ARM code: + 7z a a2.7z a.bin -m0=bc_arm -m1=lzma + +With filter for big-endian ARM code (using additional Swap4 filter): + 7z a a3.7z a.bin -m0=swap4 -m1=bc_arm -m2=lzma + +It works in such manner: +Compressing = Filter_encoding + LZMA_encoding +Decompressing = LZMA_decoding + Filter_decoding + +Compressing and decompressing speed of such filters is very high, +so it will not increase decompressing time too much. +Moreover, it reduces decompression time for LZMA_decoding, +since compression ratio with filtering is higher. + +These filters convert CALL (calling procedure) instructions +from relative offsets to absolute addresses, so such data becomes more +compressible. Source code of these CALL filters is pretty simple +(about 20 lines of C++), so you can convert it from C++ version yourself. + +For some ISAs (for example, for MIPS) it's impossible to get gain from such filter. + + +LZMA compressed file format +--------------------------- +Offset Size Description + 0 1 Special LZMA properties for compressed data + 1 4 Dictionary size (little endian) + 5 8 Uncompressed size (little endian). -1 means unknown size + 13 Compressed data + + +ANSI-C LZMA Decoder +~~~~~~~~~~~~~~~~~~~ + +To compile ANSI-C LZMA Decoder you can use one of the following files sets: +1) LzmaDecode.h + LzmaDecode.c + LzmaTest.c (fastest version) +2) LzmaDecode.h + LzmaDecodeSize.c + LzmaTest.c (old size-optimized version) +3) LzmaStateDecode.h + LzmaStateDecode.c + LzmaStateTest.c (zlib-like interface) + + +Memory requirements for LZMA decoding +------------------------------------- + +LZMA decoder doesn't allocate memory itself, so you must +allocate memory and send it to LZMA. + +Stack usage of LZMA decoding function for local variables is not +larger than 200 bytes. + +How To decompress data +---------------------- + +LZMA Decoder (ANSI-C version) now supports 5 interfaces: +1) Single-call Decompressing +2) Single-call Decompressing with input stream callback +3) Multi-call Decompressing with output buffer +4) Multi-call Decompressing with input callback and output buffer +5) Multi-call State Decompressing (zlib-like interface) + +Variant-5 is similar to Variant-4, but Variant-5 doesn't use callback functions. + +Decompressing steps +------------------- + +1) read LZMA properties (5 bytes): + unsigned char properties[LZMA_PROPERTIES_SIZE]; + +2) read uncompressed size (8 bytes, little-endian) + +3) Decode properties: + + CLzmaDecoderState state; /* it's 24-140 bytes structure, if int is 32-bit */ + + if (LzmaDecodeProperties(&state.Properties, properties, LZMA_PROPERTIES_SIZE) != LZMA_RESULT_OK) + return PrintError(rs, "Incorrect stream properties"); + +4) Allocate memory block for internal Structures: + + state.Probs = (CProb *)malloc(LzmaGetNumProbs(&state.Properties) * sizeof(CProb)); + if (state.Probs == 0) + return PrintError(rs, kCantAllocateMessage); + + LZMA decoder uses array of CProb variables as internal structure. + By default, CProb is unsigned_short. But you can define _LZMA_PROB32 to make + it unsigned_int. It can increase speed on some 32-bit CPUs, but memory + usage will be doubled in that case. + + +5) Main Decompressing + +You must use one of the following interfaces: + +5.1 Single-call Decompressing +----------------------------- +When to use: RAM->RAM decompressing +Compile files: LzmaDecode.h, LzmaDecode.c +Compile defines: no defines +Memory Requirements: + - Input buffer: compressed size + - Output buffer: uncompressed size + - LZMA Internal Structures (~16 KB for default settings) + +Interface: + int res = LzmaDecode(&state, + inStream, compressedSize, &inProcessed, + outStream, outSize, &outProcessed); + + +5.2 Single-call Decompressing with input stream callback +-------------------------------------------------------- +When to use: File->RAM or Flash->RAM decompressing. +Compile files: LzmaDecode.h, LzmaDecode.c +Compile defines: _LZMA_IN_CB +Memory Requirements: + - Buffer for input stream: any size (for example, 16 KB) + - Output buffer: uncompressed size + - LZMA Internal Structures (~16 KB for default settings) + +Interface: + typedef struct _CBuffer + { + ILzmaInCallback InCallback; + FILE *File; + unsigned char Buffer[kInBufferSize]; + } CBuffer; + + int LzmaReadCompressed(void *object, const unsigned char **buffer, SizeT *size) + { + CBuffer *bo = (CBuffer *)object; + *buffer = bo->Buffer; + *size = MyReadFile(bo->File, bo->Buffer, kInBufferSize); + return LZMA_RESULT_OK; + } + + CBuffer g_InBuffer; + + g_InBuffer.File = inFile; + g_InBuffer.InCallback.Read = LzmaReadCompressed; + int res = LzmaDecode(&state, + &g_InBuffer.InCallback, + outStream, outSize, &outProcessed); + + +5.3 Multi-call decompressing with output buffer +----------------------------------------------- +When to use: RAM->File decompressing +Compile files: LzmaDecode.h, LzmaDecode.c +Compile defines: _LZMA_OUT_READ +Memory Requirements: + - Input buffer: compressed size + - Buffer for output stream: any size (for example, 16 KB) + - LZMA Internal Structures (~16 KB for default settings) + - LZMA dictionary (dictionary size is encoded in stream properties) + +Interface: + + state.Dictionary = (unsigned char *)malloc(state.Properties.DictionarySize); + + LzmaDecoderInit(&state); + do + { + LzmaDecode(&state, + inBuffer, inAvail, &inProcessed, + g_OutBuffer, outAvail, &outProcessed); + inAvail -= inProcessed; + inBuffer += inProcessed; + } + while you need more bytes + + see LzmaTest.c for more details. + + +5.4 Multi-call decompressing with input callback and output buffer +------------------------------------------------------------------ +When to use: File->File decompressing +Compile files: LzmaDecode.h, LzmaDecode.c +Compile defines: _LZMA_IN_CB, _LZMA_OUT_READ +Memory Requirements: + - Buffer for input stream: any size (for example, 16 KB) + - Buffer for output stream: any size (for example, 16 KB) + - LZMA Internal Structures (~16 KB for default settings) + - LZMA dictionary (dictionary size is encoded in stream properties) + +Interface: + + state.Dictionary = (unsigned char *)malloc(state.Properties.DictionarySize); + + LzmaDecoderInit(&state); + do + { + LzmaDecode(&state, + &bo.InCallback, + g_OutBuffer, outAvail, &outProcessed); + } + while you need more bytes + + see LzmaTest.c for more details: + + +5.5 Multi-call State Decompressing (zlib-like interface) +------------------------------------------------------------------ +When to use: file->file decompressing +Compile files: LzmaStateDecode.h, LzmaStateDecode.c +Compile defines: +Memory Requirements: + - Buffer for input stream: any size (for example, 16 KB) + - Buffer for output stream: any size (for example, 16 KB) + - LZMA Internal Structures (~16 KB for default settings) + - LZMA dictionary (dictionary size is encoded in stream properties) + +Interface: + + state.Dictionary = (unsigned char *)malloc(state.Properties.DictionarySize); + + + LzmaDecoderInit(&state); + do + { + res = LzmaDecode(&state, + inBuffer, inAvail, &inProcessed, + g_OutBuffer, outAvail, &outProcessed, + finishDecoding); + inAvail -= inProcessed; + inBuffer += inProcessed; + } + while you need more bytes + + see LzmaStateTest.c for more details: + + +6) Free all allocated blocks + + +Note +---- +LzmaDecodeSize.c is size-optimized version of LzmaDecode.c. +But compiled code of LzmaDecodeSize.c can be larger than +compiled code of LzmaDecode.c. So it's better to use +LzmaDecode.c in most cases. + + +EXIT codes +----------- + +LZMA decoder can return one of the following codes: + +#define LZMA_RESULT_OK 0 +#define LZMA_RESULT_DATA_ERROR 1 + +If you use callback function for input data and you return some +error code, LZMA Decoder also returns that code. + + + +LZMA Defines +------------ + +_LZMA_IN_CB - Use callback for input data + +_LZMA_OUT_READ - Use read function for output data + +_LZMA_LOC_OPT - Enable local speed optimizations inside code. + _LZMA_LOC_OPT is only for LzmaDecodeSize.c (size-optimized version). + _LZMA_LOC_OPT doesn't affect LzmaDecode.c (speed-optimized version) + and LzmaStateDecode.c + +_LZMA_PROB32 - It can increase speed on some 32-bit CPUs, + but memory usage will be doubled in that case + +_LZMA_UINT32_IS_ULONG - Define it if int is 16-bit on your compiler + and long is 32-bit. + +_LZMA_SYSTEM_SIZE_T - Define it if you want to use system's size_t. + You can use it to enable 64-bit sizes supporting + + + +C++ LZMA Encoder/Decoder +~~~~~~~~~~~~~~~~~~~~~~~~ +C++ LZMA code use COM-like interfaces. So if you want to use it, +you can study basics of COM/OLE. + +By default, LZMA Encoder contains all Match Finders. +But for compressing it's enough to have just one of them. +So for reducing size of compressing code you can define: + #define COMPRESS_MF_BT + #define COMPRESS_MF_BT4 +and it will use only bt4 match finder. + + +--- + +http://www.7-zip.org +http://www.7-zip.org/support.html -- cgit v1.1