This article is about Gotcha Force PZZ file format and ongoing researches on it.
This file format is almost completely documented. The structure of this file is well known. |
PZZ files are archive packing files with a compression algorithm available.
Format
PZZ files have a fixed length header of header 0x800 bytes (2048 bytes).
We found files directly after the header one after the other (See note). Files can be compressed or not. Theoretically a pzz file can handle max 511 files (See Header). In the Gotcha Force case pzz files are in big endian (this is not the case in wii games).
Note: Every file is aligned to a 0x800 bytes (2048) multiple with Null pading ("\x00") before to be packed.
Header
Header:
- 4 bytes - file_count uint - Total files count packed in the pzz.
- 4 bytes[file_count] - uint - file_descriptors describing files length and compression.
Note: Header length is 0x800 bytes (2048) and each file_descriptor takes 4 bytes, this allows max 2048/4-1 max total file_discriptors so the PZZ cannot contain more than 511 files. All pzz from Gotcha Force contain less than 20 files.
File descriptors format
If we assign an index to each bit of a file_descriptor from 0 (least significant bit) to 31 (most significant bit):
- Bit 31 - Unused.
- Bit 30 - Compression flag: set to 1 if the file is compressed else 0.
- Bit 0 to 29 (30 bits) - File length divided by 0x800 bytes (2048).
So the max length of a pzz packed file is (2^30) * 2048: more than 2 Tb (2 199 023 255 552 octets). In others words there is no file length restriction.
File length (in bytes) is computed as following:
- (file_descriptor & 0x3FFFFFFF) * 0x800
Here 0x3FFFFFFF is a mask allowing to retrieve the 30 least significant bits from the 32 bits of the file_descriptor.
Compression flag is retrieved with another mask:
- file_descriptor & 0x40000000.
To better understand how a file descriptor works here is an example:
The first file to be packed has a length of 12kb (12 000 bytes) and is packed compressed. So it's file descriptor is equal to:
- upper_round(12000/0x800)+0x40000000 = 0x40000006
So the 40 00 00 06 is stored just after the header file count field. A file descriptor describes sometimes an empty file, in this case the file_descriptor is equal to "00 00 00 00" and will be counted in the header file count field.
Files padding
When a pzz packed file is stored uncompressed then we find a padding added to its end because of the 0x800 align. In this case it's impossible to know the padding length and how to remove it exactly since the file could contain ending Null bytes.
Compression algorithm
The compression algorithm has to be investigated.
Observations
Files and compression
All stxx.pzz -> 000 packed files are stored uncompressed same as firstld.pzz -> 000, 001, 002 and 005 files. Others pzz files are all stored compressed. (NTSC/USA version)
stxx.pzz files (40)
- 001 -> same files than hitxxx.bin
- 002 -> same files than hitxxx.bin
- 003 -> same files than hitxxx.bin
- 004 -> ?
A same position can match several hitxxx.bin. Actually hitxxx.bin are sometimes duplicated in the afs_data.afs. HITS files start with magic number "STIH" (big endian string).
gets.pzz files
- 000 -> ?
- 001 -> ?
- 002 -> ?
- 003 -> ?
- 004 -> ?
- 005 -> ?
- 006 -> ?
- 007 -> ?
- 008 -> hit000.bin from afs_data
- 009 -> hit001.bin from afs_data
- 010 -> hit002.bin from afs_data
firstld.pzz file
- 000 -> snd_com04.tsb (padding problem when unpacked because stored uncompressed "U")
- 001 -> snd_com04.chd (padding problem when unpacked because stored uncompressed "U")
- 002 -> ?
- 003 -> icon.bin from l'afs_data
- 004 -> ?
- 005 -> mc_msg00.mdt from l'afs_data (unpacked name has to be implemented in pzztool.py)
- 006 -> as_icon.tpl from l'afs_data (unpacked name has to be implemented in pzztool.py)
efct.tpl file
- 000 -> efct00.tpl from afs_data
- 001 -> efct00_mdl.arc from afs_data
- 002 -> efct01_mdl.arc from afs_data
cmn_data.pzz file
- 000 -> comhit.bin (unpacked name has to be implemented in pzztool.py)
- 001 -> comhit2.bin (unpacked name has to be implemented in pzztool.py)
- 002 -> dm0000mot.bin (unpacked name has to be implemented in pzztool.py)
- 003 -> plcmndata.bin (unpacked name has to be implemented in pzztool.py)
Software
Virtual World RE provides the python script pzztool.py handling PZZ files unpack / uncompress / compress / pack inspired from a PS2 pzz file handling algorithm.