The technology
Current compressors see bytes. PZIP sees structure. Here's what we can tell you while our lawyers finish the paperwork.
Type-aware compression
A CSV has columns and types. A log file has repeating templates. Your JSON has a discoverable schema. We detect all of it and compress accordingly.
151 specialized strategies
We call them "weapons." Each one is designed for a specific data pattern: timestamps, floating point, dictionaries, templates, sequences, and more.
46 theoretical papers
Built on information physics and the minimum description length principle. Not heuristics. Not ML magic. Math.
The pipeline
Input File
|
v
[DETECT] ─── What type of file is this?
| CSV? JSON? Log? Image?
v
[SELECT] ─── Pick the best weapons
| (from 151 strategies)
v
[EXTRACT] ─── Separate structure from variation
| Structure = tiny. Variation = residual.
v
[COMPRESS] ── Compress the residual
| (still beats LZMA because it's cleaner)
v
[VERIFY] ─── Decompress and compare byte-for-byte
| ALWAYS. EVERY. TIME.
v
Output .pz file (never larger than LZMA)The secret sauce
Patent pending. We'll publish the full details once the filing is complete. In the meantime, the benchmarks speak for themselves: 55 wins, 0 losses on 65 real-world files.
If you're a compression researcher, we have 46 papers and would love to talk. If you're a datacenter operator, you don't need to know how an engine works to drive the car.
"Our compression algorithm is so good, we compressed the explanation. Please wait for decompression (patent filing)."
Guarantees
Never-Worse
If PZIP can't beat LZMA on your file, it outputs LZMA. You literally cannot lose.
Round-Trip Correct
decompress(compress(file)) == file. Byte-exact. Verified on every single operation. No exceptions.
Deterministic
Same input always produces the same output. No randomness, no ML inference, no surprises.
916 Tests
Automated test suite covering edge cases, stress tests, and regression tests. Zero manual overrides.
Use cases
Data Lakes & Warehouses
CSV, Parquet, JSON — the files that fill your data lake. 20-45% smaller with PZIP.
Log Storage
Petabytes of logs with repeating templates. PZIP exploits the structure LZMA ignores.
Backup & Archive
Never-worse guarantee means zero risk. Your backups are at least as small as before.
CDN & Edge
Smaller files = faster edge delivery = better user experience. Drop-in replacement.
Communications
Packet compression, message stores, protocol buffers. Every byte matters at scale.
Scientific Data
HDF5, NetCDF, FITS — scientific formats with predictable structure. Coming soon.
Impact calculator
Projected savings
Estimates based on published storage pricing and average PZIP improvement ratios from our benchmark suite. Actual savings depend on your data composition. Transfer and bandwidth savings not included.
Licensing
Free
Web demo + CLI. Try it on your data. No limits during beta.
Per-TB
Python SDK, REST API, on-prem deployment. Volume discounts.
License
Embed PZIP in your product. Cloud providers, CDNs, storage vendors.
Coming soon: Desktop app for Mac, Windows, and Linux. Mobile SDKs. Once the patent is filed, we're shipping everywhere.
Contact us for enterprise pricing and evaluation.
Environmental impact
Datacenters consume 1-2% of global electricity. Less stored data means fewer hard drives, less power, less cooling, and less CO2.
A 20% reduction across a 10 PB datacenter means fewer physical drives to manufacture, power, and eventually recycle. Better compression isn't just efficient — it's a climate action.
10 PB × 20% savings = 2 PB fewer stored
≈ 250 fewer 8TB drives
≈ 15 kWh/year per drive × 250 = 3,750 kWh saved
≈ 1.6 metric tons CO2/year (US grid average)