The technology

Current compressors see bytes. PZIP sees structure. Here's what we can tell you while our lawyers finish the paperwork.

Type-aware compression

A CSV has columns and types. A log file has repeating templates. Your JSON has a discoverable schema. We detect all of it and compress accordingly.

151 specialized strategies

We call them "weapons." Each one is designed for a specific data pattern: timestamps, floating point, dictionaries, templates, sequences, and more.

46 theoretical papers

Built on information physics and the minimum description length principle. Not heuristics. Not ML magic. Math.

The pipeline

  Input File
    |
    v
  [DETECT]  ───  What type of file is this?
    |             CSV? JSON? Log? Image?
    v
  [SELECT]  ───  Pick the best weapons
    |             (from 151 strategies)
    v
  [EXTRACT] ───  Separate structure from variation
    |             Structure = tiny. Variation = residual.
    v
  [COMPRESS] ──  Compress the residual
    |             (still beats LZMA because it's cleaner)
    v
  [VERIFY]  ───  Decompress and compare byte-for-byte
    |             ALWAYS. EVERY. TIME.
    v
  Output .pz file (never larger than LZMA)

The secret sauce

Patent pending. We'll publish the full details once the filing is complete. In the meantime, the benchmarks speak for themselves: 3,117 wins across 20 file types on 3,184 real-world files.

If you're a compression researcher, we have 46 papers and would love to talk. If you're a datacenter operator, you don't need to know how an engine works to drive the car.

Full technical details will be published after patent filing is complete.

Guarantees

Never-Worse

If PZIP can't beat LZMA on your file, it outputs LZMA. You literally cannot lose.

Round-Trip Correct

decompress(compress(file)) == file. Byte-exact. Verified on every single operation. No exceptions.

Deterministic

Same input always produces the same output. No randomness, no ML inference, no surprises.

3,184 Tests

Automated test suite covering edge cases, stress tests, and regression tests. Zero manual overrides.

Use cases

Data Lakes & Warehouses

CSV, Parquet, JSON — the files that fill your data lake. 20-45% smaller with PZIP.

Log Storage

Petabytes of logs with repeating templates. PZIP exploits the structure LZMA ignores.

Backup & Archive

Never-worse guarantee means zero risk. Your backups are at least as small as before.

CDN & Edge

Smaller files = faster edge delivery = better user experience. Drop-in replacement.

Communications

Packet compression, message stores, protocol buffers. Every byte matters at scale.

Scientific Data

HDF5, NetCDF, FITS — scientific formats with predictable structure. Contact us for early access.

Impact calculator

Total stored data

1 TB100 TB10 PB

Cloud provider

Primary data type

Projected savings

Monthly cost (before)$2,355

Monthly savings-$283

Annual savings$3,391

Fewer 8TB drives

0.0t

CO2 saved/year

Estimates based on published storage pricing and median PZIP improvement ratios from our benchmark suite. Actual savings depend on your data composition. Transfer and bandwidth savings not included.

Licensing

BETA

Free

Web demo + CLI. Try it on your data. No limits during beta.

ENTERPRISE

Per-TB

Python SDK, REST API, on-prem deployment. Volume discounts.

OEM

License

Embed PZIP in your product. Cloud providers, CDNs, storage vendors.

Roadmap: Desktop app for Mac, Windows, and Linux. Python SDK. REST API. Contact us for early access.

Environmental impact

Datacenters consume 1-2% of global electricity. Less stored data means fewer hard drives, less power, less cooling, and less CO2.

A 20% reduction across a 10 PB datacenter means fewer physical drives to manufacture, power, and eventually recycle. Better compression isn't just efficient — it's a climate action.

10 PB × 20% savings = 2 PB fewer stored
≈ 250 fewer 8TB drives
≈ 15 kWh/year per drive × 250 = 3,750 kWh saved
≈ 1.6 metric tons CO2/year (US grid average)