Lossy1 compression can be scary as valuable information or features of the data may be lost.
By using safeguards to guarantee your safety requirements, lossy compression can be applied safely and without fear.
With the compression-safeguards package, you can:
- preserve properties over individual data elements (pointwise) or data neighbourhoods (stencil)
- preserve properties over quantities of interest (QoIs) over the data
- preserve regionally varying properties with regions of interest (RoIs)
- combine safeguards arbitrarily with logical combinators
- apply safeguards to any existing compressor or post-hoc to already-compressed data
- What are safeguards?
- Design and Guarantees
- Provided safeguards
- Installation
- Usage
- How to safeguard ...?
- Limitations
- Related Projects
- Citation
- License
- Funding
Safeguards are a declarative way to describe the safety requirements that you have for lossy compression. They range from simple (e.g. error bounds on the data, preserving special values and data signs) to complex (e.g. error bounds on derived quantities over data neighbourhoods, preserving monotonic sequences).
By declaring your safety requirements as safeguards, we can guarantee that any lossy compression protected by these safeguards will always uphold your safety requirements.
The compression-safeguards package provides several Safeguards with which you can express your safety requirements. Please refer to the provided safeguards section for a complete list of the supported safeguards.
We also provide the following integrations of the safeguards with popular compression APIs:
numcodecs-safeguards: provides theSafeguardsCodecmeta-compressor that conveniently applies safeguards to any compressor using thenumcodecs.abc.CodecAPI.xarray-safeguards: provides functionality to use safeguards with (chunked)xarray.DataArrays and cross-chunk boundary conditions.
The safeguards can be adopted easily:
- any existing (lossy) compressor can be safeguarded, e.g. with the
numcodecs-safeguardsfrontend, allowing users to try out different (untrusted) compressors as the safeguards guarantee that the safety requirements are always upheld - already compressed data can be safeguarded post-hoc as long as the original uncompressed data still exists, e.g. with the
xarray-safeguardsfrontend - the safeguards-corrections to the compressed data can be stored inline (alongside the lossy-compressed data, e.g. with the
numcodecs-safeguardsfrontend) or outline (e.g. in a separate file, with thexarray-safeguardsfrontend) - the safeguards can be combined with other meta-compression approaches, e.g. progressive data compression and retrieval2
-
safeguard: Declares a safety requirements and enforces that it is met after (lossy) compression.
-
pointwise safeguard: A safety requirement that concerns just a single data element and can be checked and guaranteed independently for each data point.
-
stencil safeguard: A safety requirement that is formulated over a neighbourhood of nearby points for each data element.
-
combinator safeguard: A meta-safeguard that combines over several other safeguard with a logical combinator such as logical 'and' or 'or'.
-
parameter: A configuration option for a safeguard that is provided when declaring the safeguard and cannot be changed
-
late-bound parameter: A configuration option for a safeguard that is not constant but depends on the data being compressed. At declaration time, a late-bound parameter is only given a name but not a value. When the safeguards are later applied to data, all late-bound parameters must be resolved by providing their values. The
compression-safeguards,numcodecs-safeguards, andxarray-safeguardsfrontends also provide a few built-in late-bound constants automatically, including$xto refer to the data as a constant. When configuring anumcodecs_safeguards.SafeguardsCodec, late-bound parameters are provided as fixed constants that must be compatible with any data that is encoded by the codec. -
quantity of interest (QoI): We are often not just interested in data itself, but also in quantities derived from it. For instance, we might later plot the data logarithm, compute a derivative, or apply a smoothing kernel. In these cases, we often want to safeguard not just properties on the data but also on these derived quantities of interest.
-
region of interest (RoI): Sometimes we have regionally varying safety requirements, e.g. because a region has interesting behaviour that we want to especially preserve.
The safeguards are designed to be convenient to apply to any lossy compression task:
-
They are guaranteed to always uphold the safety property they describe.
-
They are designed to minimise the overhead in compressed message size for elements where the safety requirements were already satisfied.
They should ideally be applied to every lossy compression task since they have only a small overhead in the happy case (all safety requirements are already fulfilled) and give you peace of mind by reasserting the requirements if necessary (e.g. if the lossy compressor does not provide them or e.g. has an implementation bug).
Note that the packages in this repository are provided as reference implementations of the compression safeguards framework. Therefore, their implementations prioritise simplicity, portability, and readability over performance. Please refer to the related projects section for alternatives with different design considerations.
This package currently implements the following safeguards:
-
eb(error bound):The pointwise error is guaranteed to be less than or equal to the provided bound. Three types of error bounds can be enforced:
abs(absolute),rel(relative), andratio(ratio / decimal). For the relative and ratio error bounds, zero values are preserved with the same bit pattern. For the ratio error bound, the sign of the data is preserved. Infinite values are preserved with the same bit pattern. The safeguard can be configured such that NaN values are preserved with the same bit pattern, or that correcting a NaN value to a NaN value with a different bit pattern also satisfies the error bound.
-
qoi_eb_pw(error bound on quantities of interest):The error on a derived pointwise quantity of interest (QoI) is guaranteed to be less than or equal to the provided bound. Three types of error bounds can be enforced:
abs(absolute),rel(relative), andratio(ratio / decimal). The non-constant quantity of interest expression can contain the addition, multiplication, division, comparison, square root, exponentiation, logarithm, sign, rounding, trigonometric, hyperbolic, and logical operations over integer and floating-point constants and the pointwise data value. For the ratio error bound, the sign of the quantity of interest is preserved. Infinite quantities of interest are preserved with the same bit pattern. NaN quantities of interest remain NaN though not necessarily with the same bit pattern. -
qoi_eb_stencil(error bound on quantities of interest over a neighbourhood):The error on a derived quantity of interest (QoI) over a neighbourhood of data points is guaranteed to be less than or equal to the provided bound. Three types of error bounds can be enforced:
abs(absolute),rel(relative), andratio(ratio / decimal). The non-constant quantity of interest expression can contain the addition, multiplication, division, comparison, square root, exponentiation, logarithm, sign, rounding, trigonometric, hyperbolic, array sum, matrix transpose, matrix multiplication, finite difference, and logical operations over integer and floating-point constants and arrays and the data neighbourhood. The quantity of interest over a neighbourhood can be also be used to bound the pointwise error of the finite-difference-approximated derivative, and to preserve the monotonicity of a sequence of values. If applied to data with more dimensions than the data neighbourhood of the QoI requires, the data neighbourhood is applied independently along these extra axes. If the data neighbourhood uses thevalidboundary condition along an axis, only data neighbourhoods centred on data points that have sufficient points before and after are safeguarded. If the axis is smaller than required by the neighbourhood along this axis, the data is not safeguarded at all. Using a different boundary condition ensures that all data points are safeguarded. For the ratio error bound, the sign of the quantity of interest is preserved. Infinite quantities of interest are preserved with the same bit pattern. NaN quantities of interest remain NaN though not necessarily with the same bit pattern.
-
same(value preserving):If an element has a special value in the input, that element is guaranteed to also have bitwise the same value in the decompressed output. This safeguard can be used for preserving e.g. zero values, missing values, pre-computed extreme values, or any other value of importance. By default, elements that do not have the special value in the input may still have the value in the output. It is also possible to enforce that an element in the output only has the special value if and only if it also has the value in the input, e.g. to ensure that only missing values in the input have the missing value bitpattern in the output. Beware that +0.0 and -0.0 are semantically equivalent in floating-point but have different bitwise patterns. To preserve both, two same value safeguards are needed, one for each bitpattern.
-
sign(sign-preserving):Values are guaranteed to have the same sign (-1, 0, +1) in the decompressed output as they have in the input data. NaN values are preserved as NaN values with the same sign bit. This safeguard can be configured to preserve the sign relative to a custom offset, e.g. to preserve global minima and maxima. This safeguard should be combined with e.g. an error bound, as it by itself accepts any value with the same sign.
-
all(logical all / and):For each element, all of the combined safeguards' guarantees are upheld. At the moment, only pointwise and stencil safeguards and combinations thereof can be combined by this all-combinator.
-
any(logical any / or):For each element, at least one of the combined safeguards' guarantees is upheld. At the moment, only pointwise and stencil safeguards and combinations thereof can be combined by this any-combinator.
-
assume_safe(logical truth):All elements are assumed to always meet their guarantees and are thus always safe. This truth-combinator can be used with the
selectcombinator to express regions that are not of interest, i.e. where no additional safety requirements are imposed. -
select(logical select / switch case):For each element, the guarantees of the pointwise selected safeguard are upheld. This combinator allows selecting between several safeguards with per-element granularity. It can be used to describe simple regions of interest where different safeguards, e.g. with different error bounds, are applied to different parts of the data. At the moment, only pointwise and stencil safeguards and combinations thereof can be combined by this select-combinator.
The compression-safeguards package can be installed from PyPi using pip:
pip install compression-safeguardsThe integrations can be installed similarly:
pip install numcodecs-safeguards
pip install xarray-safeguardsWe provide the lower-level compression-safeguards package and the user-facing numcodecs-safeguards and xarray-safeguards frontend packages, which can all be used to apply safeguards. We generally recommend using the safeguards through one of their integrations with popular (compression) APIs, e.g. numcodecs-safeguards for quickly getting started with a ready-made compressor for non-chunked arrays, or xarray-safeguards for adopting safeguards post-hoc and applying them to already compressed (chunked) data arrays.
You can get started quickly with the numcodecs-compatible SafeguardsCodec meta-compressor for non-chunked arrays:
import numpy as np
from numcodecs.fixedscaleoffset import FixedScaleOffset
from numcodecs_safeguards import SafeguardsCodec
# use any numcodecs-compatible codec
# here we quantize data >= -10 with one decimal digit
lossy_codec = FixedScaleOffset(
offset=-10, scale=10, dtype="float64", astype="uint8",
)
# wrap the codec in the `SafeguardsCodec` and specify the safeguards to apply
sg_codec = SafeguardsCodec(codec=lossy_codec, safeguards=[
# guarantee a relative error bound of 1%:
# |x - x'| <= |x| * 0.01
dict(kind="eb", type="rel", eb=0.01),
# guarantee that the sign is preserved:
# sign(x) = sign(x')
dict(kind="sign"),
])
# some n-dimensional data
data = np.linspace(-10, 10, 21)
# encode and decode the data
encoded = sg_codec.encode(data)
decoded = sg_codec.decode(encoded)
# the safeguard properties are guaranteed to hold
assert np.all(np.abs(data - decoded) <= np.abs(data) * 0.01)
assert np.all(np.sign(data) == np.sign(decoded))Please refer to the numcodecs-safeguards documentation for further information.
If you are working with large chunked datasets, want to post-hoc adopt safeguards for existing already-compressed data, or need extra control over how the safeguards-produced corrections are stored, you can use the xarray-safeguards frontend:
import numpy as np
import xarray as xr
from xarray_safeguards import apply_data_array_correction, produce_data_array_correction
# some (chunked) n-dimensional data array
da = xr.DataArray(np.linspace(-10, 10, 21), name="da").chunk(10)
# lossy-compressed prediction for the data, here all zeros
da_prediction = xr.DataArray(np.zeros_like(da.values), name="da").chunk(10)
da_correction = produce_data_array_correction(
data=da,
prediction=da_prediction,
# guarantee an absolute error bound of 0.1:
# |x - x'| <= 0.1
safeguards=[dict(kind="eb", type="abs", eb=0.1)],
)
## (a) manual correction ##
da_corrected = apply_data_array_correction(da_prediction, da_correction)
np.testing.assert_allclose(da_corrected.values, da.values, rtol=0, atol=0.1)
## (b) automatic correction with xarray accessors ##
# combine the lossy prediction and the correction into one dataset
# e.g. by loading them from different files using `xarray.open_mfdataset`
ds = xr.Dataset({
da_prediction.name: da_prediction,
da_correction.name: da_correction,
})
# access the safeguarded dataset that applies all corrections
ds_safeguarded: xr.Dataset = ds.safeguarded
np.testing.assert_allclose(ds_safeguarded["da"].values, da.values, rtol=0, atol=0.1)Please also refer to the xarray-safeguards documentation and the chunked.ipynb example for further information.
You can also use the lower-level compression-safeguards API directly:
import numpy as np
from compression_safeguards import Safeguards
# create the `Safeguards`
sg = Safeguards(safeguards=[
# guarantee an absolute error bound of 0.1:
# |x - x'| <= 0.1
dict(kind="eb", type="abs", eb=0.1),
])
# generate some random data to compress
data = np.random.normal(size=(10, 10, 10))
## compression
# compress and decompress the data using *some* compressor
compressed = compress(data)
decompressed = decompress(compressed)
# compute the correction that the safeguards would need to apply to
# guarantee the selected safety requirements
correction = sg.compute_correction(data, decompressed)
# now the compressed data and correction can be stored somewhere
# ...
# and loaded again to decompress
## decompression
decompressed = decompress(compressed)
decompressed = sg.apply_correction(decompressed, correction)
# the safeguard properties are now guaranteed to hold
assert np.all(np.abs(data - decompressed) <= 0.1)Please refer to the compression-safeguards documentation for further examples.
The safeguards can also fill the role of a quantizer, which is part of many (predictive) (error-bounded) compressors. If you currently use e.g. a linear quantizer module in your compressor to provide an absolute error bound, you could instead adapt the Safeguards, quantize to their Safeguards.compute_correction values, and thereby offer a larger selection of safety requirements that your compressor can then guarantee. Note, however, that only pointwise safeguards can be used when quantizing data elements one-by-one.
-
... a pointwise absolute / relative / ratio error bound on the data?
Use the
ebsafeguard and configure it with thetypeand range of the error bound. -
... a pointwise normalised (NOA) or range-relative absolute error bound?
Use the
ebsafeguard for an absolute error bound but provide a late-bound parameter for the bound value. Since the data range is tightly tied to the data itself, it makes sense to only fill in the actual when applying the safeguards to the actual data. You can either compute the range yourself and then provide it as alate_boundbinding when computing the safeguard corrections. Alternatively, you can also use theqoi_eb_pwsafeguard with the'(x - c["$x_min"]) / (c["$x_max"] - c["$x_min"])'QoI. Note that we are using the late-bound constantsc["$x_min"]andc["$x_max"]for the data minimum and maximum, which are automatically provided bynumcodecs-safeguardsandxarray-safeguards. -
... a global error bound, e.g. a mean error, mean squared error, root mean square error, or peak signal to noise ratio?
The
compression-safeguardsdo not currently support global safeguards. However, you can emulate a global error bound using a pointwise error bound, which provides a stricter guarantee. For all of the belowmentioned global error bounds, use theebsafeguard with a pointwise absolute error bound of-
$\epsilon_{abs} = |\epsilon_{ME}|$ for the mean error -
$\epsilon_{abs} = \sqrt{\epsilon_{MSE}}$ for the mean square error -
$\epsilon_{abs} = \epsilon_{RMSE}$ for the root mean square error -
$\epsilon_{abs} = (\text{max}(X) - \text{min}(X)) \cdot {10}^{-\text{PSNR} / 20}$ for the peak signal to noise ratio where$\text{PSNR}$ is given in dB
-
-
... a missing value?
If missing values are encoded as NaNs, the
ebsafeguards already guarantee that NaN values are preserved (if any NaN value works, be sure to enable theequal_nanflag). For other values, use thesamesafeguard and enable itsexclusiveflag. -
... a global extrema (minimum / maximum)?
Use the
signsafeguard with theoffsetcorresponding to the extrema to ensure the extrema itself and its relationship to other values is preserved. When using thenumcodecs-safeguardsorxarray-safeguardsfrontend, the offset can be set to the automatically-provided"$x_min"or"$x_max"late-bound parameters directly. Note that twosignsafeguards are necessary to preserve both the global minimum and global maximum. -
... local extrema or other topological features?
Identifying local topological features, especially for noisy data, is a hard problem that you are likely using a custom algorithm for. To safeguard that algorithm's results, you should apply it to the original data before compression and identify how tolerant the algorithm is to errors around the extrema, giving you a regionally varying error bound that is tighter around the topological features, i.e. your regions of interest. Then, you can use the
ebsafeguard but provide a late-bound parameter for the bound value. At compression time, you then bind your regionally varying error tolerance to this parameter. If you also need to preserve whether values around the features are above/below a value (see also isolines / isosurfaces below), you can use asignsafeguard with a matchingoffsetfor each local feature and select over them using theselectcombinator and a late-boundselectormask that is based on your a priori analysis. If regions of interest overlap, you can combine severalselectcombinators. For regions that are not of interest, theselectcombinator can fallback to theassume_safesafeguard, which imposes no additional safety requirements. -
... isolines / isosurfaces?
Isolines or isosurfaces can be preserved by using a
signsafeguard with a matchingoffsetfor each surface value that should be kept. Thesesignsafeguards should generally be combined with an error-bounding safeguard, unless any values that preserve the isosurfaces are acceptable. -
... the monotonicity of a sequence?
The
qoi_eb_stencilsafeguard can be used to preserve the monotonicity of a sequence of values, i.e. to guarantee that a sequence that was originally strictly/weakly monotonically increasing/decreasing/constant still is. The sequence can be arbitrary within the stencil neighbourhood, e.g. along a single axis, in a zigzag, etc. Preserving the monotonicity of multiple sequences, e.g. along several axes, requires multiple stencil QoI safeguards. For instance, the'all(X[1:] > X[:-1]) == all(C["$X"][1:] > C["$X"][:-1])'QoI guarantees that all strictly increasing sequences along a single axis stay strictly increasing. More monotonicity QoIs, including strict vs weak monotonicity and constant sequences, can be found in test_monotonicity.py. -
... a data distribution histogram?
The
compression-safeguardsdo not currently support global safeguards. However, we can preserve the histogram bin that each data element falls into using theqoi_eb_pwsafeguard, which provides a stricter guarantee. For instance, the'round_ties_even(100 * (x - c["$x_min"]) / (c["$x_max"] - c["$x_min"]))'QoI would preserve the index amongst 100 bins. Note that we are using the late-bound constantsc["$x_min"]andc["$x_max"]for the data minimum and maximum, which are automatically provided bynumcodecs-safeguardsandxarray-safeguards.
-
printer problem: The
compression-safeguardsneed to know about all safety requirements that they should uphold. If the data is first safeguarded with an absolute error bound, and then later the safeguards-corrected data is safeguarded with a relative error bound, the second safeguard may violate the guarantees provided by the first. Even applying the same safeguard twice in a row can violate the guarantees. This is also known as the printer problem: every time a document is copied (safeguarded) from a previously copied and printed (safeguarded) document, new artifacts are added and accumulate over time. Several safeguards should instead be combined into one using the (logical) combinator safeguards provided by thecompression-safeguardspackage. Furthermore, the safeguards should always be given the original, uncompressed and unsafeguarded, reference data in relation to which the safety requirements must be upheld. Thenumcodecs-safeguardsandxarray-safeguardsfrontends catch some trivial cases of the printer problem, e.g. wrapping aSafeguardsCodecinside aSafeguardsCodecor applying safeguards to an already safeguards-correctedDataArray. In the future, a community standard for marking lossy-compressed (and safeguarded) data with metadata could help with preventing accidental compression error accumulation. -
biased corrections: The
compression-safeguardsdo not currently provide a safeguard to guarantee that the compression errors after safeguarding are unbiased. For instance, if a compressor, which produces biased decompressed values that are within the safeguarded error bound, is safeguarded, the biased values are not corrected by the safeguards. Furthermore, the safeguard corrections themselves may introduce bias in the compression error. Please refer toerror-distribution.ipynbfor some examples. We are working on a bias safeguard that would optionally provide these guarantees. -
suboptimal one-shot corrections: The
compression-safeguardssometimes cannot provide optimal and easily compressible corrections. For instance, using a stencil safeguard that spans a local neighbourhood requires the safeguard to conservatively assume that the worst cases from each individual element could accumulate. Since thecompression-safeguardscompute the corrections for all elements simultaneously (instead of incrementally or by testing an initial correction that is repeatedly adjusted if it leads to a violation elsewhere), even a single violation can require conservative corrections for many data elements. In the future, thecompression-safeguardsAPI could support computing corrections incrementally such that stencil safeguards could make use of earlier3 already-corrected data elements and restrictions imposed by pointwise safeguards to provide better corrections for later elements. If you would like a peek at how safeguards could be applied incrementally, you can have a look at theincremental.ipynbexample. A minimal form of iterative corrections can be activated with the unstablecompute=dict(unstable_iterative=True)configuration of theSafeguardsCodec. -
no global safeguards: The
compression-safeguardsimplementation do not currently support global safeguards, such as preserving mean errors or global data distributions. In many cases, it is possible to preserve these properties using stricter pointwise safeguards, at the cost of achieving lower compression ratios. Please refer to the How to safeguard section above for further details and examples. -
only real data: The
compression-safeguardsonly support data of the following extended4 real data types:uint8,int8,uint16,int16,uint32,int32,uint64,int64,float16,float32,float64. We appreciate contributions for supporting further, e.g. complex, data types. -
single variable only: The
compression-safeguardsdo not support multi-variable safeguarding. If several variables should be safeguarded together, e.g. as inputs to a multi-variable quantity of interest, the variables can be stacked along a new dimension and then used as input for a stencil quantity of interest, e.g. as shown in thekinetic-energy.ipynbexample. Note that compressing stacked variables with different data distributions might require prior normalisation. -
no unstructured grids: The
compression-safeguardsdo not support unstructured grids for stencil safeguards (pointwise safeguards can be applied to any data). However, irregularly spaced grids are supported, e.g. by providing the coordinates as a late-bound parameter to a quantity of interest, e.g. for an arbitrary grid spacing to a finite difference. Please reach out if you want to collaborate on bringing support for unstructured grids to thecompression-safeguards. -
expensive: The
compression-safeguardsrequire substantial computation at compression time, since safety requirements have to be checked, potentially re-established, and then checked again. Safeguarding a quantity of interest can be particularly expensive since rounding errors have to be checked for at every step. However, these extensive checks allow the safeguards to provide hard safety guarantees that users can rely on.
The SZ3 compressor version >=3.2.0 provides the CmprAlgo=ALGO_NOPRED option, with which the compression error
SZ3's error compression can provide higher compression ratios if most data elements are expected to violate the error bound, e.g. when wrapping a lossy compressor that does not bound its errors. However, SZ3 has a higher byte overhead than numcodecs-safeguards if all elements already satisfy the bound.
TLDR: You can use SZ3 to transform a known unbounded lossy compressor into an (absolute) error-bound compressor. Use compression-safeguards to guarantee a variety of safety requirements for any compressor (unbounded, best-effort bounded, or strictly bounded), including SZ3.
Liu, J., Di, S., Zhao, K., Liang, X., Jin, S., Jian, Z., Huang, J., Wu, S., Chen, Z., & Cappello, F. (2024). High-performance Effective Scientific Error-bounded Lossy Compression with Auto-tuned Multi-component Interpolation. Proceedings of the ACM on Management of Data, 2(1), 1–27. Available from: doi:10.1145/3639259.
Zhao, K., Di, S., Dmitriev, M., Tonellot, T. D., Chen, Z., & Cappello, F. (2021). Optimizing Error-Bounded Lossy Compression for Scientific Data by Dynamic Spline Interpolation. 2021 IEEE 37th International Conference on Data Engineering (ICDE), 1643–1654. Available from: doi:10.1109/icde51399.2021.00145.
Liang, X., Zhao, K., Di, S., Li, S., Underwood, R., Gok, A. M., Tian, J., Deng, J., Calhoun, J. C., Tao, D., Chen, Z., & Cappello, F. (2022). SZ3: A modular framework for composing Prediction-Based Error-Bounded lossy compressors. IEEE Transactions on Big Data, 9(2), 485–498. Available from: doi:10.1109/tbdata.2022.3201176.
You can easily try out SZ3 using the numcodecs-wasm-sz3 Python package.
The SPERR compressor bounds the pointwise absolute error of its wavelet-based lossy compression by correcting any outlier points that exceed the error bound. For each outlier, where the error bound is violated, a lossy integer correction, which represents a multiple of the absolute error bound, is stored. With this correction, outliers are corrected back within the error bounds. The SPERR compressor is tuned to produce around 2% outliers, which minimises the combined cost of compression and correction.
Note that SPERR is known to 5 sometimes violate its pointwise absolute error bound even after the corrections have been applied. We thus recommend using SPERR with safeguards to guarantee that the error bound is never violated.
TLDR: You can use SPERR to (mostly) bound a (globally constant) pointwise absolute error, for which SPERR uses an efficient outlier encoding. Use compression-safeguards to guarantee a variety of safety requirements, including locally varying pointwise absolute errors, for any compressor, including SPERR.
Li, S., Lindstrom, P., & Clyne, J. (2023). Lossy Scientific Data Compression With SPERR. 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS), 1007–1017. Available from: doi:10.1109/ipdps54959.2023.00104.
You can easily try out SPERR using the numcodecs-wasm-sperr Python package.
The EBCC (Error Bounded Climate-data) compressor bounds the pointwise absolute or range-relative error of its JPEG2000-based lossy compression by encoding the residual using a discrete wavelet transform. The wavelet coefficients are encoded into a hierarchical bitstream that is truncated once the global error bound is met. The EBCC compressor is tuned to minimise the combined cost of compression and the sparse residual encoding.
TLDR: You can use EBCC to bound a (globally constant) pointwise absolute or range-relative error, for which EBCC uses efficient residual compression. Use compression-safeguards to guarantee a variety of safety requirements, for any compressor, including EBCC.
Huang, L., Fusco, L., Scheidl, F., Zibell, J., Sprenger, M. A., Schemm, S., & Hoefler, T. (2025). Error bounded compression for weather and climate applications. arXiv. Available from: doi:10.48550/arxiv.2510.22265.
LC is a framework for building custom lossless and lossy error-bounded compressors from an extensive collection of components. LC takes particular care with handling all edge cases of floating point lossy compression correctly and reproducibly across both CPU and GPU implementations. The framework is written in C and C++ with Python scripts that search for an optimal compressor pipeline, either exhaustively or using a genetic algorithm.
LC implements lossy error-bounded compression by providing specific quantizers for absolute / relative / pointwise normalised error bounds. During decompression, these quantizers can optionally decorrelate the resulting compression error by randomising the decompressed values within their quantisation bins.
TLDR: You can use LC to build a custom compressor with guaranteed error bounds across different CPUs and GPUs. Use compression-safeguards to guarantee a variety of safety requirements, including arbitrary combinations of different error bounds, for any compressor, including those created with LC.
Fallin, A., & Burtscher, M. (2024). Lessons learned on the path to guaranteeing the error bound in lossy quantizers. arXiv. Available from: doi:10.48550/arxiv.2407.15037.
The QoI-SZ3 compressor extends the SZ3 compressor by analytically deriving per-point absolute data error bounds that bound the absolute error over a derived quantity of interest. QoI-SZ3 supports quantities of interests that contain polynomials, logarithms, square roots, or regional averages, as well as isosurfaces. QoI-SZ3 quantizes and stores the analytically derived per-point absolute error bound and then uses it to quantize the prediction error from SZ3.
TLDR: You can use QoI-SZ3 to preserve an absolute error bound over simple quantities of interest for which an error bound can be derived analytically. Use compression-safeguards to guarantee a greater variety of safety requirements, for any compressor, including SZ3.
Jiao, P., Di, S., Guo, H., Zhao, K., Tian, J., Tao, D., Liang, X., & Cappello, F. (2022). Toward Quantity-of-Interest preserving lossy compression for scientific data. Proceedings of the VLDB Endowment, 16(4), 697–710. Available from: doi:10.14778/3574245.3574255.
The QPET compressor is the successor to QoI-SZ3 and bounds the absolute or range-relative error over a derived quantity of interest by deriving approximate per-point absolute data error bounds based on the symbolic derivative over the quantity of interest. QPET supports pointwise and blockwise quantities of interest that contain addition, multiplication, exponentiation, logarithm, (non-inverse) trigonometric and hyperbolic functions, sign, or the absolute value, as well as isosurfaces. QPET can be adapted for existing compressors, e.g. as QPET-SZ and QPET-SPERR. QPET auto-tunes a new global error bound based on the per-point error bounds to (a) use fewer distinct error bounds for compressors that support per-point error bounds, e.g. in QPET-SZ (where per-point error bounds are stored as in QoI-SZ3), or (b) produce a new global error bound, e.g. in QPET-SPERR. QPET losslessly encodes outlier data points for which the approximate data error bounds result in a violation of the error bound over the quantity of interest.
TLDR: You can use QPET to preserve an absolute or range-relative error bound over a variety of differentiable quantities of interest. QPET's approximate data error bounds and outlier correction result in high compression ratios. QPET can be combined with the compression-safeguards to guarantee an even greater variety of safety requirements.
Liu, J., Jiao, P., Zhao, K., Liang, X., Di, S., & Cappello, F. (2025). QPET: a versatile and portable Quantity-of-Interest-Preservation Framework for Error-Bounded Lossy Compression. Proceedings of the VLDB Endowment, 18(8), 2440–2453. Available from: doi:10.14778/3742728.3742739.
You can easily try out QPET-SPERR using the numcodecs-wasm-qpet-sperr Python package.
Please cite this work as follows:
Tyree, J., Köhler, D., Underwood, R., Bouvier, C., Järvinen, H. J., and Klöwer, M. (2026). Compression Safeguards – Towards Safe and Fearless Lossy Compression. Available from: https://github.com/juntyr/compression-safeguards
Please also refer to the CITATION.cff file and refer to https://citation-file-format.github.io to extract the citation in a format of your choice.
Licensed under the Mozilla Public License, Version 2.0 (LICENSE or https://www.mozilla.org/en-US/MPL/2.0/).
The compression-safeguards, numcodecs-safeguards, and xarray-safeguards packages have been developed as part of ESiWACE3, the third phase of the Centre of Excellence in Simulation of Weather and Climate in Europe.
Juniper Tyree and Heikki J. Järvinen are funded by the ESiWACE3 Centre of Excellence. Funded by the European Union. This work has received funding from the European High Performance Computing Joint Undertaking (JU) under grant agreement No 101093054.
Daniel Köhler is funded by the University of Helsinki Doctoral School.
Robert Underwood is funded by the National Science Foundation (NSF) CSSI "FZ" project with Grant #2311875.
Clément Bouvier was funded by the European Union's Destination Earth Initiative and the Research Council of Finland (grant nos. 338615 and 337549).
Milan Klöwer acknowledges funding from Schmidt Sciences.
Footnotes
-
Lossy compression methods reduce data size by only storing an approximation of the data. In contrast to lossless compression methods, lossy compression loses information about the data, e.g. by reducing its resolution (only store every $n$th element) precision (only store $n$ digits after the decimal point), smoothing, etc. Therefore, lossy compression methods provide a tradeoff between size reduction and quality preservation. ↩
-
See doi:10.1109/TVCG.2023.3327186 for a general meta-compressor approach that enables progressive decompression to satisfy a compression error that is user-chosen at decompression time. ↩
-
See e.g. Figure 4 in doi:10.14778/3574245.3574255 for inspiration on how incremental safeguard corrections might work. ↩
-
The extended real values
-infand+infandNaN(not a number) are supported for floating-point input data types. ↩ -
Fallin, A., & Burtscher, M. (2024). Lessons learned on the path to guaranteeing the error bound in lossy quantizers. arXiv. Available from: doi:10.48550/arxiv.2407.15037. ↩