Skip to content

Conversation

@omaskery
Copy link

Hi, thanks for writing this library!

I am using it in a hobby project to decompress some 3rd party data, but their compression implementation uses a different max_freq value of 0x4000. It took a while to figure out what was different about their implementation!

I have put together a little PR exposing the max_freq value and passing it down to where it's needed.

Feel free to discard this PR if it doesn't seem useful to upstream, and I'm happy to make small adjustments based on your feedback :)

@dfgordon
Copy link
Owner

Yes good idea. Based on a quick look a the diff I don't see any problems, but I'll hold off merging until I can have a closer look.

@omaskery
Copy link
Author

omaskery commented Jun 1, 2025

No worries, I'm not in a rush 😄

Something I noticed is the adaptive Huffman code initialises a variable to 0xffff with a comment saying it must be larger than max_freq.

I should probably change this too?

@dfgordon
Copy link
Owner

dfgordon commented Jun 1, 2025

That approach got carried over from LZHUF.C, presumably 0xffff was chosen based on a 16 bit computer. I'm not 100% sure, but I think that can simply be replaced with usize::MAX to make it more flexible, and if we want to be pedantic put an assert!(max_freq < usize::MAX) somewhere. Then test something with max_freq > 0xffff to make sure it works.

Note this PR is a breaking change due to adding a public structure field, i.e. the major version number will have to be advanced (tip: cargo semver-checks is very useful for detecting this). So if anything else seems as if it could or should be similarly exposed it would be good to handle at the same time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants