[CODEC-333] Add separate methods for strict Standard and URL Safe Base64 decoding #419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

bsanchezb wants to merge 2 commits into apache:master from bsanchezb:CODEC-333

+411 −10

bsanchezb commented Dec 30, 2025

This PR implements strict Base64 decoding mechanisms, as explained in CODEC-333.

The proposed implementation keeps the current behavior that supports both "standard" and URL Safe Base64 characters on decoding by default, but adds supplementary methods allowing to enforce a strict processing either for the "standard" Base64 or URL Safe Base64.

Also added unit tests demonstrating the difference in behavior.

NOTE: During the testing it was also found that on Base64#decodeBase64 method, the implementation silently skips unsupported characters. In my opinion, the decoding should fail in such case, by throwing an exception instead of silent processing. However, this PR does not address this issue and the same behavior is implemented in the new methods.


          Add separate methods for strict Standard and URL Safe Base64 decoding

c2358e9

garydgregory changed the title ~~CODEC-333 : Add separate methods for strict Standard and URL Safe Base64 decoding~~ [CODEC-333] Add separate methods for strict Standard and URL Safe Base64 decoding

garydgregory requested changes

View reviewed changes

Member

garydgregory left a comment •

edited

Loading

Hello @bsanchezb

Thank you for the PR. Overall, this is great. I have small comments to address scattered throughout.

I am concerned we are increasing the public footprint in this class by a lot. For example, I do not see a public use case for:

isBase64Standard(byte)
isBase64Url(byte)

If you disagree, please explain. The idea is that we should YAGNI the API here.

Thank you!

src/main/java/org/apache/commons/codec/binary/Base64.java Outdated

    
                   */

                  private static final byte[] DECODE_TABLE = {

                      //   0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F

                          //   0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F

Member

garydgregory Dec 30, 2025

Please don't edit this line, it's a poor-man's column header.

src/main/java/org/apache/commons/codec/binary/Base64.java Outdated

    
                  /**

                   * Returns whether or not the {@code octet} is in the base 64 alphabet.

                   * <p>

                   * Note: this method threats both characters '+' and '/' and  '-' and '_' as valid base64 characters.

Member

garydgregory Dec 30, 2025

The phrasing is off here: Using "both" implies 2, but you list 4, each with the word 'and'. What do you really mean?

src/main/java/org/apache/commons/codec/binary/Base64.java Outdated

    
                   * Tests a given byte array to see if it contains only valid characters within the Base64 alphabet. Currently the

                   * method treats whitespace as valid.

                   * <p>

                   * Note: this method threats both characters '+' and '/' and  '-' and '_' as valid base64 characters.

Member

garydgregory Dec 30, 2025

The phrasing is off here: Using "both" implies 2, but you list 4, each with the word 'and'. What do you really mean?

src/main/java/org/apache/commons/codec/binary/Base64.java Outdated

    
                   * Tests a given String to see if it contains only valid characters within the Base64 alphabet. Currently the

                   * method treats whitespace as valid.

                   * <p>

                   * Note: this method threats both characters '+' and '/' and  '-' and '_' as valid base64 characters.

Member

garydgregory Dec 30, 2025

The phrasing is off here: Using "both" implies 2, but you list 4, each with the word 'and'. What do you really mean?

src/main/java/org/apache/commons/codec/binary/Base64.java

    
                       * both URL_SAFE and STANDARD base64.

                       * </p>

                       *

                       * @param format table format to be used on Base64 decoding.

Member

garydgregory Dec 30, 2025

Document null input resets to the default.

src/main/java/org/apache/commons/codec/binary/Base64.java Outdated

    
                       * This method allows to explicitly state whether a "standard" or "URL Safe" Base64 decoding is expected.

                       * <p>

                       * Note: By default, the implementation uses the MIXED approach, allowing a seamless handling of

                       * both URL_SAFE and STANDARD base64.

Member

garydgregory Dec 30, 2025

Use @link when referring to elements in this class. For example {@link DecodeTableFormat#URL_SAFE}.

src/main/java/org/apache/commons/codec/binary/Base64.java Outdated

    
                   * in Table 1 of RFC 2045) into their 6-bit positive integer equivalents. Characters that are not in the Base64

                   * alphabet but fall within the bounds of the array are translated to -1.

                   * <p>

                   * Note: This decoding table handles only the "standard" base64 characters, such as '+' and '/'.

Member

garydgregory Dec 30, 2025

Remove "Note: ".

src/main/java/org/apache/commons/codec/binary/Base64.java Outdated

    
                   * Characters that are not in the Base64 URL Safe alphabet but fall within the bounds of the array

                   * are translated to -1.

                   * <p>

                   * Note: This decoding table handles only the "URL Safe" base64 characters, such as '-' and '_'.

Member

garydgregory Dec 30, 2025

Remove "Note: ".

src/main/java/org/apache/commons/codec/binary/Base64.java Outdated

    
                   * <p>

                   * <strong>Note:</strong> this method seamlessly handles data encoded in URL-safe or normal mode.

                   * For enforcing verification against strict standard Base64 or Base64 URL Safe tables,

                   * please use {@code #decodeBase64Standard} or {@code decodeBase64Url} methods respectively.

Member

garydgregory Dec 30, 2025

Use @link where you can instead of @code. Saying {@code #... doesn't mean anything in Javadoc, only in {@link #....

src/main/java/org/apache/commons/codec/binary/Base64.java

    
                      }

                      /**

                       * Sets the format of the decoding table.

Member

garydgregory Dec 30, 2025

This method and Builder.setUrlSafe(boolean) should make it clear that it doesn'r affect the action of the other. For example, Builder.setUrlSafe(boolean) already says that it only affects encoding but we should also say that if you care about decoding then you should call... I tihnk we want to make it clear what to call for what use case. It might be worth mentioning at the class level and Builder class level as well.


          fixed javadocs + resolved other comments

4c71766

Author

bsanchezb commented Dec 30, 2025

Hi @garydgregory ,

Thank you for the thoughtful review.

I added changes according to your comments.

Regarding the

I am concerned we are increasing the public footprint in this class by a lot. For example, I do not see a public use case for:

isBase64Standard(byte)
isBase64Url(byte)

If you disagree, please explain. The idea is that we should YAGNI the API here.

I do not have a strong opinion on this, but I was trying to apply the same design pattern used for the Base64#isBase64 methods, which also includes the #isBase64(final byte[] arrayOctet) method, so people have a familiar architecture if when decide to switch to one of the new implementations. I will leave it up to you to decide.

Member

garydgregory commented Dec 30, 2025

Hello @bsanchezb
Hm, yeah, I missed seeing those old APIs, in retrospect, probably breaks YAGNI to have had those, so I understand your POV! Thanks. I'll review again later today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet