Skip to content

Conversation

@anonrig
Copy link
Member

@anonrig anonrig commented Nov 24, 2025

Fixes #1022

tests/ada_c.cpp Outdated

// to_raw_string preserves %20 encoding for spaces
ada_owned_string raw_str = ada_search_params_to_raw_string(out);
ASSERT_EQ(convert_string(raw_str), "a=b%20c&d=e%20f");
Copy link

@raoxiaoyan raoxiaoyan Nov 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anonrig Thanks for your effort. This is wonderful for supporting this new feature from us(Kong). I was wondering if it is possible to only remove b and then keep the other parts the same as before.
Before: a=b c&b=remove&d=e+f
Removed b: a=b c&d=e+f (What we expected.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not following. Can you recommend a test case?

Copy link

@bungle bungle Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anonrig,

I wrote bindings to LuaJIT on this, and it is almost what we are after, here is an example code:

local search = require("resty.ada.search").parse("a=%20&b=,&c=remove&e=+&f=a b")
local normalized = search:remove("c"):tostring()
local raw = search:to_raw_string()
print("NORMALIZED: ", normalized)
print("RAW: ", raw)

This outputs:

NORMALIZED: a=+&b=%2C&e=+&f=a+b
RAW: a=%20&b=,&e=%20&f=a%20b

So the RAW seems to still do space normalization aka + and (space) is turned to %20. In fact I was expecting it to turn them to + as you see in NORMALIZED version. But it is probably best if no normalization at all happens in raw mode, that is the output would look like this:

RAW: a=%20&b=,&e=+&f=a b

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, if we remove percent encode calls above, we can make it more raw. Does that work for you? (just to double validate)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anonrig that would work for us! Thank you. I was thinking about exactly the same (removing the percent encoding in raw).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bungle I've updated the implementation. Please take a look. Once we're OK with the result, I'll land it and make a new release.

Copy link

@bungle bungle Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anonrig, actually it now seems to normalize space as (space) (and perhaps others too -> e.g. in decoded form, not in given form).

input:

a=%20&b=,&c=remove&e=+&f=a b

raw/unsafe output:

a= &b=,&e= &f=a b

So it seems Ada will percent decode (and + decode) internally and then output that (multiple different "things" will decode to same, and then we lose the original form — my bad I didn't notice it earlier). The problem is that we are looking a way to get query in like this a=%20&b=,&c=remove&e=+&f=a b, and when we want to remove e.g. c, we don't want it to touch at all to rest of it. Meaning, it should just remove &c=remove and that's it. Everything else we like to maintain as we given, a=%20&b=,&e=+&f=a b in to_raw/unsafe_string (aka don't do any processing on those that we are not touching — the removal being our current case and it is fine, if it could just handle that at least for now).

Thus the previous implementation was closer to our goal. Now I am not sure if that is easy to do with Ada code base. I hope that I have not caused a lot of pain on my answer on this.

Or let's take a bit more complete example of remove c:

Input:

a=%20&b=,&c=remove&ä=ö&%C3%A4=%C3%B6&e=+&f=a b
NORMALIZED (to_string): a=+&b=%2C&%C3%A4=%C3%B6&%C3%A4=%C3%B6&e=+&f=a+b
   RAW (to_raw_string): a= &b=,&ä=ö&ä=ö&e= &f=a b
                WANTED: a=%20&b=,&ä=ö&%C3%A4=%C3%B6&e=+&f=a b

We are looking to retain much of the original. Aka non-destructive/non-changing modification.

I am afraid it went more difficult than originally anticipated.

I think what we see is that when Ada takes the input, it will decode it, and in that process we lose information from the original, e.g. was there + or %20 (or even non-valid space) OR was there , or %2C. The decoding is needed though for APIs to work naturally, e.g. remove should work like the keys are in decoded form. Perhaps what we need is then: ada_parse_unsafe_search_params in addition to ada_search_params_to_unsafe_string? (though how to implement ada_search_params_remove then as the key needs to still be normalized for it to work?)

@anonrig
Copy link
Member Author

anonrig commented Dec 1, 2025

@lemire would you mind taking a look?

@lemire
Copy link
Member

lemire commented Dec 1, 2025

Looking

@lemire
Copy link
Member

lemire commented Dec 1, 2025

@anonrig I pushed a new test.

It seems that we want to allow the production of a URL such as...

http://mo.com/?a=&&b=?&b=+

My recommendation:

  • I think to_raw_string() should become to_unsafe_string() to indicate that this is unsafe. The users bear full responsibility because we are no longer following the standard, so it can lead to atrocious bugs and if it does, then it is not due to ada.
  • The documentation should reflect this...
  /**
   * Returns a serialized query string without normalizing the key-value pairs.
   * Unlike to_string(), this method does not apply additional transformations
   * to the percent-encoded output. The result is not standard compliant and
   * is therefore unsafe.
   */

Where lies the responsibility must be clear. It is fine to deliberately break the standard, but we must be absolutely explicit so that it does not come back to haunt us.

⚠️ I don't want to read "Cloudflare crashed half of the Internet because the ada library failed to follow the standard." ⚠️

Copy link
Member

@lemire lemire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend that @anonrig considers my comment before merging this (specifically, renaming the function with the word unsafe in it).

@anonrig
Copy link
Member Author

anonrig commented Dec 2, 2025

Yes, I agree. Let's rename the function to to_unsafe_string

@anonrig
Copy link
Member Author

anonrig commented Dec 2, 2025

@copilot can you rename to_raw_string to to_unsafe_string?

Copy link

Copilot AI commented Dec 2, 2025

@anonrig I've opened a new pull request, #1030, to work on those changes. Once the pull request is ready, I'll request review from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Search manipulation without normalization

5 participants