-
Notifications
You must be signed in to change notification settings - Fork 332
Implement JSON parser using scanner #768
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
It's good to verify my suspicion that One concern if/when we switch the main driver for I don't think it's worth depending on a new library. JSON grammar is so simple that the "faster than aeson" parser can be written by hand, pretty simple, and be vendored in I feel that Thanks for looking where |
|
Also let's first resolve #766, I'll hopefully get to it next weekend. |
|
@phadej Creating a separate parser for strict bytestrings makes sense for me. And if we include I'm not sure what you meant here:
Could you please elaborate? To make sure we are on the same page: we'll wait for #766 and then we'll decide on this PR. Is it right? |
The
Yes, let's do #766 first. |
|
Using #772 one can now indeed see that this is faster: my machine was noisy, but still, 10% speedup is quite nice! (Note: #772 is not resolving #766, though it setups the scene quite well). |
| \w -> w /= DOUBLE_QUOTE && w /= BACKSLASH && not (testBit w 7) | ||
| w <- lookAhead' | ||
| case w of | ||
| DOUBLE_QUOTE -> Scanner.anyWord8 $> Text.decodeUtf8 s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try Text.decodeLatin1 here.
|
Please revive this PR, it's amazing technology. |
I somehow missed your comment @ethercrow |
|
@Yuras |
|
@ethercrow Indeed, and I even released new version of Now I'd like to hear from maintainers whether they agree with the proposed approach to backward compatibility. Namely, implement parser via Also let's decide whether 10% speedup worth the efforts at all. Nowadays there is a number of JSON parsers on Hackage specifically designed for performance. |
|
Ah, and vendoring custom JSON parser into |
|
The ideas from here are in #996 |
This PR replaces
attoparsecwith non-backtracking parserscanner, which improves performance by 20%-40% (see the benchmark results below.) It's WIP yet, I'd like to hear people's opinion on that.Issues
There are two main issues with this change:
Backward compatibility
Since
aesonexposesattoparsecparses fromData.Aeson.Parsermodule, replacing implementation will affect the API.The possible solution is to convert parser based on
scannerto one based onattoparsecusingtoAttocombinator from here. It requires this PR to be merged though. Note thattoAttopreserves the speedup, i.e. parser implemented usingscannerand then converted toattoparsecis still faster then one written directly inattoparsec.Alternatively we can expose both implementations for some period of time.
Error messages
The error messages from the new parser lacks context. Here is a fragment of the output from the
ErrorMessagestest case:The reason is that
scannerdoesn't have a combinator similar to<?>fromattoparsec. It's possible to add it, but it affects performance negatively.Note that the error messages from
attoparsecdoesn't contain the error location, it makes them less useful. Withscannerwith can add error location to the error messages, it will make errors more useful.Benchmarks
I used the bench-parse.py for benchmarking. Here are the results on my computer. Before:
After: