-
Notifications
You must be signed in to change notification settings - Fork 3
Experimental combined / omnibus date parser #112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds a shared GRAMMAR_FILE_PATH constant, centralizes grammar file resolution, implements year/day token handlers, introduces a combined Lark grammar and parser plus an OmnibusDateConverter that merges EDTF, Hebrew, and Islamic transformers, and adds tests and docs for the combined parser. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Omnibus as OmnibusDateConverter
participant Parser as Lark Parser (combined.lark)
participant CombTF as CombinedDateTransformer
participant TFs as Merged TFs (EDTF / Hebrew / Islamic)
participant Result as Undate / UndateInterval
Client->>Omnibus: parse(value)
Omnibus->>Parser: parse(value)
alt parsed tree
Parser->>CombTF: return parse tree
CombTF->>TFs: dispatch/transform(tree)
TFs->>TFs: calendar-specific transform
TFs->>CombTF: transformed Undate/UndateInterval
CombTF->>Omnibus: return first result
Omnibus->>Client: return Undate/UndateInterval
else empty or unrecognized
Parser-->>Omnibus: raise UnexpectedCharacters / no match
Omnibus-->>Client: raise ValueError
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Repository UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #112 +/- ##
===========================================
+ Coverage 98.93% 98.96% +0.03%
===========================================
Files 38 40 +2
Lines 2057 2121 +64
===========================================
+ Hits 2035 2099 +64
Misses 22 22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
docs/undate/converters.rst(1 hunks)src/undate/converters/__init__.py(1 hunks)src/undate/converters/base.py(1 hunks)src/undate/converters/calendars/hebrew/parser.py(1 hunks)src/undate/converters/calendars/hebrew/transformer.py(1 hunks)src/undate/converters/calendars/hijri/parser.py(1 hunks)src/undate/converters/calendars/hijri/transformer.py(1 hunks)src/undate/converters/combined.py(1 hunks)src/undate/converters/edtf/parser.py(1 hunks)src/undate/converters/edtf/transformer.py(1 hunks)src/undate/converters/grammars/combined.lark(1 hunks)src/undate/converters/grammars/hebrew.lark(1 hunks)tests/test_converters/test_combined_parser.py(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (4)
tests/test_converters/test_combined_parser.py (3)
src/undate/undate.py (2)
Undate(35-487)UndateInterval(490-579)src/undate/converters/combined.py (1)
parse(65-81)src/undate/converters/base.py (1)
parse(66-74)
src/undate/converters/combined.py (5)
src/undate/undate.py (2)
Undate(35-487)UndateInterval(490-579)src/undate/converters/base.py (3)
BaseDateConverter(59-135)parse(66-74)to_string(76-85)src/undate/converters/edtf/transformer.py (1)
EDTFTransformer(6-78)src/undate/converters/calendars/hebrew/transformer.py (1)
HebrewDateTransformer(12-40)src/undate/converters/calendars/hijri/transformer.py (1)
HijriDateTransformer(12-49)
src/undate/converters/edtf/transformer.py (3)
src/undate/converters/calendars/hebrew/transformer.py (1)
year(29-32)src/undate/converters/calendars/hijri/transformer.py (1)
year(31-35)src/undate/undate.py (1)
year(380-388)
src/undate/converters/calendars/hebrew/transformer.py (3)
src/undate/converters/calendars/hijri/transformer.py (1)
year(31-35)src/undate/converters/edtf/transformer.py (1)
year(69-72)src/undate/undate.py (1)
year(380-388)
🔇 Additional comments (21)
src/undate/converters/edtf/transformer.py (1)
69-72: LGTM!The new
yearmethod correctly mirrors the pattern used inHebrewDateTransformer.yearandHijriDateTransformer.year(per the relevant snippets), enabling consistent year handling across the combined transformer pipeline. The implementation also aligns with the existingyear_unspecifiedmethod in this same class.src/undate/converters/base.py (1)
47-57: LGTM!Clean centralization of grammar file path resolution. Using
pathlib.Path(__file__).parentensures cross-platform compatibility, and having a singleGRAMMAR_FILE_PATHconstant eliminates duplication across the various parser modules.docs/undate/converters.rst (1)
15-18: LGTM!The autoclass directive for
OmnibusDateConverteris correctly formatted and will properly document the new combined parser.src/undate/converters/calendars/hijri/parser.py (1)
3-5: LGTM!Clean refactor to use the centralized
GRAMMAR_FILE_PATH, eliminating local path construction and aligning with the pattern applied to other parsers in this PR.src/undate/converters/grammars/hebrew.lark (1)
14-30: LGTM!Formatting improvements only—the comment whitespace fix and one-alternative-per-line layout improve readability without changing grammar behavior.
src/undate/converters/edtf/parser.py (1)
3-5: LGTM! Centralized grammar path approach.The refactoring to use
GRAMMAR_FILE_PATHfrom a shared constant promotes consistency across all parsers and simplifies future grammar path management.src/undate/converters/calendars/hebrew/parser.py (1)
3-5: LGTM! Consistent with the centralized grammar path refactoring.The Hebrew parser now uses the same centralized
GRAMMAR_FILE_PATHapproach as the EDTF parser, maintaining consistency across the codebase.src/undate/converters/calendars/hebrew/transformer.py (1)
29-32: LGTM! Consistent year aggregation pattern.The new
year()method correctly aggregates year parts into a single string and returns the expected Tree structure. This implementation aligns with the patterns used inEDTFTransformer.year()andHijriDateTransformer.year(), maintaining consistency across calendar transformers.src/undate/converters/calendars/hijri/transformer.py (1)
31-41: LGTM! Year and day aggregation methods added.Both methods correctly aggregate parts into strings and return proper Tree structures. The implementations are consistent with the Hebrew transformer pattern.
The inline comments about anonymous tokens in the combined parser are helpful for future maintainers, even though the root cause isn't fully explained.
src/undate/converters/__init__.py (1)
1-3: LGTM! Public API exposure of centralized grammar path.The export of
GRAMMAR_FILE_PATHenables consistent access to the centralized grammar directory across the converter ecosystem.tests/test_converters/test_combined_parser.py (3)
9-22: Excellent test coverage across calendar systems.The test cases comprehensively cover EDTF (years with unknowns, ranges), Hebrew (month+year), and Hijri (with and without day) date formats, validating the omnibus parser's multi-calendar support.
32-34: Good approach to work around equality semantics.Using
repr()comparison is appropriate here since the comment explains that "the same unknown date is not considered strictly equal." This ensures the test validates the parsed structure rather than relying on potentially loose equality implementations.
43-45: Good validation of non-serialization constraint.This test confirms that the omnibus converter correctly raises
ValueErrorwhen serialization is attempted, aligning with the documented limitation.src/undate/converters/grammars/combined.lark (3)
1-4: LGTM! Clean grammar orchestration.The combined grammar correctly imports and coordinates the three calendar systems, with proper whitespace handling and a clear start rule.
12-24: Good use of namespaced imports.The relative imports with explicit namespace prefixes (
edtf__,hebrew__,hijri__) properly isolate each grammar's rules while enabling composition. The inline comments helpfully reference the Lark documentation for this pattern.
27-32: Thoughtful ambiguity resolution for year-only dates.The overrides to exclude year-only forms from Hebrew and Hijri grammars are well-documented and correctly address the ambiguity problem—without calendar context, a year-only date would be unclear. Defaulting to EDTF/Gregorian for such cases is reasonable and explicitly noted in the combined.py module documentation.
src/undate/converters/combined.py (5)
20-23: LGTM! Minimal delegation transformer.The
CombinedDateTransformer.start()method correctly returns the parsed children, allowing the merged sub-transformers (EDTF, Hebrew, Hijri) to handle the actual transformation logic.
31-36: LGTM! Proper transformer composition.The
merge_transformers()call correctly combines the three calendar-specific transformers with appropriate namespace prefixes matching the grammar imports in combined.lark.
40-42: LGTM! Parser initialization with strict mode.Using
Lark.open()with the combined grammar andstrict=Trueenables better error detection and validates the grammar structure.
65-81: Solid parse implementation with appropriate error handling.The method correctly:
- Validates non-empty input (lines 70-71)
- Delegates to the parser and transformer (lines 75-77)
- Converts
UnexpectedCharacterstoValueErrorwith a user-friendly message (lines 78-81)The indexing on line 77 (
[0]) is safe because theCombinedDateTransformer.start()method returns the non-empty children list from a successful parse.
83-85: LGTM! Clear non-serialization boundary.The
to_string()method appropriately raisesValueErroras documented, making the limitation explicit and preventing silent failures.
|
|
||
| ..note: base converter is pretty developer-facing documentation; | ||
| .. calendar/converter usage would be helpful to add | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RST directive syntax is malformed.
The note directive should use .. note:: (two dots, space, directive name, double colon). The current syntax will not render as a note block in Sphinx documentation.
🔎 Proposed fix
-..note: base converter is pretty developer-facing documentation;
-.. calendar/converter usage would be helpful to add
+.. note::
+
+ Base converter is pretty developer-facing documentation;
+ calendar/converter usage would be helpful to add.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ..note: base converter is pretty developer-facing documentation; | |
| .. calendar/converter usage would be helpful to add | |
| .. note:: | |
| Base converter is pretty developer-facing documentation; | |
| calendar/converter usage would be helpful to add. | |
🤖 Prompt for AI Agents
In docs/undate/converters.rst around lines 7 to 10, the RST note directive is
malformed ("..note:") and won't render; change it to the correct Sphinx syntax
(".. note::") and ensure a blank line after the directive before the note
content, and also add a short example section showing calendar/converter usage
(header, brief usage snippet or steps) to make the converter guidance more
user-facing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/undate/converters/calendars/islamic/transformer.py (1)
31-35: Consider clarifying or investigating the anonymous token behavior.The comment indicates uncertainty about why an anonymous token appears in the combined parser. While the implementation handles this correctly, it would be valuable to either:
- Document the root cause if understood, or
- Add a TODO to investigate this behavior further
This helps future maintainers understand whether this is expected or a quirk that might need addressing.
📜 Review details
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
docs/undate/converters.rst(1 hunks)src/undate/converters/base.py(2 hunks)src/undate/converters/calendars/hebrew/transformer.py(1 hunks)src/undate/converters/calendars/islamic/parser.py(1 hunks)src/undate/converters/calendars/islamic/transformer.py(1 hunks)src/undate/converters/combined.py(1 hunks)src/undate/converters/edtf/transformer.py(1 hunks)src/undate/converters/grammars/combined.lark(1 hunks)src/undate/converters/grammars/hebrew.lark(1 hunks)tests/test_converters/test_combined_parser.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/undate/converters/base.py
- docs/undate/converters.rst
- src/undate/converters/edtf/transformer.py
🧰 Additional context used
🧬 Code graph analysis (3)
tests/test_converters/test_combined_parser.py (3)
src/undate/undate.py (1)
Undate(51-678)src/undate/converters/combined.py (1)
parse(65-81)src/undate/converters/base.py (1)
parse(72-80)
src/undate/converters/calendars/islamic/transformer.py (1)
src/undate/undate.py (2)
year(484-492)day(508-517)
src/undate/converters/calendars/hebrew/transformer.py (1)
src/undate/undate.py (1)
year(484-492)
🔇 Additional comments (16)
src/undate/converters/calendars/hebrew/transformer.py (2)
29-30: LGTM! Good extensibility design.The use of
self.calendarto pass the calendar parameter enables easy subclassing (e.g., for Seleucid calendar) and aligns well with the multi-calendar architecture of this PR.
32-35: LGTM! Consistent with EDTF transformer pattern.The implementation correctly concatenates items and returns a Tree structure, following the same pattern used in the EDTF transformer.
src/undate/converters/calendars/islamic/transformer.py (1)
37-41: LGTM! Consistent implementation.The
day()method correctly follows the same pattern as theyear()method. The anonymous token concern applies here as well (see comment on Line 31-35).src/undate/converters/grammars/hebrew.lark (2)
14-14: LGTM! Minor typo fix.Corrected double space in comment.
18-30: LGTM! Improved readability.The reformatting of month alternatives across multiple lines enhances readability without changing semantics.
src/undate/converters/calendars/islamic/parser.py (1)
3-5: LGTM! Good centralization of grammar path resolution.Replacing the local pathlib-based path construction with the shared
GRAMMAR_FILE_PATHconstant improves maintainability and aligns with the centralized grammar management approach across all parsers in this PR.tests/test_converters/test_combined_parser.py (4)
9-22: LGTM! Comprehensive test coverage.The test cases appropriately cover multiple scenarios:
- EDTF formats (years, unknown digits, intervals)
- Hebrew calendar dates
- Islamic/Hijri calendar dates
- Various date component combinations (year-only, month-year, full dates)
26-34: LGTM! Appropriate use of repr() comparison.The test correctly validates the transformer behavior, and the use of
repr()comparison is well-justified by the comment explaining the non-strict equality of identical calendar representations.
38-40: LGTM! Good end-to-end integration test.This test validates the full converter path through
Undate.parse(), complementing the direct transformer tests.
43-45: LGTM! Appropriate validation of serialization restriction.The test correctly verifies that the omnibus converter raises
ValueErrorwhen attempting serialization, consistent with the documented limitation.src/undate/converters/grammars/combined.lark (2)
4-24: LGTM! Well-structured grammar composition.The use of relative imports with namespace renaming (edtf__, hebrew__, islamic__) correctly handles grammar composition. The comment referencing the Lark GitHub issue provides helpful context for the design decision.
27-32: LGTM! Smart disambiguation strategy.The overrides correctly prevent year-only ambiguity by requiring month+year or day+month+year for Hebrew and Islamic dates. The comment acknowledging potential future support for calendar-labeled years shows good forward thinking.
src/undate/converters/combined.py (4)
20-23: LGTM! Appropriate pass-through design.The
CombinedDateTransformer.start()method correctly delegates to the merged transformers by returning children unchanged, allowing the merged transformers to handle the actual transformation logic.
26-42: LGTM! Correct transformer merging and parser setup.The merged transformer correctly combines the three calendar transformers with appropriate namespace prefixes matching the grammar. The parser setup with
rel_to=__file__andstrict=Trueis appropriate. The comment about year-only interpretation provides helpful context for users.
62-81: LGTM! Robust parsing implementation.The
parse()method correctly:
- Validates non-empty input
- Handles the parse-transform workflow
- Extracts the first transformed item from the returned list
- Converts Lark's
UnexpectedCharactersto a more user-friendlyValueErrorwith a clear error message
83-85: LGTM! Correctly implements serialization restriction.The
to_string()method appropriately raisesValueErrorwith a clear message, consistent with the documented limitation that the omnibus converter does not support serialization (since it would be ambiguous which format to serialize to).
Experimental combined date converter that aggregates existing parsers into a single combined parser that can handle known formats supported by our various parsers.
Summary by CodeRabbit
New Features
Documentation
Improvements
Tests
✏️ Tip: You can customize this high-level summary in your review settings.