Skip to content

Conversation

@LNSD
Copy link
Contributor

@LNSD LNSD commented Dec 11, 2025

Introduce invariant-preserving newtype wrappers for physical table URLs to enforce type safety at compile time rather than relying on runtime string validation.

  • Add PhyTableUrl in common with FromStr, Display, and conversion traits
  • Add TableUrl/TableUrlOwned in metadata-db following the Name borrowing pattern
  • Update physical_table module API to use typed URL parameters
  • Convert tests to use the new URL types

Introduce invariant-preserving newtype wrappers for physical table URLs
to enforce type safety at compile time rather than relying on runtime string validation.

- Add `PhyTableUrl` in common with `FromStr`, `Display`, and conversion traits
- Add `TableUrl`/`TableUrlOwned` in metadata-db following the `Name` borrowing pattern
- Update `physical_table` module API to use typed URL parameters
- Convert tests to use the new URL types

Signed-off-by: Lorenzo Delgado <lorenzo@edgeandnode.com>
@LNSD LNSD requested a review from JohnSwan1503 December 11, 2025 16:31
@LNSD LNSD self-assigned this Dec 11, 2025
Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR successfully introduces type-safe URL handling for physical table locations, replacing string-based URLs with dedicated newtype wrappers. The implementation follows good Rust patterns and improves type safety throughout the codebase.

Strengths

  • Clear separation of concerns: The two-layer approach with PhyTableUrl in common and TableUrl/TableUrlOwned in metadata-db is well-designed
  • Good documentation: Clear explanations of the validation strategy and URL format
  • Improved test quality: Test assertions now include descriptive messages that make failures easier to debug
  • Type safety: The newtype pattern prevents accidental misuse of URL strings

Suggestions for Improvement

  1. Validation: Consider adding more comprehensive validation in the FromStr implementation to ensure URLs meet object store requirements (valid schemes, proper format)
  2. Error handling: The expect("database URL should be valid") pattern could be replaced with more graceful error handling
  3. Safety documentation: Some safety comments mention "undefined behavior" which seems too strong for string validation issues

Minor Notes

  • The borrowing pattern implementation in metadata-db following the Name pattern is consistent and well-executed
  • The conversion traits between the two newtype layers are properly implemented with appropriate safety comments

Overall, this is a solid refactoring that improves type safety and code maintainability. The suggestions above are mostly minor improvements that could enhance the robustness of the implementation.

/// ## Example
///
/// ```text
/// s3://my-bucket/ethereum_mainnet/logs/01234567-89ab-cdef-0123-456789abcdef/
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good documentation! The format description and example clearly explain the URL structure, which helps future maintainers understand what this type represents.

/// Create a new [`PhyTableUrl`] from a [`url::Url`] without validation
///
/// # Safety
/// The caller must ensure the provided URL is a valid object store URL.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider documenting what constitutes a "valid object store URL" here. While the comment mentions safety, it would be helpful to specify the validation requirements (e.g., must be parseable by url::Url, must have a supported scheme like s3/gs/azure/file).

}

impl From<metadata_db::physical_table::TableUrlOwned> for PhyTableUrl {
fn from(value: metadata_db::physical_table::TableUrlOwned) -> Self {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error handling here with expect("database URL should be valid") could be improved. Consider using a more graceful error handling approach that provides context about which URL failed to parse, perhaps by returning a Result or using a custom error type that includes the URL string in the error message.

//! Types that convert into [`Url`] are responsible for ensuring invariants are met:
//! - URLs must be valid object store URLs (parseable by `url::Url`)
//! - URLs must have a supported scheme (s3, gs, azure, file)
//! - URLs should represent directory locations (typically ending with `/`)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent documentation of the validation strategy! The clear separation between "maintaining invariants" vs "validating input" and the explanation of where validation occurs (at system boundaries) follows best practices for type safety.

///
/// # Safety
/// The caller must ensure the provided URL upholds the physical table URL invariants.
/// This method does not perform validation. Failure to uphold the invariants may
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The # Safety documentation mentions "undefined behavior" but this seems overly strong for a URL validation issue. Since this is Rust safe code working with strings, you won't get memory unsafety. Consider rephrasing to something like "Failure to uphold the invariants may cause logical errors or panics in downstream code."


//* Then
assert!(*location_id > 0);
assert!(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement to the test assertions! The descriptive messages make test failures much more informative and easier to debug. This pattern should be encouraged throughout the test suite.

type Err = PhyTableUrlParseError;

fn from_str(s: &str) -> Result<Self, Self::Err> {
let url = s.parse().map_err(|err| PhyTableUrlParseError {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding validation in the FromStr implementation to ensure the URL meets the requirements (valid scheme, proper format, etc.). Currently it only checks if it's parseable as a URL, but doesn't validate it's a valid object store URL.

manifest_hash: impl Into<ManifestHash<'_>> + std::fmt::Debug,
table_name: impl Into<TableName<'_>> + std::fmt::Debug,
url: &str,
url: impl Into<TableUrl<'_>> + std::fmt::Debug,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good API design using the Into trait for the url parameter. This provides flexibility for callers while maintaining type safety.

Copy link
Contributor

@JohnSwan1503 JohnSwan1503 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

Can similarly replace url::Url in the structs/functions operating on the file_metadata table. Also, #1407 can be updated to reflect the new Url and its patterns for the gc_manifest table.

@LNSD LNSD merged commit ec779ea into main Dec 11, 2025
8 checks passed
@LNSD LNSD deleted the lnsd/refactor-metadata-db-physical-tables-newtype branch December 11, 2025 17:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants