-
Notifications
You must be signed in to change notification settings - Fork 30
Type hints #203
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Type hints #203
Conversation
|
One quick note: |
|
Absolutely, you can copy the section from tmt's pre-commit configuration, it served there well. |
| # JSON: TypeAlias = dict[str, "JSON"] | list["JSON"] | str | int | float | bool | None | ||
| DataType: TypeAlias = Union[RawDataType, ListDataType, DictDataType] | ||
| TreeData: TypeAlias = dict[str, DataType] | ||
| TreeDataPath: TypeAlias = Union[TreeData, str] # Either TreeData or path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the naming can be confusing here between TreeData, TreeDataPath and DataType. Any suggestions?
|
MyPy is up and there are issues to resolve |
|
All tests are green here right now. Any review please? |
fmf/base.py
Outdated
| return | ||
| # Use the special merge for merging dictionaries | ||
| data_val = data[key] | ||
| if type(data_val) == type(value) == dict: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was the original code, but this is not the pythonic way of checking object classes. if isinstance(data_val, dict) and isinstance(value, dict) will work better, and supportes dict subclasses - there's no reason to not allow e.g. OrderedDict to be the source for a node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about str, do we hard-check str?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In which condition? I see elif type(data[key]) == type(value) == type(""): a couple of lines below, here I would also use isinstance: elif isinstance(data[key], str) and isinstance(value, str) - is it this one you're asking about?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in those kind of situations. I think it is better to hard-check for str type in those cases, and implement a base type if we want other merge-able string subtypes. Dict one, could also get a similar treatment eventually, but I can see a benefit for changing the type there for the time being (i.e. before converting all raw dictionary types to attrs dataclasses that would include the adjust and other fmf directives).
But on that note, do you have a small snippet to confirm that isinstance works with OrderedDict and Mapping types? I don't think it would work with the latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, in those kind of situations. I think it is better to hard-check for
strtype in those cases, and implement a base type if we want other merge-able string subtypes. Dict one, could also get a similar treatment eventually, but I can see a benefit for changing the type there for the time being (i.e. before converting all raw dictionary types toattrsdataclasses that would include theadjustand otherfmfdirectives).
IUIC, a hard check would be type(value) == type(''), but what would be the benefit over isinstance(value, str)? I'm not sure we understand each other :)
But on that note, do you have a small snippet to confirm that
isinstanceworks withOrderedDictandMappingtypes? I don't think it would work with the latter.
isinstance(OrderedDict(), dict) is true, isinstance(AMappingSubclass(), dict) is not. isinstance(AMappingSubclass(), (dict, collections.abc.Mapping)) should cover this base class as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IUIC, a hard check would be
type(value) == type(''), but what would be the benefit overisinstance(value, str)? I'm not sure we understand each other :)
Yes, though I prefer type(value) == str. That would be to enforce we are only checking for str. Mostly as a sanity check since I don't think we should be calling _mergeplus() as data[key] += value in those cases, but instead check for specialized overloads.
isinstance(OrderedDict(), dict)is true,isinstance(AMappingSubclass(), dict)is not.isinstance(AMappingSubclass(), (dict, collections.abc.Mapping))should cover this base class as well.
Hmm, how about if I implement the general case for isinstance(value), but keep type(data_val) == dict for sanity check until we implement attrs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IUIC, a hard check would be
type(value) == type(''), but what would be the benefit overisinstance(value, str)? I'm not sure we understand each other :)Yes, though I prefer
type(value) == str. That would be to enforce we are only checking forstr. Mostly as a sanity check since I don't think we should be calling_mergeplus()asdata[key] += valuein those cases, but instead check for specialized overloads.
isinstance(OrderedDict(), dict)is true,isinstance(AMappingSubclass(), dict)is not.isinstance(AMappingSubclass(), (dict, collections.abc.Mapping))should cover this base class as well.Hmm, how about if I implement the general case for
isinstance(value), but keeptype(data_val) == dictfor sanity check until we implementattrs?
But that's what I don't understand: why should we enforce str, or dict, exactly? :) Why bother? As long as value is an instance of str (or one of its subclasses), it is string-like enough for our use. We can measure its length, we can call its join(), and we can add it to another string with +=.
I see no benefit in sticking to the exact type only, TBH. From my POV, if isinstance(value, (dict, Mapping)) is true, then value is simply a dictionary, maybe in a trenchcoat, but it still can be iterated over. We perform no black magic with dict internals - we copy keys and values between dictionaries, and from time to time we call get(), that's all :) It's actually quite common, BTW, e.g. depending on the parser type, ruamel.yaml may materialize a YAML file as a pile of OrderedDict instances - by allowing dict only, we would disqualify otherwise perfectly fine dict-like objects. If it walks like a dict, if it quacks like a dict, then we can merge it with another dict-like object.
The same applies to list checks, right? We don't inspect list type internals, all we do is stuff like data[key] = [item for item in data[key] if item not in value] - as long as value is list-like, it should be absolutely acceptable.
I don't understand what issues would be solved by allowing the exact type only, or what issues would it prevent. On the other hand, checking for being list-like, dict-like, or string-like opens the door to a wider array of compatible input types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
About data_val, it will always be a dict/list/str, etc., unless we open the yaml to beyond safe_loader, at which point ambiguity enters (e.g. the original type might no longer be a dict due to a yaml tag, but with fmf we want to add to the constructor). If we keep it strict on data_val, we would at least catch any issues if something else altered the yaml loader or anything else.
For value indeed they have the same interface so the operations will allow us to do the operations. It's just that we are assuming it is what the author of the dict-like, str-like object intends. I think I have an example of a possible confusion in this same PR.
Consider Tree. It has a dict-like structure through __getitem__, get(), etc., but how I have it implemented is that keys, values, etc. are wrappers to Tree.data (because it is easier to implement and because child is more natural to access using file-api), while additional child or nested items are gathered using __getitem__ and get(). This could create a confusion when using _merge_plus().
For the time being though and until there's a plugin system to add more specialized rules for this, it would be helpful to add the isinstance checks for value. But I'm not sure how to keep track or denote the places where we might want to change it to have more specialized rules. Due to type-hinting there are quite alot of isinstance that we would have to filter through.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider
Tree. It has a dict-like structure through__getitem__,get(), etc., but how I have it implemented is thatkeys,values, etc. are wrappers toTree.data(because it is easier to implement and becausechildis more natural to access using file-api), while additionalchildor nested items are gathered using__getitem__andget(). This could create a confusion when using_merge_plus().
I think this example goes further than I wanted: yes, Tree might look similar to a dictionary, it does have some methods with similar semantics, but it's not a dict subclass. Therefore it would not pass the isinstance(value, dict) test, and our code would not try to combine it with other values.
My point was to allow subclasses of dict or list, e.g. ruamel.yaml.comments.CommentedMap, not classes that look like dict or list, i.e. class tree, not class API. To replace type(data_val) == type(value) == dict with isinstance(data_val, dict) and isinstance(value, dict) - this would not open the door to dict-like classes like Tree. If by any chance a Tree instance would be passed to _merge_plus(), I would expect none of these tests to match, and such a value would be reported as something fmf does not know how to merge with the existing tree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about isinstance(value, (dict, Mapping))? I think it's useful for Tree to have type-hint of Mapping for further down the line, e.g. if it's used in a generic map input.
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
Signed-off-by: Cristian Le <cristian.le@mpsd.mpg.de>
I don't like that the data dict type is an
Any, but this should workTreeContextfilter