-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
We should stop considering lists to be typed arrays, because they're not.
Currently, a list of strings is treated as a typed array of strings. This is difficult as:
- Every function needs to scan the list to check that, indeed, the list only contains strings.
- Every function also needs to scan the list to check whether the list contains
Nonevalues to represent missing strings. - It also introduces ambiguity, e.g., is a list of strings to be interpreted as a typed array that can only ever contain strings or as an unstructured list of arbitrary objects? What should we guess
[]to be? (This has consequences forsingledispatch.)
So I propose that all arrays of strings should now use numpy.array with the string type, which is closer to R's character vectors than Python list. This includes, e.g., the row and column names of the BiocFrame, the levels of the Factor, and so on.
The case is even easier to make for numeric types.
Metadata
Metadata
Assignees
Labels
No labels