Skip to content

Stop treating lists as typed arrays #2

@LTLA

Description

@LTLA

We should stop considering lists to be typed arrays, because they're not.

Currently, a list of strings is treated as a typed array of strings. This is difficult as:

  • Every function needs to scan the list to check that, indeed, the list only contains strings.
  • Every function also needs to scan the list to check whether the list contains None values to represent missing strings.
  • It also introduces ambiguity, e.g., is a list of strings to be interpreted as a typed array that can only ever contain strings or as an unstructured list of arbitrary objects? What should we guess [] to be? (This has consequences for singledispatch.)

So I propose that all arrays of strings should now use numpy.array with the string type, which is closer to R's character vectors than Python list. This includes, e.g., the row and column names of the BiocFrame, the levels of the Factor, and so on.

The case is even easier to make for numeric types.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions