Skip to content

Consider making a new command to make .arrow output files more easily viewable #324

@rebkwok

Description

@rebkwok

The recommended output format for ehrQL = is arrow. However, if users run an opensafely run action locally, the arrow output file is not easily viewable. (There are a couple of VS code extensions in the marketplace, but nothing official).

The research template currently still defaults to using the output format csv.gz; this is because we have a way for users to unzip and view a csv.gz output (with the opensafely unzip command). It would be better if we could update the template to use arrow, because users are likely to use whatever we put in the starting template, irrespective of what the docs say.

We'd like to consider adding a command to either view and arrow file or (probably more useful) convert an arrow file to csv. If done in python (with pandas), this would be very simple, essentially just:

df = pd.read_feather("dataset.arrow")
df.to_csv("dataset.csv")

It should be possible to make this a wrapper around an opensafely exec python command to run the actual code using the python image, which already has the necessary pandas/pyarrow dependencies.

More details in slack thread

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions