-
Notifications
You must be signed in to change notification settings - Fork 3
Description
The recommended output format for ehrQL = is arrow. However, if users run an opensafely run action locally, the arrow output file is not easily viewable. (There are a couple of VS code extensions in the marketplace, but nothing official).
The research template currently still defaults to using the output format csv.gz; this is because we have a way for users to unzip and view a csv.gz output (with the opensafely unzip command). It would be better if we could update the template to use arrow, because users are likely to use whatever we put in the starting template, irrespective of what the docs say.
We'd like to consider adding a command to either view and arrow file or (probably more useful) convert an arrow file to csv. If done in python (with pandas), this would be very simple, essentially just:
df = pd.read_feather("dataset.arrow")
df.to_csv("dataset.csv")
It should be possible to make this a wrapper around an opensafely exec python command to run the actual code using the python image, which already has the necessary pandas/pyarrow dependencies.
More details in slack thread