Skip to content

Improve expansion of reference records, parent records and sub-form records #136

@nickdickinson

Description

@nickdickinson

For the convenience of the user and to reduce the load on the server, provide an explicit grammar to allow the user to expand reference fields, parent records and sub-form records. The current automatic expansion would be used to implement these but reduce the load on the server by only showing what users explicitly want to see.

There are two potential approaches that can be implemented.

Approach 1: ActivityInfo label/code based selection of columns.

This would be easiest for ActivityInfo users who dabble in scripts. It would be a wrapper for getRecords() and select() and rename() to allow immediately selecting variables in activityInfo style with the label and/or the code or id of each column and choose the resulting names.

In pseudo R to get all inhabitants of households with a reference to a Person form for all person fields:

df = selectFormVariables(
     inhabitantFormId, 
     vars = c(
          'Household ID' =  '@parent.[ID]', 
          'Inhabitant name' = '[Inhabitant].[Full name]',
          'Mother name' = '[Inhabitant].[Mother].[Full name]', 
          'Father name' = '[Inhabitant].[Father].[Full name]', 
          'HH head gender' = '@parent.[Head of household].[Gender]')
          'HH head age' = '@parent.[Head of household].[Age]')
)

Approach 2: Tidyverse verbs

This is the best from an R developer / data science perspective. We would implement unnest verbs. These verbs include unnest_wider() for reference fields and parent records and unnest_longer() for sub-form records and potentially unnest_auto() to automatically choose the most appropriate and potentially hoist() to be more specific in selection. The example below shows how one can use the tidy select functions to powerfully select exactly which variables are needed.

In pseudo R to get all inhabitants of households with a reference to a Person form for all person fields:

getRecords(inhabitantFormId) %>%
     unnest_wider(Inhabitant) %>%
     unnest_wider(Mother, names_sep = ' ') %>%
     unnest_wider(Father, names_sep = ' ') %>%
     unnest_wider(`@parent`, ' ') %>%
     rename(`Head of household` = `@parent Head of household`) %>% 
     unnest_wider(`Head of household`, names_sep = '') %>%
     select(`Full name`, `Mother Full name`, `Father Full name`, starts_with('@parent') & contains('ID'), contains('Head') & (contains('Gender') || contains('Age'))

Principles:

  • It would unnest columns that are unexpanded reference, parent or subform fields.
  • It would do nothing if these are already expanded.
  • Allow arbitrary expansion, even if cycles are in place.
  • One could still use getRecords(inhabitantFormId, allColumnsStyle(maxDepth = n)) to be able to use the ActivityInfo label/code to select columns at depth n but we would limit this practice due to the way that the number of columns can blow up and because we don't expand forms that have already been visited.

Combined

Both approaches are compatible and could be chained. It is mainly about exposing the most useful API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions