Skip to content

col_paths is overly expensive #1035

@gmbecker

Description

@gmbecker

Currently col_paths is implemented as

function (x) 
{
    if (!is(coltree(x), "LayoutColTree")) {
        stop("I don't know how to extract the column paths from an object of class ", 
            class(x))
    }
    make_col_df(x, visible_only = TRUE)$path
}

The problem is make_col_df does a ton of other things unrelated to column paths. Combine this with the fact that pruning or scoring functions may need to call col_paths for every row of a table (if implemented naively) and this gives rise to a situation where for large tables we have seen repeated col_paths calls take up to 50% of the total pruning/sorting time, when each call in those contexts is guaranteed to return the same set of paths making that time entirely wasted.

I propose we extend the InstantiatedColumnInfo class to cache its set of column paths the way it already does for column subset expressions. This would make repeated col_paths calls acceptable as each one is effectively free.

In fact, the result of make_col_df doesn't depend on font the way make_row_df does, so I think we could consider caching the full result of make_col_df rather than just the col_paths...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions