Skip to content

Conversation

@gogonzo
Copy link
Contributor

@gogonzo gogonzo commented Aug 21, 2025

Closes

For more info and example please check an issue and added test.

@gogonzo gogonzo added the core label Aug 21, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Aug 21, 2025

badge

Code Coverage Summary

Filename                     Stmts    Miss  Cover    Missing
-------------------------  -------  ------  -------  -------------------
R/cdisc_data.R                   1       0  100.00%
R/deprecated.R                  18       0  100.00%
R/dummy_function.R               2       2  0.00%    15-16
R/formatters_var_labels.R       61       0  100.00%
R/join_key.R                    38       0  100.00%
R/join_keys-c.R                 12       0  100.00%
R/join_keys-extract.R          128       0  100.00%
R/join_keys-names.R             15       0  100.00%
R/join_keys-parents.R           30       0  100.00%
R/join_keys-print.R             45       7  84.44%   32-38
R/join_keys-utils.R             77       3  96.10%   35-38
R/join_keys.R                   21       0  100.00%
R/teal_data-class.R             36       6  83.33%   50-52, 59, 104-105
R/teal_data-constructor.R       11       2  81.82%   48, 51
R/teal_data-extract.R            3       0  100.00%
R/teal_data-get_code.R          13       8  38.46%   114-120, 124
R/teal_data-names.R              8       1  87.50%   42
R/teal_data-show.R               5       5  0.00%    14-20
R/topological_sort.R            32       0  100.00%
R/verify.R                      42      11  73.81%   67, 97-101, 104-108
TOTAL                          598      45  92.47%

Diff against main

Filename               Stmts    Miss  Cover
-------------------  -------  ------  -------
R/join_keys-print.R        0      +7  -15.56%
R/join_keys-utils.R       +4       0  +0.21%
TOTAL                     +4      +7  -1.13%

Results for commit: 452b6dd

Minimum allowed coverage is 80%

♻️ This comment has been updated with latest results

@github-actions
Copy link
Contributor

github-actions bot commented Aug 21, 2025

Unit Tests Summary

  1 files   15 suites   2s ⏱️
164 tests 163 ✅ 1 💤 0 ❌
219 runs  218 ✅ 1 💤 0 ❌

Results for commit 452b6dd.

♻️ This comment has been updated with latest results.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 21, 2025

Unit Test Performance Difference

Additional test case details
Test Suite $Status$ Time on main $±Time$ Test Case
join_keys-extract 💀 $0.01$ $-0.01$ join_keys_i_j_doesn_t_infer_keys_between_children_if_they_don_t_have_common_key_to_parent
join_keys-extract 👶 $+0.01$ join_keys_i_j_doesn_t_infer_keys_between_datasets_if_they_don_t_refer_to_the_same_column_in_foreign_dataset
join_keys-extract 💀 $0.02$ $-0.02$ join_keys_i_j_doesn_t_infer_keys_between_grandchildren
join_keys-extract 👶 $+0.02$ join_keys_i_j_doesn_t_infer_keys_between_grandchildren_if_they_don_t_refer_to_the_same_col_in_the_foreign_dataset
join_keys-extract 💀 $0.01$ $-0.01$ join_keys_i_j_infer_keys_between_children_through_foreign_keys_to_parent._
join_keys-extract 👶 $+0.02$ join_keys_i_j_infer_keys_between_datasets_if_they_are_linked_to_the_same_col_in_their_common_foreign_dataset.
join_keys-extract 👶 $+0.02$ join_keys_i_j_infer_keys_between_datasets_through_foreign_datasets_to_parent._
join_keys-extract 💀 $0.01$ $-0.01$ join_keys_i_j_return_NULL_for_given_pair_when_no_such_key_and_no_common_parent
join_keys-extract 👶 $+0.01$ join_keys_i_j_return_NULL_for_given_pair_when_not_linked_via_foreign_datasets_and_keys
join_keys-print 💀 $0.01$ $-0.01$ format.join_keys_print_inferred_keys_for_children_sharing_parent
join_keys-print 👶 $+0.00$ format.join_keys_print_inferred_keys_for_datasets_linked_via_foreign_dataset

Results for commit 7956c88

♻️ This comment has been updated with latest results.

@gogonzo gogonzo linked an issue Oct 20, 2025 that may be closed by this pull request
@osenan osenan self-requested a review December 15, 2025 12:04
Copy link

@osenan osenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, good changes!

My comment is on function .search.recursive. This function is important. I would slightly simplify its beginning and add internal documentation, it can be helpful for the future to debug.

- not commiting names.join_keys which confuses real items order
@gogonzo
Copy link
Contributor Author

gogonzo commented Dec 17, 2025

@osenan I've applied changes in the commented places.
I've also removed names.join_keys which I've recently added. Problem with new names.join_keys is that returns topological order while items remain in the same spot in the list. This causes some problems. Specifically, imagine situation when jk <- list(a = 1, b = 2, c = 3) and names(jk) returns for example c, a, b, then:

  • jk[1] and jk["c"] doesn't refer to the same object
  • names(jk)[1] <- "newname" fails, by replacing jk[1] and not jk["c"]

Something to think about later. Right now names(<join_keys>) returns original list order. I wish this to be ordered topologically which helps with determining merge operations (in picks).

@gogonzo
Copy link
Contributor Author

gogonzo commented Dec 18, 2025

@osenan I need to spend more time on this. .append_indirect_link is too slow for larger join_keys object:

print(defalt_cdisc_join_keys) # takes a loooooot!

osenan
osenan previously approved these changes Dec 19, 2025
Copy link

@osenan osenan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, now both documentation and the steps of the function are much easier to understand.

@osenan
Copy link

osenan commented Dec 19, 2025

@gogonzo I would create a new issue for the performance improvement. We should see if we can do the recursive search with already existing functions from other packages that are optimized (which would include extra depenencies), or don't use brute force seach if possible

@gogonzo
Copy link
Contributor Author

gogonzo commented Jan 2, 2026

I would create a new issue for the performance improvement.

@osenan I won't merge a new feature which will affect all downstream packages. Algorithm is difficult, it is still recursive (not a brute force) but continues when node is re-visited. Revisit on the node is perhaps needed to find "indirect" connection. I've tried to improve performance by skipping "indirect"

I'll come back to this issue later when I have some time off from mdr. Issue I've created is not so crucial as I found more join_keys related problems

@gogonzo gogonzo dismissed osenan’s stale review January 2, 2026 11:24

PR not ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Can't find join_keys between indirectly linked datasets

3 participants