-
Notifications
You must be signed in to change notification settings - Fork 25
Add group-wise inference risks #53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add group-wise inference risks #53
Conversation
|
@MatteoGiomi After rebasing I deleted the branch because of some conflicts, and #52 was automatically closed. |
MatteoGiomi
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this second contribution @itrajanovska, I left a few comments for your consideration.
|
|
||
| if naive: | ||
| guesses = syn.sample(n_attacks)[secret] | ||
| guesses.index = targets.index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: use reindex_like:
| guesses.index = targets.index | |
| guesses.reindex_lie(targets) |
(untested)
| if not guesses.index.equals(targets.index): | ||
| raise RuntimeError("The predictions indices do not match the target indices. Check your inference model.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to move this check inside evaluate_inference_guesses since it test for a condition necessary for the evaluation. This way we can also add a test in test_inference_evaluator to check that the exception is correctly raised.
| aux_cols: list[str], | ||
| secret: str, | ||
| regression: Optional[bool] = None, | ||
| regression: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💯
| None if self._control is None else self._attack(target=self._control, naive=False, n_jobs=n_jobs, | ||
| n_attacks=self._n_attacks_control) | ||
| ) | ||
| # n_attacks is effective here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the following comments are a bit cryptic, would be good to flesh them out a little to make them clearer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an old debuging comment and it is not effective anymore with the new logic
| return results.risk(baseline=baseline) | ||
|
|
||
| def risk_for_groups(self, confidence_level: float = 0.95) -> dict[str, tuple[EvaluationResults, PrivacyRisk]]: | ||
| """Compute the attack risks on a group level, for every unique value of `self._data_groups`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| """Compute the attack risks on a group level, for every unique value of `self._data_groups`. | |
| """Compute the inference risk for each group of targets with the same value of the secret attribute. |
or similar, there self._data_groups does not exists anymore.
| if not self._evaluated: | ||
| self.evaluate(n_jobs=-2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would raise here, just as we do for the non grouped case.
| n_attacks_ori = len(data_ori) | ||
|
|
||
| # Count the number of success attacks | ||
| n_success = evaluate_inference_guesses( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re the other comment: here you are not checking for consistent indexes. That would come for free if it's moved in evaluate_inference_guesses
| # Get the targets for the current group | ||
| common_indices = data_ori.index.intersection(self._guesses_success.index) | ||
| # Get the guesses for the current group | ||
| data_ori = data_ori.loc[common_indices] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here you are overwriting an object. Better to use a different name:
| data_ori = data_ori.loc[common_indices] | |
| target_group= data_ori.loc[common_indices] |
or something.
| n_attacks_control = -1 | ||
|
|
||
| # Recreate the EvaluationResults for the current group | ||
| assert n_attacks_ori == n_success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this check here? is this assertion always passing? n_success is the sum of a boolean array of length n_attacks_ori. The assertion should pass only if all the elements of the array (the result of evaluate_inference_guesses) are true. I don't understand, maybe I am missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is an incorrect assumption, I don't remember why I tested for this, but it looks more like a buggy line than it makes sense
tests/test_inference_evaluator.py
Outdated
| def test_inference_evaluator_leaks(aux_cols, secret): | ||
| ori = get_adult("ori", n_samples=10) | ||
| evaluator = InferenceEvaluator(ori=ori, syn=ori, control=ori, aux_cols=aux_cols, secret=secret, n_attacks=10) | ||
| ori = ori.drop_duplicates(subset=aux_cols) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a bonus, would be good to have this ability built in the get_adult function, for e.g. by having a parameter deduplicate_on which can accepts a list of columns..
…ates; Remove old comments; Add RuntimeError test.
We need group wise risks to be able to assess the fairness of the assigned risks within groups
Successor of #48 implementing the computation of the group wise risks (as well as passing a custom ML model).