Skip to content

Conversation

@itrajanovska
Copy link
Contributor

We need group wise risks to be able to assess the fairness of the assigned risks within groups
Successor of #48 implementing the computation of the group wise risks (as well as passing a custom ML model).

@itrajanovska
Copy link
Contributor Author

@MatteoGiomi After rebasing I deleted the branch because of some conflicts, and #52 was automatically closed.

Copy link
Member

@MatteoGiomi MatteoGiomi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this second contribution @itrajanovska, I left a few comments for your consideration.


if naive:
guesses = syn.sample(n_attacks)[secret]
guesses.index = targets.index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: use reindex_like:

Suggested change
guesses.index = targets.index
guesses.reindex_lie(targets)

(untested)

Comment on lines 42 to 43
if not guesses.index.equals(targets.index):
raise RuntimeError("The predictions indices do not match the target indices. Check your inference model.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to move this check inside evaluate_inference_guesses since it test for a condition necessary for the evaluation. This way we can also add a test in test_inference_evaluator to check that the exception is correctly raised.

aux_cols: list[str],
secret: str,
regression: Optional[bool] = None,
regression: bool = False,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

None if self._control is None else self._attack(target=self._control, naive=False, n_jobs=n_jobs,
n_attacks=self._n_attacks_control)
)
# n_attacks is effective here
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the following comments are a bit cryptic, would be good to flesh them out a little to make them clearer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an old debuging comment and it is not effective anymore with the new logic

return results.risk(baseline=baseline)

def risk_for_groups(self, confidence_level: float = 0.95) -> dict[str, tuple[EvaluationResults, PrivacyRisk]]:
"""Compute the attack risks on a group level, for every unique value of `self._data_groups`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Compute the attack risks on a group level, for every unique value of `self._data_groups`.
"""Compute the inference risk for each group of targets with the same value of the secret attribute.

or similar, there self._data_groups does not exists anymore.

Comment on lines 300 to 301
if not self._evaluated:
self.evaluate(n_jobs=-2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would raise here, just as we do for the non grouped case.

n_attacks_ori = len(data_ori)

# Count the number of success attacks
n_success = evaluate_inference_guesses(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re the other comment: here you are not checking for consistent indexes. That would come for free if it's moved in evaluate_inference_guesses

# Get the targets for the current group
common_indices = data_ori.index.intersection(self._guesses_success.index)
# Get the guesses for the current group
data_ori = data_ori.loc[common_indices]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you are overwriting an object. Better to use a different name:

Suggested change
data_ori = data_ori.loc[common_indices]
target_group= data_ori.loc[common_indices]

or something.

n_attacks_control = -1

# Recreate the EvaluationResults for the current group
assert n_attacks_ori == n_success
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this check here? is this assertion always passing? n_success is the sum of a boolean array of length n_attacks_ori. The assertion should pass only if all the elements of the array (the result of evaluate_inference_guesses) are true. I don't understand, maybe I am missing something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an incorrect assumption, I don't remember why I tested for this, but it looks more like a buggy line than it makes sense

def test_inference_evaluator_leaks(aux_cols, secret):
ori = get_adult("ori", n_samples=10)
evaluator = InferenceEvaluator(ori=ori, syn=ori, control=ori, aux_cols=aux_cols, secret=secret, n_attacks=10)
ori = ori.drop_duplicates(subset=aux_cols)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a bonus, would be good to have this ability built in the get_adult function, for e.g. by having a parameter deduplicate_on which can accepts a list of columns..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants