-
Notifications
You must be signed in to change notification settings - Fork 89
[SVCS-334]Adding custom sentry sanitizer #297
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVCS-334]Adding custom sentry sanitizer #297
Conversation
cslzchen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, need recursion for dict type and a question on UTF-8.
| MASK = '*' * 8 | ||
|
|
||
| # Token and key added from original. Key is used by Dataverse | ||
| FIELDS = frozenset([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For phase 2 consideration: double check the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cslzchen Can you please clarify this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomBaxter My suggestion is check if there are other possible keys that may store sensitive or secret information. Here is the default keys from SanitizePasswordProcessor: https://github.com/htquach/raven-python/blob/master/raven/processors.py#L72.
FIELDS = frozenset([
'password', 'secret', 'passwd', 'authorization', 'api_key', 'apikey',
'pw', 'cred'
])We don't have 'pw', 'cred' and we added 'access_token', 'token', 'key' and 'sentry_dsn'. Other possible ones are 'refresh_token', 'code', etc. We need to understand what we need and what we don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. I plan on making better use of inheritance so that we don't miss any defaults in the parent. I'll also check the sentry data for Dataverse, for other possible things to sanitize. I'll definitely add refresh_token.
| if isinstance(value, str) and self.VALUES_RE.search(value): | ||
| return self.MASK | ||
|
|
||
| if isinstance(value, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need recursion for dict + a test fot it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete, for lists as well
| # Just in case we have bytes here, we want to turn them into text | ||
| # properly without failing so we can perform our check. | ||
| if isinstance(key, bytes): | ||
| key = key.decode('utf-8', 'replace') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed to be correctly UTF-8 encoded?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure. This line is directly from raven, so I assume so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the code base and it turns out we rarely try en/decoding. Let's leave it this way. In the future we may want to have a cleanup ticket that add try...except wrapper on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ill add a comment
0aae5df to
fb273ea
Compare
56d7c6b to
b776bc8
Compare
Johnetordoff
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an import order problem and some debugging stuff to get rid of.
| @@ -0,0 +1,196 @@ | |||
| import pytest | |||
| from unittest import mock | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unittest is a built-in library, import order.
tests/server/test_sanitize.py
Outdated
| result = sanitizer.sanitize('sanitize_dict', sanitize_dict) | ||
|
|
||
| # Sanity check | ||
| assert result != { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessary given the next the statement.
1 similar comment
| debug=debug, | ||
| ) | ||
| app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__) | ||
| app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to update this after bumping Update: ignore this commentsetuptools to 37.0.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cslzchen Can you please clarify this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomBaxter Thanks for catching this. I probably was thinking about MFR. No need to change.
| MASK = '*' * 8 | ||
|
|
||
| # Token and key added from original. Key is used by Dataverse | ||
| FIELDS = frozenset([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cslzchen Can you please clarify this comment?
| debug=debug, | ||
| ) | ||
| app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__) | ||
| app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cslzchen Can you please clarify this comment?
|
|
||
| key = key.lower() | ||
| for field in self.FIELDS: | ||
| if field in key: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check key directly from self.FIELDS
|
|
||
|
|
||
| class WBSanitizer(SanitizePasswordsProcessor): | ||
| """Asterisk out things that look like passwords, keys, etc.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make better use of parent class.
|
As mentioned, this PR is closed and continues in #313. |
Ticket
https://openscience.atlassian.net/browse/SVCS-334
Purpose
Sanitize and censor tokens we get back from Dataverse that show up in local variables
Changes
Added a sanitizer based that inherits from the default raven one. It has increased functionality to parse through dictionaries, look for dataverse keys, and a larger scope of variable names to sanitize (key, token)
NOTE:
sanitize.pyandtest_sanitize.py's location in waterbutler are pretty variable. I put them in server just because that was the easiest place at the time. if they should live somewhere else, that is an easy change, and something that might need to be considered.Side effects
Some sentry things that we don't want to sanitize may end up being sanitized.
QA Notes
To test, you need to trigger an error that will log a Dataverse API token in a local variable or request some how.
An easy way to do this is if locally testing, upgrade your furl to 1.0.1 in Waterbutler with Dataverse attached to your project. You will also need to add a personal sentry account to your Waterbutler (through your raven.config file, or by manually putting it in your settings).
One thing to note: If an API key shows up in the actual error message, this is currently not censored (not sure best way to go about this, but the only time I ever encountered this was with the error :
Bad API token.. so the token is not valid anymore anyway.To trigger the above "Bad API token" functionality, attach a dataverse account. Then on dataverse, go refresh your token and refresh the page. This error will trigger until the OSF has to revalidate (Or something like that. This only happened to me once or twice).
There are other manual ways to test locally, such as raising errors in odd places with variables named things like 'key' or having a variable value that looks like a dataverse token etc.
Deployment Notes