Skip to content

Conversation

@AddisonSchiller
Copy link
Contributor

Ticket

https://openscience.atlassian.net/browse/SVCS-334

Purpose

Sanitize and censor tokens we get back from Dataverse that show up in local variables

Changes

Added a sanitizer based that inherits from the default raven one. It has increased functionality to parse through dictionaries, look for dataverse keys, and a larger scope of variable names to sanitize (key, token)

NOTE: sanitize.py and test_sanitize.py's location in waterbutler are pretty variable. I put them in server just because that was the easiest place at the time. if they should live somewhere else, that is an easy change, and something that might need to be considered.

Side effects

Some sentry things that we don't want to sanitize may end up being sanitized.

QA Notes

To test, you need to trigger an error that will log a Dataverse API token in a local variable or request some how.
An easy way to do this is if locally testing, upgrade your furl to 1.0.1 in Waterbutler with Dataverse attached to your project. You will also need to add a personal sentry account to your Waterbutler (through your raven.config file, or by manually putting it in your settings).

One thing to note: If an API key shows up in the actual error message, this is currently not censored (not sure best way to go about this, but the only time I ever encountered this was with the error :Bad API token.. so the token is not valid anymore anyway.

To trigger the above "Bad API token" functionality, attach a dataverse account. Then on dataverse, go refresh your token and refresh the page. This error will trigger until the OSF has to revalidate (Or something like that. This only happened to me once or twice).

There are other manual ways to test locally, such as raising errors in odd places with variables named things like 'key' or having a variable value that looks like a dataverse token etc.

Deployment Notes

@coveralls
Copy link

coveralls commented Nov 8, 2017

Coverage Status

Coverage increased (+0.04%) to 89.141% when pulling 853afcc on AddisonSchiller:feature/SVCS-334-davaverse-tokens into 473191c on CenterForOpenScience:develop.

@cslzchen cslzchen self-requested a review November 10, 2017 19:45
Copy link
Contributor

@cslzchen cslzchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, need recursion for dict type and a question on UTF-8.

MASK = '*' * 8

# Token and key added from original. Key is used by Dataverse
FIELDS = frozenset([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For phase 2 consideration: double check the list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cslzchen Can you please clarify this comment?

Copy link
Contributor

@cslzchen cslzchen Dec 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomBaxter My suggestion is check if there are other possible keys that may store sensitive or secret information. Here is the default keys from SanitizePasswordProcessor: https://github.com/htquach/raven-python/blob/master/raven/processors.py#L72.

    FIELDS = frozenset([
        'password', 'secret', 'passwd', 'authorization', 'api_key', 'apikey',
        'pw', 'cred'
    ])

We don't have 'pw', 'cred' and we added 'access_token', 'token', 'key' and 'sentry_dsn'. Other possible ones are 'refresh_token', 'code', etc. We need to understand what we need and what we don't.

Copy link
Member

@TomBaxter TomBaxter Dec 26, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood. I plan on making better use of inheritance so that we don't miss any defaults in the parent. I'll also check the sentry data for Dataverse, for other possible things to sanitize. I'll definitely add refresh_token.

if isinstance(value, str) and self.VALUES_RE.search(value):
return self.MASK

if isinstance(value, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need recursion for dict + a test fot it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Complete, for lists as well

# Just in case we have bytes here, we want to turn them into text
# properly without failing so we can perform our check.
if isinstance(key, bytes):
key = key.decode('utf-8', 'replace')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed to be correctly UTF-8 encoded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure. This line is directly from raven, so I assume so.

Copy link
Contributor

@cslzchen cslzchen Nov 17, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the code base and it turns out we rarely try en/decoding. Let's leave it this way. In the future we may want to have a cleanup ticket that add try...except wrapper on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ill add a comment

@coveralls
Copy link

coveralls commented Nov 16, 2017

Coverage Status

Coverage increased (+0.03%) to 89.973% when pulling fa72098 on AddisonSchiller:feature/SVCS-334-davaverse-tokens into 26bf209 on CenterForOpenScience:develop.

@coveralls
Copy link

coveralls commented Nov 17, 2017

Coverage Status

Coverage increased (+0.05%) to 89.994% when pulling 12f163e on AddisonSchiller:feature/SVCS-334-davaverse-tokens into 26bf209 on CenterForOpenScience:develop.

@AddisonSchiller AddisonSchiller force-pushed the feature/SVCS-334-davaverse-tokens branch from 0aae5df to fb273ea Compare November 17, 2017 16:09
@AddisonSchiller AddisonSchiller force-pushed the feature/SVCS-334-davaverse-tokens branch from 56d7c6b to b776bc8 Compare November 17, 2017 16:13
@coveralls
Copy link

coveralls commented Nov 17, 2017

Coverage Status

Coverage increased (+0.05%) to 89.994% when pulling b776bc8 on AddisonSchiller:feature/SVCS-334-davaverse-tokens into 26bf209 on CenterForOpenScience:develop.

Copy link
Contributor

@Johnetordoff Johnetordoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an import order problem and some debugging stuff to get rid of.

@@ -0,0 +1,196 @@
import pytest
from unittest import mock
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unittest is a built-in library, import order.

result = sanitizer.sanitize('sanitize_dict', sanitize_dict)

# Sanity check
assert result != {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not necessary given the next the statement.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 89.994% when pulling 1df226a on AddisonSchiller:feature/SVCS-334-davaverse-tokens into 26bf209 on CenterForOpenScience:develop.

1 similar comment
@coveralls
Copy link

Coverage Status

Coverage increased (+0.05%) to 89.994% when pulling 1df226a on AddisonSchiller:feature/SVCS-334-davaverse-tokens into 26bf209 on CenterForOpenScience:develop.

debug=debug,
)
app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__)
app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__,
Copy link
Contributor

@cslzchen cslzchen Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to update this after bumping setuptools to 37.0.0. Update: ignore this comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cslzchen Can you please clarify this comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TomBaxter Thanks for catching this. I probably was thinking about MFR. No need to change.

MASK = '*' * 8

# Token and key added from original. Key is used by Dataverse
FIELDS = frozenset([
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cslzchen Can you please clarify this comment?

debug=debug,
)
app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__)
app.sentry_client = AsyncSentryClient(settings.SENTRY_DSN, release=waterbutler.__version__,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cslzchen Can you please clarify this comment?


key = key.lower()
for field in self.FIELDS:
if field in key:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check key directly from self.FIELDS



class WBSanitizer(SanitizePasswordsProcessor):
"""Asterisk out things that look like passwords, keys, etc."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make better use of parent class.

@cslzchen
Copy link
Contributor

cslzchen commented Jan 3, 2018

As mentioned, this PR is closed and continues in #313.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants