Skip to content

PURL Qualifiers are encoded twice #215

@mciccarone

Description

@mciccarone

I have a script that extracts parameters from a DLL such as the Author and Product name and I have identified a case where the attributes are encoded twice within the PackageURL. I then use these attributes to create a PURL that can be used as a decoded UTF-8 string.

Here are some values that can be used to reproduce the issue. Using file attributes from DotNetNuke.DLL as an example.

The 'dll' contains the following methods:

def get_product_name():
    product_name = "https://dnncommunity.org" # This is the unencoded value
    return urllib.parse.quote(product_name, safe='') # This is the encoded value "https%%3A%2F%2Fdnncommunity.org" 

def get_author():
    author = ".NET Foundation" # This is the unencoded value
    return urllib.parse.quote(author, safe='') # This is the encoded value ".NET%20Foundation"

I need to combine the content in a forward slash ('/' separated format so that Nexus can understand it.
e.g. /<product_name>

purlattrs = f'{dll.get_author()}%2F{dll.get_product_name()}'
print(purlattrs) # output = '.NET%20Foundation%2Fhttps%3A%2F%2Fdnncommunity.org'
# This is the correctly encoded URL safe string

_qualifiers = {'Attr1':purlattrs), 'Attr2':'Foo'}
purl = PackageURL(type='generic', name="DotNetNuke.dll", version="9.11.0.46", qualifiers=_qualifiers)
print(purl)

The purl that is printed is
"pkg:generic/DotNetNuke.dll@9.11.0.46?Attr1=.NET%2520Foundation%252Fhttps%253A%252F%252Fdnncommunity.org&Attr2=Foo"

  • As you can see, the Space characters are encoded now as %2520
  • The Forward Slash is now %252F instead of %2F
  • The colon is now %253A instead of %3A.
  • The % Character is being encoded to %25.

If I pass in the raw string value to the PackageURL like below:

purlattrs = f".Net Foundation/https://dnncommunity.org"
print(purlattrs) # output = ".Net Foundation/https://dnncommunity.org"

I get the following output from print() when I pass in the raw string value.
"pkg:generic/DotNetNuke.dll@9.11.0.46?Attr1=.NET%20Foundation/https://dnncommunity.org&Attr2=Foo"

  • In this scenario the Encoding works for the Space, but does not work for the Slashes or Colon.

  • Recommend changing the behavior of the PURL encoding to urllib.parse.quote and url.parse.unquote ,or eliminating the encoding portion and having the PackageURL user perform the encoding/decoding.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions