Skip to content

Question on The Real-Repo's Version / Commit for the Vulnerable File #1

@yangamazon

Description

@yangamazon

Thanks for open-sourcing this security benchmarking dataset! I have a few important questions on the datasets details and want to include your benchmark to my ongoing research project.

(1) What are the versions of the real-world repo for the vulnerable files? Under the cases folder, I can see that each data point only contains the identified "input" files that were annotated to cover the vulnerability. Since these open-source GitHub repositories are continuously being updated, could you share the commit/version of the code repositories so I can obtain the complete source code?

(2) Meanwhile, I saw that you have a meta-data file named data/vader.csv, which contains the Repository column. This column includes a mixture of GitHub repo urls (i.e. https://github.com/heli-toon/LBSHS-LMS) and specific file paths (i.e. https://github.com/kishanrajput23/Jarvis-Desktop-Voice-Assistant/blob/main/Jarvis/jarvis.py). I wonder if you can clean it up.

(3) What does the "before_and_after" column stand for in the file data/vader_languages_before_after.csv? Also, the "Case" column in this data/vader_languages_before_after.csv file starts with 2 and does not match with your cases folder.

(4) There seems to be mismatching of data within the cases folder. Could you please verify and validate all the shared files?

  • For instance, case1's input and patch can be mapped to the [test_plugin.py]
    The recursive input validation functions validd() and valid() in the code call themselves repeatedly whenever the user inputs an invalid book code. This unchecked recursion can cause the call stack to grow indefinitely if the user keeps entering invalid input, eventually exhausting the stack memory and causing a stack overflow error or a RecursionError in Python.
    , which stands for case_2 in data/vadar.csv. However, the case_1_tests.txt is relevant to the Library Management System.
  • Similarly, the inputs for case_2 is mapped to case_3 in data/vadar.csv, and the test.txt does not account for the SQL injection vulnerability.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions