Adversarial malware generator (AMG) for attacking the GBDT classifier. This repository contains source codes for my final thesis defended at CTU in Prague and research paper published in JICV (https://doi.org/10.1007/s11416-024-00516-2).
To install all necesarry Python libraries, use the provided Conda environment file amg-env.yml.
Place your binary sample into gym_malware/envs/utils/samples folder. Make 3 folders there: train, test and all (all samples combined). To switch between validation and final testing phase, change the content of test folder
For training use the RAY_train_from_config.py file. It can be used for example as: python RAY_train_from_config.py --agent=DQN --params=params_malware-ember-v0.json --stop-value=50 to train DQN agent with default configuration against GBDT (specified in the --params file) for 50 training iterations. All possible parameters are listed in the help guide.
For optimizing hyperparameters use the tune-hyperparams.py file. Use as: python tune-hyperparams.py --agent=DQN --train-env=malware-train-ember-v0 --stop-value=100. The hyperparameters to optimize are specified directly in the source code.
For testing use RAY_test_from_checkpoint file. Usage: python RAY_test_from_checkpoint.py --agent=DQN --params={PATH_TO_CONFIG} --checkpoint={PATH_TO_CHECKPOINT} --save-files=True.
As configs and checkpoints you can use files from the BEST_AGENTS folder. This folder contains config files and checkpoints for each of the tested RL algorithms (DQN, PG, PPO).
Recommendation: By uncommenting line 165 in the RAY_test_from_checkpoint.py file, you can run experiments with a random AMG agent, i.e., an agent that chooses AMG file manipulations at random. This agent is proven to be superior to the train agent against antivirus engine [1].