|
| 1 | +[](https://github.com/pytorch-ignite/code-generator) |
| 2 | + |
| 3 | +# Text Classification Template |
| 4 | + |
| 5 | +This template is ported from [Transformers Example with PyTorch-Ignite example](https://github.com/pytorch/ignite/tree/master/examples/contrib/transformers). |
| 6 | + |
| 7 | +<details> |
| 8 | +<summary> |
| 9 | +Table of Contents |
| 10 | +</summary> |
| 11 | + |
| 12 | +- [Getting Started](#getting-started) |
| 13 | +- [Training](#training) |
| 14 | +- [Configurations](#configurations) |
| 15 | + |
| 16 | +</details> |
| 17 | + |
| 18 | +## Getting Started |
| 19 | + |
| 20 | +<details> |
| 21 | +<summary> |
| 22 | +Detailed Directory List |
| 23 | +</summary> |
| 24 | + |
| 25 | +```bash |
| 26 | +text_classification |
| 27 | +├── README.md |
| 28 | +├── config.py |
| 29 | +├── dataset.py |
| 30 | +├── main.py |
| 31 | +├── models.py |
| 32 | +├── requirements.txt |
| 33 | +├── test_all.py |
| 34 | +├── trainers.py |
| 35 | +└── utils.py |
| 36 | +``` |
| 37 | + |
| 38 | +</details> |
| 39 | + |
| 40 | +- Install the dependencies with `pip`: |
| 41 | + |
| 42 | + ```sh |
| 43 | + pip install -r requirements.txt --progress-bar off -U |
| 44 | + ``` |
| 45 | + |
| 46 | +> **💡 TIP** |
| 47 | +> |
| 48 | +> To quickly adapt to the generated code structure, there are TODOs in the files that are needed to be edited. |
| 49 | +> [PyCharm TODO comments](https://www.jetbrains.com/help/pycharm/using-todo.html) or |
| 50 | +> [VSCode Todo Tree](https://marketplace.visualstudio.com/items?itemName=Gruntfuggly.todo-tree) |
| 51 | +> can help you find them easily. |
| 52 | +
|
| 53 | +## Training |
| 54 | + |
| 55 | +{% if not use_distributed_training %} |
| 56 | + |
| 57 | +### Single Node, Single GPU |
| 58 | + |
| 59 | +```bash |
| 60 | +python main.py --verbose |
| 61 | +``` |
| 62 | + |
| 63 | +{% else %} |
| 64 | + |
| 65 | +{% if nnodes < 2 %} |
| 66 | + |
| 67 | +### Single Node, Multiple GPUs |
| 68 | + |
| 69 | +{% if use_distributed_launcher %} |
| 70 | + |
| 71 | +- Using `torch.distributed.launch` (preferred) |
| 72 | + |
| 73 | + ```bash |
| 74 | + python -m torch.distributed.launch \ |
| 75 | + --nproc_per_node={{nproc_per_node}} \ |
| 76 | + --use_env main.py \ |
| 77 | + --backend="nccl" \ |
| 78 | + --verbose |
| 79 | + ``` |
| 80 | + |
| 81 | +{% else %} |
| 82 | + |
| 83 | +- Using function spawn inside the code |
| 84 | + |
| 85 | + ```bash |
| 86 | + python main.py \ |
| 87 | + --backend="nccl" \ |
| 88 | + --nproc_per_node={{nproc_per_node}} \ |
| 89 | + --verbose |
| 90 | + ``` |
| 91 | + |
| 92 | + {% endif %} |
| 93 | + |
| 94 | +{% else %} |
| 95 | + |
| 96 | +### Multiple Nodes, Multiple GPUs |
| 97 | + |
| 98 | +Let's start training on {{nnodes}} nodes with {{nproc_per_node}} gpus each: |
| 99 | + |
| 100 | +- Execute on master node |
| 101 | + |
| 102 | + ```bash |
| 103 | + python -m torch.distributed.launch \ |
| 104 | + --nnodes={{nnodes}} \ |
| 105 | + --nproc_per_node={{nproc_per_node}} \ |
| 106 | + --node_rank=0 \ |
| 107 | + --master_addr={{master_addr}} \ |
| 108 | + --master_port={{master_port}} \ |
| 109 | + --use_env main.py \ |
| 110 | + --backend="nccl" \ |
| 111 | + --verbose |
| 112 | + ``` |
| 113 | + |
| 114 | +- Execute on worker nodes |
| 115 | + |
| 116 | + ```bash |
| 117 | + python -m torch.distributed.launch \ |
| 118 | + --nnodes={{nnodes}} \ |
| 119 | + --nproc_per_node={{nproc_per_node}} \ |
| 120 | + --node_rank=<node_rank> \ |
| 121 | + --master_addr={{master_addr}} \ |
| 122 | + --master_port={{master_port}} \ |
| 123 | + --use_env main.py \ |
| 124 | + --backend="nccl" \ |
| 125 | + --verbose |
| 126 | + ``` |
| 127 | + |
| 128 | + {% endif %} |
| 129 | + {% endif %} |
| 130 | + |
| 131 | +## Configurations |
| 132 | + |
| 133 | +```bash |
| 134 | +usage: main.py [-h] [--use_amp] [--resume_from RESUME_FROM] [--seed SEED] [--verbose] [--backend BACKEND] |
| 135 | + [--nproc_per_node NPROC_PER_NODE] [--node_rank NODE_RANK] [--nnodes NNODES] |
| 136 | + [--master_addr MASTER_ADDR] [--master_port MASTER_PORT] [--epoch_length EPOCH_LENGTH] |
| 137 | + [--save_every_iters SAVE_EVERY_ITERS] [--n_saved N_SAVED] [--log_every_iters LOG_EVERY_ITERS] |
| 138 | + [--with_pbars WITH_PBARS] [--with_pbar_on_iters WITH_PBAR_ON_ITERS] |
| 139 | + [--stop_on_nan STOP_ON_NAN] [--clear_cuda_cache CLEAR_CUDA_CACHE] |
| 140 | + [--with_gpu_stats WITH_GPU_STATS] [--patience PATIENCE] [--limit_sec LIMIT_SEC] |
| 141 | + [--output_dir OUTPUT_DIR] [--logger_log_every_iters LOGGER_LOG_EVERY_ITERS] |
| 142 | + [--data_dir DATA_DIR] [--model {bert-base-uncased}] [--model_dir MODEL_DIR] |
| 143 | + [--tokenizer_dir TOKENIZER_DIR] [--num_classes NUM_CLASSES] [--dropout DROPOUT] [--n_fc N_FC] |
| 144 | + [--max_length MAX_LENGTH] [--batch_size BATCH_SIZE] [--weight_decay WEIGHT_DECAY] |
| 145 | + [--num_workers NUM_WORKERS] [--max_epochs MAX_EPOCHS] [--learning_rate LEARNING_RATE] |
| 146 | + [--num_warmup_epochs NUM_WARMUP_EPOCHS] [--validate_every VALIDATE_EVERY] |
| 147 | + [--checkpoint_every CHECKPOINT_EVERY] [--eval_epoch_length EVAL_EPOCH_LENGTH] |
| 148 | + |
| 149 | +optional arguments: |
| 150 | + -h, --help show this help message and exit |
| 151 | + --use_amp use torch.cuda.amp for automatic mixed precision. Default: False |
| 152 | + --resume_from RESUME_FROM |
| 153 | + path to the checkpoint file to resume, can also url starting with https. Default: |
| 154 | + None |
| 155 | + --seed SEED seed to use in ignite.utils.manual_seed(). Default: 666 |
| 156 | + --verbose use logging.INFO in ignite.utils.setup_logger. Default: False |
| 157 | + --backend BACKEND backend to use for distributed training. Default: None |
| 158 | + --nproc_per_node NPROC_PER_NODE |
| 159 | + number of processes to launch on each node, for GPU training this is recommended to |
| 160 | + be set to the number of GPUs in your system so that each process can be bound to a |
| 161 | + single GPU. Default: None |
| 162 | + --node_rank NODE_RANK |
| 163 | + rank of the node for multi-node distributed training. Default: None |
| 164 | + --nnodes NNODES number of nodes to use for distributed training. Default: None |
| 165 | + --master_addr MASTER_ADDR |
| 166 | + master node TCP/IP address for torch native backends. Default: None |
| 167 | + --master_port MASTER_PORT |
| 168 | + master node port for torch native backends. Default: None |
| 169 | + --train_epoch_length EPOCH_LENGTH |
| 170 | + epoch_length of Engine.run() for training. Default: None |
| 171 | + --eval_epoch_length EVAL_EPOCH_LENGTH |
| 172 | + epoch_length of Engine.run() for evaluation. Default: None |
| 173 | + --save_every_iters SAVE_EVERY_ITERS |
| 174 | + Saving iteration interval. Default: 1000 |
| 175 | + --n_saved N_SAVED number of best models to store. Default: 2 |
| 176 | + --log_every_iters LOG_EVERY_ITERS |
| 177 | + Argument to log batch loss every log_every_iters iterations. 0 to disable it. |
| 178 | + Default: 100 |
| 179 | + --with_pbars WITH_PBARS |
| 180 | + show epoch-wise and iteration-wise progress bars. Default: False |
| 181 | + --with_pbar_on_iters WITH_PBAR_ON_ITERS |
| 182 | + show iteration progress bar or not. Default: True |
| 183 | + --stop_on_nan STOP_ON_NAN |
| 184 | + stop the training if engine output contains NaN/inf values. Default: True |
| 185 | + --clear_cuda_cache CLEAR_CUDA_CACHE |
| 186 | + clear cuda cache every end of epoch. Default: True |
| 187 | + --with_gpu_stats WITH_GPU_STATS |
| 188 | + show gpu information, requires pynvml. Default: False |
| 189 | + --patience PATIENCE number of events to wait if no improvement and then stop the training. Default: None |
| 190 | + --limit_sec LIMIT_SEC |
| 191 | + maximum time before training terminates in seconds. Default: None |
| 192 | + --output_dir OUTPUT_DIR |
| 193 | + directory to save all outputs. Default: ./logs |
| 194 | + --logger_log_every_iters LOGGER_LOG_EVERY_ITERS |
| 195 | + logging interval for experiment tracking system. Default: 100 |
| 196 | + --data_dir DATA_DIR Dataset cache directory. Default: ./ |
| 197 | + --model {bert-base-uncased} |
| 198 | + Model name (from transformers) to setup model, tokenize and config to train. |
| 199 | + Default: bert-base-uncased |
| 200 | + --model_dir MODEL_DIR |
| 201 | + Cache directory to download the pretrained model. Default: ./ |
| 202 | + --tokenizer_dir TOKENIZER_DIR |
| 203 | + Tokenizer cache directory. Default: ./tokenizer |
| 204 | + --num_classes NUM_CLASSES |
| 205 | + Number of target classes. Default: 1 |
| 206 | + --dropout DROPOUT Dropout probability. Default: 0.3 |
| 207 | + --n_fc N_FC Number of neurons in the last fully connected layer. Default: 768 |
| 208 | + --max_length MAX_LENGTH |
| 209 | + Maximum number of tokens for the inputs to the transformer model. Default: 256 |
| 210 | + --batch_size BATCH_SIZE |
| 211 | + Total batch size. Default: 16 |
| 212 | + --weight_decay WEIGHT_DECAY |
| 213 | + Weight decay. Default: 0.01 |
| 214 | + --num_workers NUM_WORKERS |
| 215 | + Number of workers in the data loader. Default: 2 |
| 216 | + --max_epochs MAX_EPOCHS |
| 217 | + Number of epochs to train the model. Default: 3 |
| 218 | + --learning_rate LEARNING_RATE |
| 219 | + Peak of piecewise linear learning rate scheduler. Default: 5e-05 |
| 220 | + --num_warmup_epochs NUM_WARMUP_EPOCHS |
| 221 | + Number of warm-up epochs before learning rate decay. Default: 0 |
| 222 | + --validate_every VALIDATE_EVERY |
| 223 | + Run model's validation every validate_every epochs. Default: 1 |
| 224 | + --checkpoint_every CHECKPOINT_EVERY |
| 225 | + Store training checkpoint every checkpoint_every iterations. Default: 1000 |
| 226 | +``` |
0 commit comments