Skip to content

Commit 1c112fc

Browse files
trsvchnydcjeffvfdev-5
authored
Add transformer template: text classification task (#19)
* Add transformer minimal template * Refactor template structure to fit the latest app updates * Yet another refactoring * Refactoring * Finalize * Fixing tests * Reduce batch size * Fix trainer run * Adjust tests run for text_classifier * Increase batch size * Add eval_epoch_length param * Apply suggestions from code review Co-authored-by: Jeff Yang <32727188+ydcjeff@users.noreply.github.com> Co-authored-by: vfdev <vfdev.5@gmail.com> * Move params from toml to py * Fix linting issues * Fix indentations * Improve params label * Use default specifier to indicate default values * Indicate default values * Use validate_every, fix eval metrics logging * Finalizing * Reduce batch size to 4 * isort and black * Update templates/text_classification/main.py Co-authored-by: Jeff Yang <32727188+ydcjeff@users.noreply.github.com> Co-authored-by: vfdev <vfdev.5@gmail.com>
1 parent e4c654b commit 1c112fc

File tree

12 files changed

+1222
-0
lines changed

12 files changed

+1222
-0
lines changed

app/streamlit_app.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@
1212

1313
FOLDER_TO_TEMPLATE_NAME = {
1414
"Image Classification": "image_classification",
15+
"Text Classification": "text_classification",
1516
"Generative Adversarial Network": "gan",
1617
"Single Model, Single Optimizer": "single",
1718
}
Lines changed: 226 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,226 @@
1+
[![Code-Generator](https://badgen.net/badge/Template%20by/Code-Generator/ee4c2c?labelColor=eaa700)](https://github.com/pytorch-ignite/code-generator)
2+
3+
# Text Classification Template
4+
5+
This template is ported from [Transformers Example with PyTorch-Ignite example](https://github.com/pytorch/ignite/tree/master/examples/contrib/transformers).
6+
7+
<details>
8+
<summary>
9+
Table of Contents
10+
</summary>
11+
12+
- [Getting Started](#getting-started)
13+
- [Training](#training)
14+
- [Configurations](#configurations)
15+
16+
</details>
17+
18+
## Getting Started
19+
20+
<details>
21+
<summary>
22+
Detailed Directory List
23+
</summary>
24+
25+
```bash
26+
text_classification
27+
├── README.md
28+
├── config.py
29+
├── dataset.py
30+
├── main.py
31+
├── models.py
32+
├── requirements.txt
33+
├── test_all.py
34+
├── trainers.py
35+
└── utils.py
36+
```
37+
38+
</details>
39+
40+
- Install the dependencies with `pip`:
41+
42+
```sh
43+
pip install -r requirements.txt --progress-bar off -U
44+
```
45+
46+
> **💡 TIP**
47+
>
48+
> To quickly adapt to the generated code structure, there are TODOs in the files that are needed to be edited.
49+
> [PyCharm TODO comments](https://www.jetbrains.com/help/pycharm/using-todo.html) or
50+
> [VSCode Todo Tree](https://marketplace.visualstudio.com/items?itemName=Gruntfuggly.todo-tree)
51+
> can help you find them easily.
52+
53+
## Training
54+
55+
{% if not use_distributed_training %}
56+
57+
### Single Node, Single GPU
58+
59+
```bash
60+
python main.py --verbose
61+
```
62+
63+
{% else %}
64+
65+
{% if nnodes < 2 %}
66+
67+
### Single Node, Multiple GPUs
68+
69+
{% if use_distributed_launcher %}
70+
71+
- Using `torch.distributed.launch` (preferred)
72+
73+
```bash
74+
python -m torch.distributed.launch \
75+
--nproc_per_node={{nproc_per_node}} \
76+
--use_env main.py \
77+
--backend="nccl" \
78+
--verbose
79+
```
80+
81+
{% else %}
82+
83+
- Using function spawn inside the code
84+
85+
```bash
86+
python main.py \
87+
--backend="nccl" \
88+
--nproc_per_node={{nproc_per_node}} \
89+
--verbose
90+
```
91+
92+
{% endif %}
93+
94+
{% else %}
95+
96+
### Multiple Nodes, Multiple GPUs
97+
98+
Let's start training on {{nnodes}} nodes with {{nproc_per_node}} gpus each:
99+
100+
- Execute on master node
101+
102+
```bash
103+
python -m torch.distributed.launch \
104+
--nnodes={{nnodes}} \
105+
--nproc_per_node={{nproc_per_node}} \
106+
--node_rank=0 \
107+
--master_addr={{master_addr}} \
108+
--master_port={{master_port}} \
109+
--use_env main.py \
110+
--backend="nccl" \
111+
--verbose
112+
```
113+
114+
- Execute on worker nodes
115+
116+
```bash
117+
python -m torch.distributed.launch \
118+
--nnodes={{nnodes}} \
119+
--nproc_per_node={{nproc_per_node}} \
120+
--node_rank=<node_rank> \
121+
--master_addr={{master_addr}} \
122+
--master_port={{master_port}} \
123+
--use_env main.py \
124+
--backend="nccl" \
125+
--verbose
126+
```
127+
128+
{% endif %}
129+
{% endif %}
130+
131+
## Configurations
132+
133+
```bash
134+
usage: main.py [-h] [--use_amp] [--resume_from RESUME_FROM] [--seed SEED] [--verbose] [--backend BACKEND]
135+
[--nproc_per_node NPROC_PER_NODE] [--node_rank NODE_RANK] [--nnodes NNODES]
136+
[--master_addr MASTER_ADDR] [--master_port MASTER_PORT] [--epoch_length EPOCH_LENGTH]
137+
[--save_every_iters SAVE_EVERY_ITERS] [--n_saved N_SAVED] [--log_every_iters LOG_EVERY_ITERS]
138+
[--with_pbars WITH_PBARS] [--with_pbar_on_iters WITH_PBAR_ON_ITERS]
139+
[--stop_on_nan STOP_ON_NAN] [--clear_cuda_cache CLEAR_CUDA_CACHE]
140+
[--with_gpu_stats WITH_GPU_STATS] [--patience PATIENCE] [--limit_sec LIMIT_SEC]
141+
[--output_dir OUTPUT_DIR] [--logger_log_every_iters LOGGER_LOG_EVERY_ITERS]
142+
[--data_dir DATA_DIR] [--model {bert-base-uncased}] [--model_dir MODEL_DIR]
143+
[--tokenizer_dir TOKENIZER_DIR] [--num_classes NUM_CLASSES] [--dropout DROPOUT] [--n_fc N_FC]
144+
[--max_length MAX_LENGTH] [--batch_size BATCH_SIZE] [--weight_decay WEIGHT_DECAY]
145+
[--num_workers NUM_WORKERS] [--max_epochs MAX_EPOCHS] [--learning_rate LEARNING_RATE]
146+
[--num_warmup_epochs NUM_WARMUP_EPOCHS] [--validate_every VALIDATE_EVERY]
147+
[--checkpoint_every CHECKPOINT_EVERY] [--eval_epoch_length EVAL_EPOCH_LENGTH]
148+
149+
optional arguments:
150+
-h, --help show this help message and exit
151+
--use_amp use torch.cuda.amp for automatic mixed precision. Default: False
152+
--resume_from RESUME_FROM
153+
path to the checkpoint file to resume, can also url starting with https. Default:
154+
None
155+
--seed SEED seed to use in ignite.utils.manual_seed(). Default: 666
156+
--verbose use logging.INFO in ignite.utils.setup_logger. Default: False
157+
--backend BACKEND backend to use for distributed training. Default: None
158+
--nproc_per_node NPROC_PER_NODE
159+
number of processes to launch on each node, for GPU training this is recommended to
160+
be set to the number of GPUs in your system so that each process can be bound to a
161+
single GPU. Default: None
162+
--node_rank NODE_RANK
163+
rank of the node for multi-node distributed training. Default: None
164+
--nnodes NNODES number of nodes to use for distributed training. Default: None
165+
--master_addr MASTER_ADDR
166+
master node TCP/IP address for torch native backends. Default: None
167+
--master_port MASTER_PORT
168+
master node port for torch native backends. Default: None
169+
--train_epoch_length EPOCH_LENGTH
170+
epoch_length of Engine.run() for training. Default: None
171+
--eval_epoch_length EVAL_EPOCH_LENGTH
172+
epoch_length of Engine.run() for evaluation. Default: None
173+
--save_every_iters SAVE_EVERY_ITERS
174+
Saving iteration interval. Default: 1000
175+
--n_saved N_SAVED number of best models to store. Default: 2
176+
--log_every_iters LOG_EVERY_ITERS
177+
Argument to log batch loss every log_every_iters iterations. 0 to disable it.
178+
Default: 100
179+
--with_pbars WITH_PBARS
180+
show epoch-wise and iteration-wise progress bars. Default: False
181+
--with_pbar_on_iters WITH_PBAR_ON_ITERS
182+
show iteration progress bar or not. Default: True
183+
--stop_on_nan STOP_ON_NAN
184+
stop the training if engine output contains NaN/inf values. Default: True
185+
--clear_cuda_cache CLEAR_CUDA_CACHE
186+
clear cuda cache every end of epoch. Default: True
187+
--with_gpu_stats WITH_GPU_STATS
188+
show gpu information, requires pynvml. Default: False
189+
--patience PATIENCE number of events to wait if no improvement and then stop the training. Default: None
190+
--limit_sec LIMIT_SEC
191+
maximum time before training terminates in seconds. Default: None
192+
--output_dir OUTPUT_DIR
193+
directory to save all outputs. Default: ./logs
194+
--logger_log_every_iters LOGGER_LOG_EVERY_ITERS
195+
logging interval for experiment tracking system. Default: 100
196+
--data_dir DATA_DIR Dataset cache directory. Default: ./
197+
--model {bert-base-uncased}
198+
Model name (from transformers) to setup model, tokenize and config to train.
199+
Default: bert-base-uncased
200+
--model_dir MODEL_DIR
201+
Cache directory to download the pretrained model. Default: ./
202+
--tokenizer_dir TOKENIZER_DIR
203+
Tokenizer cache directory. Default: ./tokenizer
204+
--num_classes NUM_CLASSES
205+
Number of target classes. Default: 1
206+
--dropout DROPOUT Dropout probability. Default: 0.3
207+
--n_fc N_FC Number of neurons in the last fully connected layer. Default: 768
208+
--max_length MAX_LENGTH
209+
Maximum number of tokens for the inputs to the transformer model. Default: 256
210+
--batch_size BATCH_SIZE
211+
Total batch size. Default: 16
212+
--weight_decay WEIGHT_DECAY
213+
Weight decay. Default: 0.01
214+
--num_workers NUM_WORKERS
215+
Number of workers in the data loader. Default: 2
216+
--max_epochs MAX_EPOCHS
217+
Number of epochs to train the model. Default: 3
218+
--learning_rate LEARNING_RATE
219+
Peak of piecewise linear learning rate scheduler. Default: 5e-05
220+
--num_warmup_epochs NUM_WARMUP_EPOCHS
221+
Number of warm-up epochs before learning rate decay. Default: 0
222+
--validate_every VALIDATE_EVERY
223+
Run model's validation every validate_every epochs. Default: 1
224+
--checkpoint_every CHECKPOINT_EVERY
225+
Store training checkpoint every checkpoint_every iterations. Default: 1000
226+
```
Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
import sys
2+
3+
import streamlit as st
4+
5+
sys.path.append("./templates")
6+
7+
from _base._sidebar import (
8+
default_none_options,
9+
distributed_options,
10+
ignite_handlers_options,
11+
ignite_loggers_options,
12+
)
13+
14+
15+
def get_configs() -> dict:
16+
config = {}
17+
config["train_epoch_length"] = None
18+
config["eval_epoch_length"] = None
19+
default_none_options(config)
20+
21+
st.header("Transformer")
22+
23+
st.subheader("Model Options")
24+
config["model"] = st.selectbox(
25+
"Model name (from transformers) to setup model, tokenize and config to train (model)",
26+
options=["bert-base-uncased"],
27+
)
28+
config["model_dir"] = st.text_input("Cache directory to download the pretrained model (model_dir)", value="./")
29+
config["tokenizer_dir"] = st.text_input("Tokenizer cache directory (tokenizer_dir)", value="./tokenizer")
30+
config["num_classes"] = st.number_input(
31+
"Number of target classes. Default, 1 (binary classification) (num_classes)", min_value=0, value=1
32+
)
33+
config["max_length"] = st.number_input(
34+
"Maximum number of tokens for the inputs to the transformer model (max_length)", min_value=1, value=256
35+
)
36+
config["dropout"] = st.number_input(
37+
"Dropout probability (dropout)", min_value=0.0, max_value=1.0, value=0.3, format="%f"
38+
)
39+
config["n_fc"] = st.number_input(
40+
"Number of neurons in the last fully connected layer (n_fc)", min_value=1, value=768
41+
)
42+
st.markdown("---")
43+
44+
st.subheader("Dataset Options")
45+
config["data_dir"] = st.text_input("Dataset cache directory (data_dir)", value="./")
46+
st.markdown("---")
47+
48+
st.subheader("DataLoader Options")
49+
config["batch_size"] = st.number_input("Total batch size (batch_size)", min_value=1, value=4)
50+
config["num_workers"] = st.number_input("Number of workers in the data loader (num_workers)", min_value=1, value=2)
51+
st.markdown("---")
52+
53+
st.subheader("Optimizer Options")
54+
config["learning_rate"] = st.number_input(
55+
"Peak of piecewise linear learning rate scheduler", min_value=0.0, value=5e-5, format="%e"
56+
)
57+
config["weight_decay"] = st.number_input("Weight decay", min_value=0.0, value=0.01, format="%f")
58+
st.markdown("---")
59+
60+
st.subheader("Training Options")
61+
config["max_epochs"] = st.number_input("Number of epochs to train the model", min_value=1, value=3)
62+
config["num_warmup_epochs"] = st.number_input(
63+
"Number of warm-up epochs before learning rate decay", min_value=0, value=0
64+
)
65+
config["validate_every"] = st.number_input(
66+
"Run model's validation every validate_every epochs", min_value=0, value=1
67+
)
68+
config["checkpoint_every"] = st.number_input(
69+
"Store training checkpoint every checkpoint_every iterations", min_value=0, value=1000
70+
)
71+
config["log_every_iters"] = st.number_input(
72+
"Argument to log batch loss every log_every_iters iterations. 0 to disable it", min_value=0, value=15
73+
)
74+
st.markdown("---")
75+
76+
distributed_options(config)
77+
ignite_handlers_options(config)
78+
ignite_loggers_options(config)
79+
80+
return config

0 commit comments

Comments
 (0)