@@ -130,92 +130,77 @@ Let's start training on {{nnodes}} nodes with {{nproc_per_node}} gpus each:
130130## Configurations
131131
132132``` sh
133- usage: main.py [-h] [--use_amp] [--resume_from RESUME_FROM] [--seed SEED]
134- [--verbose] [--backend BACKEND]
135- [--nproc_per_node NPROC_PER_NODE] [--nnodes NNODES]
136- [--node_rank NODE_RANK] [--master_addr MASTER_ADDR]
137- [--master_port MASTER_PORT]
138- [--save_every_iters SAVE_EVERY_ITERS] [--n_saved N_SAVED]
139- [--log_every_iters LOG_EVERY_ITERS] [--with_pbars WITH_PBARS]
140- [--with_pbar_on_iters WITH_PBAR_ON_ITERS]
141- [--stop_on_nan STOP_ON_NAN]
142- [--clear_cuda_cache CLEAR_CUDA_CACHE]
143- [--with_gpu_stats WITH_GPU_STATS] [--patience PATIENCE]
144- [--limit_sec LIMIT_SEC] [--output_dir OUTPUT_DIR]
145- [--logger_log_every_iters LOGGER_LOG_EVERY_ITERS]
146- [--dataset {cifar10,lsun,imagenet,folder,lfw,fake,mnist}]
147- [--data_path DATA_PATH] [--batch_size BATCH_SIZE]
148- [--num_workers NUM_WORKERS] [--beta_1 BETA_1] [--lr LR]
149- [--max_epochs MAX_EPOCHS] [--z_dim Z_DIM]
150- [--g_filters G_FILTERS] [--d_filters D_FILTERS]
133+ usage: main.py [-h] [--use_amp] [--resume_from RESUME_FROM] [--seed SEED] [--verbose] [--backend BACKEND]
134+ [--nproc_per_node NPROC_PER_NODE] [--node_rank NODE_RANK] [--nnodes NNODES]
135+ [--master_addr MASTER_ADDR] [--master_port MASTER_PORT] [--epoch_length EPOCH_LENGTH]
136+ [--save_every_iters SAVE_EVERY_ITERS] [--n_saved N_SAVED] [--log_every_iters LOG_EVERY_ITERS]
137+ [--with_pbars WITH_PBARS] [--with_pbar_on_iters WITH_PBAR_ON_ITERS]
138+ [--stop_on_nan STOP_ON_NAN] [--clear_cuda_cache CLEAR_CUDA_CACHE]
139+ [--with_gpu_stats WITH_GPU_STATS] [--patience PATIENCE] [--limit_sec LIMIT_SEC]
140+ [--output_dir OUTPUT_DIR] [--logger_log_every_iters LOGGER_LOG_EVERY_ITERS]
141+ [--dataset {cifar10,lsun,imagenet,folder,lfw,fake,mnist}] [--data_path DATA_PATH]
142+ [--batch_size BATCH_SIZE] [--num_workers NUM_WORKERS] [--beta_1 BETA_1] [--lr LR]
143+ [--max_epochs MAX_EPOCHS] [--z_dim Z_DIM] [--g_filters G_FILTERS] [--d_filters D_FILTERS]
151144
152145optional arguments:
153146 -h, --help show this help message and exit
154- --use_amp use torch.cuda.amp for automatic mixed precision
147+ --use_amp use torch.cuda.amp for automatic mixed precision. Default: False
155148 --resume_from RESUME_FROM
156- path to the checkpoint file to resume, can also url
157- starting with https ( None)
158- --seed SEED seed to use in ignite.utils.manual_seed () ( 666)
159- --verbose use logging.INFO in ignite.utils.setup_logger
160- --backend BACKEND backend to use for distributed training ( None)
149+ path to the checkpoint file to resume, can also url starting with https. Default:
150+ None
151+ --seed SEED seed to use in ignite.utils.manual_seed (). Default: 666
152+ --verbose use logging.INFO in ignite.utils.setup_logger. Default: False
153+ --backend BACKEND backend to use for distributed training. Default: None
161154 --nproc_per_node NPROC_PER_NODE
162- number of processes to launch on each node, for GPU
163- training this is recommended to be set to the number
164- of GPUs in your system so that each process can be
165- bound to a single GPU (None)
166- --nnodes NNODES number of nodes to use for distributed training (None)
155+ number of processes to launch on each node, for GPU training this is recommended to
156+ be set to the number of GPUs in your system so that each process can be bound to a
157+ single GPU. Default: None
167158 --node_rank NODE_RANK
168- rank of the node for multi-node distributed training
169- ( None)
159+ rank of the node for multi-node distributed training. Default: None
160+ --nnodes NNODES number of nodes to use for distributed training. Default: None
170161 --master_addr MASTER_ADDR
171- master node TCP/IP address for torch native backends
172- (None)
162+ master node TCP/IP address for torch native backends. Default: None
173163 --master_port MASTER_PORT
174- master node port for torch native backends (None)
164+ master node port for torch native backends. Default: None
165+ --epoch_length EPOCH_LENGTH
166+ epoch_length of Engine.run (). Default: None
175167 --save_every_iters SAVE_EVERY_ITERS
176- Saving iteration interval ( 1000)
177- --n_saved N_SAVED number of best models to store (2)
168+ Saving iteration interval. Default: 1000
169+ --n_saved N_SAVED number of best models to store. Default: 2
178170 --log_every_iters LOG_EVERY_ITERS
179- logging interval for iteration progress bar ( 100)
171+ logging interval for iteration progress bar. Default: 100
180172 --with_pbars WITH_PBARS
181- show epoch-wise and iteration-wise progress bars
182- (True)
173+ show epoch-wise and iteration-wise progress bars. Default: False
183174 --with_pbar_on_iters WITH_PBAR_ON_ITERS
184- show iteration progress bar or not ( True)
175+ show iteration progress bar or not. Default: True
185176 --stop_on_nan STOP_ON_NAN
186- stop the training if engine output contains NaN/inf
187- values (True)
177+ stop the training if engine output contains NaN/inf values. Default: True
188178 --clear_cuda_cache CLEAR_CUDA_CACHE
189- clear cuda cache every end of epoch ( True)
179+ clear cuda cache every end of epoch. Default: True
190180 --with_gpu_stats WITH_GPU_STATS
191- show gpu information, requires pynvml (False)
192- --patience PATIENCE number of events to wait if no improvement and then
193- stop the training (None)
181+ show gpu information, requires pynvml. Default: False
182+ --patience PATIENCE number of events to wait if no improvement and then stop the training. Default: None
194183 --limit_sec LIMIT_SEC
195- maximum time before training terminates in seconds
196- (None)
184+ maximum time before training terminates in seconds. Default: None
197185 --output_dir OUTPUT_DIR
198- directory to save all outputs ( ./logs)
186+ directory to save all outputs. Default: ./logs
199187 --logger_log_every_iters LOGGER_LOG_EVERY_ITERS
200- logging interval for experiment tracking system ( 100)
188+ logging interval for experiment tracking system. Default: 100
201189 --dataset {cifar10,lsun,imagenet,folder,lfw,fake,mnist}
202- dataset to use ( cifar10)
190+ dataset to use. Default: cifar10
203191 --data_path DATA_PATH
204- datasets path (./)
192+ datasets path. Default: ./
205193 --batch_size BATCH_SIZE
206- will be equally divided by number of GPUs if in
207- distributed (4)
194+ will be equally divided by number of GPUs if in distributed. Default: 16
208195 --num_workers NUM_WORKERS
209- num_workers for DataLoader (2)
210- --beta_1 BETA_1 beta_1 for Adam optimizer ( 0.5)
211- --lr LR learning rate used by torch.optim.* ( 0.001)
196+ num_workers for DataLoader. Default: 2
197+ --beta_1 BETA_1 beta_1 for Adam optimizer. Default: 0.5
198+ --lr LR learning rate used by torch.optim.* . Default: 0.001
212199 --max_epochs MAX_EPOCHS
213- max_epochs of ignite.Engine.run () for training (2)
214- --z_dim Z_DIM size of the latent z vector ( 100)
200+ max_epochs of ignite.Engine.run () for training. Default: 5
201+ --z_dim Z_DIM size of the latent z vector. Default: 100
215202 --g_filters G_FILTERS
216- number of filters in the second-to-last generator
217- deconv layer (64)
203+ number of filters in the second-to-last generator deconv layer. Default: 64
218204 --d_filters D_FILTERS
219- number of filters in first discriminator conv layer
220- (64)
205+ number of filters in first discriminator conv layer. Default: 64
221206` ` `
0 commit comments