Skip to content

Cannot reproduce training/prediction in offline data sets #1

@fbenites

Description

@fbenites

running on docker from https://github.com/mortal123/autonlp_starting_kit
log:
sudo docker run -it -v "$(pwd):/app/codalab" wahaha909/autonlp:gpu


___ /________________________________ / /______ __
__ / _ _ _ __ _ / __ _ / / __ / __ _ | /| / /
_ / / / / / /( )/ /
/ / / _ __/ _ / / /
/ /
|/ |/ /
// ___/// //// _/// // // _/__/|__/

WARNING: You are running this container as root, which can cause new files in
mounted volumes to be created as the root user on your host machine.

To avoid this, run the container by specifying your user's userid:

$ docker run -u $(id -u):$(id -g) args...

root@4c0b0a0d44bb:/app/codalab# python run_local_test.py --dataset_dir=data/offline_data/O2/ -code_dir=scripts/DeepBlueAI/AutoNLP/
2019-12-14 21:57:25,443 INFO ingestion.py: ===== Start ingestion program.
2019-12-14 21:57:26,133 INFO ingestion.py: Time budget: 2400
2019-12-14 21:57:26,133 INFO ingestion.py: ************************************************
2019-12-14 21:57:26,134 INFO ingestion.py: ******** Processing dataset O2 ********
2019-12-14 21:57:26,134 INFO ingestion.py: ************************************************
2019-12-14 21:57:26,134 INFO ingestion.py: Reading training set and test set...
2019-12-14 21:57:26,204 INFO score.py: Detected the start of ingestion after 0 seconds. Start scoring.
read zh embedding time: 87.15087389945984s.
read en embedding time: 172.18107748031616s.
Using TensorFlow backend.
WARNING: Logging before flag parsing goes to stderr.
W1214 22:00:19.855660 140028982310720 deprecation_wrapper.py:119] From scripts/DeepBlueAI/AutoNLP/model.py:38: The name tf.ConfigProto is deprecated. Please use tf.compat.v1.ConfigProto instead.

W1214 22:00:19.855889 140028982310720 deprecation_wrapper.py:119] From scripts/DeepBlueAI/AutoNLP/model.py:43: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2019-12-14 22:00:19.879044: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-12-14 22:00:19.899515: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Could not dlopen library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory; LD_LIBRARY_PATH: /usr/local/cuda/extras/CUPTI/lib64:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
2019-12-14 22:00:19.899541: E tensorflow/stream_executor/cuda/cuda_driver.cc:318] failed call to cuInit: UNKNOWN ERROR (303)
2019-12-14 22:00:19.899565: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:163] no NVIDIA GPU device is present: /dev/nvidia0 does not exist
2019-12-14 22:00:19.942985: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 4200000000 Hz
2019-12-14 22:00:19.944683: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x103b3b360 executing computations on platform Host. Devices:
2019-12-14 22:00:19.944735: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): ,
2019-12-14 22:00:19,950 INFO ingestion.py: [12-14 22:00:19] Importing_model success, time spent so far 173.33044409751892 sec
2019-12-14 22:00:19,950 INFO ingestion.py: Creating model...
Init Model************
2019-12-14 22:00:19,951 INFO ingestion.py: [12-14 22:00:19] Initialization success, time spent so far 173.33051562309265 sec
2019-12-14 22:00:19,951 INFO ingestion.py: Begin training the model...

--- remaining_time_budget
2400
current len mean 1844 constraint 6000
len mean 1844 FIRST CUT 1200 need cut.
sample_row= int(-90.8*len_mean_for_compute + 128960), len_mean_for_compute=1200, sample_row=20000
len mean 1200


Num of Data 11314 Sample Num: 20000
Text Length: 1844 Cut: 1200
Is Sample: 1 Is Cut: 1
Language: EN
Class Num: 20
Postive-Negtive Samples Portion: [600. 595. 584. 594. 599. 593. 590. 598. 593. 564. 377. 546. 585. 480.
591. 578. 594. 465. 597. 591.]


Running in Sample Data Stage
TRAIN EPOCH: 1
When enter the system firstly, generate data for training: 1
SAMPLE_POS_NEG: [600.0 595.0 584.0 594.0 599.0 593.0 590.0 598.0 593.0 564.0 377.0 546.0
585.0 480.0 591.0 578.0 594.0 465.0 597.0 591.0]
#################Sample From 11314 To 11314 ######################
*****************************DataNum: 11314
*****************************DataLen: 172.36697896411525
###clean 0.3239445686340332 s
###build 0.2042856216430664 s
###seq 0.09550786018371582 s
###init data tot use time 0.6237847805023193 s
************************************* [11 16 9 10 9 14 11 2 4 8 3 8 11 19 7 11 16 9 13 2]
###initail: 0.696040153503418
###Use model: CNN
Embedding Size of This Model: 64.
W1214 22:00:20.657241 140028982310720 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:74: The name tf.get_default_graph is deprecated. Please use tf.compat.v1.get_default_graph instead.

W1214 22:00:20.657757 140028982310720 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:517: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

W1214 22:00:20.660748 140028982310720 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:4138: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

W1214 22:00:20.700274 140028982310720 deprecation.py:506] From /usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use rate instead of keep_prob. Rate should be set to rate = 1 - keep_prob.
###Bluid model: 0.1275312900543213
W1214 22:00:20.795397 140028982310720 deprecation_wrapper.py:119] From /usr/local/lib/python3.6/dist-packages/keras/optimizers.py:790: The name tf.train.Optimizer is deprecated. Please use tf.compat.v1.train.Optimizer instead.

###train batch size: 128 (9051, 261) (9051,)
W1214 22:00:20.862344 140028982310720 deprecation.py:323] From /usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/math_grad.py:1250: add_dispatch_support..wrapper (from tensorflow.python.ops.array_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.where in 2.0, which has the same broadcast rule as np.where
Epoch 1/1
2019-12-14 22:00:21.434657: W tensorflow/compiler/jit/mark_for_compilation_pass.cc:1412] (One-time warning): Not using XLA:CPU for cluster because envvar TF_XLA_FLAGS=--tf_xla_cpu_global_jit was not set. If you want XLA:CPU, either set that envvar, or use experimental_jit_scope to enable XLA:CPU. To confirm that XLA is active, pass --vmodule=xla_compilation_cache=1 (as a proper command-line flag, not via TF_XLA_FLAGS) or set the envvar XLA_FLAGS=--xla_hlo_profile.
2019-12-14 22:00:21,778 INFO ingestion.py: [12-14 22:00:21] training success, time spent so far 1.8266241550445557 sec
2019-12-14 22:00:21,778 INFO ingestion.py: Failed to run ingestion.
2019-12-14 22:00:21,778 ERROR ingestion.py: Encountered exception:
indices[72,36] = 35000 is not in [0, 35000)
[[{{node embedding_1/embedding_lookup}}]]
Traceback (most recent call last):
File "/app/codalab/AutoDL_ingestion_program/ingestion.py", line 303, in
M.train(D.get_train(), remaining_time_budget=timer.remain)
File "scripts/DeepBlueAI/AutoNLP/model.py", line 506, in train
shuffle=True)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training.py", line 1039, in fit
validation_steps=validation_steps)
File "/usr/local/lib/python3.6/dist-packages/keras/engine/training_arrays.py", line 199, in fit_loop
outs = f(ins_batch)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2715, in call
return self._call(inputs)
File "/usr/local/lib/python3.6/dist-packages/keras/backend/tensorflow_backend.py", line 2675, in _call
fetched = self._callable_fn(*array_vals)
File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1458, in call
run_metadata_ptr)
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[72,36] = 35000 is not in [0, 35000)
[[{{node embedding_1/embedding_lookup}}]]
2019-12-14 22:00:21,806 INFO ingestion.py: Wrote the file end.txt marking the end of ingestion.
2019-12-14 22:00:21,806 INFO ingestion.py: [-] Done, but encountered some errors during ingestion.
2019-12-14 22:00:21,806 INFO ingestion.py: [-] Overall time spent 176.36 sec
Traceback (most recent call last):
File "/app/codalab/AutoDL_ingestion_program/ingestion.py", line 396, in
values_list = [modelname, Importing_model, Initialization, training, predicting, predictions_made, overall_time_spent,
NameError: name 'predicting' is not defined
2019-12-14 22:00:22,914 INFO score.py: Final area under learning curve for O2: 0.0000
2019-12-14 22:00:22,923 ERROR score.py: [-] Some error occurred in ingestion program. Please see output/error log of Ingestion Step.
Traceback (most recent call last):
File "/app/codalab/AutoDL_scoring_program/score.py", line 715, in
values_list = [accurcy, roc_auc, auc2, score]
NameError: name 'accurcy' is not defined
root@4c0b0a0d44bb:/app/codalab#

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions