Skip to content

Conversation

@felker
Copy link
Member

@felker felker commented Dec 5, 2019

felker and others added 30 commits November 21, 2019 14:05
Now, all criteria for excluding a shot from the input raw shot lists
trigger "[omit]" string in their diagnostic when they are satisfied.

Should make searching the piped output easier
as in tensorflow. Occurs on ALCF Theta

numpy                     1.17.2
tensorboard               1.12.2                   pypi_0    pypi
tensorflow                1.12.0                   pypi_0    pypi
tensorflow-base           1.14.0          eigen_py36hf4a566f_0
tensorflow-estimator      1.14.0                     py_0

/home/felker/FRNN_project/build/miniconda-3.6-4.5.4/miniconda3/4.5.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550:
  FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
Currently, preprocessing dumps all machines/shotlists with the same
signal group hash into the same folder. There were no collisions in file
names because D3D and JET shot numbers do not currently overlap.

Unify implementations of get_individual_shot_file() in
utils/processing.py (fairly confident that warning comment about globals
incompat with multiprocessing is no longer valid).

Use os.path.join() instead of manual += '/'

Need to test these changes.
- Consider wrapping import onnx, etc. in try/except to make this an
  optional dependency that automatically runs if installed
- Specify Opset=10, for now
- Only add dropout parameters to RNN layer if CuDNNLSTM is not used

- ONNX conversion will not fail fatally if op is not supported. Need to
  evaluate if CuDNNLSTM output is usable at all (or with non-GPU
  inference), given the following warning that is emitted:

WARNING:tensorflow:From
/home/kfelker/.conda/envs/frnn/lib/python3.7/site-packages/keras2onnx/subgraph.py:156:
tensor_shape_from_node_def_name (from
tensorflow.python.framework.graph_util_impl) is deprecated and will be
removed in a future version.
Instructions for updating:
Use `tf.compat.v1.graph_util.tensor_shape_from_node_def_name`
Cannot infer shape for TFNodes1/cu_dnnlstm_1/CudnnRNN:
TFNodes1/cu_dnnlstm_1/CudnnRNN:3
Tensorflow op [TFNodes1/cu_dnnlstm_1/CudnnRNN: CudnnRNN] is not
supported
Unsupported ops: Counter({'CudnnRNN': 1})
Cannot infer shape for TFNodes/cu_dnnlstm_2/CudnnRNN:
TFNodes/cu_dnnlstm_2/CudnnRNN:3
Tensorflow op [TFNodes/cu_dnnlstm_2/CudnnRNN: CudnnRNN] is not supported
Unsupported ops: Counter({'CudnnRNN': 1})
felker and others added 23 commits December 16, 2019 11:44
Only meaningful difference in Conda YAML is the removal of the ppc64le
IBM AI Conda channel
Not currently valid field in environment YAML file.
Follow conda/conda#8675
Until then, use
conda config --set channel_priority strict
Intentionally add PEP 8 style error in order to test Travis CI email
notifications on failed builds
Both Keras v2.3.0 and v2.3.1 on Traverse (and at least the latter on
TigerGPU) die with:

WARNING:tensorflow:From
/home/kfelker/.conda/envs/frnn/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630:
calling BaseReso\
urceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops)
with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Printing out pre_rnn model...
Traceback (most recent call last):
  File "mpi_learn.py", line 111, in <module>
      shot_list_test=shot_list_test)

  File
  "/home/kfelker/.conda/envs/frnn/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py",
  line 1229, in __imul__
      raise RuntimeError("Variable *= value not supported. Use "
      RuntimeError: Variable *= value not supported. Use
      `var.assign(var * value)` to modify the variable or `var = var *
      value` to get a new Tensor object.

Incompatibility likely fixed in TF >= v2.0 and/or TF's internal Keras
tensorflow/tensorflow#27829

Re-check this after moving to TensorFlow's internal Keras in #43
Add conda-forge to channels above Anaconda Cloud defaults

Need to reevaluate these choices later on
Use "sync", not "synch", for "synchronization" abbreviation
Added by @ge-dong in #49 (refactored in #50). It is the only
MPI-specific code within builder.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

2 participants