Improve reproducibility of preprocessing; add ALCF documentation #46

felker · 2019-12-05T20:09:15Z

Prefixed preprocess shot .npz files with machine name. Closes Delineate processed shot .npz files in same signal group folder by machine name #45.
Preprocessing and normalization diagnostics:
- Write out [omit] every time a shot is excluded from either procedure
- Print totals of non-/disruptive shots after preprocessing, alongside split into train/test/dev totals for each class.
Remove redundant "data" token from shot list variable names in conf_parser.py
Add ALCF documentation
Use 2 space indentation in all YAML
Delete single-GPU runner.py
Add tcn.py file (c522d3b) missing from Cleaned and squashed merge of @ge-dong fork #50.
Add ONNX writer using https://github.com/onnx/keras-onnx
Start encapsulating module, pip, and conda dependencies in platform-dependent files in new directory envs/. Closes Add platform-dependent Conda YAML and environments/module dependencies #47.
Add YAML linter
Update ALCF Theta documentation
Make ONNX writer optional; builder.py should not fail due to import onnx or import keras2onnx

Canonical order: train, validate, test

…tics.py

to preprocess.py

Now, all criteria for excluding a shot from the input raw shot lists trigger "[omit]" string in their diagnostic when they are satisfied. Should make searching the piped output easier

as in tensorflow. Occurs on ALCF Theta numpy 1.17.2 tensorboard 1.12.2 pypi_0 pypi tensorflow 1.12.0 pypi_0 pypi tensorflow-base 1.14.0 eigen_py36hf4a566f_0 tensorflow-estimator 1.14.0 py_0 /home/felker/FRNN_project/build/miniconda-3.6-4.5.4/miniconda3/4.5.4/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)])

Currently, preprocessing dumps all machines/shotlists with the same signal group hash into the same folder. There were no collisions in file names because D3D and JET shot numbers do not currently overlap. Unify implementations of get_individual_shot_file() in utils/processing.py (fairly confident that warning comment about globals incompat with multiprocessing is no longer valid). Use os.path.join() instead of manual += '/' Need to test these changes.

- Consider wrapping import onnx, etc. in try/except to make this an optional dependency that automatically runs if installed - Specify Opset=10, for now - Only add dropout parameters to RNN layer if CuDNNLSTM is not used - ONNX conversion will not fail fatally if op is not supported. Need to evaluate if CuDNNLSTM output is usable at all (or with non-GPU inference), given the following warning that is emitted: WARNING:tensorflow:From /home/kfelker/.conda/envs/frnn/lib/python3.7/site-packages/keras2onnx/subgraph.py:156: tensor_shape_from_node_def_name (from tensorflow.python.framework.graph_util_impl) is deprecated and will be removed in a future version. Instructions for updating: Use `tf.compat.v1.graph_util.tensor_shape_from_node_def_name` Cannot infer shape for TFNodes1/cu_dnnlstm_1/CudnnRNN: TFNodes1/cu_dnnlstm_1/CudnnRNN:3 Tensorflow op [TFNodes1/cu_dnnlstm_1/CudnnRNN: CudnnRNN] is not supported Unsupported ops: Counter({'CudnnRNN': 1}) Cannot infer shape for TFNodes/cu_dnnlstm_2/CudnnRNN: TFNodes/cu_dnnlstm_2/CudnnRNN:3 Tensorflow op [TFNodes/cu_dnnlstm_2/CudnnRNN: CudnnRNN] is not supported Unsupported ops: Counter({'CudnnRNN': 1})

Only meaningful difference in Conda YAML is the removal of the ppc64le IBM AI Conda channel

Not currently valid field in environment YAML file. Follow conda/conda#8675 Until then, use conda config --set channel_priority strict

Intentionally add PEP 8 style error in order to test Travis CI email notifications on failed builds

Modified version of https://github.com/philipperemy/keras-tcn

Both Keras v2.3.0 and v2.3.1 on Traverse (and at least the latter on TigerGPU) die with: WARNING:tensorflow:From /home/kfelker/.conda/envs/frnn/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseReso\ urceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. Printing out pre_rnn model... Traceback (most recent call last): File "mpi_learn.py", line 111, in <module> shot_list_test=shot_list_test) File "/home/kfelker/.conda/envs/frnn/lib/python3.6/site-packages/tensorflow_core/python/ops/resource_variable_ops.py", line 1229, in __imul__ raise RuntimeError("Variable *= value not supported. Use " RuntimeError: Variable *= value not supported. Use `var.assign(var * value)` to modify the variable or `var = var * value` to get a new Tensor object. Incompatibility likely fixed in TF >= v2.0 and/or TF's internal Keras tensorflow/tensorflow#27829 Re-check this after moving to TensorFlow's internal Keras in #43

Add conda-forge to channels above Anaconda Cloud defaults Need to reevaluate these choices later on

Use "sync", not "synch", for "synchronization" abbreviation

@ge-dong

Added by @ge-dong in #49 (refactored in #50). It is the only MPI-specific code within builder.py

See #54

felker and others added 30 commits November 21, 2019 14:05

Print out total shot counts in preprocess.py

3a3a9f2

Always print # train shots before # validate shots in diagnostics

09c4d67

Canonical order: train, validate, test

Encapsulate printing of shot set sizes in new fn, in new file diagnos…

df7228e

…tics.py

Add missing bracket

cbc31b4

Comment out unused fn

366de2d

Add details about how many omitted shots were disruptive

e527c38

to preprocess.py

Typo

286dc87

Remove stray diagnostic print

3009662

Reformat diagnostics

0e4715b

Reduce number of lines

a99c01a

Drop Rick Zamora's ALCF notes into docs/

31fafa3

Convert YAML to 2 space indent from 4 spaces

8ecf571

Clarify comment

9585728

Add comment warning about workaround for JET 0D CW -> ILW preprocessing

e2738ea

Attempt to standardize diagnostics from guarantee_preprocessed.py

6f7894d

Now, all criteria for excluding a shot from the input raw shot lists trigger "[omit]" string in their diagnostic when they are satisfied. Should make searching the piped output easier

Spacing changes

36920a6

Change validation_frac default from 0.33 to 1.0/3.0

01a2376

Add comments

7586b68

Do not use Python expressions in conf.yaml values

9a5b0f4

Remove leftover debug print()

fda919f

Standardize [omit] diagnostics in Normalizer and downloading.py

7712e2e

Add comment

f53eeee

Fix bugs in parent commit

677b00b

Delete single-GPU Keras runner; comment out hyperopt driver for it

b11d759

Remove "_data_", "_data" from names of dataset variables

7a72cca

Consistency in cross-machine data variables

036b9fb

Merge branch 'master' into feature/documentation

5d5932e

felker and others added 23 commits December 16, 2019 11:44

Start storing machine-dependent module .cmd and Conda YAML in envs/

61cd42c

Reorganize CI files

212b0b0

Add TigerGPU build specs to envs/

44772f9

Only meaningful difference in Conda YAML is the removal of the ppc64le IBM AI Conda channel

Fix 2x Conda YAML errors; comment-out channel_priority

336135b

Not currently valid field in environment YAML file. Follow conda/conda#8675 Until then, use conda config --set channel_priority strict

Add pip to TigerGPU dependencies

8593944

Intentionally add PEP 8 style error in order to test Travis CI email notifications on failed builds

Fix intentional style error

5b44e44

Do not install mpi4py via pip when building Conda env

f68f2f0

TigerGPU conda env is generic for x86_64 Linux platforms with GPUs

03f6239

Install Cython on Traverse (unique req. for this PICSciE cluster)

4a6ceaa

Add tcn.py from @ge-dong's fork

c522d3b

Modified version of https://github.com/philipperemy/keras-tcn

Fix PEP8 style erorrs

ae84a22

Make keras2onnx, onnx optional modules

6add7f5

Merge branch 'master' into feature/documentation

92a76fb

Loosen precise Python version requirement in Conda YAML files

9b58f63

Add conda-forge to channels above Anaconda Cloud defaults Need to reevaluate these choices later on

Remove version info from dependency list in setup.py

46afd73

Add docstrings to cumulative moving Averager()

057535f

Use "sync", not "synch", for "synchronization" abbreviation

Comment-out pre-RNN diagnostics

856c147

Added by @ge-dong in #49 (refactored in #50). It is the only MPI-specific code within builder.py

Fix typo

1867bd2

Deprecate and remove old documentation

f292e9e

Note unused functions in targets.py

63b8d2d

See #54

Group "T_*" timing parameters close together in conf.yaml

8a36a58

Fix bug: MaxHingeTarget was inheriting loss='mse' from parent Target

eb8f664

felker merged commit 7986f46 into master Jan 9, 2020

felker deleted the feature/documentation branch January 9, 2020 18:09

felker mentioned this pull request Jan 14, 2020

Bugs: variable number of steps per epoch, and inaccurate diagnostic count of num_so_far shots #63

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve reproducibility of preprocessing; add ALCF documentation #46

Improve reproducibility of preprocessing; add ALCF documentation #46

Uh oh!

felker commented Dec 5, 2019 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve reproducibility of preprocessing; add ALCF documentation #46

Improve reproducibility of preprocessing; add ALCF documentation #46

Uh oh!

Conversation

felker commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

felker commented Dec 5, 2019 •

edited

Loading