Skip to content

Conversation

@jorgeorpinel
Copy link
Contributor

@shcheklein shcheklein temporarily deployed to dvc-org-pr-762 October 31, 2019 05:17 Inactive
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 31, 2019

@iterative/engineering Hi. This WIP PR is also kind of a ticket to fix the versioning tutorial.

  • The problem we have now for this tutorial is that when I run python train.py, I get:
Details
Using TensorFlow backend.
Traceback (most recent call last):
  File "train.py", line 122, in <module>
    save_bottlebeck_features()
  File "train.py", line 71, in save_bottlebeck_features
    model = applications.VGG16(include_top=False, weights='imagenet')
  File "/.../example-versioning/.env/lib/python3.7/site-packages/keras/applications/__init__.py", line 28, in wrapper
    return base_fun(*args, **kwargs)
...
  File "/.../example-versioning/.env/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "/.../example-versioning/.env/lib/python3.7/site-packages/keras/engine/input_layer.py", line 39, in __init__
    name = prefix + '_' + str(K.get_uid(prefix))
  File "/.../example-versioning/.env/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 74, in get_uid
    graph = tf.get_default_graph()
AttributeError: module 'tensorflow' has no attribute 'get_default_graph'

Probably a small bug from migrating to Python 3 or something, but I'm not much familiar with TensorFlow or Keras. Did one of you write this in the first place?

UPDATE: I fixed this by changing the import statements as suggested in keras-team/keras#12379 (comment).

  • But now I get:
$ python train.py 
2019-10-31 01:32:11.382374: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-10-31 01:32:11.450687: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7fc3578c2320 executing computations on platform Host. Devices:
2019-10-31 01:32:11.450724: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): Host, Default Version
Found 1000 images belonging to 2 classes.
Found 800 images belonging to 2 classes.
Train on 1000 samples, validate on 800 samples
Epoch 1/10
1000/1000 [==============================] - 4s 4ms/sample - loss: 0.8257 - accuracy: 0.7370 - val_loss: 0.3335 - val_accuracy: 0.8537
...
Epoch 10/10
1000/1000 [==============================] - 4s 4ms/sample - loss: 0.1258 - accuracy: 0.9570 - val_loss: 0.4326 - val_accuracy: 0.8900
Traceback (most recent call last):
  File "train.py", line 123, in <module>
    train_top_model()
  File "train.py", line 118, in train_top_model
    json.dump(history.history, open("metrics.json", 'w'))
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 179, in dump
    for chunk in iterable:
...
  File "/usr/local/Cellar/python/3.7.4_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type float32 is not JSON serializable

Maybe needs to switch to use Pandas' to_json or another more robust method?

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 31, 2019

  • BTW, the requirements for this code are pretty outdated (all from Oct, 2018). Should we also update them?
tensorflow>=1.11.0
keras==2.2.4
pillow==5.3.0

UPDATE: Extracted to treeverse/example-versioning/issues/3.

@shcheklein
Copy link
Contributor

shcheklein commented Oct 31, 2019

For now, make sure that you use Python 3.6, try to install TF version close to the one specified in the requirements file.

@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 31, 2019

make sure that you use Python 3.6,

A lot of people will have Python 3.7 though, we would need to specify in the tutorial that you have to use Python 3.6 specifically. Not sure this is reasonable as lots of people won't know how/want to manage 2 versions of Python 3 in their system.

try to install TF version close to the one specified in the requirements file...

I changed the TF requirement to ==1.11.0 and it works now! (On Python 3.7) Will continue testing the rest of the tutorial... UPDATE: Never mind, this doesn't work.

@shcheklein
Copy link
Contributor

shcheklein commented Oct 31, 2019

I think TF didn't support Python 3.7 at some point. I'm pretty sure there was a note in this tutorial about that. If it does work now, please disregard this.

jorgeorpinel added a commit to treeverse/example-versioning that referenced this pull request Oct 31, 2019
@jorgeorpinel jorgeorpinel changed the title tutorials/versioning: address errors in code samples [WIP] tutorials/versioning: address errors in code samples Oct 31, 2019
@jorgeorpinel jorgeorpinel changed the title tutorials/versioning: address errors in code samples tutorials: address errors in versioning tut code samples Oct 31, 2019
@jorgeorpinel
Copy link
Contributor Author

jorgeorpinel commented Oct 31, 2019

BTW, the requirements for this code are pretty outdated (all from Oct, 2018). Should we also update them?

I think it's bette to review it and update either using PyTorch or the latest release TensorFlow + Keras.

Extracted discussion to treeverse/example-versioning/issues/3. Please comment, team.

@jorgeorpinel
Copy link
Contributor Author

This is ready for merging, @shcheklein.

@shcheklein shcheklein merged commit 1da7de3 into master Oct 31, 2019
VenkateshGangadhar pushed a commit to VenkateshGangadhar/MLOps that referenced this pull request Jul 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants