-
Notifications
You must be signed in to change notification settings - Fork 128
Loading speedup + TensorFlow2.0, Python3 compatibility #33
base: master
Are you sure you want to change the base?
Conversation
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
The changes are the following: 1. Memorize valid_epochs: We know there are only two options and it reduces the dict[key] = [] 2. Memorize squared roots: We know there are only 4,9,16,25,36,49 3. Memorize splitted operations (ops_dict): Enumerate all of them in advance 4. Json -> UltraJson: Pure C implementation 5. base64 -> pybase64: Twice faster Overall, the elapsed time decreases by 40% in my enviroment (52 sec -> 32 sec)
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
|
Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). 📝 Please visit https://cla.developers.google.com/ to sign. Once you've signed (or fixed any issues), please reply here with What to do if you already signed the CLAIndividual signers
Corporate signers
ℹ️ Googlers: Go here for more info. |
Since the tfrecord is too slow, I introduced a serialization feature for the first load. By doing this, users do not have to wait for several 10 seconds from the second load.
This PR is on top of the PR#29.
This PR made the loading speed faster.
In my environment, the runtime became
120 sec -> 30 secfor the full dataset.Note that it will take
4 secfrom the second load thanks to pickle serialization.Each change was tested by
%timeit.