Skip to content
This repository was archived by the owner on Jul 3, 2024. It is now read-only.

Conversation

@nabenabe0928
Copy link

@nabenabe0928 nabenabe0928 commented Oct 25, 2021

This PR is on top of the PR#29.
This PR made the loading speed faster.
In my environment, the runtime became 120 sec -> 30 sec for the full dataset.
Note that it will take 4 sec from the second load thanks to pickle serialization.

Each change was tested by %timeit.

@google-cla
Copy link

google-cla bot commented Oct 25, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

The changes are the following:
1. Memorize valid_epochs: We know there are only two options and it
   	    		  reduces the dict[key] = []
2. Memorize squared roots: We know there are only 4,9,16,25,36,49
3. Memorize splitted operations (ops_dict): Enumerate all of them
					    in advance
4. Json -> UltraJson: Pure C implementation
5. base64 -> pybase64: Twice faster

Overall, the elapsed time decreases by 40% in my enviroment
(52 sec -> 32 sec)
@google-cla
Copy link

google-cla bot commented Oct 26, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@google-cla
Copy link

google-cla bot commented Oct 27, 2021

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

Since the tfrecord is too slow, I introduced a serialization feature
for the first load.
By doing this, users do not have to wait for several 10 seconds from
the second load.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants