int->size_t to support large datasets more than 2G instances#2473
int->size_t to support large datasets more than 2G instances#2473buaaliyi wants to merge 1 commit intoBVLC:masterfrom
Conversation
|
Excited to see this patch pass the Travis tests... I'm running into the same issue! |
.gitignore
Outdated
There was a problem hiding this comment.
This should not be part of this patch.
|
Thanks for the PR @buaaliyi, and for the reviewing efforts @flx42. I think we do eventually want to increase the blob size limit. A couple of comments:
|
|
Why not use |
|
This patch has been updated base on @flx42 's comments Thank you @jeffdonahue and @flx42 for your advices. Let me do a further check, which to fix those places base on the current block max size, besides MemoryDataLayer. |
1552c20 to
2581f18
Compare
|
I need this same change. I had just filed #3159 in error because I did not search properly, and will close it if I can. I would resolve this by using ssize_t (signed size_t) everywhere you use int to hold a size but are willing to forgo the highest bit in order to get special negative values. Wherever you are using unsigned int, you could use size_t. This should be a straightforward change, at least for g++ on linux, though it will probably touch most files. I can prepare a CPU-tested change set for a pull request if one of the developers is willing to consider it. |
When I was trying to use Caffe to train my large dataset (billions of instances), I found the class 'SyncedMemory' uses data type 'size_t' to alloc memories, while blob.count_ and blob.capacity_ is of type 'int'. As a result, this have cut off the alloc size to less than 2GB, and my experiment was failed due to the pointer overflow.
This patch fixed the data size related types from int to size_t that guarantee to use the correct size on 64bit machine, even though dataset size is over 2 billions.
Thanks for the review.