Skip to content

GRA-122: Data loader implementation#67

Merged
MaxVorosh merged 7 commits intomainfrom
task122_data_loader
Dec 13, 2023
Merged

GRA-122: Data loader implementation#67
MaxVorosh merged 7 commits intomainfrom
task122_data_loader

Conversation

@MaxVorosh
Copy link
Contributor

@MaxVorosh
Copy link
Contributor Author

Pr получился большой, простите...
Я бы попросил особое внимание обратить на то, как я собираю Blob, потому что там я мог не правильно понять принцип. Это примерно по одному методу в классе. Остальное - работа с данными как с векторами. Я написал парочку тестов по этому поводу (именно на вектора). Надеюсь, они помогут прояснить, как вообще пользоваться тем, что я написал.

Copy link
Contributor

@lpetrov02 lpetrov02 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Я много несущественного написал с пометкой ПОФИГ, а так в целом всё вроде круто

Comment on lines 8 to +9
static std::vector<std::vector<float>> load_csv(std::string path);
static std::vector<std::pair<std::string, float>> load_labels(std::string path);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Кажется, методы можно сделать const
Но тут как хочешь, раз уж мы договорились придерживаться принципа ПОФИГ

#include "Blob.h"

DataMarker::DataMarker(std::string path, FileExtension type, int percentage_for_train, std::size_t batch_size) {
if (percentage_for_train > 100 || percentage_for_train < 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

А не лучше ли тут float? Вряд ли, конечно, кому-то нужно именно 20,5% на тест, но как будто бы можно сделать более гибко практически бесплатно (ну и в торче/sklearn так сделано))
НО! Так как у нас ПОФИГ, на это можно забить, так тоже норм)


class UnshuffledImgLoader: public UnshuffledDataLoader {
private:
std::vector<std::pair<std::string, float>> data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Вот здесь и в остальных местах тоже: много раз используется вот эта пара, не лучше ли написать структурку с понятными названиями полей?

Comment on lines +55 to +59
auto dims = shape.getDims();
int data_size = 1;
for (int i = 0; i < dims.size(); ++i) {
data_size *= dims[i];
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

По-моему, я видел у Shape метод size(), делающий ровно это

}
data.resize(data_size, 0);
int cur_data = 0;
for (int i = index; i < index + batch_size; ++i) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[ПОФИГ]
Насколько я понимаю, index - это откуда мы читаем батч. Лично мне было бы удобнее подавать номер батча, и тогда цикл выглядел бы как
for (int i = index * batch_size; i < (index + 1) * batch_size; ++i)
Но в целом реально ПОФИГ, лучше оставить как есть, раз работает)

@MaxVorosh MaxVorosh merged commit 52d4fa3 into main Dec 13, 2023
@MaxVorosh MaxVorosh deleted the task122_data_loader branch December 13, 2023 19:00
lpetrov02 pushed a commit that referenced this pull request Dec 15, 2023
Data loader implementation
lpetrov02 pushed a commit that referenced this pull request Dec 15, 2023
Data loader implementation
lpetrov02 pushed a commit that referenced this pull request Dec 18, 2023
Makes preparations for metrics logging on python
Functionality for c++ http added, but not working yet
Adds saving train metrics
Adds saving train metrics and responding with PNG

Adaptates code for new 4D blob

cpprest CI support
Add load possibility for zip
Add load possibility for png on predict
-------
GRA-122: Data loader implementation (#67)
Data loader implementation
-------
ID-154: Loss type selection (#70)
* Add loss type selection
* Add loss type selection
* Remove layer-class loss
* Clean up Loss type
* Make format
-------
ID-171: Fix input selection (#69)
* Fix input selection
* Clean up fix input selection
-------
Change train and predict for zip file case
Starts fixing train
Fixes train with dataloader
It's not fucking working :( (x3)
server train fix
Fixes train and predcit
lpetrov02 pushed a commit that referenced this pull request Dec 18, 2023
Makes preparations for metrics logging on python
Functionality for c++ http added, but not working yet
Adds saving train metrics
Adds saving train metrics and responding with PNG

Adaptates code for new 4D blob

cpprest CI support
Add load possibility for zip
Add load possibility for png on predict
-------
GRA-122: Data loader implementation (#67)
Data loader implementation
-------
ID-154: Loss type selection (#70)
* Add loss type selection
* Add loss type selection
* Remove layer-class loss
* Clean up Loss type
* Make format
-------
ID-171: Fix input selection (#69)
* Fix input selection
* Clean up fix input selection
-------
Change train and predict for zip file case
Starts fixing train
Fixes train with dataloader
It's not fucking working :( (x3)
server train fix
Fixes train and predcit
lpetrov02 added a commit that referenced this pull request Dec 20, 2023
* Started migration from Data2dLayer to DataLayer

Makes preparations for metrics logging on python
Functionality for c++ http added, but not working yet
Adds saving train metrics
Adds saving train metrics and responding with PNG

Adaptates code for new 4D blob

cpprest CI support
Add load possibility for zip
Add load possibility for png on predict
-------
GRA-122: Data loader implementation (#67)
Data loader implementation
-------
ID-154: Loss type selection (#70)
* Add loss type selection
* Add loss type selection
* Remove layer-class loss
* Clean up Loss type
* Make format
-------
ID-171: Fix input selection (#69)
* Fix input selection
* Clean up fix input selection
-------
Change train and predict for zip file case
Starts fixing train
Fixes train with dataloader
It's not fucking working :( (x3)
server train fix
Fixes train and predcit

* Follow up review

* Follow up review

* ID-167: Upload zip (#74)

* Fixes graph tests

* Fixes DataLayer

---------

Co-authored-by: lpetrov02 <lpetrov02@mail.ru>
Co-authored-by: Artem Goldenberg <58527023+Artem-Goldenberg@users.noreply.github.com>
AntoxaBarin added a commit that referenced this pull request Dec 21, 2023
* Started migration from Data2dLayer to DataLayer

* Makes preparations for metrics logging on python
Functionality for c++ http added, but not working yet
Adds saving train metrics
Adds saving train metrics and responding with PNG

* Adaptates code for new 4D blob

* cpprest CI support

* Add load possibility for zip

* Add load possibility for png on predict

* GRA-122: Data loader implementation (#67)

Data loader implementation

* ID-154: Loss type selection (#70)

* Add loss type selection

* Add loss type selection

* Remove layer-class loss

* Clean up Loss type

* Make format

* ID-171: Fix input selection (#69)

* Fix input selection

* Clean up fix input selection

* Change train and predict for zip file case

* Starts fixing train

* Fixes train with dataloader

* It's not fucking working :(

* It's not fucking working :(

* Input selection bug fix

* Started migration from Data2dLayer to DataLayer

Makes preparations for metrics logging on python
Functionality for c++ http added, but not working yet
Adds saving train metrics
Adds saving train metrics and responding with PNG

Adaptates code for new 4D blob

cpprest CI support
Add load possibility for zip
Add load possibility for png on predict
-------
GRA-122: Data loader implementation (#67)
Data loader implementation
-------
ID-154: Loss type selection (#70)
* Add loss type selection
* Add loss type selection
* Remove layer-class loss
* Clean up Loss type
* Make format
-------
ID-171: Fix input selection (#69)
* Fix input selection
* Clean up fix input selection
-------
Change train and predict for zip file case
Starts fixing train
Fixes train with dataloader
It's not fucking working :( (x3)
server train fix
Fixes train and predcit

* Follow up review

* Follow up review

* Make formats

* Fix bug (not deleting connection after deleting layer)

* Minor changes in cpp_server

* tests fix

* Fix order of addition inputs in layer

---------

Co-authored-by: lpetrov02 <lpetrov02@mail.ru>
Co-authored-by: Artem Goldenberg <st087953@student.spbu.ru>
Co-authored-by: MaxVorosh <ma_voroshilov@mail.ru>
Co-authored-by: Voroshilov Maksim <47945698+MaxVorosh@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants