Gluon 2.0 Dataloader should support BERT training using GluonNLP

## Description
Currently we cannot use 2.0 Dataloader to train BERT, and the reason is 2.0 Dataloader is not flexible to support the data schema used by GluonNLP BERT, specifically if passing in a nested list of variable length numpy array, the construction of dataset would fail and throw NDArray conversion errors

Here is a minimal reproducible code, which is the similar data schema BERT pre-training script is using:

import mxnet as mx
import numpy as np
a = np.ndarray(shape=(128,)) # similar to one feature of one sequence
b = np.ndarray(shape=(19,))
l1 = [a,b]                                   # similar to one feature of all sequences
l2 = [a,b]
c = [l1, l2]                                 # similar to a training instance that will be sampled against
ds = mx.gluon.data.ArrayDataset(*c)
dt = mx.gluon.data.DataLoader(dataset=ds, batch_size=1, num_workers=1, try_nopython=True)
print('ok') # error out before prints

## References
https://github.com/apache/incubator-mxnet/pull/17841


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Gluon 2.0 Dataloader should support BERT training using GluonNLP #18672

Description

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Gluon 2.0 Dataloader should support BERT training using GluonNLP #18672

Description

Description

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions