Skip to content

RuntimeError when running train.py #1

@lrizzello

Description

@lrizzello

Hello,

First of all, thank you for sharing your code. Unfortunately, I'm running into an exception when trying to run the train.py function. The exception is the following

Traceback (most recent call last):
  File "dynamic_classification/train.py", line 197, in <module>
    train(args)
  File "dynamic_classification/train.py", line 90, in train
    pred = model(query, support, support_label)
  File "/home/usr-lin-ai/anaconda3/envs/dedupe_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/usr-lin-ai/Deduplication/PythonDeduper/dynamic_classification/model.py", line 298, in forward
    query_embeddings = self.embedding_dropout(self.embedding(query))
  File "/home/usr-lin-ai/anaconda3/envs/dedupe_torch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/usr-lin-ai/anaconda3/envs/dedupe_torch/lib/python3.7/site-packages/torch/nn/modules/sparse.py", line 114, in forward
    self.norm_type, self.scale_grad_by_freq, self.sparse)
  File "/home/usr-lin-ai/anaconda3/envs/dedupe_torch/lib/python3.7/site-packages/torch/nn/functional.py", line 1484, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select

I have tried this with both PyTorch 1.4 and 1.3 and with different datasets and get the same exception every time. I use cudnn 7.6.5 and cudatoolkit 10.1.243. I have not modified the script code in any way.
I use the same kind of conda environments for many other projects that use Pytorch or Tensorflow and they can access my GPU successfully, so I don't think the problem is coming from my setup,

I have tried fixing this but can't find the error, I get the feeling that this comes from deeper within the code. Could you provide me with some insights on this problem, please?

You can find an example of toy dataset I tried here

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions