Skip to content

UCX ERROR Failed to allocate memory pool #11

@orensg1

Description

@orensg1

Code of Conduct

What is the problem?

Default value for max memory pool too low.
When running pytorch and allocating large amounts of memory (>8M) the following error appears:

[1740171758.025024] [gpu6:1025296:0]           mpool.c:269  UCX  ERROR Failed to allocate memory pool (name=ud_recv_skb) chunk: Input/output error
[1740171758.025504] [gpu6:1025296:0]           ib_md.c:282  UCX  ERROR ibv_reg_mr(address=0x7ffdd8600000, length=20971520, access=0x10000f) failed: Cannot allocate memory : Please set max locked memory (ulimit -l) to 'unlimited' (current: 8192 kbytes)

Steps to Reproduce

ulimit -a

real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 3092588
max locked memory           (kbytes, -l) 8192
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 513293
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

ulimit -l unlimited

ulimit -a

real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) 0
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 3092588
max locked memory           (kbytes, -l) unlimited
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 513293
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited
``

### Expected Results

See above

### Additional information

See above

Metadata

Metadata

Assignees

No one assigned

    Labels

    errorSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions