Skip to content

naturalsize np.int32 multiplication overflow #217

@Toprak2

Description

@Toprak2

What did you do?

I was using the torchio library, which relies on humanize to return the memory size of image arrays. The images I processed had dimensions 512x512x166, with each pixel being a 32-bit (4-byte) integer.

What did you expect to happen?

torchio calls the naturalsize function with the occupied memory size in bytes and sets binary=True. Manually calculating the expected value:
512×512×166×4÷(1024×1024)=166 MiB

So, I expected the function to return approximately 166 MiB.

What actually happened?

Instead, the returned value was -2 MiB, accompanied by this warning:

RuntimeWarning: overflow encountered in scalar multiply
ret: str = format % ((base * bytes_ / unit)) + s

This overflow occurs because the input to the function was of type np.int32 instead of Python's native int. Since np.int32 has a maximum value of 2^31−1, the multiplication of base and bytes_ results in an overflow.

Steps to Reproduce

import humanize
import numpy as np

print(humanize.naturalsize(512*512*166*4, binary=True))  
# Expected: 166.0 MiB
# Works as expected with Python’s built-in int type

print(humanize.naturalsize(np.int32(512*512*166*4), binary=True))
# Returns: -2.0 MiB
# RuntimeWarning: overflow encountered in scalar multiply ret: str = format % ((base * bytes_ / unit)) + s

Proposed Solutions

  1. Change the Order of Operations
    Adjusting the order of operations can avoid overflow. In the current line:
ret: str = format % ((base * bytes_ / unit)) + s

when bytes_ is np.int32, multiplying base and bytes_ produces an np.int32 result, which overflows before it’s divided by unit. By dividing either base or bytes_ by unit before the multiplication, each sub-operation remains a float:

ret: str = format % ((base * (bytes_ / unit))) + s
# or
ret: str = format % ((base / unit * bytes_)) + s
  1. Convert Input to Float
    Alternatively, cast value to float without checking its type. Currently, the casting applies only if value is a string:
# Current approach
if isinstance(value, str):
    bytes_ = float(value)
else:
    bytes_ = value

Updating it to cast all inputs to float could resolve the issue:

bytes_ = float(value)

I haven’t created a pull request since I’m unsure if you’d prefer developers to ensure input compatibility or handle this within the function.
Environment

OS: Windows 11
Python: 3.12.1
Humanize: 4.11.0
Numpy: 1.26.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions