Skip to content

Conversation

@ihsaan-ullah
Copy link
Collaborator

@ihsaan-ullah ihsaan-ullah commented Feb 18, 2025

@ mention of reviewers

@Didayolo

A brief description of the purpose of the changes contained in this PR.

The way of dealing with file sizes was not uniform in the platform. At some places size was stored in Bytes and in other places in KiB. This is now fixes and we store bytes in the db and use KB/MB/GB instead of KiB/MiB/GiB. The size formatter used were different at different places. Now we have one formatter named pretty_bytes that is declared in both javascript and python.

Issues this PR resolves

Important Note

I have left some size unit conversion in the following files because there is a confusion in what is going on in these files. I cannot see any data in the analytics to match the data with code BUT once analytics start working then I will check these

Important Todos for deployment:

We have some critical changes here so before deployment we should run the following 3 blocks of code to get the last ids of Data, Submission and SubmissionDetail

# Get the maximum ID for Data
from datasets.models import Data
latest_id_data = Data.objects.latest('id').id
print("Data Last ID: ", latest_id_data)
# Get the maximum ID for Submission
from competitions.models import Submission
latest_id_submission = Submission.objects.latest('id').id
print("Submission Last ID: ", latest_id_submission)
# Get the maximum ID for Submission Detail
from competitions.models import SubmissionDetails
latest_id_submission_detail = SubmissionDetails.objects.latest('id').id
print("SubmissionDetail Last ID: ", latest_id_submission_detail)

After we have the latest ids, we should deploy and run the 3 blocks of code below to fix the sizes i.e. to convert all kib to bytes to make everything consistent. For new files uploaded after the deployment, the sizes will be saved in bytes automatically that is why we need to run the following code for older files only.

# Run the conversion only for records with id <= latest_id
from datasets.models import Data
for data in Data.objects.filter(id__lte=latest_id_data):
    if data.file_size:
        data.file_size = data.file_size * 1024  # Convert from KiB to bytes
        data.save()
# Run the conversion only for records with id <= latest_id
from competitions.models import Submission
for sub in Submission.objects.filter(id__lte=latest_id_submission):
    updated = False  # Track if any field is updated

    if sub.prediction_result_file_size:
        sub.prediction_result_file_size = sub.prediction_result_file_size * 1024  # Convert from KiB to bytes
        updated = True
    
    if sub.scoring_result_file_size:
        sub.scoring_result_file_size = sub.scoring_result_file_size * 1024  # Convert from KiB to bytes
        updated = True

    if sub.detailed_result_file_size:
        sub.detailed_result_file_size = sub.detailed_result_file_size * 1024  # Convert from KiB to bytes
        updated = True
    
    if updated:
        sub.save()
# Run the conversion only for records with id <= latest_id
from competitions.models import SubmissionDetails
for sub_det in SubmissionDetails.objects.filter(id__lte=latest_id_submission_detail):
    if sub_det.file_size:
        sub_det.file_size = sub_det.file_size * 1024  # Convert from KiB to bytes
        sub_det.save()

Checklist

  • Code review by me
  • Hand tested by me
  • I'm proud of my work
  • Code review by reviewer
  • Hand tested by reviewer
  • CircleCi tests are passing
  • Ready to merge

@ihsaan-ullah ihsaan-ullah mentioned this pull request Feb 18, 2025
5 tasks
@Didayolo Didayolo linked an issue Mar 20, 2025 that may be closed by this pull request
@Didayolo
Copy link
Member

Didayolo commented Mar 21, 2025

@ihsaan-ullah

For the manual intervention, I replaced:

  • <latest_data_id> by latest_id_data
  • <latest_sub_id> by latest_id_submission
  • <latest_sub_det_id> by latest_id_submission_detail

Is that correct?


The manual intervention raises this error:

    if sub.file_size:
AttributeError: 'Submission' object has no attribute 'file_size'

I had one more migration file after a makemigrations, but the problem persists even including this one:

# Generated by Django 2.2.28 on 2025-03-21 13:41

from django.db import migrations, models


class Migration(migrations.Migration):

    dependencies = [
        ('competitions', '0053_auto_20250218_1151'),
    ]

    operations = [
        migrations.AlterField(
            model_name='submissiondetails',
            name='file_size',
            field=models.DecimalField(blank=True, decimal_places=2, max_digits=15, null=True),
        ),
    ]

@Didayolo Didayolo self-assigned this Mar 21, 2025
@ihsaan-ullah
Copy link
Collaborator Author

For the manual intervention, I replaced:

  • <latest_data_id> by latest_id_data
  • <latest_sub_id> by latest_id_submission
  • <latest_sub_det_id> by latest_id_submission_detail

Is that correct?

Yes, I have updated the code in the main comment with the right var name

@ihsaan-ullah
Copy link
Collaborator Author

@Didayolo I have

  • fixed the manual intervention code
  • added the missing migration
  • resolved merge conflicts.

For this PR deployment we have to stop users from using the platform. I think @ObadaS knows how to do it

@Didayolo
Copy link
Member

@ihsaan-ullah

Great, thanks.

Not sure if it is related to this PR (or the other?), but the quota is written "B" instead of "GB":

Capture d’écran 2025-03-24 à 22 34 17

@Didayolo Didayolo merged commit 68a1e08 into develop Mar 24, 2025
1 check passed
@Didayolo Didayolo deleted the sizes_bytes branch March 24, 2025 21:35
curious-broccoli added a commit to curious-broccoli/codabench that referenced this pull request May 27, 2025
* show server error instead of fronend parsing error

* show back button in edit only. Show help in create only

* do not allow special chars in usernames

* Enable/Disable competition forum (codalab#1774)

* forum enable/disable functionality added

* new forum_enabled field added to competition dump data

* Email in lowercase (codalab#1769)

* on signup email stored in lower case letters. Whitelist emails conveted to lowercased

* whitespace removed

* convert email to lowercased during login

* latest competition fields added to dump (codalab#1786)

Co-authored-by: Adrien Pavão <adrien.pavao@gmail.com>

* User quota is updated to GB from Bytes (codalab#1749)

* user assigned quota will now be in GB instead of bytes

* unused counter removed

* File Sizes cleanup (codalab#1752)

* sizes Kib to Bytes, size formatting functions cleanup

* space between size and unit, removed factor multiplication from size calcualtion

* file_size to bytes in SubmissionDetail

* added missing migration

* migration conflict resolved

* reseting file sizes task removed

* Resource interface: quota unit is now GB instead of B

* Create CODE_OF_CONDUCT.md

* Added log rolling to limit log size to 5GB for now, can be changed

* Maintenance mode option added in Caddyfile, used by creating a maintenance.on file in the maintenance_mode/ directory (codalab#1799)

* Maintenance mode option added in Caddyfile, used by creating a maintenance.on file in the maintenance_mode/ directory

* Add offline.png

* Fixed image not loading

---------

Co-authored-by: Obada Haddad <obada.haddad@lisn.fr>
Co-authored-by: didayolo <adrien.pavao@gmail.com>

* Update version.json for release 1.18.0

* Add remove button for cancelled submissions (codalab#1808)

* Add remove button for cancelled submissions

* Allow remove of cancelled submissions

* Update compute_worker.py

* Add permissions check for bulk download

* flake8 fix

* Add hide_score_output option (codalab#1838)

* Add hide_score_output option

* Update test

* Add the options for v1 bundles

* Make more generic tests (v1, v2)

* version update workflow removed

* Add hide_prediction_output feature

* Calendar lock fixed, additional check added for start and end date

* Simplify code

* Version bump

* Removed time and updated date to today

---------

Co-authored-by: Ihsan Ullah <ihsan2131@gmail.com>
Co-authored-by: Adrien Pavão <adrien.pavao@gmail.com>
Co-authored-by: Obada Haddad <obada.haddad@lisn.fr>
Co-authored-by: Obada Haddad-Soussac <11889208+ObadaS@users.noreply.github.com>
Co-authored-by: GitHub Actions <actions@github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resource tab "Size" values are NaN KB

3 participants