Skip to content

Conversation

@bbearce
Copy link
Collaborator

@bbearce bbearce commented Sep 12, 2023

Limit CPU usage on the default queue

@Didayolo

A brief description of the purpose of the changes contained in this PR.

Limit duration of submissions (DONE): We should add an option in the .env file specifying a maximum execution time limit for jobs in the default queue (main queue of the platform).

Limit submissions (DONE): This way, the main queue could not get blocked by long jobs running in parallel. This would incentivize organizers to use their own queue if they want to make more ambitious benchmarks.

Issues this PR resolves

A checklist for hand testing

  • create a queue and get it working and attached to competition (attached below) or custom comp you already have.

  • default queue:

    • MAX_EXECUTION_TIME_LIMIT=20 in .env file;
    • time.sleep(15) in score.py in attached competition or other comp;
    • set phase execution_time_limit to 10 seconds
    • submission should fail as 15 seconds is longer than 10 sec limit and below default queue limit
  • default queue:

    • MAX_EXECUTION_TIME_LIMIT=20 in .env file;
    • time.sleep(15) in score.py in attached competition or other comp;
    • set phase execution_time_limit to 35 seconds
    • submission should pass as 15 seconds is less than 35 sec limit and below default queue limit
  • default queue:

    • MAX_EXECUTION_TIME_LIMIT=20 in .env file;
    • time.sleep(25) in score.py in attached competition or other comp;
    • set phase execution_time_limit to 35 seconds
    • submission should fail as 25 seconds is longer than 20 sec default queue limit regardless of execution_time_imit
  • custom queue:

    • MAX_EXECUTION_TIME_LIMIT=20 in .env file;
    • time.sleep(15) in score.py in attached competition or other comp;
    • set phase execution_time_limit to 10 seconds
    • submission should fail as 15 seconds is longer than 10 sec execution_time_imit
  • custom queue:

    • MAX_EXECUTION_TIME_LIMIT=20 in .env file;
    • time.sleep(15) in score.py in attached competition or other comp;
    • set phase execution_time_limit to 35 seconds
    • submission should pass as 15 seconds is less than 35 sec execution_time_imit
  • custom queue:

    • MAX_EXECUTION_TIME_LIMIT=20 in .env file;
    • time.sleep(25) in score.py in attached competition or other comp;
    • set phase execution_time_limit to 35 seconds
    • submission should pass as 25 seconds is less than 35 sec execution_time_imit

Any relevant files for testing

template_v2_competition_bundle.zip

Checklist

  • Code review by me
  • Hand tested by me
  • I'm proud of my work
  • Code review by reviewer
  • Hand tested by reviewer
  • CircleCi tests are passing
  • Ready to merge

@Didayolo
Copy link
Member

Didayolo commented Sep 12, 2023

If I understand, it is here that you ignore the platform time limit if we are not using the default queue?

[tasks.py]
    if submission.phase.competition.queue:
        submission.queue_name = submission.phase.competition.queue.name or ''
        run_args['execution_time_limit'] = submission.phase.execution_time_limit
        submission.save()

I added some comments to clarify this according to my view point, correct them if my wrong.

Also, here, 600 is hard-coded and not read from the environment variable, so it may be wrong:

[_phases.tag]
Execution Time Limit (seconds)<span data-tooltip="600s if unset, 600s max with default queue."

@bbearce
Copy link
Collaborator Author

bbearce commented Sep 12, 2023

Here "submission.phase.competition.queue" is None or empty if using the default queue so you never enter this block. If you have a custom queue, it shows up in "submission.phase.competition.queue" so we know we have a custom queue. Then we take whatever was specified as phase no matter what (custom queue means you can do waht you like).

[tasks.py] - line 188
if submission.phase.competition.queue:
submission.queue_name = submission.phase.competition.queue.name or ''
run_args['execution_time_limit'] = submission.phase.execution_time_limit
submission.save()

Look here which is above a bit:
[tasks.py] - line 105
def _send_to_compute_worker(submission, is_scoring):
run_args = {
"user_pk": submission.owner.pk,
"submissions_api_url": settings.SUBMISSIONS_API_URL,
"secret": submission.secret,
"docker_image": submission.phase.competition.docker_image,
"execution_time_limit": submission.phase.execution_time_limit,
"execution_time_limit": min(MAX_EXECUTION_TIME_LIMIT, submission.phase.execution_time_limit),
"id": submission.pk,
"is_scoring": is_scoring,
}

This used to be:
"execution_time_limit": submission.phase.execution_time_limit,
but now is:
"execution_time_limit": min(MAX_EXECUTION_TIME_LIMIT, submission.phase.execution_time_limit),

It used to be the case that the phase controls the time but now it can only, if it is less than or equal to MAX_EXECUTION_TIME_LIMIT. In the case where there is a custom queue this will get overridden and set to submission.phase.execution_time_limit which participants control via the phase.

(Adriene) Also, here, 600 is hard-coded and not read from the environment variable, so it may be wrong..
(Ben) Do you mean this:
MAX_EXECUTION_TIME_LIMIT = int(os.environ.get('MAX_EXECUTION_TIME_LIMIT', 600)) # time limit of the default queue

This code above will grab the env variable from .env and if if cannot find it, will default to 600. This will be overridden by a variable though if present. I chose 600 to sync with the UI default in phase settings.

If you meant the below code or the UI, yes right now it is always 600 to begin with unless you edit it. This doesn't affect the functionality as even if it says 600 in the UI, MAX_EXECUTION_TIME_LIMIT will override this value if it is smaller. Is that the desired functionality?

[_phases.tag]
Execution Time Limit (seconds)<span data-tooltip="600s if unset, 600s max with default queue."

@Didayolo
Copy link
Member

If you meant the below code or the UI, yes right now it is always 600 to begin with unless you edit it

Yes, I meant that the UI is not aligned with the actual value of the parameter.

@ihsaan-ullah
Copy link
Collaborator

This is a guide which shows how to use redis with python
https://redis.io/docs/clients/python/

I think it should be discussed first and if decided then implemented in another PR as it is a big feature.

@Didayolo Didayolo linked an issue Sep 18, 2023 that may be closed by this pull request
3 tasks
@Didayolo Didayolo changed the title Issue 1141 cpu lmit default queue CPU limit on default queue Sep 18, 2023
@bbearce
Copy link
Collaborator Author

bbearce commented Sep 19, 2023

All, there is a way to flow up the env variables through a state variable already being used so I caught a ride on the existing process to flow this info up.

@Didayolo
Copy link
Member

Didayolo commented Feb 9, 2024

@bbearce I have a doubt: is this PR ready to be reviewed? I think this feature is very important. It may have avoided the congestion we experienced recently.

On a side note, limiting the number of daily submissions per user for a competition when using the default queue could be interesting too.

@Didayolo
Copy link
Member

@bbearce Can you confirm if this PR is ready or not? Thank you.

@Didayolo Didayolo self-assigned this Mar 21, 2024
@Didayolo Didayolo merged commit 34bbf0b into develop Apr 16, 2024
@Didayolo Didayolo deleted the issue_1141_cpu_lmit_default_queue branch April 16, 2024 13:53
@Didayolo
Copy link
Member

Hi @bbearce,

Do you think we could add more logs in addition to "Execution time limit exceeded" specifically for the case where we are using the default queue?

Something like "This competition is using the public queue, on which execution time is limited to XXX seconds."

Because it can be confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Limit CPU usage on the default queue

4 participants