perf(mcq): improve computational resource management with batching and lazy model loading#552
Conversation
…ding and memory cleanup
📝 WalkthroughWalkthroughAdds a centralized, thread-safe ModelManager for lazy, device-aware model loading/unloading and caching; implements memory-aware batching, monitoring, and cleanup in MCQ generation; converts eager global instantiation in server routes to concurrency-safe lazy getters; and adds psutil dependency. Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant Server
participant Generator
participant ModelManager
participant Cache as "Model Cache"
participant Device
participant Memory
Client->>Server: request (e.g., generate_mcq)
Server->>Generator: obtain generator (get_..._generator) and invoke generation
Generator->>ModelManager: load_model(model_key, pretrained)
alt model cached
ModelManager->>Cache: retrieve(model_key)
Cache-->>ModelManager: model
else not cached
ModelManager->>Memory: get_memory_usage()
alt sufficient memory
ModelManager->>Device: select_device()
Device-->>ModelManager: device
ModelManager->>ModelManager: instantiate model on device
ModelManager->>Cache: store(model_key, model)
else insufficient memory
ModelManager->>Memory: cleanup_memory()
Memory->>Memory: gc.collect(), cuda.empty_cache()
ModelManager->>Device: select_device()
ModelManager->>ModelManager: instantiate model on device
ModelManager->>Cache: store(model_key, model)
end
end
ModelManager-->>Generator: return model
Generator->>Generator: run generation (batching, tokenize, model.generate, decode)
Generator-->>Server: results
Server->>Memory: cleanup_memory()
Memory->>Memory: gc.collect(), cuda.empty_cache()
Server-->>Client: response
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
backend/server.py (2)
236-244:⚠️ Potential issue | 🔴 Critical
/get_boolean_answeris passing the wrong payload shape.
AnswerPredictor.predict_boolean_answer()expects the fullinput_questionlist, but this route passes one string at a time. The predictor then iterates over characters of that string, returns a non-empty list, andif (qa_response)makes almost every non-empty question come back"True".🛠️ Minimal fix
- answer_pred = get_answer_predictor() - for question in input_questions: - qa_response = answer_pred.predict_boolean_answer( - {"input_text": input_text, "input_question": question} - ) - if(qa_response): - output.append("True") - else: - output.append("False") + answers = get_answer_predictor().predict_boolean_answer( + {"input_text": input_text, "input_question": input_questions} + ) + output = ["True" if answer else "False" for answer in answers]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 236 - 244, The route is calling AnswerPredictor.predict_boolean_answer with a single question string, causing the predictor to iterate characters; instead pass the whole input_questions list and handle the returned per-question results. Call answer_pred.predict_boolean_answer with payload {"input_text": input_text, "input_question": input_questions}, assign its result (qa_response) which should be a list of booleans, then iterate over qa_response to append "True"/"False" to output; adjust references to input_questions, input_text, predict_boolean_answer, qa_response and output accordingly.
480-486:⚠️ Potential issue | 🔴 Critical
/get_boolq_hardis wired to an unsupportedanswer_style.
QuestionGenerator.generate_qg_inputs()only accepts"all","sentences", and"multiple_choice". Passing"true_false"forces the exception path inQuestionGenerator.generate()and this endpoint silently returns[]. Either add true/false support end-to-end or route this endpoint throughBoolQGenerator.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 480 - 486, The /get_boolq_hard endpoint is passing answer_style="true_false" to get_question_generator().generate(), but QuestionGenerator.generate_qg_inputs() only supports "all", "sentences", and "multiple_choice", causing the exception path and an empty response; fix by routing this endpoint to the BoolQGenerator (e.g., instantiate/use BoolQGenerator instead of generic QuestionGenerator) or extend QuestionGenerator.generate_qg_inputs() and QuestionGenerator.generate() to accept and handle "true_false" end-to-end (implement input conversion and output formatting for true/false questions), and update the call site in backend.server (where get_question_generator() is used) to use the appropriate generator.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/Generator/main.py`:
- Around line 263-280: The branches that currently return an empty dict
(final_output = {}) break the expected response schema; instead ensure
final_output is initialized with the standard envelope keys (at minimum
"questions": [] for both MCQ and ShortQ) and for MCQ also include a "time_taken"
numeric value (e.g., 0 or measured duration) before any early return; update
both the "no keywords" path and the exception catch around
generate_multiple_choice_questions (and the analogous branch at lines 414-418)
to return final_output with these keys rather than {} so downstream code can
safely access output["questions"].
- Around line 693-700: The NLI model is loaded onto self.device but
predict_boolean_answer sends CPU tensors to it; fix by moving all input tensors
to the model device before inference. In predict_boolean_answer (and the other
similar block referenced), take the tokenized inputs/torch tensors produced for
the NLI call and do something like: inputs = {k: v.to(self.device) for k, v in
inputs.items()} (or call .to(self.device) on each tensor) immediately before
calling self.nli_model(**inputs), ensuring device consistency with the loaded
model.
- Around line 39-104: The load_model cache access is unsynchronized causing
duplicate concurrent loads; add synchronization: in __init__ create self._lock =
threading.Lock() and self._model_locks = {} (or defaultdict), then in load_model
first use self._lock to check if model_name in self._models and to create/get a
per-model lock (e.g., lock = self._model_locks.setdefault(model_name,
threading.Lock())); then acquire that per-model lock, re-check self._models for
model_name, perform the actual loading (the existing body that constructs the
model) only if still missing, store it into self._models, and finally release
the per-model lock so concurrent callers wait for the single load instead of
duplicating work.
In `@backend/Generator/mcq.py`:
- Around line 148-155: The logger calls in backend/Generator/mcq.py currently
write the raw extracted answer (variable answer) into logs in the try/except
around find_similar_words(s2v_model) and at the other noted locations (lines
~361-363 and ~479-480); change these logger.info/logger.error calls to avoid
printing the full answer—log a redacted form such as a truncated snippet, a
length, an identifier, or a secure hash (e.g., "answer_length=...,
answer_hash=...") and keep the existing contextual messages; update the messages
in the block containing find_similar_words(answer, s2v_model) and the other two
similar logging sites to use the redacted/hashed info instead of the raw answer
variable.
- Around line 313-317: The batch_encode_plus calls in mcq.py can produce
arbitrarily large tensors because they don’t bound sequence length; update both
occurrences (the tokenizer.batch_encode_plus calls around the shown diff and at
the other occurrence) to pass truncation=True and a max_length (e.g.,
max_length=tokenizer.model_max_length or a project MAX_SEQ_LEN constant) and use
explicit padding (padding='max_length' or padding=True as appropriate) so each
input is truncated to the model limit before batching; also keep the existing
recovery/skip logic but operate on the truncated inputs so a single long snippet
cannot blow up memory for the whole batch.
- Around line 30-40: The batching flow drops work because answer_batches is
materialized once and adjust_batch_size() never reinserts smaller batches when
memory checks or exceptions occur; change the loop that consumes answer_batches
(and related code paths handling memory checks and exceptions) to treat batches
as requeueable work: on low memory or on OOM/encoding exceptions inside the
processing of a batch (references: answer_batches, adjust_batch_size), split the
failing batch into smaller batches using adjust_batch_size or a splitter, push
those smaller batches back onto the active queue/iterator and retry them instead
of continuing, and/or convert answer_batches into a lazy generator so subsequent
adjustments apply; ensure all exception handlers replace the discarded batch
with retried smaller units rather than silently dropping questions.
In `@backend/server.py`:
- Around line 36-108: The lazy singletons (get_mcq_generator,
get_answer_predictor, get_boolq_generator, get_shortq_generator,
get_question_generator, get_docs_service, get_file_processor, get_mediawiki,
get_qa_model) must be protected with first-use locking to avoid duplicate
initialization under concurrency: add a module-level threading.Lock (or
individual locks) and implement double-checked locking in each getter (check if
_x is None, acquire the lock, check again, then assign), ensure you import
threading and use the same lock variable (e.g., _init_lock) and the exact global
names (_mcq_gen, _answer_predictor, etc.) so initialization happens only once.
---
Outside diff comments:
In `@backend/server.py`:
- Around line 236-244: The route is calling
AnswerPredictor.predict_boolean_answer with a single question string, causing
the predictor to iterate characters; instead pass the whole input_questions list
and handle the returned per-question results. Call
answer_pred.predict_boolean_answer with payload {"input_text": input_text,
"input_question": input_questions}, assign its result (qa_response) which should
be a list of booleans, then iterate over qa_response to append "True"/"False" to
output; adjust references to input_questions, input_text,
predict_boolean_answer, qa_response and output accordingly.
- Around line 480-486: The /get_boolq_hard endpoint is passing
answer_style="true_false" to get_question_generator().generate(), but
QuestionGenerator.generate_qg_inputs() only supports "all", "sentences", and
"multiple_choice", causing the exception path and an empty response; fix by
routing this endpoint to the BoolQGenerator (e.g., instantiate/use
BoolQGenerator instead of generic QuestionGenerator) or extend
QuestionGenerator.generate_qg_inputs() and QuestionGenerator.generate() to
accept and handle "true_false" end-to-end (implement input conversion and output
formatting for true/false questions), and update the call site in backend.server
(where get_question_generator() is used) to use the appropriate generator.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 61b65e61-f4b9-46f2-aa81-e14268bf70bd
📒 Files selected for processing (4)
backend/Generator/main.pybackend/Generator/mcq.pybackend/server.pyrequirements.txt
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
backend/Generator/main.py (1)
1106-1114:⚠️ Potential issue | 🟠 Major
_prepare_qg_inputs_MCloads a fresh spaCy model on every call, defeating lazy loading.Line 1113 calls
en_core_web_sm.load()directly instead of using theModelManager-cached spaCy model. This creates a new model instance on each call, causing memory bloat and negating the benefits of lazy loading.Consider adding a lazy
nlpproperty toQuestionGenerator(similar toMCQGenerator) and using it here.♻️ Proposed fix
Add a lazy property to
QuestionGenerator:`@property` def nlp(self): """Lazy load spacy model.""" if not hasattr(self, '_nlp') or self._nlp is None: self._nlp = self.model_manager.load_model( 'spacy_model', 'spacy', model='en_core_web_sm' ) return self._nlpThen update
_prepare_qg_inputs_MC:def _prepare_qg_inputs_MC( self, sentences: List[str] ) -> Tuple[List[str], List[str]]: - spacy_nlp = en_core_web_sm.load() + spacy_nlp = self.nlp docs = list(spacy_nlp.pipe(sentences, disable=["parser"]))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/main.py` around lines 1106 - 1114, The _prepare_qg_inputs_MC method currently calls en_core_web_sm.load() each time causing repeated spaCy model loads; add a lazy nlp property on QuestionGenerator that caches the model in self._nlp using model_manager.load_model('spacy_model','spacy', model='en_core_web_sm') (or similar) and then replace the direct en_core_web_sm.load() call in _prepare_qg_inputs_MC with self.nlp so the spaCy pipeline is reused across calls.
🧹 Nitpick comments (4)
backend/server.py (1)
121-128: QA model pipeline initialization could benefit from explicit device specification.The
pipeline("question-answering")call at line 127 uses default device selection. For consistency with the rest of the codebase (which explicitly usesmodel_manager.get_device()), consider specifying the device explicitly to ensure GPU utilization when available.♻️ Proposed enhancement
def get_qa_model(): """Lazy load QA model with thread safety.""" global _qa_model if _qa_model is None: with _init_lock: if _qa_model is None: - _qa_model = pipeline("question-answering") + import torch + device = 0 if torch.cuda.is_available() else -1 + _qa_model = pipeline("question-answering", device=device) return _qa_model🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/server.py` around lines 121 - 128, The get_qa_model() lazy initializer currently calls pipeline("question-answering") without a device, so it may not use the intended accelerator; update the call inside get_qa_model (the block that sets _qa_model) to pass the explicit device obtained from model_manager.get_device() (or its integer/torch device equivalent) so the QA pipeline uses the same device as the rest of the codebase.backend/Generator/mcq.py (3)
157-160: Uselogger.exceptionfor better diagnostics.When logging inside an exception handler,
logger.exceptionautomatically includes the stack trace, which aids debugging without manually formatting the error.♻️ Proposed fix
except Exception as e: # Redact sensitive content from logs answer_hash = hashlib.sha256(answer.encode()).hexdigest()[:8] - logger.error(f"Failed to generate choices | answer_hash={answer_hash}. Error: {e}") + logger.exception(f"Failed to generate choices | answer_hash={answer_hash}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/mcq.py` around lines 157 - 160, Replace the logger.error call inside the exception handler with logger.exception so the stack trace is included automatically; keep the redaction logic (using answer and answer_hash) unchanged and still log the hashed answer for context—update the except block that currently computes answer_hash from answer and calls logger.error to instead call logger.exception with the same message including answer_hash.
373-376: Consider usinglogger.exceptionin exception handlers.Multiple exception handlers use
logger.error(f"... {e}"). Usinglogger.exception()would automatically include the full stack trace, improving debuggability for production issues.♻️ Proposed fix for representative locations
except Exception as e: - logger.error(f"Error processing question for answer: {e}") + logger.exception("Error processing question for answer") continueexcept Exception as e: - logger.error(f"Unexpected error processing batch: {e}") + logger.exception("Unexpected error processing batch") # Try to recover by cleaning memory and continuing cleanup_memory(device) continueAlso applies to: 402-406
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/mcq.py` around lines 373 - 376, Replace the plain error logging in the exception handlers with logger.exception to capture stack traces: locate the except blocks that currently call logger.error(f"Error processing question for answer: {e}") (and the similar handler around the other occurrence at 402-406) and change them to logger.exception("Error processing question for answer") (or an appropriate short context message) so the full stack trace is recorded; keep the except Exception as e structure but remove the manual inclusion of e in the message since logger.exception will log the exception details automatically.
24-29:MODEL_CACHE_ENABLEDconstant is defined but never used.This constant suggests an intent to toggle model caching, but no code path references it. Either implement the toggle or remove the dead constant to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/mcq.py` around lines 24 - 29, The MODEL_CACHE_ENABLED constant is defined but unused; either remove it to avoid dead code or implement it as a guard around the module's model-caching logic—locate the code that creates/stores cached model instances (the caching block that currently unconditionally caches models) and wrap that behavior with if MODEL_CACHE_ENABLED: (or check the flag before returning/adding to the cache), ensuring the constant controls whether caching is enabled and default behavior stays consistent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/Generator/main.py`:
- Around line 135-158: unload_model and clear_all mutate CUDA state and call
gc.collect()/torch.cuda.empty_cache() without synchronization; wrap the critical
sections in self._lock to prevent concurrent mutations (use a with self._lock:
or explicit acquire/release) around model deletion/clearing, gc.collect(), and
torch.cuda.empty_cache() in both unload_model and clear_all so they cannot run
concurrently with load_model or cleanup_memory in mcq.py; ensure you reference
and use the same lock object (self._lock) used elsewhere in the class to
maintain consistent synchronization.
In `@backend/Generator/mcq.py`:
- Around line 32-42: The get_memory_usage() function currently returns 0 on any
exception from psutil.virtual_memory(), which falsely indicates no memory
pressure; change the exception handling in get_memory_usage to log the full
exception (using logger.exception or include the stack/exception) and return a
conservative high value (e.g., 100.0) instead of 0 so downstream adaptive logic
treats failures as high memory usage (alternatively re-raise if you prefer
fail-fast), keeping references to psutil.virtual_memory(), get_memory_usage, and
logger.warning/logger.exception for locating the change.
---
Outside diff comments:
In `@backend/Generator/main.py`:
- Around line 1106-1114: The _prepare_qg_inputs_MC method currently calls
en_core_web_sm.load() each time causing repeated spaCy model loads; add a lazy
nlp property on QuestionGenerator that caches the model in self._nlp using
model_manager.load_model('spacy_model','spacy', model='en_core_web_sm') (or
similar) and then replace the direct en_core_web_sm.load() call in
_prepare_qg_inputs_MC with self.nlp so the spaCy pipeline is reused across
calls.
---
Nitpick comments:
In `@backend/Generator/mcq.py`:
- Around line 157-160: Replace the logger.error call inside the exception
handler with logger.exception so the stack trace is included automatically; keep
the redaction logic (using answer and answer_hash) unchanged and still log the
hashed answer for context—update the except block that currently computes
answer_hash from answer and calls logger.error to instead call logger.exception
with the same message including answer_hash.
- Around line 373-376: Replace the plain error logging in the exception handlers
with logger.exception to capture stack traces: locate the except blocks that
currently call logger.error(f"Error processing question for answer: {e}") (and
the similar handler around the other occurrence at 402-406) and change them to
logger.exception("Error processing question for answer") (or an appropriate
short context message) so the full stack trace is recorded; keep the except
Exception as e structure but remove the manual inclusion of e in the message
since logger.exception will log the exception details automatically.
- Around line 24-29: The MODEL_CACHE_ENABLED constant is defined but unused;
either remove it to avoid dead code or implement it as a guard around the
module's model-caching logic—locate the code that creates/stores cached model
instances (the caching block that currently unconditionally caches models) and
wrap that behavior with if MODEL_CACHE_ENABLED: (or check the flag before
returning/adding to the cache), ensuring the constant controls whether caching
is enabled and default behavior stays consistent.
In `@backend/server.py`:
- Around line 121-128: The get_qa_model() lazy initializer currently calls
pipeline("question-answering") without a device, so it may not use the intended
accelerator; update the call inside get_qa_model (the block that sets _qa_model)
to pass the explicit device obtained from model_manager.get_device() (or its
integer/torch device equivalent) so the QA pipeline uses the same device as the
rest of the codebase.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a3cb1cf8-5acd-447b-9b6f-424aeda324cc
📒 Files selected for processing (3)
backend/Generator/main.pybackend/Generator/mcq.pybackend/server.py
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
backend/Generator/main.py (2)
760-762:⚠️ Potential issue | 🟠 MajorDon't log raw question text from request payloads.
These error paths write full user-supplied questions into logs. That can leak document content or PII; log a hash/length or question index instead.
Also applies to: 801-803
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/main.py` around lines 760 - 762, Replace logging of raw user questions in the exception handlers to avoid writing PII/content to logs: in the except block around the prediction call (the try/except that currently does logger.error(f"Error predicting answer for question '{ques}': {e}") and then answers.append("")), and the similar handler later (the block at the other occurrence), change the log to reference a non-sensitive identifier such as the question index, the question length, or a hash (e.g., SHA256 of ques) instead of the full ques, and keep the exception message; leave answers.append("") behavior unchanged. Ensure you update both occurrences in main.py (the exception handling around the prediction step and the duplicate at the later lines) so no raw question text is ever logged.
61-119:⚠️ Potential issue | 🟠 Major
load_model()can still race withclear_all()andunload_model().The final cache publish at Line 117 is outside the same coordination used by the clear/unload paths. An in-flight load can therefore complete after
clear_all()returns and silently reinsert the model.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/main.py` around lines 61 - 119, The final cache insertion in load_model can race with clear_all/unload_model; after loading under model_lock, acquire the main self._lock before mutating self._models, re-check whether the model was cleared/unloaded and only then set self._models[model_name] (otherwise discard/cleanup the newly loaded model); update load_model to perform the final self._models assignment inside a with self._lock block and leave the earlier double-checks as-is to prevent reinsertion after clear_all()/unload_model(), referencing load_model, clear_all, unload_model, self._lock and self._model_locks for locating the change.
🧹 Nitpick comments (1)
backend/Generator/main.py (1)
305-308: Avoid forcing full cleanup on every successful MCQ/ShortQ request.
generate_multiple_choice_questions()andgenerate_normal_questions()already clean up insidebackend/Generator/mcq.py, so these extragc.collect()/torch.cuda.empty_cache()calls add hot-path latency and throw away allocator reuse immediately after generation.Also applies to: 447-450
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@backend/Generator/main.py` around lines 305 - 308, The post-generation forced cleanup in backend/Generator/main.py (the torch.cuda.empty_cache() and gc.collect() block) duplicates cleanup already performed inside generate_multiple_choice_questions() and generate_normal_questions() in backend/Generator/mcq.py and should be removed to avoid hot-path latency and allocator churn; delete the conditional that checks self.device.type == 'cuda' and the gc.collect() call (and replicate the same removal at the other occurrence around the 447-450 region), or gate these calls behind a debug/config flag so they only run when explicitly requested (keep cleanup only in mcq.py functions and any error-handling paths).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@backend/Generator/main.py`:
- Around line 939-940: The decorator for the nlp property is written as a string
(backticked) instead of a Python decorator; remove the backticks so `@property`
sits directly above def nlp(self): (i.e., replace the quoted/backticked line
with a real `@property` decorator for the nlp method) and ensure
indentation/spacing matches surrounding decorators so the module can import
cleanly.
---
Duplicate comments:
In `@backend/Generator/main.py`:
- Around line 760-762: Replace logging of raw user questions in the exception
handlers to avoid writing PII/content to logs: in the except block around the
prediction call (the try/except that currently does logger.error(f"Error
predicting answer for question '{ques}': {e}") and then answers.append("")), and
the similar handler later (the block at the other occurrence), change the log to
reference a non-sensitive identifier such as the question index, the question
length, or a hash (e.g., SHA256 of ques) instead of the full ques, and keep the
exception message; leave answers.append("") behavior unchanged. Ensure you
update both occurrences in main.py (the exception handling around the prediction
step and the duplicate at the later lines) so no raw question text is ever
logged.
- Around line 61-119: The final cache insertion in load_model can race with
clear_all/unload_model; after loading under model_lock, acquire the main
self._lock before mutating self._models, re-check whether the model was
cleared/unloaded and only then set self._models[model_name] (otherwise
discard/cleanup the newly loaded model); update load_model to perform the final
self._models assignment inside a with self._lock block and leave the earlier
double-checks as-is to prevent reinsertion after clear_all()/unload_model(),
referencing load_model, clear_all, unload_model, self._lock and
self._model_locks for locating the change.
---
Nitpick comments:
In `@backend/Generator/main.py`:
- Around line 305-308: The post-generation forced cleanup in
backend/Generator/main.py (the torch.cuda.empty_cache() and gc.collect() block)
duplicates cleanup already performed inside generate_multiple_choice_questions()
and generate_normal_questions() in backend/Generator/mcq.py and should be
removed to avoid hot-path latency and allocator churn; delete the conditional
that checks self.device.type == 'cuda' and the gc.collect() call (and replicate
the same removal at the other occurrence around the 447-450 region), or gate
these calls behind a debug/config flag so they only run when explicitly
requested (keep cleanup only in mcq.py functions and any error-handling paths).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 24c83134-1b90-45b1-a05d-7af3b5395081
📒 Files selected for processing (3)
backend/Generator/main.pybackend/Generator/mcq.pybackend/server.py
Addressed Issues:
Fixes #261
Additional Notes:
This PR improves computational resource management in the MCQ generation pipeline located in:
backend/Generator/mcq.pyThe existing implementation could lead to high memory consumption and inefficient model usage when processing large input texts or running multiple models simultaneously. This PR introduces several optimizations to improve performance, stability, and scalability.
Key Improvements
1. Batch Processing
2. Lazy Model Loading
3. Memory Optimization
gc.collect()for garbage collection.torch.cuda.empty_cache()when CUDA is available.4. Resource Monitoring
5. CPU / GPU Compatibility
torch.cuda.is_available().6. Improved Error Handling
Expected Benefits
These improvements maintain full backward compatibility with the existing MCQ generation workflow and do not modify the external API behavior.
Screenshots/Recordings:
N/A — This change focuses on backend performance optimization and resource management within the MCQ generation pipeline. No UI changes were introduced.
Checklist
Check one of the checkboxes below:
I have used the following AI models and tools: ChatGPT (for architectural guidance and code review assistance)
AI Usage Disclosure:
We encourage contributors to use AI tools responsibly when creating Pull Requests. While AI can be a valuable aid, it is essential to ensure that your contributions meet the task requirements, build successfully, include relevant tests, and pass all linters. Submissions that do not meet these standards may be closed without warning to maintain the quality and integrity of the project. Please take the time to understand the changes you are proposing and their impact. AI slop is strongly discouraged and may lead to banning and blocking. Do not spam our repos with AI slop.
Summary by CodeRabbit
New Features
Bug Fixes
Refactor