-
Notifications
You must be signed in to change notification settings - Fork 5
Add summary #271
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add summary #271
Conversation
add new dependencies for upcoming models
add Model class
model for image summarization and vqa
Add example notebook and small fixes
…ebook for new summary
updates: - [github.com/astral-sh/ruff-pre-commit: v0.12.10 → v0.13.0](astral-sh/ruff-pre-commit@v0.12.10...v0.13.0) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
updates: - [github.com/astral-sh/ruff-pre-commit: v0.13.0 → v0.13.1](astral-sh/ruff-pre-commit@v0.13.0...v0.13.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
updates: - [github.com/astral-sh/ruff-pre-commit: v0.13.1 → v0.13.3](astral-sh/ruff-pre-commit@v0.13.1...v0.13.3) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds multimodal image summary and visual question answering (VQA) capabilities to the ammico package using the Qwen2.5-VL model. The changes address dependency compatibility issues and remove the redundant analyse_text parameter from TextDetector.
- Adds new
MultimodalSummaryModelandImageSummaryDetectorclasses for image captioning and VQA - Removes the
analyse_textboolean parameter fromTextDetector(always enabled now) - Updates dependencies to resolve compatibility issues with CUDA 11.8 and JupyterLab
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| pyproject.toml | Added ML dependencies (transformers, torch, etc.) and adjusted tensorflow/googletrans versions for compatibility |
| environment.yml | New conda environment specification with CUDA 11.8 and PyTorch 2.3.1 |
| ammico/utils.py | Added AnalysisType enum for categorizing analysis types |
| ammico/text.py | Removed analyse_text parameter and its validation logic |
| ammico/test/test_text.py | Updated tests to remove analyse_text parameter usage |
| ammico/test/test_model.py | New tests for MultimodalSummaryModel initialization and resource management |
| ammico/test/test_image_summary.py | New tests for image summary and VQA functionality |
| ammico/test/test_display.py | Updated display test to match new parameter structure |
| ammico/test/conftest.py | Added model fixture for test suite |
| ammico/notebooks/DemoNotebook_ammico.ipynb | Updated notebook to use new API without analyse_text parameter |
| ammico/notebooks/DemoImageSummaryVQA.ipynb | New demo notebook showcasing image summary and VQA features |
| ammico/model.py | New class implementing Qwen2.5-VL model loading and management |
| ammico/image_summary.py | New class for image captioning and visual question answering |
| ammico/display.py | Added VQA detector option and UI components for questions |
| ammico/init.py | Exported new MultimodalSummaryModel and ImageSummaryDetector classes |
| .pre-commit-config.yaml | Updated ruff version |
| .github/workflows/ci.yml | Excluded long-running tests from CI |
Comments suppressed due to low confidence (2)
ammico/notebooks/DemoNotebook_ammico.ipynb:1
- Hardcoded absolute path to credentials file exposes user-specific filesystem structure and should be parameterized or use relative paths. Consider using environment variables or a configuration file.
{
ammico/notebooks/DemoNotebook_ammico.ipynb:1
- The variable
data_pathis defined but appears to be unused in the subsequent code, which still references the hardcoded path in theammico.find_files()call. Either remove the unused variable or update the function call to use it.
{
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
This PR looks good to me. I think I will make some small changes to the structure of the class and helper methods, if that is ok, in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 21 out of 21 changed files in this pull request and generated 8 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| 1. Feature extraction from the images: User inputs query and images are matched to that query (both text and image query) | ||
| 1. Question answering | ||
| 1. Question answering about image content | ||
| 1. Content extractioni from the videos |
Copilot
AI
Oct 27, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected spelling of 'extractioni' to 'extraction'.
| 1. Content extractioni from the videos | |
| 1. Content extraction from the videos |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
for more information, see https://pre-commit.ci
|
@DimasfromLavoisier could you please check that my changes did not break anything? I'm still having issues with running some of the tests/analysis on my laptop (did not have time to set up compatible cuda version for torch and tensorflow). |
|



ipykernel7.0.0 cannot send display data from threads ipython/ipykernel#1450 by pinning ipykernel