my changes so far are limited to using the Weave library from Weights & Biases to instrument LLM function calls. LandingAI's approach to wrapping these calls in microservices is brilliant and this was a simple way to confirm the flow.
- @weave.op(): search for this string to identify which functions are currently instrumented
- next steps may include:
- integration of fine-tuned Florence 2 models
- use of LangGraph to manage the coordination between multiple vision agents
This repository contains tools that solve vision problems. This tools can be used in conjunction with the vision-agent.
You can use single tools by instantiating them and calling the tool with a dictionary object.
from PIL import Image
from vision_agent_tools.tools.qr_reader import QRReader
# load image
image = Image.open("/path/to/my/image.png")
qr_reader = QRReader()
qr_detections = qr_reader(image=image)>>> poetry install
>>> poetry run pre-commit installTo add a new tool you first need to add the needed dependencies by adding them as optional:
poetry add <dependency> --optionalAfter adding each dependency, you need to go to the pyproject.toml file and add a new group under [tool.poetry.extras]. This will allow the installation of the package with specific tools, like pip install "vision-agent-tools[qr-reader]". You also need to manually add each dependency to the "all" group so that user can install all tools as pip install "vision-agent-tools[all]". Example for the "qr-reader" tool:
[tool.poetry.extras]
all = ["qreader"]
qr-reader = ["qreader"]