Skip to content

Conversation

@utkarsharma2
Copy link
Contributor

This PR is part of our larger effort to add first-class integrations to support LLMOps that was presented at Airflow Summit.

This PR adds explicitly the Weaviate Provider. Weaviate is an open-source vector database. It allows you to store data objects and vector embeddings. Storing vector embedding is one of the vital steps in LLM ops. This provider adds the ability to interact the with vector database using the phyton client.

Example DAG:
The WeaviateIngestOperator can accept either JSON or a callable that returns JSON.
input_callable.

data = [
  {
    "Answer": "Liver",
    "Category": "SCIENCE",
    "Question": "This organ removes excess glucose from the blood & stores it as glycogen"
  },
  {
    "Answer": "Elephant",
    "Category": "ANIMALS",
    "Question": "It's the only living mammal in the order Proboseidea"
  }
]

WeaviateIngestOperator(
        task_id="batch_data_without_vectors_xcom_data",
        conn_id="weaviate_default",
        class_name="QuestionWithOpenAIVectorizerUsingOperator",
        input_json=data,
        trigger_rule="all_done",
    )

Email Discussion related to the effort can be found here - https://lists.apache.org/thread/0d669fmy4hn29h5c0wj0ottdskd77ktp

Copy link
Member

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pankajastro pankajastro force-pushed the WeAviate branch 3 times, most recently from 7312ef6 to ca5e749 Compare November 4, 2023 18:35
utkarsharma2 and others added 4 commits November 6, 2023 19:44
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
@pankajastro pankajastro merged commit 4fe87ea into apache:main Nov 6, 2023
@bolkedebruin
Copy link
Contributor

Too bad there is no decorator that just wraps the callable like so

@task.weaviate
def my_task() -> JsonStr:
  return JsonStr(Something)

romsharon98 pushed a commit to romsharon98/airflow that referenced this pull request Nov 10, 2023
* Add Weaviate Provider

* Fix docs and static checks

* Remove callable interface params from the operator

* Resolve conflicts

* Fix docs

* Resolve conflicts

* Update airflow/providers/weaviate/hooks/weaviate.py

Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>

* Update airflow/providers/weaviate/operators/weaviate.py

Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>

* Update airflow/providers/weaviate/operators/weaviate.py

Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>

* Add security.rst to docs

* Resolve conflicts

* Address PR Comments

---------

Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:dev-tools area:providers changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants