Skip to content

Conversation

@mohamedawnallah
Copy link
Contributor

@mohamedawnallah mohamedawnallah commented Jun 9, 2025

Description

  • Add Milvus Search Enrichment Handler [Vector, Keyword, Hybrid]
  • Unit Test Milvus Search Enrichment Handler
  • Integrate Test Milvus Search Enrichment Handler
  • Review & Address (can be) Flaky Test Cases
  • Add Release Note for the new Milvus Search Enrichment Handler Integration [Vector, Keyword, Hybrid]

Towards #35046.
Next #35577.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 9, 2025

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

@damccorm
Copy link
Contributor

Thanks! It looks like there are a bunch of precommits failing with failures like:

<testcase classname="apache_beam.ml.rag.enrichment.milvus_search_it_test.TestMilvusSearchEnrichment" name="test_chunks_batching" time="0.010">
<error message="failed on setup with "docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))"">self = <docker.transport.unixconn.UnixHTTPConnectionPool object at 0x138666680> method = 'GET', url = '/version', body = None headers = {'User-Agent': 'docker-sdk-python/7.1.0', 'Accept-Encoding': 'gzip, deflate, zstd', 'Accept': '*/*', 'Connection': 'keep-alive'} retries = Retry(total=0, connect=None, read=False, redirect=None, status=None) redirect = False, assert_same_host = False timeout = Timeout(connect=60, read=60, total=None), pool_timeout = None release_conn = False, chunked = False, body_pos = None, preload_content = False decode_content = False, response_kw = {} parsed_url = Url(scheme=None, auth=None, host=None, port=None, path='/version', query=None, fragment=None) destination_scheme = None, conn = None, release_this_conn = True http_tunnel_required = False, err = None, clean_exit = False def urlopen( # type: ignore[override] self, method: str, url: str, body: _TYPE_BODY | None = None, headers: typing.Mapping[str, str] | None = None, retries: Retry | bool | int | None = None, redirect: bool = True, assert_same_host: bool = True, timeout: _TYPE_TIMEOUT = _DEFAULT_TIMEOUT, pool_timeout: int | None = None, release_conn: bool | None = None, chunked: bool = False, body_pos: _TYPE_BODY_POSITION | None = None, preload_content: bool = True, decode_content: bool = True, **response_kw: typing.Any, ) -> BaseHTTPResponse: """ Get a connection from the pool and perform an HTTP request. This is the lowest level call for making a request, so you'll need to specify all the raw details. .. note:: More commonly, it's appropriate to use a convenience method such as :meth:`request`. .. note:: `release_conn` will only behave as expected if `preload_content=False` because we want to make `preload_content=False` the default behaviour someday soon without breaking backwards compatibility. :param method: HTTP request method (such as GET, POST, PUT, etc.) :param url: The URL to perform the request on. :param body: Data to send in the request body, either :class:`str`, :class:`bytes`, an iterable of :class:`str`/:class:`bytes`, or a file-like object. :param headers: Dictionary of custom headers to send, such as User-Agent, If-None-Match, etc. If None, pool headers are used. If provided, these headers completely replace any pool-specific headers. :param retries: Configure the number of retries to allow before raising a :class:`~urllib3.exceptions.MaxRetryError` exception. If ``None`` (default) will retry 3 times, see ``Retry.DEFAULT``. Pass a :class:`~urllib3.util.retry.Retry` object for fine-grained control over different types of retries. Pass an integer number to retry connection errors that many times, but no other types of errors. Pass zero to never retry. If ``False``, then retries are disabled and any exception is raised immediately. Also, instead of raising a MaxRetryError on redirects, the redirect response will be returned. :type retries: :class:`~urllib3.util.retry.Retry`, False, or an int. :param redirect: If True, automatically handle redirects (status codes 301, 302, 303, 307, 308). Each redirect counts as a retry. Disabling retries will disable redirect, too. :param assert_same_host: If ``True``, will make sure that the host of the pool requests is consistent else will raise HostChangedError. When ``False``, you can use the pool on an HTTP proxy and request foreign hosts. :param timeout: If specified, overrides the default timeout for this one request. It may be a float (in seconds) or an instance of :class:`urllib3.util.Timeout`. :param pool_timeout: If set and the pool is set to block=True, then this method will block for ``pool_timeout`` seconds and raise EmptyPoolError if no connection is available within the time period. :param bool preload_content: If True, the response's body will be preloaded into memory. :param bool decode_content: If True, will attempt to decode the body based on the 'content-encoding' header. :param release_conn: If False, then the urlopen call will not release the connection back into the pool once a response is received (but will release if you read the entire contents of the response such as when `preload_content=True`). This is useful if you're not preloading the response's content immediately. You will need to call ``r.release_conn()`` on the response ``r`` to return the connection back into the pool. If None, it takes the value of ``preload_content`` which defaults to ``True``. :param bool chunked: If True, urllib3 will send the body using chunked transfer encoding. Otherwise, urllib3 will send the body using the standard content-length form. Defaults to False. :param int body_pos: Position to seek to in file-like body in the event of a retry or redirect. Typically this won't need to be set because urllib3 will auto-populate the value when needed. """ parsed_url = parse_url(url) destination_scheme = parsed_url.scheme if headers is None: headers = self.headers if not isinstance(retries, Retry): retries = Retry.from_int(retries, redirect=redirect, default=self.retries) if release_conn is None: release_conn = preload_content # Check host if assert_same_host and not self.is_same_host(url): raise HostChangedError(self, url, retries) # Ensure that the URL we're connecting to is properly encoded if url.startswith("/"): url = to_str(_encode_target(url)) else: url = to_str(parsed_url.url) conn = None # Track whether `conn` needs to be released before # returning/raising/recursing. Update this variable if necessary, and # leave `release_conn` constant throughout the function. That way, if # the function recurses, the original value of `release_conn` will be # passed down into the recursive call, and its value will be respected. # # See issue #651 [1] for details. # # [1] <https://github.com/urllib3/urllib3/issues/651> release_this_conn = release_conn http_tunnel_required = connection_requires_http_tunnel( self.proxy, self.proxy_config, destination_scheme ) # Merge the proxy headers. Only done when not using HTTP CONNECT. We # have to copy the headers dict so we can safely change it without those # changes being reflected in anyone else's copy. if not http_tunnel_required: headers = headers.copy() # type: ignore[attr-defined] headers.update(self.proxy_headers) # type: ignore[union-attr] # Must keep the exception bound to a separate variable or else Python 3 # complains about UnboundLocalError. err = None # Keep track of whether we cleanly exited the except block. This # ensures we do proper cleanup in finally. clean_exit = False # Rewind body position, if needed. Record current position # for future rewinds in the event of a redirect/retry. body_pos = set_file_position(body, body_pos) try: # Request a connection from the queue. timeout_obj = self._get_timeout(timeout) conn = self._get_conn(timeout=pool_timeout) conn.timeout = timeout_obj.connect_timeout # type: ignore[assignment] # Is this a closed/new connection that requires CONNECT tunnelling? if self.proxy is not None and http_tunnel_required and conn.is_closed: try: self._prepare_proxy(conn) except (BaseSSLError, OSError, SocketTimeout) as e: self._raise_timeout( err=e, url=self.proxy.url, timeout_value=conn.timeout ) raise # If we're going to release the connection in ``finally:``, then # the response doesn't need to know about the connection. Otherwise # it will also try to release it and we'll have a double-release # mess. response_conn = conn if not release_conn else None # Make the request on the HTTPConnection object > response = self._make_request( conn, method, url, timeout=timeout_obj, body=body, headers=headers, chunked=chunked, retries=retries, response_conn=response_conn, preload_content=preload_content, decode_content=decode_content, **response_kw, ) target/.tox/py310-macos/lib/python3.10/site-packages/urllib3/connectionpool.py:787: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ target/.tox/py310-macos/lib/python3.10/site-packages/urllib3/connectionpool.py:493: in _make_request conn.request( target/.tox/py310-macos/lib/python3.10/site-packages/urllib3/connection.py:445: in request self.endheaders() /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/http/client.py:1278: in endheaders self._send_output(message_body, encode_chunked=encode_chunked) /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/http/client.py:1038: in _send_output self.send(msg) /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/http/client.py:976: in send self.connect() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <docker.transport.unixconn.UnixHTTPConnection object at 0x138665240> def connect(self): sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) sock.settimeout(self.timeout) > sock.connect(self.unix_socket) E FileNotFoundError: [Errno 2] No such file or directory target/.tox/py310-macos/lib/python3.10/site-packages/docker/transport/unixconn.py:26: FileNotFoundError During handling of the above exception, another exception occurred: self = <docker.transport.unixconn.UnixHTTPAdapter object at 0x138666d40> request = <PreparedRequest [GET]>, stream = False timeout = Timeout(connect=60, read=60, total=None), verify = True, cert = None proxies = OrderedDict() def send( self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None ): """Sends PreparedRequest object. Returns Response object. :param request: The :class:`PreparedRequest <PreparedRequest>` being sent. :param stream: (optional) Whether to stream the request content. :param timeout: (optional) How long to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple or urllib3 Timeout object :param verify: (optional) Either a boolean, in which case it controls whether we verify the server's TLS certificate, or a string, in which case it must be a path to a CA bundle to use :param cert: (optional) Any user-provided SSL certificate to be trusted. :param proxies: (optional) The proxies dictionary to apply to the request. :rtype: requests.Response """ try: conn = self.get_connection_with_tls_context( request, verify, proxies=proxies, cert=cert ) except LocationValueError as e: raise InvalidURL(e, request=request) self.cert_verify(conn, request.url, verify, cert) url = self.request_url(request, proxies) self.add_headers( request, stream=stream, timeout=timeout, verify=verify, cert=cert, proxies=proxies, ) chunked = not (request.body is None or "Content-Length" in request.headers) if isinstance(timeout, tuple): try: connect, read = timeout timeout = TimeoutSauce(connect=connect, read=read) except ValueError: raise ValueError( f"Invalid timeout {timeout}. Pass a (connect, read) timeout tuple, " f"or a single float to set both timeouts to the same value." ) elif isinstance(timeout, TimeoutSauce): pass else: timeout = TimeoutSauce(connect=timeout, read=timeout) try: > resp = conn.urlopen( method=request.method, url=url, body=request.body, headers=request.headers, redirect=False, assert_same_host=False, preload_content=False, decode_content=False, retries=self.max_retries, timeout=timeout, chunked=chunked, ) target/.tox/py310-macos/lib/python3.10/site-packages/requests/adapters.py:667: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ target/.tox/py310-macos/lib/python3.10/site-packages/urllib3/connectionpool.py:841: in urlopen retries = retries.increment( target/.tox/py310-macos/lib/python3.10/site-packages/urllib3/util/retry.py:474: in increment raise reraise(type(error), error, _stacktrace) target/.tox/py310-macos/lib/python3.10/site-packages/urllib3/util/util.py:38: in reraise raise value.with_traceback(tb) target/.tox/py310-macos/lib/python3.10/site-packages/urllib3/connectionpool.py:787: in urlopen response = self._make_request( target/.tox/py310-macos/lib/python3.10/site-packages/urllib3/connectionpool.py:493: in _make_request conn.request( target/.tox/py310-macos/lib/python3.10/site-packages/urllib3/connection.py:445: in request self.endheaders() /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/http/client.py:1278: in endheaders self._send_output(message_body, encode_chunked=encode_chunked) /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/http/client.py:1038: in _send_output self.send(msg) /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/http/client.py:976: in send self.connect() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <docker.transport.unixconn.UnixHTTPConnection object at 0x138665240> def connect(self): sock = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM) sock.settimeout(self.timeout) > sock.connect(self.unix_socket) E urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')) target/.tox/py310-macos/lib/python3.10/site-packages/docker/transport/unixconn.py:26: ProtocolError During handling of the above exception, another exception occurred: self = <docker.api.client.APIClient object at 0x138665600> def _retrieve_server_version(self): try: > return self.version(api_version=False)["ApiVersion"] target/.tox/py310-macos/lib/python3.10/site-packages/docker/api/client.py:223: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ target/.tox/py310-macos/lib/python3.10/site-packages/docker/api/daemon.py:181: in version return self._result(self._get(url), json=True) target/.tox/py310-macos/lib/python3.10/site-packages/docker/utils/decorators.py:44: in inner return f(self, *args, **kwargs) target/.tox/py310-macos/lib/python3.10/site-packages/docker/api/client.py:246: in _get return self.get(url, **self._set_request_timeout(kwargs)) target/.tox/py310-macos/lib/python3.10/site-packages/requests/sessions.py:602: in get return self.request("GET", url, **kwargs) target/.tox/py310-macos/lib/python3.10/site-packages/requests/sessions.py:589: in request resp = self.send(prep, **send_kwargs) target/.tox/py310-macos/lib/python3.10/site-packages/requests/sessions.py:703: in send r = adapter.send(request, **kwargs) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <docker.transport.unixconn.UnixHTTPAdapter object at 0x138666d40> request = <PreparedRequest [GET]>, stream = False timeout = Timeout(connect=60, read=60, total=None), verify = True, cert = None proxies = OrderedDict() def send( self, request, stream=False, timeout=None, verify=True, cert=None, proxies=None ): """Sends PreparedRequest object. Returns Response object. :param request: The :class:`PreparedRequest <PreparedRequest>` being sent. :param stream: (optional) Whether to stream the request content. :param timeout: (optional) How long to wait for the server to send data before giving up, as a float, or a :ref:`(connect timeout, read timeout) <timeouts>` tuple. :type timeout: float or tuple or urllib3 Timeout object :param verify: (optional) Either a boolean, in which case it controls whether we verify the server's TLS certificate, or a string, in which case it must be a path to a CA bundle to use :param cert: (optional) Any user-provided SSL certificate to be trusted. :param proxies: (optional) The proxies dictionary to apply to the request. :rtype: requests.Response """ try: conn = self.get_connection_with_tls_context( request, verify, proxies=proxies, cert=cert ) except LocationValueError as e: raise InvalidURL(e, request=request) self.cert_verify(conn, request.url, verify, cert) url = self.request_url(request, proxies) self.add_headers( request, stream=stream, timeout=timeout, verify=verify, cert=cert, proxies=proxies, ) chunked = not (request.body is None or "Content-Length" in request.headers) if isinstance(timeout, tuple): try: connect, read = timeout timeout = TimeoutSauce(connect=connect, read=read) except ValueError: raise ValueError( f"Invalid timeout {timeout}. Pass a (connect, read) timeout tuple, " f"or a single float to set both timeouts to the same value." ) elif isinstance(timeout, TimeoutSauce): pass else: timeout = TimeoutSauce(connect=timeout, read=timeout) try: resp = conn.urlopen( method=request.method, url=url, body=request.body, headers=request.headers, redirect=False, assert_same_host=False, preload_content=False, decode_content=False, retries=self.max_retries, timeout=timeout, chunked=chunked, ) except (ProtocolError, OSError) as err: > raise ConnectionError(err, request=request) E requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')) target/.tox/py310-macos/lib/python3.10/site-packages/requests/adapters.py:682: ConnectionError The above exception was the direct cause of the following exception: @pytest.fixture(scope="session") def milvus_container(): # Start the container before any tests run. > container = MilvusEnrichmentTestHelper.start_milvus_search_db_container() apache_beam/ml/rag/enrichment/milvus_search_it_test.py:93: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ apache_beam/ml/rag/enrichment/milvus_search_it_test.py:75: in start_milvus_search_db_container raise e apache_beam/ml/rag/enrichment/milvus_search_it_test.py:54: in start_milvus_search_db_container vector_db_container = MilvusContainer(image=image, port=19530) target/.tox/py310-macos/lib/python3.10/site-packages/testcontainers/milvus/__init__.py:45: in __init__ super().__init__(image=image, **kwargs) target/.tox/py310-macos/lib/python3.10/site-packages/testcontainers/core/container.py:50: in __init__ self._docker = DockerClient(**(docker_client_kw or {})) target/.tox/py310-macos/lib/python3.10/site-packages/testcontainers/core/docker_client.py:70: in __init__ self.client = docker.from_env(**kwargs) target/.tox/py310-macos/lib/python3.10/site-packages/docker/client.py:94: in from_env return cls( target/.tox/py310-macos/lib/python3.10/site-packages/docker/client.py:45: in __init__ self.api = APIClient(*args, **kwargs) target/.tox/py310-macos/lib/python3.10/site-packages/docker/api/client.py:207: in __init__ self._version = self._retrieve_server_version() _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ self = <docker.api.client.APIClient object at 0x138665600> def _retrieve_server_version(self): try: return self.version(api_version=False)["ApiVersion"] except KeyError as ke: raise DockerException( 'Invalid response from docker daemon: key "ApiVersion"' ' is missing.' ) from ke except Exception as e: > raise DockerException( f'Error while fetching server API version: {e}' ) from e E docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory')) target/.tox/py310-macos/lib/python3.10/site-packages/docker/api/client.py:230: DockerException</error>
</testcase>

Would you mind taking a look?

@github-actions
Copy link
Contributor

Assigning reviewers:

R: @liferoad for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@mohamedawnallah
Copy link
Contributor Author

waiting on author

@mohamedawnallah
Copy link
Contributor Author

stop reviewer notifications

@github-actions
Copy link
Contributor

Stopping reviewer notifications for this pull request: requested by reviewer. If you'd like to restart, comment assign set of reviewers

@github-actions github-actions bot removed the examples label Jun 27, 2025
@github-actions github-actions bot removed the build label Jun 27, 2025
@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Jun 27, 2025

Thanks a lot @damccorm for the thoughtful feedback. I've addressed it and left one comment explaining adding setuptools as dependency. I think it is now ready for another round of review. Also I raised this follow-up PR for updating/adding docs/exmaples for milvus search #35467

@mohamedawnallah mohamedawnallah requested a review from damccorm June 27, 2025 23:30
@mohamedawnallah mohamedawnallah changed the title [1/2] sdks/python: enrich data with Milvus Search [Vector, Keyword, Hybrid] [1/3] sdks/python: enrich data with Milvus Search [Vector, Keyword, Hybrid] Jun 30, 2025
@codecov
Copy link

codecov bot commented Jun 30, 2025

Codecov Report

Attention: Patch coverage is 70.79646% with 66 lines in your changes missing coverage. Please review.

Project coverage is 56.52%. Comparing base (dd9552c) to head (65fc24b).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...hon/apache_beam/ml/rag/enrichment/milvus_search.py 69.86% 66 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #35216      +/-   ##
============================================
+ Coverage     56.51%   56.52%   +0.01%     
  Complexity     3319     3319              
============================================
  Files          1198     1199       +1     
  Lines        182870   183091     +221     
  Branches       3426     3426              
============================================
+ Hits         103347   103496     +149     
- Misses        76223    76295      +72     
  Partials       3300     3300              
Flag Coverage Δ
python 80.77% <70.79%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mohamedawnallah
Copy link
Contributor Author

Thanks @damccorm for taking another look at this PR. I've removed the explicit setuptools and pip dependencies in sdks/python/setup.py according to your feedback. I think it is now ready for review

P.S. The failed Go tests / Go Build job workflow appears unrelated to the changes introduced by this PR.

@mohamedawnallah
Copy link
Contributor Author

mohamedawnallah commented Jun 30, 2025

Also, I plan to submit a follow-up PR perhaps after this one and that docs PR #35467? It will primarily address two pieces of feedback you provided:

  1. Handling tests that can't run on self-hosted runners due to the Docker-in-Docker environment ([1/3] sdks/python: enrich data with Milvus Search [Vector, Keyword, Hybrid] #35216 (comment)).
  2. Moving the milvus dependency into its own optional extra ([1/3] sdks/python: enrich data with Milvus Search [Vector, Keyword, Hybrid] #35216 (comment)).

Copy link
Contributor

@damccorm damccorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! This looks good to me

@damccorm damccorm merged commit 510d842 into apache:master Jun 30, 2025
102 of 103 checks passed
changliiu pushed a commit to changliiu/beam that referenced this pull request Jul 1, 2025
…ybrid] (apache#35216)

* sdks/python: add pymilvus dependency

* sdks/python: add `MilvusSearchEnrichmentHandler`

* sdks/python: test `MilvusSearchEnrichmentHandler`

* sdks/python: itest `MilvusSearchEnrichmentHandler`

* examples: add `MilvusSearchEnrichmentHandler`

* sdks/python: combine milvus search strategies in one

Co-authored-by: Danny McCormick <dannymccormick@google.com>

* sdks/python/container: update image requirements

* sdks/python: add license for `milvus_search.py`

* sdks/python: add docstrings for `milvus_search.py`

* sdks/python: unit test milvus search handler

* sdks/python: update docstrings for milvus_search

* sdks/python: fix linting for `milvus_search.py`

* sdks/python: add more unit tests for milvus search

* sdks/python: combine test classes in one

* sdks/python: add `setuptools` as dependency

* sdks/python: update container image requirements

* sdks/python: update definition for `ANNS` field

* .github/workflows: upgrade pip & setuptools

* sdks/python: fix linting issue for `milvus_search`

* sdks/python: group I/O types together milvus_search

* .github/workflows: upgrade pip & setuptools

* .github: unify upgrading setuptools & pip

* sdks/python: fix linting for `milvus_search.py`

* sdks/python: update grpcio for py<=3.12

* sdks/python: update image requirements

* sdks/python: add `milvus-lite` manual license

* sdks/python: fix `milvus_search_it_test` failed cases

* sdks/python: unify access to sparse/dense embeddings

* sdks/python: invoke `unittest.main` on milvus search

* sdks/pyhon: make `MilvusSearchDBContainerInfo` optional for linting

* sdks/python+website: update docs

* sdks/python: fix linting issues for `milvus_search` component

* sdks/python: fix linting issues for milvus search component

* website: add missing doc for milvus search

* sdks/python: add itests for milvus search

* sdks/python: complete itests for milvus search

* sdks/python: fix linting

* sdks/python: address (can be) flaky test cases

* website: update relase version for `enrichment-milvus.md`

* sdks/python: fix failed unit tests for milvus search

* sdks/python: fix linting for milvus search itests

* website: update docs html to ref milvus enrichment handler

* sdks/python: avoid port collision for milvus container

* sdks/python: remove free port allocation for milvus search

* sdks/python: fix formatting issues for milvus search

* sdks/python: fix linting for milvus_search_it_test

* sdks/python: handle port collisions for milvus search itest

* sdks/python: increase timeout for milvus container

* sdks/python: experiment being explicit about the port solve the CI issue

* sdks+.github: experiment running ml deps CI test onubuntu solve issue

* .github/workflwos: revert python precommit ml changes

* sdks/python: fix CI issues for itests

* sdks/python: fix linting for milvus search itests

* examples/notebook: update milvus enrichment transform

* website: update milvus enrichment transform

* CHANGES.md: add note for milvus enrichment handler

* sdks/python: update itests for milvus search

* sdks/python: fix linting issues

* multi: update

* multi: update

* updatet

* update

* update

* sdks/python: fix linting issues

* sdks/python: see what CI workflows would fail

* .github: run beam_PreCommit_Python_ML only on ubuntu-20.04 runner

* .github: test workflow

* .github: revert changes

* .github: add milvus-integration-tests.yml

* .github: update milvus it workflow

* update

* .github: update milvus-tests workflow

* .github: try to use ubuntu version `ubuntu-20.04`

* .github+sdks/python: update itests

* .github: update gh runner for milvus itests

* .github: update milvus itests workflow

* .github+sdks/python: update itests

* .github: remove `milvus-integration-tests.yml` for the PR review

* sdks/python: skip itests properly if milvus db container failed to start

* skds/python: restructure the code order in the example

* sdks/python: reduce number of retries to avoid test timeout

* sdks/python: set internal testcontainer env variable for max retries

* sdks/python: update tc max retries

* sdks/python: update

* sdks/python: use dynamic milvus service and healthcheck ports

* sdks/python: fix linting issues for milvus search itest

* sdks/python: fixing linting issues for milvus search itests

* .github+sdks/python: reconfigure dependencies

* sdks/python: address Danny's feedback (2)

* examples/notebooks: update `milvus_enrichment_transform`

* website+examples: remove non-functional docs/examples

* website: revert updated `enrichment.md`

* sdks/python: remove duplicated `HybridSearchParameters`

* sdks/python: fix linting for milvus search

* sdks/python: remove examples from this PR

* .github/workflows: remove unnecesssary changes

* CHANGES.md: undo the feature template

* sdks/python: remove `pip` and `setuptools` as explicit dependency

---------

Co-authored-by: Danny McCormick <dannymccormick@google.com>
jrmccluskey pushed a commit to jrmccluskey/beam that referenced this pull request Jul 1, 2025
…ybrid] (apache#35216)

* sdks/python: add pymilvus dependency

* sdks/python: add `MilvusSearchEnrichmentHandler`

* sdks/python: test `MilvusSearchEnrichmentHandler`

* sdks/python: itest `MilvusSearchEnrichmentHandler`

* examples: add `MilvusSearchEnrichmentHandler`

* sdks/python: combine milvus search strategies in one

Co-authored-by: Danny McCormick <dannymccormick@google.com>

* sdks/python/container: update image requirements

* sdks/python: add license for `milvus_search.py`

* sdks/python: add docstrings for `milvus_search.py`

* sdks/python: unit test milvus search handler

* sdks/python: update docstrings for milvus_search

* sdks/python: fix linting for `milvus_search.py`

* sdks/python: add more unit tests for milvus search

* sdks/python: combine test classes in one

* sdks/python: add `setuptools` as dependency

* sdks/python: update container image requirements

* sdks/python: update definition for `ANNS` field

* .github/workflows: upgrade pip & setuptools

* sdks/python: fix linting issue for `milvus_search`

* sdks/python: group I/O types together milvus_search

* .github/workflows: upgrade pip & setuptools

* .github: unify upgrading setuptools & pip

* sdks/python: fix linting for `milvus_search.py`

* sdks/python: update grpcio for py<=3.12

* sdks/python: update image requirements

* sdks/python: add `milvus-lite` manual license

* sdks/python: fix `milvus_search_it_test` failed cases

* sdks/python: unify access to sparse/dense embeddings

* sdks/python: invoke `unittest.main` on milvus search

* sdks/pyhon: make `MilvusSearchDBContainerInfo` optional for linting

* sdks/python+website: update docs

* sdks/python: fix linting issues for `milvus_search` component

* sdks/python: fix linting issues for milvus search component

* website: add missing doc for milvus search

* sdks/python: add itests for milvus search

* sdks/python: complete itests for milvus search

* sdks/python: fix linting

* sdks/python: address (can be) flaky test cases

* website: update relase version for `enrichment-milvus.md`

* sdks/python: fix failed unit tests for milvus search

* sdks/python: fix linting for milvus search itests

* website: update docs html to ref milvus enrichment handler

* sdks/python: avoid port collision for milvus container

* sdks/python: remove free port allocation for milvus search

* sdks/python: fix formatting issues for milvus search

* sdks/python: fix linting for milvus_search_it_test

* sdks/python: handle port collisions for milvus search itest

* sdks/python: increase timeout for milvus container

* sdks/python: experiment being explicit about the port solve the CI issue

* sdks+.github: experiment running ml deps CI test onubuntu solve issue

* .github/workflwos: revert python precommit ml changes

* sdks/python: fix CI issues for itests

* sdks/python: fix linting for milvus search itests

* examples/notebook: update milvus enrichment transform

* website: update milvus enrichment transform

* CHANGES.md: add note for milvus enrichment handler

* sdks/python: update itests for milvus search

* sdks/python: fix linting issues

* multi: update

* multi: update

* updatet

* update

* update

* sdks/python: fix linting issues

* sdks/python: see what CI workflows would fail

* .github: run beam_PreCommit_Python_ML only on ubuntu-20.04 runner

* .github: test workflow

* .github: revert changes

* .github: add milvus-integration-tests.yml

* .github: update milvus it workflow

* update

* .github: update milvus-tests workflow

* .github: try to use ubuntu version `ubuntu-20.04`

* .github+sdks/python: update itests

* .github: update gh runner for milvus itests

* .github: update milvus itests workflow

* .github+sdks/python: update itests

* .github: remove `milvus-integration-tests.yml` for the PR review

* sdks/python: skip itests properly if milvus db container failed to start

* skds/python: restructure the code order in the example

* sdks/python: reduce number of retries to avoid test timeout

* sdks/python: set internal testcontainer env variable for max retries

* sdks/python: update tc max retries

* sdks/python: update

* sdks/python: use dynamic milvus service and healthcheck ports

* sdks/python: fix linting issues for milvus search itest

* sdks/python: fixing linting issues for milvus search itests

* .github+sdks/python: reconfigure dependencies

* sdks/python: address Danny's feedback (2)

* examples/notebooks: update `milvus_enrichment_transform`

* website+examples: remove non-functional docs/examples

* website: revert updated `enrichment.md`

* sdks/python: remove duplicated `HybridSearchParameters`

* sdks/python: fix linting for milvus search

* sdks/python: remove examples from this PR

* .github/workflows: remove unnecesssary changes

* CHANGES.md: undo the feature template

* sdks/python: remove `pip` and `setuptools` as explicit dependency

---------

Co-authored-by: Danny McCormick <dannymccormick@google.com>
@mohamedawnallah mohamedawnallah deleted the enrichWithMilvus branch July 13, 2025 10:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants