Skip to content

[API Update]: AlloyDBVectorWriterConfig Changes #35225

@claudevdm

Description

@claudevdm

What needs to happen?

We've updated the AlloyDBVectorWriterConfig to make it more flexible and align it with our new PostgresVectorWriter transform. Here’s a quick guide to help you update your code.

Here's a summary of the key changes:

  • Simplified Connection Config: AlloyDBConnectionConfig has been streamlined. You no longer need to wrap your connector options. Your username, password, database, and instance URI now go directly into AlloyDBLanguageConnectorConfig.
  • New JDBC WriteConfig: Parameters like autosharding and write_batch_size have been moved out of the connection configuration and into a new WriteConfig from jdbc_common.
  • Moved Imports: The ColumnSpecsBuilder and ConflictResolution utilities have been moved from alloydb to a more general postgres_common module.

Follow these steps to update your code

Update your imports

First, adjust your import statements. Some have been removed, and others now point to postgres_common.

Old imports

from apache_beam.ml.rag.ingestion.alloydb import AlloyDBConnectionConfig
from apache_beam.ml.rag.ingestion.alloydb import AlloyDBLanguageConnectorConfig
from apache_beam.ml.rag.ingestion.alloydb import AlloyDBVectorWriterConfig
from apache_beam.ml.rag.ingestion.alloydb import ColumnSpec
from apache_beam.ml.rag.ingestion.alloydb import ColumnSpecsBuilder
from apache_beam.ml.rag.ingestion.alloydb import ConflictResolution

New imports

# New imports for JDBC and Postgres utilities
from apache_beam.ml.rag.ingestion.jdbc_common import WriteConfig
from apache_beam.ml.rag.ingestion.postgres_common import ColumnSpecsBuilder, ConflictResolution

# Existing AlloyDB imports (no more AlloyDBConnectionConfig)
from apache_beam.ml.rag.ingestion.alloydb import AlloyDBLanguageConnectorConfig, AlloyDBVectorWriterConfig

Simplify Connection and optionally add WriteConfig

Next, update how you configure your connection. You'll now pass credentials directly to AlloyDBLanguageConnectorConfig. Then, create a WriteConfig object for settings like autosharding

Old configuration

# Connector options were wrapped in AlloyDBConnectionConfig
connector_options = AlloyDBLanguageConnectorConfig(
    database_name="<database_name>",
    instance_name="<instance_name>",
    autosharding=True,
    write_batch_size=1
)

connection_config = AlloyDBConnectionConfig.with_language_connector(
    connector_options=connector_options,
    username="<username>",
    password="<password>"
)

New Configuration

# Simplified connection: credentials go directly here
connection_config = AlloyDBLanguageConnectorConfig(
    username="<username>",
    password="<password>",
    database_name="<database_name>",
    instance_name="<instance_name>"
)

# New config for write-specific parameters
jdbc_write_config = WriteConfig(
    autosharding=True,
    write_batch_size=1
)

Update the VectorDatabaseWriteTransform

Finally, add the new write_config to your AlloyDBVectorWriterConfig instantiation within your pipeline.

Old Transform

| VectorDatabaseWriteTransform(
    AlloyDBVectorWriterConfig(
        connection_config=connection_config,
        table_name=self.default_table_name,
        column_specs=specs,
        conflict_resolution=conflict_resolution
    )
)

New Transform

| VectorDatabaseWriteTransform(
    AlloyDBVectorWriterConfig(
        connection_config=connection_config,
        table_name=self.default_table_name,
        write_config=jdbc_write_config,  # <-- Add the new WriteConfig here
        column_specs=specs,
        conflict_resolution=conflict_resolution
    )
)

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions