-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Closed as not planned
Description
What needs to happen?
We've updated the AlloyDBVectorWriterConfig to make it more flexible and align it with our new PostgresVectorWriter transform. Here’s a quick guide to help you update your code.
Here's a summary of the key changes:
- Simplified Connection Config:
AlloyDBConnectionConfighas been streamlined. You no longer need to wrap your connector options. Your username, password, database, and instance URI now go directly intoAlloyDBLanguageConnectorConfig. - New JDBC
WriteConfig: Parameters likeautoshardingandwrite_batch_sizehave been moved out of the connection configuration and into a newWriteConfigfromjdbc_common. - Moved Imports: The
ColumnSpecsBuilderandConflictResolutionutilities have been moved fromalloydbto a more generalpostgres_commonmodule.
Follow these steps to update your code
Update your imports
First, adjust your import statements. Some have been removed, and others now point to postgres_common.
Old imports
from apache_beam.ml.rag.ingestion.alloydb import AlloyDBConnectionConfig
from apache_beam.ml.rag.ingestion.alloydb import AlloyDBLanguageConnectorConfig
from apache_beam.ml.rag.ingestion.alloydb import AlloyDBVectorWriterConfig
from apache_beam.ml.rag.ingestion.alloydb import ColumnSpec
from apache_beam.ml.rag.ingestion.alloydb import ColumnSpecsBuilder
from apache_beam.ml.rag.ingestion.alloydb import ConflictResolution
New imports
# New imports for JDBC and Postgres utilities
from apache_beam.ml.rag.ingestion.jdbc_common import WriteConfig
from apache_beam.ml.rag.ingestion.postgres_common import ColumnSpecsBuilder, ConflictResolution
# Existing AlloyDB imports (no more AlloyDBConnectionConfig)
from apache_beam.ml.rag.ingestion.alloydb import AlloyDBLanguageConnectorConfig, AlloyDBVectorWriterConfig
Simplify Connection and optionally add WriteConfig
Next, update how you configure your connection. You'll now pass credentials directly to AlloyDBLanguageConnectorConfig. Then, create a WriteConfig object for settings like autosharding
Old configuration
# Connector options were wrapped in AlloyDBConnectionConfig
connector_options = AlloyDBLanguageConnectorConfig(
database_name="<database_name>",
instance_name="<instance_name>",
autosharding=True,
write_batch_size=1
)
connection_config = AlloyDBConnectionConfig.with_language_connector(
connector_options=connector_options,
username="<username>",
password="<password>"
)
New Configuration
# Simplified connection: credentials go directly here
connection_config = AlloyDBLanguageConnectorConfig(
username="<username>",
password="<password>",
database_name="<database_name>",
instance_name="<instance_name>"
)
# New config for write-specific parameters
jdbc_write_config = WriteConfig(
autosharding=True,
write_batch_size=1
)
Update the VectorDatabaseWriteTransform
Finally, add the new write_config to your AlloyDBVectorWriterConfig instantiation within your pipeline.
Old Transform
| VectorDatabaseWriteTransform(
AlloyDBVectorWriterConfig(
connection_config=connection_config,
table_name=self.default_table_name,
column_specs=specs,
conflict_resolution=conflict_resolution
)
)
New Transform
| VectorDatabaseWriteTransform(
AlloyDBVectorWriterConfig(
connection_config=connection_config,
table_name=self.default_table_name,
write_config=jdbc_write_config, # <-- Add the new WriteConfig here
column_specs=specs,
conflict_resolution=conflict_resolution
)
)
Issue Priority
Priority: 2 (default / most normal work should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner