-
Notifications
You must be signed in to change notification settings - Fork 173
feat(c/driver/postgres): Implement GetObjectsDbSchemas for Postgres #679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
|
c/driver/postgresql/connection.cc
Outdated
|
|
||
| pg_result* result = result_helper.Execute(); | ||
| ExecStatusType pq_status = PQresultStatus(result); | ||
| if (pq_status == PGRES_TUPLES_OK) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pg looping is getting pretty repetitive. Feels like there is a way to refactor into the existing ResultHelper class and create an iterator over the results
c/driver/postgresql/connection.cc
Outdated
| } else { | ||
| return ADBC_STATUS_NOT_IMPLEMENTED; | ||
| if (depth >= ADBC_OBJECT_DEPTH_DB_SCHEMAS) { | ||
| RAISE_ADBC(PostgresConnectionGetSchemasImpl(conn, depth, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we are grabbing the associated schemas in a loop, but this brings up a structural question. I assume from the adbc.h header that GetInfo should maintain a hierarchy from catalog -> schema -> table -> constraints, etc...
However, the current implementation (shared with sqlite) creates the overall array structure in a call to AdbcInitConnectionObjectsSchema; at face value I don't see how this preserves the hierarchy between elements. With this expression in a loop, it ends up writing the schema name for as many databases as exist.
I could move it out of the loop to fix that, but figured I'd ask about the hierarchical intent of the result before going too far
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's supposed to be a nested data structure, so the catalog name appears once, with the schemas nested inside that row, and ditto for tables being nested into catalogs. The Flight SQL driver has code along those lines, if you want to reference it (in Go/Java).
It is a bit annoying and if I could do it again I would probably stick with the Flight SQL design (GetTables/GetCatalogs/GetSchemas). (Or possibly Calcite's design, which is more flexible; that has come up before too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFACIT those implementations generate maps that relate a table to a schema. I'm guessing elsewhere in the FlightSQL code base is where that mapping gets generated back into an Arrow structure. Any pointers on where I should look for that?
Generally I am unsure I have the right idea of how this is all supposed to be laid out. For example, if we have the following setup in the database:
[{
"catalog_name": "postgres",
"catalog_db_schema": [
{
"db_schema_name": "public",
"db_schema_tables": [...],
},
{
"db_schema_name": "template0",
"db_schema_tables": [...],
},
{
"db_schema_name": "template1",
"db_schema_tables": [...],
},
}]
}]Am I supposed to access the value of "public" via reader->array_view->children[1][0]->children[0]? I have a feeling the children[1][0] access is incorrect and there should be a way to extract the list element by index from an array via nanoarrow, but didn't see such a function.
Is that something that needs to be implemented upstream or do I have the wrong mental model?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to iterate "catalog_db_schema", you would use nanoarrow to get the array offsets of element 0 in the catalog_db_schema array, which would give you a slice of the child array to iterate through. For each of those elements, you would then again get the array offsets of that element in the db_schema_tables array, which would give you a slice of the child child array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see this starting here:
arrow-adbc/c/validation/adbc_validation.cc
Line 564 in 47775a8
| void ConnectionTest::TestMetadataGetObjectsTables() { |
The code I'm referencing for building up the structure is here: https://github.com/apache/arrow-adbc/blob/main/go/adbc/driver/internal/shared_utils.go
You could also reference the code here: apache/arrow#14082
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to iterate "catalog_db_schema", you would use nanoarrow to get the array offsets of element 0 in the catalog_db_schema array, which would give you a slice of the child array to iterate through...
Great thanks for the guidance. I see nanoarrow has ArrowArrayViewUnionChildOffset but didn't see anything similar for Lists (though I think the member access would be the same as far as lists go). Think its worth trying to upstream that function for lists first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah...I was going to say it exists, but just realized, it's actually part of this project:
| int64_t ArrowArrayViewGetOffsetUnsafe(struct ArrowArrayView* array_view, int64_t i) { |
I think it'd be worth upstreaming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool - I'll give that a shot
| ArrowArrayViewGetStringUnsafe(reader.array_view->children[0], row); | ||
| auto val_str = std::string(val.data, val.size_bytes); | ||
| if (val_str == "postgres") { | ||
| // ASSERT_EQ(reader.array->children[1]->children[0]->length, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commented assertion and the one on L244 relate to the question about whether or not a hierarchy should exist between the elements inspected by GetInfo
Similar to what exists for Dense/Sparse Union. This would at least be useful in the ADBC library, which currently provides its own implementation apache/arrow-adbc#679 (comment)
Related to #679 (comment) this pulls down the nanoarrow change apache/arrow-nanoarrow#197
c/driver/postgresql/connection.cc
Outdated
| CHECK_NA(INTERNAL, ArrowArrayAppendNull(catalog_db_schemas_col, 1), error); | ||
| } else { | ||
| return ADBC_STATUS_NOT_IMPLEMENTED; | ||
| } else if (!db_schema || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is messy but there is a test that provides a schema named this schema does not exist and expects a NULL value back. This doesn't work if you provide a schema name that actually does exist, but neither does the sqlite implementation AFAICT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That test expects a non-NULL, but empty, list of schemas back. Obviously, if the schema exists, we should return it.
c/validation/adbc_validation.cc
Outdated
|
|
||
| ASSERT_FALSE(ArrowArrayViewIsNull(catalog_db_schemas_list, row)) | ||
| << "Row " << row << " should have non-null catalog_db_schemas"; | ||
| // ASSERT_FALSE(ArrowArrayViewIsNull(catalog_db_schemas_list, row)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK postgres only allows you to retrieve the schemas from the database you are currently connected to, but we can see what other databases exist. So either this test is too restrictive being in the base tests or we should make it so that the postgres driver only lists the current database in GetObjects; I think the former option is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent is that if you request the schemas, it should come back as a (possibly empty) list, not null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah OK makes sense. Thanks for the guidance
c/validation/adbc_validation.cc
Outdated
| << std::string(catalog_name.data, catalog_name.size_bytes) | ||
| << ") should have nonempty catalog_db_schemas "; | ||
| ASSERT_FALSE(ArrowArrayViewIsNull(catalog_db_schemas_list, row)); | ||
| // ASSERT_FALSE(ArrowArrayViewIsNull(catalog_db_schemas_list, row)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to be duplicative of the test on L511 above
|
I think this is working now with proper tests. Definitely agree on the structure of this being less than ideal, so happy to move towards a more FlightSQL/Go approach here or in a follow up |
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think we will want to refactor this before tackling columns/constraints, but things at least work.
c/driver/postgresql/connection.cc
Outdated
| PqResultRow db_row = *curr_iter; | ||
| curr_db = db_row[0].data; | ||
| } else { | ||
| return ADBC_STATUS_NOT_IMPLEMENTED; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should probably be like STATUS_IO or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed to INTERNAL for now - I don't know if there is any valid reason why this call would fail
c/driver/postgresql/connection.cc
Outdated
| assert(curr_db_helper.NumRows() == 1); | ||
| auto curr_iter = curr_db_helper.begin(); | ||
| PqResultRow db_row = *curr_iter; | ||
| curr_db = db_row[0].data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this pointer stay valid? maybe we should copy to a string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch
| } | ||
|
|
||
| int res = | ||
| StringBuilderAppend(&query, "%s%s%s", " AND nspname ='", schema_name, "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually we should do this via query parameters instead of escaping (especially since the spec lets you use LIKE-style patterns for these filters)
lidavidm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
No description provided.