Skip to content

Conversation

@WillAyd
Copy link
Contributor

@WillAyd WillAyd commented May 11, 2023

No description provided.

@github-actions
Copy link

⚠️ Please follow the Conventional Commits format in CONTRIBUTING.md for PR titles.


pg_result* result = result_helper.Execute();
ExecStatusType pq_status = PQresultStatus(result);
if (pq_status == PGRES_TUPLES_OK) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pg looping is getting pretty repetitive. Feels like there is a way to refactor into the existing ResultHelper class and create an iterator over the results

} else {
return ADBC_STATUS_NOT_IMPLEMENTED;
if (depth >= ADBC_OBJECT_DEPTH_DB_SCHEMAS) {
RAISE_ADBC(PostgresConnectionGetSchemasImpl(conn, depth,
Copy link
Contributor Author

@WillAyd WillAyd May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we are grabbing the associated schemas in a loop, but this brings up a structural question. I assume from the adbc.h header that GetInfo should maintain a hierarchy from catalog -> schema -> table -> constraints, etc...

However, the current implementation (shared with sqlite) creates the overall array structure in a call to AdbcInitConnectionObjectsSchema; at face value I don't see how this preserves the hierarchy between elements. With this expression in a loop, it ends up writing the schema name for as many databases as exist.

I could move it out of the loop to fix that, but figured I'd ask about the hierarchical intent of the result before going too far

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's supposed to be a nested data structure, so the catalog name appears once, with the schemas nested inside that row, and ditto for tables being nested into catalogs. The Flight SQL driver has code along those lines, if you want to reference it (in Go/Java).

It is a bit annoying and if I could do it again I would probably stick with the Flight SQL design (GetTables/GetCatalogs/GetSchemas). (Or possibly Calcite's design, which is more flexible; that has come up before too.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFACIT those implementations generate maps that relate a table to a schema. I'm guessing elsewhere in the FlightSQL code base is where that mapping gets generated back into an Arrow structure. Any pointers on where I should look for that?

Generally I am unsure I have the right idea of how this is all supposed to be laid out. For example, if we have the following setup in the database:

[{
 "catalog_name": "postgres",
 "catalog_db_schema": [
 {
   "db_schema_name": "public",
   "db_schema_tables": [...],
 },
 {
   "db_schema_name": "template0",
   "db_schema_tables": [...],
 },
 {
   "db_schema_name": "template1",
   "db_schema_tables": [...],
 },
 }]
 }]

Am I supposed to access the value of "public" via reader->array_view->children[1][0]->children[0]? I have a feeling the children[1][0] access is incorrect and there should be a way to extract the list element by index from an array via nanoarrow, but didn't see such a function.

Is that something that needs to be implemented upstream or do I have the wrong mental model?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to iterate "catalog_db_schema", you would use nanoarrow to get the array offsets of element 0 in the catalog_db_schema array, which would give you a slice of the child array to iterate through. For each of those elements, you would then again get the array offsets of that element in the db_schema_tables array, which would give you a slice of the child child array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see this starting here:

void ConnectionTest::TestMetadataGetObjectsTables() {

The code I'm referencing for building up the structure is here: https://github.com/apache/arrow-adbc/blob/main/go/adbc/driver/internal/shared_utils.go

You could also reference the code here: apache/arrow#14082

Copy link
Contributor Author

@WillAyd WillAyd May 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to iterate "catalog_db_schema", you would use nanoarrow to get the array offsets of element 0 in the catalog_db_schema array, which would give you a slice of the child array to iterate through...

Great thanks for the guidance. I see nanoarrow has ArrowArrayViewUnionChildOffset but didn't see anything similar for Lists (though I think the member access would be the same as far as lists go). Think its worth trying to upstream that function for lists first?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah...I was going to say it exists, but just realized, it's actually part of this project:

int64_t ArrowArrayViewGetOffsetUnsafe(struct ArrowArrayView* array_view, int64_t i) {

I think it'd be worth upstreaming.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool - I'll give that a shot

ArrowArrayViewGetStringUnsafe(reader.array_view->children[0], row);
auto val_str = std::string(val.data, val.size_bytes);
if (val_str == "postgres") {
// ASSERT_EQ(reader.array->children[1]->children[0]->length, 1);
Copy link
Contributor Author

@WillAyd WillAyd May 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commented assertion and the one on L244 relate to the question about whether or not a hierarchy should exist between the elements inspected by GetInfo

@WillAyd WillAyd changed the title feat:(c/driver/postgres): Implement GetObjectsDbSchemas for Postgres feat(c/driver/postgres): Implement GetObjectsDbSchemas for Postgres May 11, 2023
paleolimbot pushed a commit to apache/arrow-nanoarrow that referenced this pull request May 20, 2023
Similar to what exists for Dense/Sparse Union. This would at least be
useful in the ADBC library, which currently provides its own
implementation

apache/arrow-adbc#679 (comment)
lidavidm pushed a commit that referenced this pull request May 22, 2023
CHECK_NA(INTERNAL, ArrowArrayAppendNull(catalog_db_schemas_col, 1), error);
} else {
return ADBC_STATUS_NOT_IMPLEMENTED;
} else if (!db_schema ||
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is messy but there is a test that provides a schema named this schema does not exist and expects a NULL value back. This doesn't work if you provide a schema name that actually does exist, but neither does the sqlite implementation AFAICT

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That test expects a non-NULL, but empty, list of schemas back. Obviously, if the schema exists, we should return it.


ASSERT_FALSE(ArrowArrayViewIsNull(catalog_db_schemas_list, row))
<< "Row " << row << " should have non-null catalog_db_schemas";
// ASSERT_FALSE(ArrowArrayViewIsNull(catalog_db_schemas_list, row))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK postgres only allows you to retrieve the schemas from the database you are currently connected to, but we can see what other databases exist. So either this test is too restrictive being in the base tests or we should make it so that the postgres driver only lists the current database in GetObjects; I think the former option is better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is that if you request the schemas, it should come back as a (possibly empty) list, not null.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah OK makes sense. Thanks for the guidance

<< std::string(catalog_name.data, catalog_name.size_bytes)
<< ") should have nonempty catalog_db_schemas ";
ASSERT_FALSE(ArrowArrayViewIsNull(catalog_db_schemas_list, row));
// ASSERT_FALSE(ArrowArrayViewIsNull(catalog_db_schemas_list, row));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks to be duplicative of the test on L511 above

@WillAyd WillAyd marked this pull request as ready for review May 24, 2023 16:57
@WillAyd
Copy link
Contributor Author

WillAyd commented May 24, 2023

I think this is working now with proper tests. Definitely agree on the structure of this being less than ideal, so happy to move towards a more FlightSQL/Go approach here or in a follow up

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think we will want to refactor this before tackling columns/constraints, but things at least work.

PqResultRow db_row = *curr_iter;
curr_db = db_row[0].data;
} else {
return ADBC_STATUS_NOT_IMPLEMENTED;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should probably be like STATUS_IO or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to INTERNAL for now - I don't know if there is any valid reason why this call would fail

assert(curr_db_helper.NumRows() == 1);
auto curr_iter = curr_db_helper.begin();
PqResultRow db_row = *curr_iter;
curr_db = db_row[0].data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this pointer stay valid? maybe we should copy to a string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch

}

int res =
StringBuilderAppend(&query, "%s%s%s", " AND nspname ='", schema_name, "'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually we should do this via query parameters instead of escaping (especially since the spec lets you use LIKE-style patterns for these filters)

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@lidavidm lidavidm merged commit dcc681e into apache:main May 24, 2023
@lidavidm lidavidm added this to the ADBC Libraries 0.5.0 milestone May 24, 2023
@WillAyd WillAyd deleted the pg-objects-schema branch May 24, 2023 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants