Skip to content

Conversation

@ianton-ru
Copy link

@ianton-ru ianton-ru commented Jan 23, 2026

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

DataLakeCatalog namespace filter

Documentation entry for user-facing changes

New setting namespaces for DataLakeCatalog with comma-separated list of namespaces.
Supports rest, glue and unity types.

namepsaces='foo,bar'

rest type can have nested namespaces, supports next rules:

  • foo - tables from namespace foo, but not from nested namespaces
  • foo.bar - tables from nested namespace, but not from base namespace
  • foo.* - tables from al; nested namespaces, but not from base namespace
    When tables from both namespaces (base and nested) are required, need to use both: foo,foo.*

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@ianton-ru
Copy link
Author

@codex review

@github-actions
Copy link

github-actions bot commented Jan 23, 2026

Workflow [PR], commit [51cbf62]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8b92a48e5b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 529 to 533
DB::Names RestCatalog::getTables(const std::string & base_namespace, size_t limit) const
{
if (!allowed_namespaces.isNamespaceAllowed(base_namespace, /*nested*/ false))
throw DB::Exception(DB::ErrorCodes::OUT_OF_SCOPE,
"Namespace {} is filtered by `namespaces` database parameter", base_namespace);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Skip filtered namespaces when probing emptiness

This new OUT_OF_SCOPE throw means RestCatalog::getTables() now fails for any namespace excluded by the namespaces filter. RestCatalog::empty() still iterates all namespaces returned by the catalog and calls getTables() in its stop condition without checking the filter, so any filtered namespace will now throw and bubble up. That makes DatabaseDataLake::empty() (used by DatabaseCatalog::detachDatabase when dropping/detaching DBs) fail whenever the catalog contains namespaces outside the filter (e.g., namespaces='dev' but prod exists). Consider skipping filtered namespaces in empty() or swallowing OUT_OF_SCOPE there.

Useful? React with 👍 / 👎.

@ianton-ru
Copy link
Author

@codex review

@chatgpt-codex-connector
Copy link

Codex Review: Didn't find any major issues. 🎉

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

DataLakeStorageSettingsPtr settings_,
std::optional<std::string> catalog_namespaces_ = std::nullopt)
: settings(settings_)
, catalog_namespaces(catalog_namespaces_ ? *catalog_namespaces_ : std::string("*")) {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider std::optional::value_or()

{
return allowed_namespaces.contains("*") || allowed_namespaces.contains(namespace_);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a big deal, but may be make this method virtual and move this implementation to ICatalog to avoid duplication?
Feel free to ignore if you've considered this option and decided that it is better to keep it simple.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation in ICatalog requires to make allowed_namespaces a ICatalog member. But allowed_namespaces has a simple type for 'flat' case and more complex type for case with nested namespaces.

Copy link
Collaborator

@ilejn ilejn Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation in ICatalog requires to make allowed_namespaces a ICatalog member. But allowed_namespaces has a simple type for 'flat' case and more complex type for case with nested namespaces.

More complex type can live (under a different name) near more complex implementation in a derived class.
Member of a simple type can be std::optional in ICatalog.

Up to you.

glue_client = std::make_unique<Aws::Glue::GlueClient>(chain, endpoint_provider, client_configuration);
}

boost::split(allowed_namespaces, settings.namespaces, [](char c){ return c == ','; });
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider boost::is_any_of instead of lambda.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, am I right that 'aaa, bbb' would not work because of space after comma?
Is it ok?
To fix it one can
...split ... is_any_of(", "), boost::token_compress_on)

M(754, UDF_EXECUTION_FAILED) \
M(755, TOO_LARGE_LIGHTWEIGHT_UPDATES) \
M(756, CANNOT_PARSE_PROMQL_QUERY) \
M(757, OUT_OF_SCOPE) \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, though may be it is better to either use existing error code (e.g. DATALAKE_DATABASE_ERROR) or make up a more specific name, e.g. CATALOG_NAMESPACE_DISABLED ?

| `aws_access_key_id` | AWS access key ID for S3/Glue access (if not using vended credentials) |
| `aws_secret_access_key` | AWS secret access key for S3/Glue access (if not using vended credentials) |
| `region` | AWS region for the service (e.g., `us-east-1`) |
| `namespaces` | Comma-separated list of namespaces, supported types: `rest`, `glue` and `unity` |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially I thought that rest, glue and unity are supported namespaces. Probably target audience of this feature would not have this problem, but may be '... implemented for catalog types: ..'

boost::split(list_of_nested_namespaces, ns, [](char c){ return c == '.'; });

size_t len = list_of_nested_namespaces.size();
if (!len)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure that I understand what it actually means.
That 'ns' is an empty string? Is it possible?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Some kind of useless value.
You mean. need to throw exception in this case?

Copy link
Collaborator

@ilejn ilejn Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the only possible example of 'useless value'- a string that contains nothing except dots, right?
It can be processed in any way, e.g. as you currently do.

@ianton-ru
Copy link
Author

ianton-ru commented Jan 27, 2026

Test test_keeper_memory_soft_limit/test.py::test_soft_limit_create failed by timeout, can't strat cluster by some reason. Current PR shoudln't affect it, looks like unrelated flap.
Test test_async_load_databases/test.py::test_materialized_views_cascaded_multiple also looks unrelated, may be some kind of race conditions in materialized views filling, it's a background process, and possible that query was made before changes in MV.
Test test_async_load_databases/test.py::test_materialized_views_replicated because previous failed test did not drop database test_mv. Unrelated to this PR too.

@ianton-ru
Copy link
Author

Failed stateless tests:
amd_ubsan 03221_merge_profile_events - looks unrelated. Test for ProfileEvents['UserTimeMicroseconds'] > 0 and ProfileEvents['OSCPUVirtualTimeMicroseconds'] > 0 - is it possible that test computer too fast and execution took less than 1 microsecond?
amd_msan 00160_decode_xml_component, 01732_race_condition_storage_join_long, 03595_equality_deletes_simple - Reason: server died, killed by some timeout?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants