Skip to content

[Feature] Support remote storage for Cold-Hot data #8531

@qidaye

Description

@qidaye

Search before asking

  • I had searched in the issues and found no similar issues.

Description

Background

To support the separation of hot and cold data, you need to specify the connection information of the remote storage when creating an OLAP table.
For example, if the remote storage is the object storage S3, you need to specify the access key, secret key and other information. This information can be reused, and a Doris cluster may also be connected to multiple remote storages, so these connection information needs to be managed as remote storage. It is convenient for users to use, and it is not necessary to provide it every time a table is created, only the remote storage name can be provided. At the same time, when the remote connection information changes, there is no need to rebuild the table, only the corresponding remote storage information needs to be updated.

Design

Need to increase the management function of remote storage, as follows

  1. Support the creation of remote storage information
    1. At present, only S3 is supported, and more types may be supported in the future. Expansion design needs to be considered
  2. Support remote storage information display
  3. Support delete remote storage information

The remote storage information is stored in the catalog, and RemoteStorageManager is added to manage these meta information. When using it, it can be obtained from the manager according to the remote storage name.

When creating an OLAP table, add the remote_storage parameter and use the corresponding remote storage according to the remote_storage_name information. Support to modify the corresponding information through ALTER TABLE.

When the cold data storage medium storage_cold_medium is s3, the remote_storage information must be added. When the time reaches storage_cooldown_time, the cold data will be migrated to the remote S3.

Specific syntax

-- create remote storage
alter system add remote storage "remote_s3"
properties
(
    "type" = "s3",
    "s3_endpoint" = "bj",
    "s3_region" = "bj",
    "s3_root_path" = "/path/to/root",
    "s3_access_key" = "bbb",
    "s3_secret_key" = "aaaa",
    "s3_max_connections" = "50",
    "s3_request_timeout_ms" = "3000",
    "s3_connection_timeout_ms" = "1000"
);

alter system add remote storage "remote_s3_1"
properties
(
    "type" = "s3",
    "s3_endpoint" = "bj",
    "s3_region" = "bj",
    "s3_root_path" = "/path/to/root",
    "s3_access_key" = "bbb",
    "s3_secret_key" = "aaaa",
    "s3_max_connections" = "50",
    "s3_request_timeout_ms" = "3000",
    "s3_connection_timeout_ms" = "1000"
);

-- view remote storage
show remote storage;

-- delete remote storage
alter system drop remote storage "remote_s3";

-- modify remote storage
alter system modify remote storage remote_s3 
properties
(
    "s3_max_connections" = "500"
);


-- Create OLAP table with remote storage information
-- When "storage_cold_medium" = "S3", "remote_storage" = "remote_s3" must exist
CREATE TABLE example_db.table_remote
(
    aa BIGINT,
    bb VARCHAR(64),
    cc VARCHAR(64),
    dd VARCHAR(64)
)
ENGINE=olap
DISTRIBUTED BY HASH (aa) BUCKETS 1
PROPERTIES(
    "replication_num" = "1",
    "storage_medium" = "HDD",
    "storage_cold_medium" = "S3",
    "remote_storage" = "remote_s3",
    "storage_cooldown_time" = "2022-04-01 20:24:00"
);

-- Modify the remote storage information of the OLAP table
alter table table_remote modify partition table_remote set ("storage_medium" = "HDD","storage_cold_medium" = "S3","remote_storage" = "remote_s3_1","storage_cooldown_time" = "2022-04-01 20:24:00");

Use case

No response

Related issues

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions