Skip to content

Critical bug: FE image file maybe write wrong #1964

@kangkaisen

Description

@kangkaisen

Describe the bug
Yesterday, I found the FE image file in our prod env is wrong when I restart one OBSERVER node. The FE image file lost one database meta!!!

I repaired the FE image by dump the in memory meta.

Why the disk image file is broken but the dump image file is right?

Because checkpoint create disk image file use CHECKPOINT Catalog instance, dump image file use SingletonHolder.INSTANCE.

    public long saveDb(DataOutputStream dos, long checksum) throws IOException {
        int dbCount = idToDb.size() - nameToCluster.keySet().size();

I ensure the reason is idToDb lost one database.

I think there should be a concurrent issue or the Checkpoint process has some bug.

I will add some log to track this issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/fixCategorizes issue or PR as related to a bug.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions