Skip to content
This repository was archived by the owner on Jan 20, 2026. It is now read-only.

Add validation to prevent data corruption due to SS store misoperation#503

Merged
yzang2019 merged 2 commits intomainfrom
yzang/SEI-7320
May 8, 2024
Merged

Add validation to prevent data corruption due to SS store misoperation#503
yzang2019 merged 2 commits intomainfrom
yzang/SEI-7320

Conversation

@yzang2019
Copy link
Collaborator

@yzang2019 yzang2019 commented May 8, 2024

Describe your changes and provide context

Problem:
Currently if a node did state sync with SS disabled, and then we turn on SS, it will behave wrong due to SS not having initial version for most keys and cause data corruption. And if that node tries to create a snapshot, it will produce a bad snapshot which could cause other nodes to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is suddenly enabled, and SS doesn't have any data yet, SC must also not have any data, otherwise, we will panic during startup.

Testing performed to validate your change

Tested on arctic-1:
image

Enabled SS first, hit panic, and then disable SS again, node was running fine.

@yzang2019 yzang2019 requested review from Kbhat1 and udpatil May 8, 2024 02:15
@codecov-commenter
Copy link

codecov-commenter commented May 8, 2024

Codecov Report

Attention: Patch coverage is 0% with 4 lines in your changes are missing coverage. Please review.

Project coverage is 55.36%. Comparing base (a0b1d4a) to head (2ea7002).
Report is 1 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main     #503   +/-   ##
=======================================
  Coverage   55.36%   55.36%           
=======================================
  Files         631      631           
  Lines       54117    54144   +27     
=======================================
+ Hits        29962    29977   +15     
- Misses      22016    22025    +9     
- Partials     2139     2142    +3     
Files Coverage Δ
storev2/rootmulti/store.go 2.97% <0.00%> (-0.03%) ⬇️

... and 2 files with indirect coverage changes

@yzang2019 yzang2019 merged commit 30bcea9 into main May 8, 2024
@yzang2019 yzang2019 deleted the yzang/SEI-7320 branch May 8, 2024 03:51
yzang2019 added a commit that referenced this pull request Mar 26, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
yzang2019 added a commit that referenced this pull request Mar 26, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Kbhat1 pushed a commit that referenced this pull request Mar 30, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Kbhat1 pushed a commit that referenced this pull request Apr 16, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
yzang2019 added a commit that referenced this pull request Apr 21, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes

Remove noVersioning and add packages
Kbhat1 pushed a commit that referenced this pull request Apr 22, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Kbhat1 pushed a commit that referenced this pull request Apr 23, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Kbhat1 pushed a commit that referenced this pull request May 30, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Kbhat1 pushed a commit that referenced this pull request Jun 9, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Kbhat1 pushed a commit that referenced this pull request Jun 13, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Kbhat1 pushed a commit that referenced this pull request Jun 16, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Kbhat1 pushed a commit that referenced this pull request Jun 20, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Kbhat1 pushed a commit that referenced this pull request Jun 30, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Kbhat1 pushed a commit that referenced this pull request Jul 7, 2025
[SeiDB] Part2:Add io.closer interface for commit multistore (#374)

Problem:
With SeiDB, we need to support closing the database when self
remediation is triggered, otherwise the chain will halt due to not able
to reopen the database.

Changes:
- Add an interface for CMS (commit multistore)
- Add implementation for existing cms
- Add close logic for self remediation to close the commit store and
state store

Tested with local docker chain

[SeiDB] Part3: Adjust Snapshot and Pruning logic for storeV2 (#376)

Changes:
- Adjust and fix pruning and snapshot logic which could break seidb
- Add some logging for snapshot manager

Covered by unit test and local docker cluster

SeiDB part 4

SeiDB part 5

[SeiDB] Fix concurrent map access (#411)

This should fix the concurrent map access for storeV2 root multistore
over (rs.ckvStores). The problem is that it is currently not protected
by a read lock so when other goroutine modify the map, it will throw
panic

[SeiDB] Fix various issues from ottersec audit (#418)

Fix a bunch of minor issues from audit:
- LoadVersionAndUpgrade should not ignore version
- Simplify CacheWrap to avoid wrapping twice
- SS store query is not setting height correctly in the query response
- Add panic logic for bumpversion if it is false in commit
- Use the correct commit info in query

[SeiDB] Fix SS apply changeset version off by 1 (#424)

Problem:
When apply changeset to SS store, currently we are using the
lastCommitInfo version, which is the previous block height. This means
the version is off by 1. The correct way is to use the current block
height for the changeset version
Will add integration test for this test case

SeiDB fixes

Add metrics for storeV2 (#456)

Add two new metrics for StoreV2:
- Add metric to monitor SC commit latency
- Add metric to monitor SS store last commit version

Tested locally and see metrics appear

Fix Panic: Check For Error Before Defer (#482)

- Checks for err before `defer`
- Without this, if there is a err, the node will panic on the `defer`
because `scStore` is `nil`

- Verified on node

Add validation to prevent data corruption due to SS store misoperation (#503)

Problem:
Currently if a node did state sync with SS disabled, and then we turn on
SS, it will behave wrong due to SS not having initial version for most
keys and cause data corruption. And if that node tries to create a
snapshot, it will produce a bad snapshot which could cause other nodes
to app hash.

Solution:
In this PR, we simply add a validation check to make sure if SS is
suddenly enabled, and SS doesn't have any data yet, SC must also not
have any data, otherwise, we will panic during startup.

Tested on arctic-1:
<img width="887" alt="image"
src="https://github.com/sei-protocol/sei-cosmos/assets/50607998/9bc1a67b-7b11-4254-96a2-9e155239bb85">

Enabled SS first, hit panic, and then disable SS again, node was running
fine.

Bump seidb version

SeiDB fixes
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants