sqldb: add invoice schema and sql queries#7354
Conversation
ziggie1984
left a comment
There was a problem hiding this comment.
Hi, impressive work! :)
Did a first pass-through in preparation of the review club next week.
There was a problem hiding this comment.
question: Could you explain why we are naming the test files with the prefix test instead of the suffix test?
There was a problem hiding this comment.
yes, comes from taproot_assets but as you noticed it would be good to change it to match the Go's good practices
This feedback will be added in #7343
There was a problem hiding this comment.
question: would it be reasonable to only include this file when building for example with the kvdb_sqlite package, this came to my mind because later in the test files you mark the with specific tags like test_db_postgres or test_db_sqlite so was wondering why in the test code but not here?
sputn1ck
left a comment
There was a problem hiding this comment.
Great work @positiveblue . Added some qs and need to research the SEQUENCES stuff a bit
There was a problem hiding this comment.
I'm not sure about the implications of storing with or without the time zone. E.g. assuming I'm running my node in my home in utc+0 and later migrating that to a remote server that is in a different timezone, my gut feeling is that would be a conflict here?
There was a problem hiding this comment.
I think it's ok to use no timezone until we make sure to display all printed timestamps as UTC or alternatively transform to the user's timezone (when printing).
There was a problem hiding this comment.
@sputn1ck your gut feeling is right here: time/timezone == software engineering headache
As Andras said, the idea is to not use timezone (and everything is sitll consistent) until you want to show it to the user
There was a problem hiding this comment.
haha I checked it in our code and we call them hodl invoices
I think it's they real name hold-invoices though...
bhandras
left a comment
There was a problem hiding this comment.
Cool PR, great work so far @positiveblue 🚀 🚀
There was a problem hiding this comment.
I think it's ok to use no timezone until we make sure to display all printed timestamps as UTC or alternatively transform to the user's timezone (when printing).
65b3307 to
3972d08
Compare
0fd6898 to
c9c2e47
Compare
c902354 to
c9d9247
Compare
bhandras
left a comment
There was a problem hiding this comment.
Looking really good to me, awesome work @positiveblue 🥇
ad5df9e to
3297bf7
Compare
|
@bhandras: review reminder |
ziggie1984
left a comment
There was a problem hiding this comment.
Great PR 🤓, learned a lot how to cleanly setup sql schemas with sqlc.
Just had some nits and questions.
Regarding your remarks:
-
I would prefer int64 for the primary key, especially because sql dbs can easily deployed on other hosts and compacted on the fly so I consider the additonal space not as critical. On the other hand a migration later one is not a big deal as well?
-
So regarding the payment information when fetching an invoice, do I understand it right that you propose just adding the latest successful payment rather then all when using MPP or AMP for example?
-
Very much in favour of making the feature set space requirement more efficient but also keep the possibility to have it separate for each invoice just in case we want to be more flexible?
| amount_msat BIGINT NOT NULL, | ||
|
|
||
| -- Delta to use for the time-lock of the CLTV extended to the final hop. | ||
| cltv_delta INTEGER NOT NULL, |
There was a problem hiding this comment.
Q: For Bolt12 invoices there will be a "cltv_delta" included in the blinded path, however there will be no specific final_hop delta as in bolt11 invoices, there might be still a receiver required delta but it will be handled differently I think. So my question is, maybe we should allow a NULL value then?
There was a problem hiding this comment.
so for every update there will be a corresponding invoice_event?
There was a problem hiding this comment.
wondering why the set_id is stored as bytes but he htlc_id we go for text ? Is text more efficient that BLOB?
| -- not have leave this empty, like Keysends. | ||
| payment_request TEXT UNIQUE, | ||
|
|
||
| -- The invoice state. |
There was a problem hiding this comment.
in the next commit you filter for pending_only which only hits for 0 and 3, would it make sense then to define the states in the comments here as well?
There was a problem hiding this comment.
I think we can leave that for the application layer (Go) given that "pending_only"/"pending" is not a state by itself but an invoice is considered pending when its state is "open"/"accepted"
|
|
||
| -- name: SelectAMPInvoicePayments :many | ||
| SELECT aip.*, ip.* | ||
| FROM amp_invoice_payments aip LEFT JOIN invoice_payments ip ON aip.settled_index = ip.id |
There was a problem hiding this comment.
Q: Could you elaborate why we want to show information from the invoice_payments table here? Will those not be equal to the ones in the amp_invoice_payments table?
There was a problem hiding this comment.
Yes, sql generates a struct where fields in both tables get a standard name, fore example:
type SelectAMPInvoicePaymentsRow struct {
SetID []byte
State int16
CreatedAt time.Time
SettledIndex sql.NullInt32
InvoiceID int32
ID sql.NullInt32
SettledAt sql.NullTime
AmountPaidMsat sql.NullInt64
InvoiceID_2 sql.NullInt32
}
We could "avoid" the overlap and specify a .* for one of the tables and only the distinct columns in the other table. The problem is that next time that we add a column to the second table, we need to update this query too. Because it is not much data, I think it's better to leave it as is.
There was a problem hiding this comment.
Sounds good, now I understand thanks.
There was a problem hiding this comment.
Q: We do not use ENUM types here because we would need to redefine them in the go code which makes things more complicated than having everything in one place ?
bhandras
left a comment
There was a problem hiding this comment.
LGTM, pending @ziggie1984's comments.
Great to see this coming together, will be super nice to use SQL finally! 🚀 🚀
sqlc is a tool that generates fully type-safe idiomatic code from SQL. The result is Go code can then used execute the queries in the database. The noraml flow looks like: - The developer write some sql that will update the schema in the database: new tables, indices, etc - The developer updates the set of queries that will use the new schema. - `sqlc` generates type-safe interfaces to those queries. - The developer can then write application code that calls the methods generated by sqlc. The tool configuration needs to live in the repo's root and its name is `sqlc.yaml`. LND will support out of the box sqlite and postgres. The sql code needs to be (almost) the same for both engines, so we cannot use custom functions like `ANY` in postgres. The SQLC config file needs to define what is the target engine, we will set postgres but the generated code can be executed by sqlite too. In some specific cases, we will `match and replace` some sql lines to be sure the table definitions are valid for the targeted engine.
This is the schema for "ordinal" BOLT11 invoices. The invoices table aims to keep an entry for each invoice, BOLT11 or not, that will be supported. Invoice related HTLCs will be stored in a separete table than forwarded htlcs. SQLite does not support `SEQUENCE`. We achieve atomic autoincrementals using primary keys with autoincrement/serial. An invoice `AddIndex` translates to `invoices(id)` while `SettleIndex` is `invoice_payments(id)`.
Set of queries to deal with invoices. A couple of things to take into
account:
- Because the queries are not rewritten at runtime, we cannot have a
generic `INSERT` with different tuples.
- Because the queries are not rewritten at runtime, we cannot build
one query with only the filters that matter for that queries. The
two options are a combinatorial approach (a new query for every
permutation) or a generic query using the pattern
```
SELECT *
FROM table
WHERE (
-- Can be read as:
-- Match the filter 1 value if filter_1 != nil
column_1 >= sqlc.narg('filter_1') OR
sqlc.narg('filter_1') IS NULL
) AND (
column_2 >= sqlc.narg('filter_2') OR
sqlc.narg('filter_2') IS NULL
) ...
```
Schema for AMP invocies. AMP invoices can be paid multiple times and each payment to an AMP invoice is identified by a `set_id`. The A in AMP stands for `Atomic`. All the htlcs belonging to the same AMP payment (share the same set_id) will be resolved at the same time with the same result: settled/canceled. AMP invoices do not have an "invoice preimage". Instead, each htcl has its own hash/preimage. When a new htlc is added the hash for that htlc is attached to it. When all the htlcs of a set_id have been received we are able to compute the preimage for each one of them.
run `make sqlc` All the code in this commit is auto-generated.
322e460 to
8fcb404
Compare
sputn1ck
left a comment
There was a problem hiding this comment.
Awesome work @positiveblue 🥇
Roasbeef
left a comment
There was a problem hiding this comment.
Excellent PR! This series beings a marked improvement for the general persistence situation within lnd.
Left some non-blocking comments that can be followed up on later. Some may end up changing the schema, but we can do so in a breaking manner, as people can't yet use the definitions, etc.
LGTM 🌪️
|
|
||
| -- The encoded payment request for this invoice. Some invoice types may | ||
| -- not have leave this empty, like Keysends. | ||
| payment_request TEXT UNIQUE, |
There was a problem hiding this comment.
I think we have a bit of room to further decouple things here. Eg: in the future a logical invoice will have both a BOLT 11 and BOLT 12 invoice format, along w/ w/e might arise in the future. So we could spin this out into another table that then references this main invoice table.
Similarly, not every invoice will have a 1:1 payment hash to pre-image relationship. Eg: AMP invoices only have a payment_addr as the hash+preimage are generated by the sender.
| state SMALLINT NOT NULL, | ||
|
|
||
| -- The accumulated value of all the htlcs settled for this invoice. | ||
| amount_paid_msat BIGINT NOT NULL, |
There was a problem hiding this comment.
This in theory could be derived via a view that sums over the set of payments to the invoice itself. Another factor here is the set of active HTLCs as well.
| -- The accumulated value of all the htlcs settled for this invoice. | ||
| amount_paid_msat BIGINT NOT NULL, | ||
|
|
||
| -- This field will be true for AMP invoices. |
There was a problem hiding this comment.
So then the general invoice feature table was dropped in favor of this/
| CREATE INDEX IF NOT EXISTS invoice_htlc_custom_records_htlc_id_idx ON invoice_htlc_custom_records(htlc_id); | ||
|
|
||
| -- invoice_payments contains the information of a settled invoice payment. | ||
| CREATE TABLE IF NOT EXISTS invoice_payments ( |
There was a problem hiding this comment.
👍
In the future we can add another event table here for stuff like: cancel, accept, hodl, etc.
| @@ -0,0 +1,54 @@ | |||
| -- amp_invoices_payments | |||
| CREATE TABLE IF NOT EXISTS amp_invoice_payments ( | |||
There was a problem hiding this comment.
Perhaps also nice to have easy access to the amount here as well? Then can just read this out rather than needing to also fetch+sum over the set of invoice HTLCs.
| SELECT aip.*, ip.* | ||
| FROM amp_invoice_payments aip LEFT JOIN invoice_payments ip ON aip.settled_index = ip.id | ||
| WHERE ( | ||
| set_id = sqlc.narg('set_id') OR |
There was a problem hiding this comment.
If you look at the way the queries exist as is, I think what you also want here is based on the payment_addr as well. Then given that, you can fetch them all for a payment addr w/o an intermediate look up for the invoice itself.
There was a problem hiding this comment.
In the future if we want very strict typing here, we can make a set "enum values" table, then point to that itself (all items unique, essentially pre-populated).
You can also do stuff like:
type CHECK(type in ('THIS', THAT'))etc
This is the first PR for #6288
In this PR we define the tables, indexes and queries for our invoice schema.
We aim to support sqlite/postgres out of the box. The sql code in the migration/query files need to work for both engines with the minimum changes and we want to avoid using specific functionalities like postgres
SEQUENCEs.We will use sqlc to generate go code from SQL. The tool uses the config defined in
sqlc.yaml. By default we will set the engine to postgres but the generated code also works for sqlite.The invoice schema can be divided in three parts:
set_id) and the extra data of each htlc.Because the queries are not generated at runtime, we need to write generic ones that allow us to filter/order using different filter parameters that can be set to
NULLwhen we don't want to apply in a specific query.For filtering, we use the pattern
We also can also use
Ordering is a bit more complicated, because we cannot add
ASC/DESCin aCASE-WHEN-THEN-ELSEso we need to unroll the casesA good example of a real query using this patterns is
FilterInvocies.There are a couple of things that I am considering to add, but I would like to know others opinions.
Currently, row ids (INTEGER PRIMARY KEY) are mapped to
int32. That means that we have a counter valid until2,147,483,647which is 1 invoice/second for the next ~68 years. In Go code we useuint64for them.uint64is not a supported sql type, but we may want to use the largest integer value (int64) for this.Invoice payments are stored in their own table. Some invoice types can have a
one invoice -> many peymentsso aJOINmake sense in this case. However, it could be interesting to add information about the "latest" payment in the invoice table. In that way we would fetch the invoice and its payment information in one trip to the db, whenever it's possible.
Most of the invoices will have the same feature set, based on their type. Maybe we could add an "features_set_enum" instead of storing them for each invoice. That would also gives us the flexibility of having a different feature set for the same kind of invoice in case we add it only to new invoices.
Here is a diagram of the proposed schema.

NOTE: the structure of this PR has changed so some of the old comments apply now to other PRs. This one is only about the SQL schema.