-
Notifications
You must be signed in to change notification settings - Fork 1
oceanoverflow/alphadb
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
----------------------------------------------------------------------------------------------------
CONSENSUS:
----------------------------------------------------------------------------------------------------
CAP theorem
In theoretical computer science, the CAP theorem, also named Brewer's theorem after computer scientist Eric Brewer,
states that it is impossible for a distributed data store to simultaneously provide more than two out of the follow-
ing three guarantees:
* Consistency: Every read receives the most recent write or an error
* Availability: Every request receives a (non-error) response - without the guarantee that it contains the most
recent write
* Partition tolerance: The system continues to operate despite an arbitrary number of messages being dropped
(or delayed) by the network between nodes
In particular, the CAP theorem implies that in the presence of a network partition, one has to choose between consi-
stency and availability. Note that consistency as defined in the CAP theorem is quite different from the consistency
guaranteed in ACID database transactions.
Explanation:
No distributed system is safe from network failures, thus network partitioning generally has to be tolerated. In the
presence of a partition, one is then left with two options: consistency or availability. When choosing consistency
over availability, the system will always process the query and try to return the most recent available version of
the information, even if it cannot gurantee it is up to date due to network partitioning.
In the absence of network failure - that is, when the distributed system is running normally - both availability and
consistency can be satisfied.
CAP is frequently misunderstood as if one has to choose to abandon one of the three guarantees at all times. In fact
, the choice is really between consistency and availability only when a network partition or failure happens; at all
other times, no trade-off has to be made.
Database systems designed with traditional ACID guarantees in mind such as RDBMS choose consistency over availability
, whereas system designed around the BASE philosophy, common in the NoSQL movement for example, choose availability
over consistency.
The PACELC theorem builds on CAP by stating that even in the absence of partitioning, anther trade-off between laten-
cy and consistency occurs.
History
According to University of California, Berkeley computer scientist Eric Brewer, the theorem first appeared in autumn
1998. It was published as the CAP principle in 1999 and presented as a conjecture by Brewer at the 2000 Symposium
on Principles of Distributed Computing (PODC). In 2002, Seth Gilbert and Nancy Lynch of MIT published a formal proof
of Brewer's conjecture, rendering it a theorem.
In 2012, Brewer clarified some of his positions, including why the often-used "two out of three" concept can be mis-
leading or misapplied, and the different definition if consistency used in CAP relative to the one used in ACID.
A similar theorem stating the trade-off between consistency and availability in distributed systems was published
by Birman and Friedman in 1996. The result of Birman and Friedman restricted this lower bound to non-commuting ope-
rations.
+--------+
|raft |
|Follower|
+--------+
/ storage \
/ engine! \
+--------+-------------+--------+
|raft | |raft |
|Leader | |Follower|
+--------+ +--------+
^
|
|
|
+------+ |
|client|------------+
+------+
replicate then reply || reply then replicate ?
for read ==> should be reply then replicate
for write ==> should be replicate then reply
______ raft follower1
/
/
client -----> server(s1) -----> state machine (db) -----> raft leader(s2)
\
\______ raft follower2
----------------------------------------------------------------------------------------------------
BUFFER:
----------------------------------------------------------------------------------------------------
+----+----+----+----+----+----+
| | | | | | |
+----+----+----+----+----+----+
| | | | | | |
+----+----+----+----+----+----+
| | | | | | |
+----+----+----+----+----+----+
| | | | | | |
+----+----+----+----+----+----+
| | | | | | |
+----+----+----+----+----+----+
----------------------------------------------------------------------------------------------------
TABLE:
----------------------------------------------------------------------------------------------------
page_id
| +----- slot_id
| base --+ |
v v v 4092B
+----+--------------+------+------+------+------+
| Nt |table_page_hdr| ptr0 | ptr1 | ptr2 | ptr3 |
+----+-+--------+---+--+---+------+------+------+
| ptr4 | ..... | ptrX |<| |
+---+--+--------+---+--+ |
| | | |
| | +------------+ |
| | | |
| +--+--------------+ |
| | | | |
| v v v |
| +--------+-----+------------------+------+
| |>|(tupleX)| ... |> (tuple4) |tuple3|
+------+-+------+-----+---------+--------+------+
|(tuple2)|> (tuple1) |> (tuple0) |
+--------+----------------------+---------------+
largest <-------- smallest
tuple_len[i] = ptr[i] - ptr[i-1]
for example: tuple_len[4] = ptr[3] - ptr[4]
special case: tuple_len[0] = INDEX_PAGE_SIZE - ptr[0]
----------------------------------------------------------------------------------------------------
INDEX:
----------------------------------------------------------------------------------------------------
Disk structure:
platter -> track -> sector (block=4096Bytes)==> addressable <track_no, sector_no> offset
page layout:
degree_: N
page_size_ = (4KB-4B)
degree can also be calculated: N = std::floor( page_size_ / (sizeof(RID) + sizeof(KEY)) )
for [internal] node:
KEY[n-1] is a PSEUDO key, RID<page, slot>, for interval node, just p works, s is not important. <===> RID<p, -1>
RID0 < KEY0 <= RID1 < KEY1 <= RID2 < KEY2 <= RID3 < KEY3 <= ...... < KEY[n-2] <= RID[n-1] < (<<<<<<<<<<KEY[n-1]>>>>>>>>>)p
| | | | |
p{0}<-----+ | | | +------>p{n-1}
| | |
p1<------+ | +------>p{3}
|
v
p{2}
for {leaf} node:
RID[n-1] is the pointer to the next leaf, KEY[n-1] is also a PSEUDO key,
other RIDi points to the record in the table, and KEYi is the index key, here i < [n-1]
(RID0, KEY0) <= (RID1, KEY1) <= (RID2, KEY2) <= (RID3, KEY3) ........... <= (RID[n-1], KEY[n-1]) <= (RID[n-1], KEY[n-1]x) +------------------------->***nextleaf***
| | | | | | |
| | | | | +----------------------+
| | | | v
| | | v (p{n-2},s{n-2})
| | v (p3,s3)
| v (p2,s2)
v (p1,s1)
(p0,s0)
logical view:
+--------------+--------------+--------------+--------------+-------------------------------------+----------------------+
| (RID0, KEY0) | (RID1, KEY1) | (RID2, KEY2) | (RID3, KEY3) | ....................................| (RID[n-1], KEY[n-1]) |
+--------------+--------------+--------------+--------------+-------------------------------------+----------------------+
physical view:
+---+------+------+------+----------+------+------+------+----------+------------------------------------------------------+
| n | RID0 | RID1 |......| RID[n-1] | KEY0 | KEY1 |......| KEY[n-1] | free_space |
+---+------+------+------+----------+------+------+------+----------+------------------------------------------------------+
+--------------+
|p0|k0|p1|k1|p2| internal_node
+--------------+
/ | \
/ | \
/ v \
+--------------+ +--------------+ +--------------+
first --> |r0|k0|r1|k1|n2|-->|r3|k3|r4|k4|n2|-->|r5|k5|r6|k6|n3| --> last (nullptr) leaf_node
+--------------+ +--------------+ +--------------+
| | | | | |
| | | | | |
v v v v v v
*** *** *** *** *** ***
degree = 7 max_node = 6; min_node = 3
1. node->move_half_to(node* receiver) [when splitting]
begin: |*******| end: |****| -> |***| ***|||insert|||***
2. node->move_all_to(node* receiver) [when the node is underflow]
begin: |****| <- |**| end: |*****| ***|||delete|||*** coalesce
3. node->move_first_to_end_of(node* receiver) [when the node is overflow, and left neighbor is not]
begin: |*****| <- |*******| end: |******| <-> |******| ***|||insert|||*** redistribute
4. node->move_last_to_front_of(node* receiver) [same as last one, but this time the right neighbor is not]
begin: |*******| -> |*****| end: |******| <-> |******| ***|||insert|||*** redistribute
----------------------------------------------------------------------------------------------------
SQL:
----------------------------------------------------------------------------------------------------
lex is a scanner generator.
input is a set of regular expressions and associated actions (written in C)
output is a table-driven scanner (lex.yy.c)
flex: an open source implementation of the original UNIX lex utility
lex: semantic analysis
-> splits the input file into tokens
yacc: yet another compiler compiler
-> parses and does semantic processing on the stream of tokens produced by lex
bison: GNU parser parser, upward compatibility with yacc.
lex input example(.l) exl.l
FIRST PART(optional) %{ #include "myscanner.h" %}
%% %%
keyword first!!! "select" select();
pattern action "hello world" printf("GOODBYE\n")
....(regular expressions) : return COLON;
%% %%
THIRD PART int yywrap(void) { return 1; }
yacc input
FIRST PART %{ C declarations; yacc definitions; %}
%% %%
production action statements: statement {printf("statement");}
.... | statement statements {printf("statements");}
;
%% %%
THIRD PART
$ yacc -d sql.y // generate y.tab.h && y.tab.c
$ lex sql.l // process the lex file to generate a scanner (gets saved as lex.yy.c)
$ gcc lex.yy.c y.tab.c -o sqls // compile the scanner and grab main() from the lex library (-ll option)
$ ./a.out // run the scanner taking input from standard input
lex pattern examples
+-----------+-------------------------------------------------------------------------------+
|abc | match the string "abc" |
+-----------+-------------------------------------------------------------------------------+
|[a-zA-Z] | match any lower or uppercase letter |
+-----------+-------------------------------------------------------------------------------+
|dog.*cat | match any string starting with dog, and ending with cat. |
+-----------+-------------------------------------------------------------------------------+
|(ab)+ | match one or more occurrences of "ab" concatenated. |
+-----------+-------------------------------------------------------------------------------+
|[^a-z]+ | matches any string of one or more characters that do not includ lower case a-z|
+-----------+-------------------------------------------------------------------------------+
|[+-]?[0-9]+| match any string of one or more digists with an optional prefix of + or -. |
+-----------+-------------------------------------------------------------------------------+
source code a = b + c * d
|
v
+----------------+ +---+
|Lexical Analyzer|<--------|Lex|<------- patterns
+----------------+ +---+
|
tokens v
+---------------+ +----+
|Syntax Analyzer|<--------|Yacc|<------ grammers (context free)
+---------------+ +----+
|
syntax tree v
/ \
id1 +
/ \
id2 *
| / \
| id3 id4
v
+--------------+
|code generator|
+--------------+
|
generated code v
load id3
mul id4
add id2
store id1
+------------+
mylang.y --> | yacc | --> y.tab.c _
+------------+ \ source
| \ code
| \ |
v +--------+ v
y.tab.h | gcc |--> mylang
| +--------+ |
v / v
+-----------+ / compiled code
mylang.l --> | lex | --> lex.yy.c _/ interpreter output
+-----------+
Supported SQL Queries
=====================
## Select Statements
```sql
SELECT name, city, *
FROM students AS t1 JOIN students AS t2 ON t1.city = t2.city
WHERE t1.grade < 2.0 AND
t2.grade > 2.0 AND
t1.city = 'Frohnau'
ORDER BY t1.grade DESC;
SELECT city, AVG(grade) AS average,
MIN(grade) AS best, MAX(grade) AS worst
FROM students
GROUP BY city;
```
## Data Definition & Modification
**Create Tables**
```sql
CREATE TABLE students (
name TEXT,
student_number INTEGER,
city TEXT,
grade DOUBLE
);
```
**Update and Delete**
```sql
UPDATE students SET name='Max Mustermann' WHERE name = 'Ralf Mustermann';
DELETE FROM students WHERE name = 'Max Mustermann';
```
A compilr or interpreter for a progromming language is often decomposed into two parts:
1. Read the source program and discover its structure.
2. Process this structure, e.g. to generate the target program.
Lex and Yacc can generate program fragments that solve the first task.
The task of discovering the source structure again is decomposed into subtasks:
1. Split the source file into tokens (Lex)
2. Find the hierarchical structure of the program (Yacc).
----------------------------------------------------------------------------------------------------
NETWORK:
----------------------------------------------------------------------------------------------------
server client
+--------+ +--------+
| socket | | socket |
+--------+ +--------+
| |
v |
+--------+ |
| bind | |
+--------+ |
| |
v |
+--------+ |
| listen | |
+--------+ |
| |
v v
+--------+ TCP 3 time +---------+
(ESTABLISHED) SYN+ACK | accept |<-------------| connect | SYN, ACK (ESTABLISHED)
+--------+ handshake +---------+
| |
v v
+--------+ +---------+
| read |<-------------| write |
+--------+ +---------+
+--------+ +---------+
| write |------------->| read |
+--------+ +---------+
| |
v v
+-------+ +-------+
| close | | close |
+-------+ +-------+
+------------+ +---------------+ +------+ +----------------+ +----------------+
| user input |---->| child process | ----> | pipe | ----> | parent process | ----> | server |
+------------+ +---------------+ +------+ +----------------+ +----------------+
^ |
| |
+------------------------+
TCP is a finite state machine, it has 11 state.
SYNseq=x
SYN_SENT |------------------>| LISTEN (listen())
(connect)| SYNseq=y,ACK=x+1 |
ESTABLISHED|<------------------| SYN_RCVD
| ACK=y+1 |
|------------------>| ESTABLISHED
write() | |___________________
| |
| seq=x+1,ACK=y+1 |
|------------------>| read()
| ACK=x+2 |
|<------------------|
| |___________________
| FINseq=x+2ACK=y+1 |
FIN_WAIT_1 |------------------>| CLOSE_WAIT
| ACK=x+3 |
FIN_WAIT_2 |<------------------| LAST_ACK
TIME_WAIT |<------------------| (close())
| FINseq=y+1 |
|------------------>|
| ACK=y+2 |
CLOSED --> SYN_SENT --> ESTABLISHED
LISTEN
SYN_RCVD CLOSE_WAIT LAST_ACK CLOSING FIN_WAIT_1 FIN_WAIT_2 TIME_WAIT
1. synchronize blocking iterative
bottleneck: disk or network I/O
for (;;) {
fd = accept(...);
read(fd, buf, n);
dosomething(buf);
write(fd, buf, n);
close(fd);
}
+-----------+
|application|
+-----------+
|
+-+
| |
| | System call - kernel context switch
+-+ -------------------------------------> +-+
| | | initiate read I/O
| +-+ ---->
| |
| |
| |
| |
| | read response
| data movement from +-+ <----
| kernel space to user space | |
+-+ <------------------------------------- +-+
| | |
| | |
+-+ |
| |
| |
| |
| |
2. multiple process
for (;;) {
fd = accept(...);
ret = fork();
swtich (ret) {
case -1:
do_err_handler();
break;
case 0: // child process
do_handler_fd(fd);
break;
default: // parent process
continue;
}
}
// need signal handling to handle zombie child process
pre-fork: process pool
3. multiple thread
void* thread_entry(void* args) {
int fd = *(int *)args;
do_handler_fd(fd);
}
for (;;) {
fd = accept();
pthread_create(..., thread_entry, &fd);
}
----------------------------------------------------------------------------------------------------
TRANSACTION:
----------------------------------------------------------------------------------------------------
In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set of properties of
database transactions intended to guarantee validity even in the event of errors, power failures, etc.
In the context of database, a sequence of database operations that satisfies the ACID properties (and
these can be perceived as a single logical operation on the data) is called a transaction. For example,
a transfer of funds from one bank account to another, even involving multiple changes such as debiting
one account and crediting another, is a single transaction.
About
distributed relational database
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published