Skip to content

Latest commit

 

History

History
160 lines (109 loc) · 6 KB

File metadata and controls

160 lines (109 loc) · 6 KB

All the images used in this document are cited from the PATSQL paper. Please refer to the paper for further details.

Prerequisite

What is DSL?

DSL is a computer language designed specifically for solving a particular task or problem.
PATSQL uses DSLs defined by using Window for relational algebra operators such as Selecction and Projection.

Root(
	Sort(
		(
			Selection(
				BaseTable(input_table, [[9] col1:Str, [10] col2:Str, [11] col3:Int, [12] col4:Date, [13] EXTRACT(YEAR _FROM col4):Int, [14] EXTRACT(MONTH _FROM col4):Int, [15] EXTRACT(DAY _FROM col4):Int])
					, ([[9] col1:Str] <> A2)
				)
			, [[9] col1:Str, [11] col3:Int]
			)
		, [9] Asc
	)
)

What is Sketch?

Sketch is program with uninstantiated parts
PATSQL uses a sketch where only the relational algebra operator is instantiated and the table is not instantiated Then iteratively explores Sketch and determine Suitable DSL.

An example of a sketch is shown below. an uninstantiated part is denoted as the symbol '□'

Project(Select(Table(□), □), □)

Algorithm Overview

A sketcher is an Iterator includes the possible sketches from a given input and output. The expandSketch method, which expands the sketch using the following relational algebra operator if no suitable DSL is found, is implemented in sketcher.

Using the Sktcher generated by the method described later, determine the DSL that matches the input and output by the following search method

  1. Assign a table name to each sketch by the AssignTable method.

  2. Call CompleteSketch (SketchFIller) to complete all the remaining □ of the sketch.

  3. If the completion succeeds and program p is found, check again whether p is equal to the output table and return p as the result

  4. If completion does not succeed, call ExpandSketch to generate additional sketches and return to 1

  5. 1 ~ 4 is repeated until p is fount

ImplementationAlgorithm
patsql.synth.RASynthesizer>synthesize()
	Sketcher sketcher = new Sketcher(example.inputs.length, isOutputSorted);

	for (RAOperator s : sketcher) {
		for (RAOperator sketch : assignNamesOnBaseTables(s)) {
			// check the timeout of itself.
			if (Thread.currentThread().isInterrupted()) {
				return null;
			}

			if (!isValidSketch(sketch)) {
				continue;
			}
			if (Debug.isDebugMode) {
				RAUtils.printSketch(sketch);
			}
			SketchFiller filler = new SketchFiller(sketch, example, option);
			for (RAOperator program : filler.fillSketch()) {
				if (!check(program))
					continue;

				// optimize the program returned.
				program = RAOptimizer.optimize(program);
				if (Debug.isDebugMode) {
					long dur = (System.nanoTime() - startDebug) / 1000000;
					Debug.Time.doneSynth(dur);
				}
				return program;
			}
		}
	}
synth_algorithm

Algorithm to generate sketcher

The following rules were used to create the sketcher.

Restrictions on the parent-child relationship between relational algebra operators in a sketch

expand_sketch

The relational algebra operators that can appear in a sketch are restricted as shown in the table above.

For example, if the input sketch is Project(Table(□), □), the allowed operators after Project are Select, Group, Window, Join, LeftJoin, and Table, which are ✔ in the table, so

•Project(Select(Table(□), □), □)
•Project(Group(Table(□), □, □), □)
•Project(Join(Table(□), Table(□), □), □)
etc..

A total of six sketches are returned,

The reason we can limit the number of sketches that come next, as shown in the table above, is that

  • X1: Operators other than Order do not preserve the order of the records, so they only make sense if the order determined by Order is at the beginning of the sketch

  • X2: Excluded because it is considered rare in actual queries

  • X3: This is because sketches containing combinations of these are not in normal form. It seems that if we obtain a program p from a sketch that is not in normal form, we can always obtain a program equivalent to p from a sketch in normal form

Conversion rules for relational algebra

  1. A program that repeats the same operator can always be expressed as a non-repeating program

•Project(Project(T , c1), c2) → Project(T , c2)
•Select(Select(T , p1), p2) → Select(T , p1 ∧ p2)

  1. Rules for moving a Project no top of a Select or Join(The same rule applies to Group, Window, LeftJoin as well as Project)

•Select(Project(T , c), p) → Project(Select(T , p), c)
•Join(Project(T1, c),T2, p) → Project(Join(T1,T2, p), c′)

=> From rules 1 and 2, we know that a sketch can only contain at most one Project.

In addition, PatSQL uses the rule 2 to display the sketch components Order, Distinct, and Project at the top of the sketch in this order

See patsql.synth.sketcher.Sketchers.java for implementation.

Algorithm to complete sketch

The algorithm for sketch completion for each relational algebra operator is shown below, which is called by fillSketch().

sketchCompletionAlgorithm

The function CompleteSketch(s,$T_{in}$, C) is the entry point and calls an auxiliary function called Complete to recursively complete the sketch by propagating the constraint φ.

The algorithm prunes the search space based on the inclusion relation φ of the table.

It returns a program p if it satisfies the algorithm for pruning based on the inclusion relation shown above

See patsql.synth.filler.strategy for the implementation of each operator. And see patsql.synth.filler.SketchFiller.java for the implementation of SketchFiller.