Refactor alter job #1695

morningman · 2019-08-23T13:28:44Z

EmmyMiao87 · 2019-08-28T06:49:37Z

docs/documentation/cn/sql-reference/sql-statements/Data Manipulation/BROKER LOAD.md

        exec_mem_limit:   设置导入使用的内存上限。默认为2G，单位字节。这里是指单个 BE 节点的内存上限。
                          一个导入可能分布于多个BE。我们假设 1GB 数据在单个节点处理需要最大5GB内存。那么假设1GB文件分布在2个节点处理，那么理论上，每个节点需要内存为2.5GB。则该参数可以设置为 2684354560，即2.5GB
-	strict mode：     是否对数据进行严格限制。默认为true。
+	    strict mode：     是否对数据进行严格限制。默认为true。


对齐错了

EmmyMiao87 · 2019-08-28T06:59:43Z

fe/src/main/java/org/apache/doris/load/loadv2/BrokerLoadJob.java

+                    stmt.getBrokerDesc(), originStmt);
            brokerLoadJob.setJobProperties(stmt.getProperties());
-            brokerLoadJob.setDataSourceInfo(db, stmt.getDataDescriptions());
+            brokerLoadJob.checkAndDataSourceInfo(db, stmt.getDataDescriptions());


checkAnd ?? Maybe checkAndCreate is better.

change to checkAndSetDataSourceInfo

EmmyMiao87 · 2019-08-28T07:04:12Z

fe/src/main/java/org/apache/doris/load/loadv2/BrokerLoadJob.java

+                txnState.addTableIndexes(table);
+            }
+            // submit all tasks together
+            for (LoadTask loadTask : idToTasks.values()) {


There are some finished task in idToTasks which should not be submitted.

EmmyMiao87 · 2019-08-28T07:10:08Z

fe/src/main/java/org/apache/doris/load/BrokerFileGroup.java

    private boolean isNegative;
    private List<Long> partitionIds;
-    // this is a compatible param which only happens before the function of broker has been supported.
    private List<String> fileFieldNames;


Is fileFieldNames used by other class ?

yiguolei · 2019-08-29T08:38:29Z

be/src/olap/schema_change.cpp


+    // _validate_alter_result should be outside the above while loop.
+    // to avoid requiring the header lock twice.
+    res = _validate_alter_result(new_tablet, request);


If the previous step failed, this step will override the result

Add a res check before this

yiguolei · 2019-08-29T09:02:47Z

fe/src/main/java/org/apache/doris/alter/RollupHandler.java

-        }
-        int rollupSchemaHash = Util.schemaHash(schemaVersion, rollupSchema, olapTable.getCopiedBfColumns(),
+        // get rollup schema hash
+        int rollupSchemaHash = Util.schemaHash(0 /* init schema version */, rollupSchema, olapTable.getCopiedBfColumns(),


Not use schema hash 0, because we do not know whether be is used this

this 0 is schema version, not schema hash. schema hash is calculated inside this function

yiguolei · 2019-08-29T10:47:24Z

fe/src/main/java/org/apache/doris/transaction/GlobalTransactionMgr.java

-                    rollingUpIndex = rollupJob.getRollupIndex(partition.getId());
+
+                List<MaterializedIndex> allIndices = null;
+                if (transactionState.getLoadedTblIndexes().isEmpty()) {


Only stream load pu loaded indices, but dpp load does not, so that dpp load will fail during schema change process.

yiguolei · 2019-08-29T10:55:18Z

fe/src/main/java/org/apache/doris/transaction/GlobalTransactionMgr.java

                if (entry.getKey() <= endTransactionId) {
-                    LOG.debug("find a running txn with txn_id={}, less than schema change txn_id {}", 
-                            entry.getKey(), endTransactionId);
+                    LOG.info("find a running txn with txn_id={} on db: {}, less than watermark txn_id {}",


should check at tablet level not at db level

This PR is already too big, so I will improve this at another PR. I will create an issue for it #1724

fe/src/main/java/org/apache/doris/alter/RollupJobV2.java

yiguolei

Some comments

imay · 2019-08-30T02:24:05Z

fe/src/main/java/org/apache/doris/planner/StreamLoadScanNode.java

+        // eg:
+        // base schema is (A, B, C), and B is under schema change, so there will be a shadow column: '__doris_shadow_B'
+        // So the final column mapping should looks like: (A, B, C, __doris_shadow_B = B);
+        List<Column> fullSchema = dstTable.getFullSchema();


There may be problem when input column mapping is (tmp_a, tmp_b, tmp_c, a = tmp_a, b = tmp_b, c = tmp_c). It's better to add shadow column after all exprs are analyzed.

I changed, __shadow_column will be mapped to what its base column mapping.
In your example, is will be __shadow_column = tmp_x

imay · 2019-08-30T02:30:48Z

fe/src/main/java/org/apache/doris/transaction/TransactionState.java

        this.txnCommitAttachment = txnCommitAttachment;
    }

+    public void addTableIndexes(OlapTable table) {


Does this function can be called by multi threads? If so you should make this function thread-safe, otherwise you should add some comments for it.

This will only be called before transaction running. Only on thread can access it.
I will add comment here

imay · 2019-08-30T02:33:48Z

fe/src/main/java/org/apache/doris/task/StreamLoadTask.java

+        return columnExprDescs;
+    }
+
+    public void addColumnExprDesc(ImportColumnDesc columnExprDesc) {


Better to comment who will use this

It is removed

imay · 2019-08-30T03:30:09Z

fe/src/main/java/org/apache/doris/load/loadv2/LoadJob.java

+
+            if (properties.containsKey(LoadStmt.TIMEZONE)) {
+                timezone = properties.get(LoadStmt.TIMEZONE);
+            }


we can get it from session variables if properties don't have

imay · 2019-08-30T06:18:44Z

fe/src/main/java/org/apache/doris/alter/RollupJobV2.java

+        super(JobType.ROLLUP);
+    }
+
+    public void addTabletIdMap(long partitionId, long rollupTabletId, long baseTabletId) {


does it need thread safe?

No need, it only be accessed when job is creating.

imay · 2019-08-30T06:33:18Z

fe/src/main/java/org/apache/doris/analysis/DataDescription.java

+     * For hadoop load, this param is also used to persistence.
+     * The function in this param is copied from 'parsedColumnExprList'
+     */
+    private Map<String, Pair<String, List<String>>> columnToHadoopFunction = Maps.newHashMap();


Suggested change

private Map<String, Pair<String, List<String>>> columnToHadoopFunction = Maps.newHashMap();

private Map<String, Pair<String, List<String>>> columnToHadoopFunction = Maps.newTreeMap(String.CASE_INSENSITIVE_ORDER);

EmmyMiao87 · 2019-09-01T12:21:16Z

fe/src/main/java/org/apache/doris/analysis/DataDescription.java

+            "default_value",
+            "md5sum",
+            "replace_value",
+            "now", "hll_hash",


per function per line？

EmmyMiao87 · 2019-09-01T12:39:28Z

fe/src/main/java/org/apache/doris/planner/BrokerScanNode.java

-
-        TBrokerScanRangeParams params = context.params;
-        // there are no columns transform
+        List<String> sourceFileColumns = context.fileGroup.getFileFieldNames();


Why don't you analyze the sourceFileColumns and pathColumns in DataDescription ? The originColumnNameToExprList is the finally columns expr including source file columns, path columns, and column expr.

EmmyMiao87 · 2019-09-01T12:42:38Z

fe/src/main/java/org/apache/doris/load/Load.java

-            }
+            List<Column> baseSchema = table.getBaseSchema();
+            // fill the column info if user does not specify them
+            dataDescription.fillColumnInfoIfNotSpecified(baseSchema);


The columns and parsedColumnExprList include the same columns.

No, In fillColumnInfoIfNotSpecified(), I just want to fill the fields which user did not fill. After that, these fields should look like user fill them, not some "after-analyzed" results.

imay · 2019-09-12T03:27:39Z

fe/src/main/java/org/apache/doris/planner/StreamLoadScanNode.java

-                slotDescByName.put(column.getName(), slotDesc);
-            }
-        }
+        boolean specifyFileFieldNames = streamLoadTask.getColumnExprDescs().stream().anyMatch(p -> p.isColumn());


why not put this in Load.initColumns?

imay · 2019-09-12T05:41:03Z

fe/src/main/java/org/apache/doris/load/Load.java

+        // base schema is (A, B, C), and B is under schema change, so there will be a shadow column: '__doris_shadow_B'
+        // So the final column mapping should looks like: (A, B, C, __doris_shadow_B = substitute(B));
+        for (Column column : tbl.getFullSchema()) {
+            if (column.isNameWithPrefix(SchemaChangeHandler.SHADOW_NAME_PRFIX)) {


Suggested change

if (column.isNameWithPrefix(SchemaChangeHandler.SHADOW_NAME_PRFIX)) {

if (!column.isNameWithPrefix(SchemaChangeHandler.SHADOW_NAME_PRFIX)) {

continue;

imay · 2019-09-12T05:50:01Z

fe/src/main/java/org/apache/doris/alter/AlterJobV2.java

+import java.util.List;
+
+/*
+ * Author: Chenmingyu


Remove Author tag

imay

LGTM

morningman force-pushed the refactor_alter_job branch from 3edd147 to 56478b5 Compare August 27, 2019 14:49

EmmyMiao87 reviewed Aug 28, 2019

View reviewed changes

morningman force-pushed the refactor_alter_job branch from 10894a2 to db504f5 Compare August 28, 2019 12:17

yiguolei reviewed Aug 29, 2019

View reviewed changes

fe/src/main/java/org/apache/doris/alter/RollupJobV2.java Show resolved Hide resolved

yiguolei previously approved these changes Aug 29, 2019

View reviewed changes

morningman mentioned this pull request Aug 29, 2019

Make bitmap_union agg column support insert into and broker load #1721

Merged

morningman dismissed yiguolei’s stale review via 4effda0 August 29, 2019 13:47

morningman force-pushed the refactor_alter_job branch from 2b52e67 to 4effda0 Compare August 29, 2019 13:47

imay requested changes Aug 30, 2019

View reviewed changes

morningman force-pushed the refactor_alter_job branch 2 times, most recently from 38d255a to 226548c Compare August 30, 2019 14:41

EmmyMiao87 reviewed Sep 1, 2019

View reviewed changes

morningman force-pushed the refactor_alter_job branch 9 times, most recently from fef75e6 to 80a1174 Compare September 10, 2019 01:33

morningman force-pushed the refactor_alter_job branch 2 times, most recently from 158a3ca to f7c34bd Compare September 11, 2019 02:03

morningman added 3 commits September 11, 2019 18:04

first commit

0797316

second commit

5cd0bb7

third commit

1e2ed4d

morningman added 21 commits September 11, 2019 18:04

fix ut

29f69dd

fix ut2

7cbed35

rebase master

d836723

fix by miaoling review

8509701

fix compile bug

d8d46e3

fix ALTER status bug

bace253

fix by review yiguolei

cb5656d

change FeConstants metaversion to 61

18a847c

fix init column bug

02e9b8d

fix not drop old tablet bug

ac6f411

fix unknown column name

da2ac3c

fix drop origin index NPE

0a2f632

fix show job proc error

657492b

add more detail log

f6daaa0

fix meta version

cf1f6b4

fix empty rollup rowset bug

cda990a

cancel the alter job if table is dropped

948683c

fix meta version of partition to 61

8f63745

fix meta version to 61

38c138e

fix replay waiting txn bug

1047adf

allow drop rollup if table is unstable

644d159

morningman force-pushed the refactor_alter_job branch from 8165d50 to 644d159 Compare September 11, 2019 10:04

fix review by zhaochun

3497b1c

imay reviewed Sep 12, 2019

View reviewed changes

fix review by zc2

40d78c0

imay approved these changes Sep 12, 2019

View reviewed changes

imay merged commit 9aa2045 into apache:master Sep 12, 2019

kangkaisen mentioned this pull request Sep 17, 2019

Schema change job could not finished #1393

Closed

imay mentioned this pull request Sep 26, 2019

Release Notes 0.11.0 #1891

Closed

swjtu-zhanglei pushed a commit to swjtu-zhanglei/incubator-doris that referenced this pull request Jul 25, 2023

(selectdb-cloud) Retain the old update_ak_sk interface (apache#1695)

efdb60b

	private Map<String, Pair<String, List<String>>> columnToHadoopFunction = Maps.newHashMap();
	private Map<String, Pair<String, List<String>>> columnToHadoopFunction = Maps.newTreeMap(String.CASE_INSENSITIVE_ORDER);

	if (column.isNameWithPrefix(SchemaChangeHandler.SHADOW_NAME_PRFIX)) {
	if (!column.isNameWithPrefix(SchemaChangeHandler.SHADOW_NAME_PRFIX)) {
	continue;

Refactor alter job #1695

Refactor alter job #1695

Uh oh!

Conversation

morningman commented Aug 23, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yiguolei left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

morningman Aug 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

morningman Aug 30, 2019 •

edited

Loading