Add group_id to the sys.tasks table#8304
Conversation
| { | ||
| return "TaskStatusPlus{" + | ||
| "id='" + id + '\'' + | ||
| "groupId='" + groupId + '\'' + |
There was a problem hiding this comment.
should this have a , to match the rest of the pattern?
There was a problem hiding this comment.
oops, yes, i'll give you that comma !
jihoonson
left a comment
There was a problem hiding this comment.
@surekhasaharan thank you for adding groupId. Left a comment.
| @Nullable | ||
| private final EntryType task; | ||
| @Nullable | ||
| private final String groupId; |
There was a problem hiding this comment.
This information is duplicate since it's already in task. The caller can get groupId by calling getTask().getGroupId().
There was a problem hiding this comment.
I see, thank you, removed groupId from here.
| for (final TaskStuff taskStuff : tasks.values()) { | ||
| if (taskStuff.getStatus().isRunnable()) { | ||
| TaskInfo t = new TaskInfo( | ||
| TaskInfo t = new TaskInfo<>( |
| return ImmutableList.copyOf( | ||
| handler.getCompletedTaskInfo( | ||
| DateTimes.nowUtc().minus(durationBeforeNow == null ? config.getRecentlyFinishedThreshold() : durationBeforeNow), | ||
| DateTimes.nowUtc() |
| { | ||
| final Future<TaskStatus> taskStatusFuture = tasks.get(taskId); | ||
| final Optional<Task> task = getTaskStorage().getTask(taskId); | ||
| final String groupId = task.isPresent() ? task.orNull().getGroupId() : taskId; |
There was a problem hiding this comment.
The orNull() is misleading since it would allow the subsequent call to getGroupId() to throw an NPE. Change to task.get().getGroupId(), which is also consistent with the preceding task.isPresent().
Also, a comment explaining why the groupId should default to the taskId the caller specified would work well here.
There was a problem hiding this comment.
yeah, it should be task.get().getGroupId(), thanks. I think the default value of groupId doesn't matter here, it can be any value, i think null would be okay as well.
| return new TaskStatusResponse( | ||
| taskId, | ||
| new TaskStatusPlus( | ||
| taskId, |
There was a problem hiding this comment.
Why is taskId passed in as the groupId argument? Would using "groupId" or a GROUP_ID named constant work here?
If groupId is changed to parentId, then a named constant NO_PARENT_ID = "" would work well here.
There was a problem hiding this comment.
I guess it might be fine to use the taskId as the groupId since the default value of groupId is the taskId. But, since TaskMonitor is used to monitor sub tasks of the parallel indexing task, it could be more realistic to use the groupId of the taskId of the supervisor task.
There was a problem hiding this comment.
I used taskId because getTaskStatus only takes a taskId string, instead of a Taskobject, so there was no easy way to get the real groupId without changing the the interface IndexingServiceClient and also as Jihoon mentioned because default is taskId. This particular test works with any value, eg groupId can be any string or null.
There was a problem hiding this comment.
In this particular test, the groupId is always "groupId" as seen here. You can use it if you want.
| new TaskStatusResponse( | ||
| "mytask", | ||
| new TaskStatusPlus( | ||
| "mytask", |
There was a problem hiding this comment.
Similar comment here about id and groupId having the same value and changing it if groupId is renamed to parentId
There was a problem hiding this comment.
here, it does matter, the value of groupId is null in NoopTask, so it gets set to taskId in AbstractTask and this test checks the TaskStatusResponse later, so this will fail if groupId is not set to taskId
| |------|-----|-----| | ||
| |task_id|STRING|Unique task identifier| | ||
| |type|STRING|Task type, for example this value is "index" for indexing tasks. See [tasks-overview](../ingestion/tasks.md)| | ||
| |group_id|STRING|Task group ID for this task, for sub tasks, this value is the parent task's ID| |
There was a problem hiding this comment.
Describe the value if it is not a subtask (looks like the implementation currently uses the value of task_id)
There was a problem hiding this comment.
It depends on the task type i guess, for kafka/kinesis the group_id is a combination of type+datasource. For native index tasks, it's same as task_id. The behavior might be different for other task types. Will adjust the docs saying that it depends on the task type.
| @Nullable | ||
| private final String errorMsg; | ||
| @Nullable | ||
| private final String groupId; |
There was a problem hiding this comment.
Perhaps a more intuitive name for users to understand would be parentId? Renaming it to parentId will also resemble the parent process id concept of most operating systems that many users are familiar with. For non-subtasks, the value would default to the empty string instead of the task id.
There was a problem hiding this comment.
This sounds an interesting idea, but I'm not sure if it would make sense with Kafka/Kinesis indexing tasks. The kafka/kinesis supervisor spawns indexing tasks but the supervisor is executed using a thread in the overlord. The tasks have the same groupId which is datasource name where they ingest data into.
There was a problem hiding this comment.
Ah, I didn't know there's already a notion of task groups. For consistency, we should continue using "group" then.
| { | ||
| final Task subTask = (Task) taskObject; | ||
| try { | ||
| getTaskStorage().insert(subTask, TaskStatus.running(subTask.getId())); |
There was a problem hiding this comment.
It looks like the subtask only starts running after subTask.run() is called below. What cleans up the task running status created here if:
- The task never starts running (task is not ready, so
subTask.run()is not called) - The task starts running but unexpectedly fails during execution
If those scenarios need to be considered, then added tests for them will be useful.
There was a problem hiding this comment.
hmm, good point, I set the task status to failure if any of those condition occurs. Not sure how/where to unit test for those cases though.
There was a problem hiding this comment.
Ah, this is a test class, so my earlier comment can be disregarded
|
|
||
| String json = "[{\n" | ||
| + "\t\"id\": \"index_wikipedia_2018-09-20T22:33:44.911Z\",\n" | ||
| + "\t\"groupId\": \"index_wikipedia_2018-09-20T22:33:44.911Z\",\n" |
There was a problem hiding this comment.
Does it make sense to use a value here that's different from the "id" to confirm the implementation sets the value appropriately (i.e., the implementation does not always have this value equal to "id")?
|
Thanks @ccaominh and @jihoonson for review ! |
|
I do not understand why the coveralls reports decreased coverage after this change |
|
I'm hoping #8374 will fix the issue with the fluctuating coverage reports |
jihoonson
left a comment
There was a problem hiding this comment.
LGTM. The coveralls failure looks not legitimate.
Fixes #8200
Description
This PR adds a
group_idcolumn tosys.taskstable. The tasks API of OverlordResource will now show the group_id of tasks.Added a
groupIdfield toTaskInfoandTaskStatusPlus. The groupId is present in the Task object, now it's exposed via the API's.This PR has: