Cleanup serialiazation of TaskReportMap#16217
Conversation
Co-authored-by: Abhishek Radhakrishnan <abhishek.rb19@gmail.com>
…nto cleanup_build_task_report
| } | ||
|
|
||
| @Test | ||
| public void testWriteReportMapToStringAndRead() throws Exception |
There was a problem hiding this comment.
Could we also add a test to verify that a serialized old type task report Map<String, TaskReport> deserializes correctly into the new type TaskReport.ReportMap? Should address any upgrade concerns.
Ditto for the reverse roundtrip for a downgrade scenario.
There was a problem hiding this comment.
Sure, that makes sense.
| * a TaskReport is serialized without the type information and cannot be | ||
| * deserialized back into a concrete implementation. | ||
| */ | ||
| class ReportMap extends LinkedHashMap<String, TaskReport> |
There was a problem hiding this comment.
Are there any tests that verify the reports are indeed ordered since we rely on a LinkedHashMap? Just looking at the callers of buildTaskReports(), I don't seem to find any.
There was a problem hiding this comment.
No, I can add a test to verify the order. Although, I don't see any actual task writing a report map that contains multiple entries. Also not sure why the order was considered to be important in the first place, its json anyway.
| @Override | ||
| public ListenableFuture<Map<String, Object>> taskReportAsMap(String taskId) | ||
| { | ||
| return Futures.immediateFuture(null); |
There was a problem hiding this comment.
Should this call getLiveReportsForTask(taskId)?
There was a problem hiding this comment.
It was doing that originally but not needed right now as I have added the other method getLiveReportsForTask() just below this one.
This is anyway used only in the tests and I plan to fix it back up once I replace the Map<String, Object> in the OverlordClient with TaskReport.ReportMap.
| return Futures.immediateFuture(null); | ||
| } | ||
|
|
||
| public TaskReport.ReportMap getLiveReportsForTask(String taskId) |
There was a problem hiding this comment.
| public TaskReport.ReportMap getLiveReportsForTask(String taskId) | |
| protected TaskReport.ReportMap getLiveReportsForTask(String taskId) |
| @@ -546,12 +548,17 @@ public ListenableFuture<Void> runTask(String taskId, Object taskObject) | |||
|
|
|||
| @Override | |||
| public ListenableFuture<Map<String, Object>> taskReportAsMap(String taskId) | |||
There was a problem hiding this comment.
Can taskReportAsMap() now return the concrete type TaskReport.ReportMap instead of Map<String, Object>?
There was a problem hiding this comment.
Yes, I have that change in a follow up PR. Didn't do it here as it requires moving all the TaskReport related classes to the druid-processing module, so that OverlordClient can use it.
|
Thanks a lot for the review, @abhishekrb19 ! |
abhishekrb19
left a comment
There was a problem hiding this comment.
LGTM, thanks! I'm ok with doing the suggestions in a follow-up 👍
Follow up to #16217 Changes: - Update `OverlordClient.getReportAsMap()` to return `TaskReport.ReportMap` - Move the following classes to `org.apache.druid.indexer.report` in the `druid-processing` module - `TaskReport` - `KillTaskReport` - `IngestionStatsAndErrorsTaskReport` - `TaskContextReport` - `TaskReportFileWriter` - `SingleFileTaskReportFileWriter` - `TaskReportSerdeTest` - Remove `MsqOverlordResourceTestClient` as it had only one method which is already present in `OverlordResourceTestClient` itself
Issue
While serializing a
Mapor even aListcontainingTaskReportobjects, thetypeinformation is lost. Thus, the object cannot be serialized back.This is a known issue with Jackson.
Existing solution in Druid
The serialization of a
Map<String, TaskReport>was originally fixed in #12938.The way this has been tackled in the Druid code till now is:
SingleFileTaskReportFileWriter.writeReportToStream()to write out eachTaskReportobject one by oneTaskReportobjectProposed solution
Add a new
ReportMapclass.Changes
TaskReport.ReportMapTaskReport.buildTaskReports()return the new classMap<String, TaskReport>withTaskReport.ReportMapto ensurethat we always use this class for serialization of reports
AbstractBatchIndexTask.buildLiveIngestionStatsReport()to reduce duplication and hard-coding of serializable field names.Important classes
TaskReportAbstractBatchIndexTaskSingleFileTaskReportFileWriterParallelIndexSupervisorTaskSinglePhaseSubTaskIndexTaskRolling upgrade concerns
None