Enhance orc-extensions - use orc file schema#7282
Enhance orc-extensions - use orc file schema#7282es1220 wants to merge 1 commit intoapache:masterfrom
Conversation
|
Oops, it looks like we have duplicated some effort (see proposal #7134 and associated PR #7138), but at least it looks like we have the same goal in mind, eliminating usage of I am no doubt 100% biased on this, but I like my approach a bit better because it supports |
|
Your approach works well in my case. Thanks reply. I close my PR #7282. |
orc-extensionsuses custom structtypeString. (user configuration or druid parser auto making)typeStringis an unstable and has the potential to make a mistake. (such as column order, type ..)So, I create
DruidOrcNewInputFormatanddruid_orcparser type.Now, if you change only the
inputFormatand parsertype, you can easily ingest the orc file like aparquet-extensionswithout anytypeStringerrors.DruidOrcNewInputFormatOrcNewInputFormatDruidOrcRecordReaderand store file schemaDruidOrcRecordReaderOrcStructtoMap<String, Object>by stored file schema.(This has moved the existing process in
OrcHadoopInputRowParser.)DruidOrcHadoopInputRowParserMaptoMapBasedInputRow.