Avoid write failures if metrics mode is invalid#301
Conversation
|
@aokolnychyi, could you review this? |
aokolnychyi
left a comment
There was a problem hiding this comment.
I think it is a great idea not to fail jobs if the metrics config is invalid. I would also handle invalid default modes and add a test (maybe to TestMetricsModes).
| public static MetricsConfig fromProperties(Map<String, String> props) { | ||
| MetricsConfig spec = new MetricsConfig(); | ||
| String defaultModeAsString = props.getOrDefault(DEFAULT_WRITE_METRICS_MODE, DEFAULT_WRITE_METRICS_MODE_DEFAULT); | ||
| spec.defaultMode = MetricsModes.fromString(defaultModeAsString); |
There was a problem hiding this comment.
Will we fail jobs if the default mode is invalid?
Will it make sense to fallback DEFAULT_WRITE_METRICS_MODE_DEFAULT?
There was a problem hiding this comment.
Good idea, we should wrap that in a try/catch as well.
|
@aokolnychyi, could you take another look? |
| public static final String METADATA_COMPRESSION = "write.metadata.compression-codec"; | ||
| public static final String METADATA_COMPRESSION_DEFAULT = "none"; | ||
|
|
||
| public static final String METRICS_MODE_COLUMN_CONF_PREFIX = "write.metadata.metrics.column."; |
There was a problem hiding this comment.
+1 on this. Do we want to have it as WRITE_METRICS_MODE_COLUMN_CONF_PREFIX to be consistent with defaults? Is there a possibility we will have READ_METRICS_MODE_COLUMN_CONF_PREFIX? Not sure.
There was a problem hiding this comment.
I think this is fine.
* Add argument validation to HadoopTables#create (#298) * Install source JAR when running install target (#310) * Add projectStrict for Dates and Timestamps (#283) * Correctly publish artifacts on JitPack (#321) The Gradle install target produces invalid POM files that are missing the dependencyManagement section and versions for some dependencies. Instead, we directly tell JitPack to run the correct Gradle target. * Add build info to README.md (#304) * Convert Iceberg time type to Hive string type (#325) * Add overwrite option to write builders (#318) * Fix out of order Pig partition fields (#326) * Add mapping to Iceberg for external name-based schemas (#338) * Site: Fix broken link to Iceberg API (#333) * Add forTable method for Avro WriteBuilder (#322) * Remove multiple literal strings check rule for scala (#335) * Fix invalid javadoc url in README.md (#336) * Use UnicodeUtil.truncateString for Truncate transform. (#340) This truncates by unicode codepoint instead of Java chars. * Refactor metrics tests for reuse (#331) * Spark: Add support for write-audit-publish workflows (#342) * Avoid write failures if metrics mode is invalid (#301) * Fix truncateStringMax in UnicodeUtil (#334) Fixes #328, fixes #329. Index to codePointAt should be the offset calculated by code points * [Vectorization] Added batch sizing, switched to BufferAllocator, other minor style fixes.
This updates the
MetricsConfigclass to catch exceptions thrown byMetricsMode.fromString. The intent is to avoid failing write jobs when a metrics mode is invalid, because users may make changes to a table while a pipeline that writes to it is deployed and running. A live pipeline should not fail because of a typo in table tuning settings.