diff --git a/contrib/format-httpd/README.md b/contrib/format-httpd/README.md new file mode 100644 index 00000000000..4d45c0ac390 --- /dev/null +++ b/contrib/format-httpd/README.md @@ -0,0 +1,75 @@ +# Web Server Log Format Plugin (HTTPD) +This plugin enables Drill to read and query httpd (Apache Web Server) and nginx access logs natively. This plugin uses the work by [Niels Basjes](https://github.com/nielsbasjes +) which is available here: https://github.com/nielsbasjes/logparser. + +## Configuration +There are five fields which you can to configure in order for Drill to read web server logs. In general the defaults should be fine, however the fields are: +* **`logFormat`**: The log format string is the format string found in your web server configuration. If you have multiple logFormats then you can add all of them in this + single parameter separated by a newline (`\n`). The parser will automatically select the first matching format. +* **`timestampFormat`**: The format of time stamps in your log files. This setting is optional and is almost never needed. +* **`extensions`**: The file extension of your web server logs. Defaults to `httpd`. +* **`maxErrors`**: Sets the plugin error tolerance. When set to any value less than `0`, Drill will ignore all errors. If unspecified then maxErrors is 0 which will cause the query to fail on the first error. +* **`flattenWildcards`**: There are a few variables which Drill extracts into maps. Defaults to `false`. + + +```json +"httpd" : { + "type" : "httpd", + "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"", + "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ", + "maxErrors": 0, + "flattenWildcards": false +} +``` + +## Data Model +The fields which Drill will return from HTTPD access logs should be fairly self explanatory and should all be mapped to correct data types. For instance, `TIMESTAMP` fields are + all Drill `TIMESTAMPS` and so forth. + +### Nested Columns +The HTTPD parser can produce a few columns of nested data. For instance, the various `query_string` columns are parsed into Drill maps so that if you want to look for a specific + field, you can do so. + + Drill allows you to directly access maps in with the format of: + ``` +.. +``` + One note is that in order to access a map, you must assign an alias to your table as shown below: + ```sql +SELECT mylogs.`request_firstline_uri_query_$`.`username` AS username +FROM dfs.test.`logfile.httpd` AS mylogs + +``` +In this example, we assign an alias of `mylogs` to the table, the column name is `request_firstline_uri_query_$` and then the individual field within that mapping is `username +`. This particular example enables you to analyze items in query strings. + +### Flattening Maps +In the event that you have a map field that you would like broken into columns rather than getting the nested fields, you can set the `flattenWildcards` option to `true` and +Drill will create columns for these fields. For example if you have a URI Query option called `username`. If you selected the `flattedWildcards` option, Drill will create a +field called `request_firstline_uri_query_username`. + +** Note that underscores in the field name are replaced with double underscores ** + + ## Useful Functions + If you are using Drill to analyze web access logs, there are a few other useful functions which you should know about: + + * `parse_url()`: This function accepts a URL as an argument and returns a map of the URL's protocol, authority, host, and path. + * `parse_query()`: This function accepts a query string and returns a key/value pairing of the variables submitted in the request. + * `parse_user_agent()`, `parse_user_agent( , )`: The function parse_user_agent() takes a user agent string as an argument and + returns a map of the available fields. Note that not every field will be present in every user agent string. + [Complete Docs Here](https://github.com/apache/drill/tree/master/contrib/udfs#user-agent-functions) + + +## Implicit Columns +Data queried by this plugin will return two implicit columns: + +* **`_raw`**: This returns the raw, unparsed log line +* **`_matched`**: Returns `true` or `false` depending on whether the line matched the config string. + +Thus, if you wanted to see which lines in your log file were not matching the config, you could use the following query: + +```sql +SELECT _raw +FROM +WHERE _matched = false +``` \ No newline at end of file diff --git a/contrib/format-httpd/pom.xml b/contrib/format-httpd/pom.xml new file mode 100644 index 00000000000..50ae6185b7b --- /dev/null +++ b/contrib/format-httpd/pom.xml @@ -0,0 +1,100 @@ + + + + 4.0.0 + + drill-contrib-parent + org.apache.drill.contrib + 1.19.0-SNAPSHOT + + drill-format-httpd + contrib/httpd-format-plugin + + + + org.apache.drill.exec + drill-java-exec + ${project.version} + + + + nl.basjes.parse.httpdlog + httpdlog-parser + 5.6 + + + commons-codec + commons-codec + + + commons-logging + commons-logging + + + + + nl.basjes.parse.useragent + yauaa-logparser + 5.19 + + + + org.apache.drill.exec + drill-java-exec + tests + ${project.version} + test + + + org.apache.drill + drill-common + tests + ${project.version} + test + + + + + + maven-resources-plugin + + + copy-java-sources + process-sources + + copy-resources + + + ${basedir}/target/classes/org/apache/drill/exec/store/httpd + + + + src/main/java/org/apache/drill/exec/store/httpd + true + + + + + + + + + diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogBatchReader.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogBatchReader.java new file mode 100644 index 00000000000..07f14393856 --- /dev/null +++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogBatchReader.java @@ -0,0 +1,187 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.httpd; + +import org.apache.drill.common.exceptions.CustomErrorContext; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator; +import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.MetadataUtils; +import org.apache.drill.exec.store.dfs.easy.EasySubScan; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.shaded.guava.com.google.common.base.Charsets; +import org.apache.hadoop.mapred.FileSplit; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.BufferedReader; +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; + +public class HttpdLogBatchReader implements ManagedReader { + + private static final Logger logger = LoggerFactory.getLogger(HttpdLogBatchReader.class); + public static final String RAW_LINE_COL_NAME = "_raw"; + public static final String MATCHED_COL_NAME = "_matched"; + private final HttpdLogFormatConfig formatConfig; + private final int maxRecords; + private final EasySubScan scan; + private HttpdParser parser; + private FileSplit split; + private InputStream fsStream; + private RowSetLoader rowWriter; + private BufferedReader reader; + private int lineNumber; + private CustomErrorContext errorContext; + private ScalarWriter rawLineWriter; + private ScalarWriter matchedWriter; + private int errorCount; + + + public HttpdLogBatchReader(HttpdLogFormatConfig formatConfig, int maxRecords, EasySubScan scan) { + this.formatConfig = formatConfig; + this.maxRecords = maxRecords; + this.scan = scan; + } + + @Override + public boolean open(FileSchemaNegotiator negotiator) { + // Open the input stream to the log file + openFile(negotiator); + errorContext = negotiator.parentErrorContext(); + try { + parser = new HttpdParser(formatConfig.getLogFormat(), formatConfig.getTimestampFormat(), formatConfig.getFlattenWildcards(), scan); + negotiator.tableSchema(parser.setupParser(), false); + } catch (Exception e) { + throw UserException.dataReadError(e) + .message("Error opening HTTPD file: " + e.getMessage()) + .addContext(errorContext) + .build(logger); + } + + ResultSetLoader loader = negotiator.build(); + rowWriter = loader.writer(); + parser.addFieldsToParser(rowWriter); + rawLineWriter = addImplicitColumn(RAW_LINE_COL_NAME, MinorType.VARCHAR); + matchedWriter = addImplicitColumn(MATCHED_COL_NAME, MinorType.BIT); + return true; + } + + @Override + public boolean next() { + while (!rowWriter.isFull()) { + if (!nextLine(rowWriter)) { + return false; + } + } + return true; + } + + private boolean nextLine(RowSetLoader rowWriter) { + String line; + + // Check if the limit has been reached + if (rowWriter.limitReached(maxRecords)) { + return false; + } + + try { + line = reader.readLine(); + if (line == null) { + return false; + } else if (line.isEmpty()) { + return true; + } + } catch (Exception e) { + throw UserException.dataReadError(e) + .message("Error reading HTTPD file at line number %d", lineNumber) + .addContext(e.getMessage()) + .addContext(errorContext) + .build(logger); + } + // Start the row + rowWriter.start(); + + try { + parser.parse(line); + matchedWriter.setBoolean(true); + } catch (Exception e) { + errorCount++; + if (errorCount >= formatConfig.getMaxErrors()) { + throw UserException.dataReadError() + .message("Error reading HTTPD file at line number %d", lineNumber) + .addContext(e.getMessage()) + .addContext(errorContext) + .build(logger); + } else { + matchedWriter.setBoolean(false); + } + } + + // Write raw line + rawLineWriter.setString(line); + + // Finish the row + rowWriter.save(); + lineNumber++; + + return true; + } + + @Override + public void close() { + if (fsStream == null) { + return; + } + try { + fsStream.close(); + } catch (IOException e) { + logger.warn("Error when closing HTTPD file: {} {}", split.getPath().toString(), e.getMessage()); + } + fsStream = null; + } + + private void openFile(FileSchemaNegotiator negotiator) { + split = negotiator.split(); + try { + fsStream = negotiator.fileSystem().openPossiblyCompressedStream(split.getPath()); + } catch (Exception e) { + throw UserException + .dataReadError(e) + .message("Failed to open open input file: %s", split.getPath().toString()) + .addContext(e.getMessage()) + .build(logger); + } + reader = new BufferedReader(new InputStreamReader(fsStream, Charsets.UTF_8)); + } + + private ScalarWriter addImplicitColumn(String colName, MinorType type) { + ColumnMetadata colSchema = MetadataUtils.newScalar(colName, type, TypeProtos.DataMode.OPTIONAL); + colSchema.setBooleanProperty(ColumnMetadata.EXCLUDE_FROM_WILDCARD, true); + int index = rowWriter.addColumn(colSchema); + + return rowWriter.scalar(index); + } +} diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java similarity index 56% rename from exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java rename to contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java index 0aa7ecefd81..a1f56177328 100644 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java +++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java @@ -17,33 +17,46 @@ */ package org.apache.drill.exec.store.httpd; -import java.util.Objects; - -import org.apache.drill.common.PlanStringBuilder; -import org.apache.drill.common.logical.FormatPluginConfig; - import com.fasterxml.jackson.annotation.JsonCreator; import com.fasterxml.jackson.annotation.JsonInclude; import com.fasterxml.jackson.annotation.JsonProperty; import com.fasterxml.jackson.annotation.JsonTypeName; +import org.apache.drill.common.PlanStringBuilder; +import org.apache.drill.common.logical.FormatPluginConfig; +import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList; -@JsonTypeName("httpd") +import java.util.Collections; +import java.util.List; +import java.util.Objects; + +@JsonTypeName(HttpdLogFormatPlugin.DEFAULT_NAME) @JsonInclude(JsonInclude.Include.NON_DEFAULT) public class HttpdLogFormatConfig implements FormatPluginConfig { public static final String DEFAULT_TS_FORMAT = "dd/MMM/yyyy:HH:mm:ss ZZ"; + public final String logFormat; + public final String timestampFormat; + public final List extensions; + public final boolean flattenWildcards; + public final int maxErrors; - // No extensions? - private final String logFormat; - private final String timestampFormat; @JsonCreator public HttpdLogFormatConfig( + @JsonProperty("extensions") List extensions, @JsonProperty("logFormat") String logFormat, - @JsonProperty("timestampFormat") String timestampFormat) { + @JsonProperty("timestampFormat") String timestampFormat, + @JsonProperty("maxErrors") int maxErrors, + @JsonProperty("flattenWildcards") boolean flattenWildcards + ) { + + this.extensions = extensions == null + ? Collections.singletonList("httpd") + : ImmutableList.copyOf(extensions); this.logFormat = logFormat; - this.timestampFormat = timestampFormat == null - ? DEFAULT_TS_FORMAT : timestampFormat; + this.timestampFormat = timestampFormat; + this.maxErrors = maxErrors; + this.flattenWildcards = flattenWildcards; } /** @@ -61,23 +74,32 @@ public String getTimestampFormat() { return timestampFormat; } + public List getExtensions() { + return extensions; + } + + public int getMaxErrors() { return maxErrors;} + + public boolean getFlattenWildcards () { return flattenWildcards; } + @Override public int hashCode() { - return Objects.hash(logFormat, timestampFormat); + return Objects.hash(logFormat, timestampFormat, maxErrors, flattenWildcards); } @Override - public boolean equals(Object o) { - if (this == o) { + public boolean equals(Object obj) { + if (this == obj) { return true; } - if (o == null || getClass() != o.getClass()) { + if (obj == null || getClass() != obj.getClass()) { return false; } - - HttpdLogFormatConfig that = (HttpdLogFormatConfig) o; - return Objects.equals(logFormat, that.logFormat) && - Objects.equals(timestampFormat, that.timestampFormat); + HttpdLogFormatConfig other = (HttpdLogFormatConfig) obj; + return Objects.equals(logFormat, other.logFormat) + && Objects.equals(timestampFormat, other.timestampFormat) + && Objects.equals(maxErrors, other.maxErrors) + && Objects.equals(flattenWildcards, other.flattenWildcards); } @Override @@ -85,6 +107,8 @@ public String toString() { return new PlanStringBuilder(this) .field("log format", logFormat) .field("timestamp format", timestampFormat) + .field("max errors", maxErrors) + .field("flattenWildcards", flattenWildcards) .toString(); } } diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java new file mode 100644 index 00000000000..674bfdb7cd6 --- /dev/null +++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java @@ -0,0 +1,97 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.httpd; + +import org.apache.drill.common.logical.StoragePluginConfig; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.Types; +import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework; +import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory; +import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator; +import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader; +import org.apache.drill.exec.proto.UserBitShared; +import org.apache.drill.exec.server.DrillbitContext; +import org.apache.drill.exec.server.options.OptionManager; +import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin; +import org.apache.drill.exec.store.dfs.easy.EasySubScan; +import org.apache.hadoop.conf.Configuration; + +public class HttpdLogFormatPlugin extends EasyFormatPlugin { + + protected static final String DEFAULT_NAME = "httpd"; + + private static class HtttpLogReaderFactory extends FileReaderFactory { + + private final HttpdLogFormatConfig config; + private final int maxRecords; + private final EasySubScan scan; + + private HtttpLogReaderFactory(HttpdLogFormatConfig config, int maxRecords, EasySubScan scan) { + this.config = config; + this.maxRecords = maxRecords; + this.scan = scan; + } + + @Override + public ManagedReader newReader() { + return new HttpdLogBatchReader(config, maxRecords, scan); + } + } + + public HttpdLogFormatPlugin(final String name, + final DrillbitContext context, + final Configuration fsConf, + final StoragePluginConfig storageConfig, + final HttpdLogFormatConfig formatConfig) { + + super(name, easyConfig(fsConf, formatConfig), context, storageConfig, formatConfig); + } + + private static EasyFormatConfig easyConfig(Configuration fsConf, HttpdLogFormatConfig pluginConfig) { + EasyFormatConfig config = new EasyFormatConfig(); + config.readable = true; + config.writable = false; + config.blockSplittable = false; + config.compressible = true; + config.supportsProjectPushdown = true; + config.extensions = pluginConfig.getExtensions(); + config.fsConf = fsConf; + config.defaultName = DEFAULT_NAME; + config.readerOperatorType = UserBitShared.CoreOperatorType.HTPPD_LOG_SUB_SCAN_VALUE; + config.useEnhancedScan = true; + config.supportsLimitPushdown = true; + return config; + } + + @Override + public ManagedReader newBatchReader( + EasySubScan scan, OptionManager options) { + return new HttpdLogBatchReader(formatConfig, scan.getMaxRecords(), scan); + } + + @Override + protected FileScanFramework.FileScanBuilder frameworkBuilder(OptionManager options, EasySubScan scan) { + FileScanFramework.FileScanBuilder builder = new FileScanFramework.FileScanBuilder(); + builder.setReaderFactory(new HtttpLogReaderFactory(formatConfig, scan.getMaxRecords(), scan)); + + initScanBuilder(builder, scan); + builder.nullType(Types.optional(TypeProtos.MinorType.VARCHAR)); + return builder; + } +} diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java new file mode 100644 index 00000000000..8f2c73acbb7 --- /dev/null +++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java @@ -0,0 +1,482 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.httpd; + +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.record.metadata.ColumnMetadata; +import org.apache.drill.exec.record.metadata.MetadataUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.shaded.guava.com.google.common.collect.Maps; + +import java.util.EnumSet; +import java.util.HashMap; +import java.util.Map; + +import nl.basjes.parse.core.Casts; +import nl.basjes.parse.core.Parser; +import org.joda.time.Instant; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.text.SimpleDateFormat; +import java.util.Date; + +public class HttpdLogRecord { + + private static final Logger logger = LoggerFactory.getLogger(HttpdLogRecord.class); + + private final Map strings = Maps.newHashMap(); + private final Map longs = Maps.newHashMap(); + private final Map doubles = Maps.newHashMap(); + private final Map dates = Maps.newHashMap(); + private final Map times = Maps.newHashMap(); + private final Map timestamps = new HashMap<>(); + private final Map wildcards = Maps.newHashMap(); + private final Map cleanExtensions = Maps.newHashMap(); + private final Map startedWildcards = Maps.newHashMap(); + private final Map wildcardWriters = Maps.newHashMap(); + private final SimpleDateFormat dateFormatter; + private RowSetLoader rootRowWriter; + private final boolean flattenWildcards; + + public HttpdLogRecord(String timeFormat, boolean flattenWildcards) { + if (timeFormat == null) { + timeFormat = HttpdLogFormatConfig.DEFAULT_TS_FORMAT; + } + this.dateFormatter = new SimpleDateFormat(timeFormat); + this.flattenWildcards = flattenWildcards; + } + + /** + * Call this method after a record has been parsed. This finished the lifecycle of any maps that were written and + * removes all the entries for the next record to be able to work. + */ + public void finishRecord() { + wildcardWriters.clear(); + startedWildcards.clear(); + } + + /** + * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get + * called when the value of a log field is a String data type. + * + * @param field name of field + * @param value value of field + */ + @SuppressWarnings("unused") + public void set(String field, String value) { + if (value != null) { + final ScalarWriter w = strings.get(field); + if (w != null) { + logger.debug("Parsed field: {}, as string: {}", field, value); + w.setString(value); + } else { + logger.warn("No 'string' writer found for field: {}", field); + } + } + } + + /** + * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get + * called when the value of a log field is a Long data type. + * + * @param field name of field + * @param value value of field + */ + @SuppressWarnings("unused") + public void set(String field, Long value) { + if (value != null) { + final ScalarWriter w = longs.get(field); + if (w != null) { + logger.debug("Parsed field: {}, as long: {}", field, value); + w.setLong(value); + } else { + logger.warn("No 'long' writer found for field: {}", field); + } + } + } + + /** + * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get + * called when the value of a log field is a Date data type. + * + * @param field name of field + * @param value value of field + */ + @SuppressWarnings("unused") + public void setDate(String field, String value) { + if (value != null) { + final ScalarWriter w = dates.get(field); + if (w != null) { + logger.debug("Parsed field: {}, as long: {}", field, value); + w.setDate(new LocalDate(value)); + } else { + logger.warn("No 'date' writer found for field: {}", field); + } + } + } + + /** + * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get + * called when the value of a log field is a Time data type. + * + * @param field name of field + * @param value value of field + */ + @SuppressWarnings("unused") + public void setTime(String field, String value) { + if (value != null) { + final ScalarWriter w = times.get(field); + if (w != null) { + logger.debug("Parsed field: {}, as long: {}", field, value); + w.setTime(new LocalTime(value)); + } else { + logger.warn("No 'date' writer found for field: {}", field); + } + } + } + + /** + * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get + * called when the value of a log field is a timesstamp data type. + * + * @param field name of field + * @param value value of field + */ + @SuppressWarnings("unused") + public void setTimestampFromEpoch(String field, Long value) { + if (value != null) { + final ScalarWriter w = timestamps.get(field); + if (w != null) { + logger.debug("Parsed field: {}, as timestamp: {}", field, value); + w.setTimestamp(new Instant(value)); + } else { + logger.warn("No 'timestamp' writer found for field: {}", field); + } + } + } + + + /** + * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get + * called when the value of a log field is a timesstamp data type. + * + * @param field name of field + * @param value value of field + */ + @SuppressWarnings("unused") + public void setTimestamp(String field, String value) { + if (value != null) { + //Convert the date string into a long + long ts = 0; + try { + Date d = this.dateFormatter.parse(value); + ts = d.getTime(); + } catch (Exception e) { + //If the date formatter does not successfully create a date, the timestamp will fall back to zero + //Do not throw exception + } + final ScalarWriter tw = timestamps.get(field); + if (tw != null) { + logger.debug("Parsed field: {}, as time: {}", field, value); + tw.setTimestamp(new Instant(ts)); + } else { + logger.warn("No 'timestamp' writer found for field: {}", field); + } + } + } + + /** + * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get + * called when the value of a log field is a Double data type. + * + * @param field name of field + * @param value value of field + */ + @SuppressWarnings("unused") + public void set(String field, Double value) { + if (value != null) { + final ScalarWriter w = doubles.get(field); + if (w != null) { + logger.debug("Parsed field: {}, as double: {}", field, value); + w.setDouble(value); + } else { + logger.warn("No 'double' writer found for field: {}", field); + } + } + } + + /** + * This method is referenced and called via reflection. When the parser processes a field like: + * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be + * invoked.
+ * + * @param field name of field + * @param value value of field + */ + @SuppressWarnings("unused") + public void setWildcard(String field, String value) { + if (value != null) { + String cleanedField = HttpdUtils.getFieldNameFromMap(field); + if (flattenWildcards) { + String drillFieldName = HttpdUtils.drillFormattedFieldName(field); + ScalarWriter writer = getColWriter(rootRowWriter, drillFieldName, MinorType.VARCHAR); + writer.setString(value); + } else { + final TupleWriter mapWriter = getWildcardWriter(field); + logger.debug("Parsed wildcard field: {}, as String: {}", field, value); + writeStringColumn(mapWriter, cleanedField, value); + } + } + } + + /** + * This method is referenced and called via reflection. When the parser processes a field like: + * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be + * invoked.
+ * + * @param field name of field + * @param value value of field + */ + @SuppressWarnings("unused") + public void setWildcard(String field, Long value) { + if (value != null) { + String cleanedField = HttpdUtils.getFieldNameFromMap(field); + + if (flattenWildcards) { + String drillFieldName = HttpdUtils.drillFormattedFieldName(field); + ScalarWriter writer = getColWriter(rootRowWriter, drillFieldName, MinorType.BIGINT); + writer.setLong(value); + } else { + final TupleWriter mapWriter = getWildcardWriter(field); + logger.debug("Parsed wildcard field: {}, as long: {}", field, value); + writeLongColumn(mapWriter, cleanedField, value); + } + } + } + + /** + * This method is referenced and called via reflection. When the parser processes a field like: + * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be + * invoked.
+ * + * @param field name of field + * @param value value of field + */ + @SuppressWarnings("unused") + public void setWildcard(String field, Double value) { + if (value != null) { + String cleanedField = HttpdUtils.getFieldNameFromMap(field); + + if (flattenWildcards) { + String drillFieldName = HttpdUtils.drillFormattedFieldName(field); + ScalarWriter writer = getColWriter(rootRowWriter, drillFieldName, MinorType.FLOAT8); + writer.setDouble(value); + } else { + final TupleWriter mapWriter = getWildcardWriter(field); + logger.debug("Parsed wildcard field: {}, as double: {}", field, value); + writeFloatColumn(mapWriter, cleanedField, value); + } + } + } + + /** + * For a configuration like HTTP.URI:request.firstline.uri.query.*, a writer was created with name + * HTTP.URI:request.firstline.uri.query, we traverse the list of wildcard writers to see which one is the root of the + * name of the field passed in like HTTP.URI:request.firstline.uri.query.old. This is writer entry that is needed. + * + * @param field like HTTP.URI:request.firstline.uri.query.old where 'old' is one of many different parameter names. + * @return the writer to be used for this field. + */ + private TupleWriter getWildcardWriter(String field) { + + TupleWriter writer = startedWildcards.get(field); + if (writer == null) { + for (Map.Entry entry : wildcards.entrySet()) { + String root = entry.getKey(); + if (field.startsWith(root)) { + writer = entry.getValue(); + /* + * In order to save some time, store the cleaned version of the field extension. It is possible it will have + * unsafe characters in it. + */ + if (!cleanExtensions.containsKey(field)) { + String extension = field.substring(root.length() + 1); + String cleanExtension = HttpdUtils.drillFormattedFieldName(extension); + cleanExtensions.put(field, cleanExtension); + logger.debug("Added extension: field='{}' with cleanExtension='{}'", field, cleanExtension); + } + + /* + * We already know we have the writer, but if we have put this writer in the started list, do NOT call start + * again. + */ + if (!wildcardWriters.containsKey(root)) { + /* + * Start and store this root map writer for later retrieval. + */ + logger.debug("Starting new wildcard field writer: {}", field); + startedWildcards.put(field, writer); + wildcardWriters.put(root, writer); + } + /* + * Break out of the for loop when we find a root writer that matches the field. + */ + break; + } + } + } + + return writer; + } + + public Map getStrings() { + return strings; + } + + public Map getLongs() { + return longs; + } + + public Map getDoubles() { + return doubles; + } + + public Map getTimestamps() { + return timestamps; + } + + /** + * This record will be used with a single parser. For each field that is to be parsed a setter will be called. It + * registers a setter method for each field being parsed. It also builds the data writers to hold the data beings + * parsed. + * + * @param parser The initialized HttpdParser + * @param rowWriter An initialized RowSetLoader object + * @param type The Drill MinorType which sets the data type in the rowWriter + * @param parserFieldName The field name which is generated by the Httpd Parser. These are not "Drill safe" + * @param drillFieldName The Drill safe field name + * @param mappedColumns A list of columns mapped to their correct Drill data type + * @throws NoSuchMethodException Thrown in the event that the parser does not have a correct setter method + */ + public void addField(final Parser parser, + final RowSetLoader rowWriter, + final EnumSet type, + final String parserFieldName, + final String drillFieldName, + Map mappedColumns) throws NoSuchMethodException { + final boolean hasWildcard = parserFieldName.endsWith(HttpdParser.PARSER_WILDCARD); + + logger.debug("Field name: {}", parserFieldName); + rootRowWriter = rowWriter; + /* + * This is a dynamic way to map the setter for each specified field type.
+ * e.g. a TIME.EPOCH may map to a LONG while a referrer may map to a STRING + */ + if (hasWildcard) { + final String cleanName = parserFieldName.substring(0, parserFieldName.length() - HttpdParser.PARSER_WILDCARD.length()); + logger.debug("Adding WILDCARD parse target: {} as {}, with field name: {}", parserFieldName, cleanName, drillFieldName); + parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, String.class), parserFieldName); + parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, Double.class), parserFieldName); + parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, Long.class), parserFieldName); + wildcards.put(cleanName, getMapWriter(drillFieldName, rowWriter)); + } else if (type.contains(Casts.DOUBLE) || mappedColumns.get(drillFieldName) == MinorType.FLOAT8) { + parser.addParseTarget(this.getClass().getMethod("set", String.class, Double.class), parserFieldName); + doubles.put(parserFieldName, rowWriter.scalar(drillFieldName)); + } else if (type.contains(Casts.LONG) || mappedColumns.get(drillFieldName) == MinorType.BIGINT) { + parser.addParseTarget(this.getClass().getMethod("set", String.class, Long.class), parserFieldName); + longs.put(parserFieldName, rowWriter.scalar(drillFieldName)); + } else { + if (parserFieldName.startsWith("TIME.STAMP:")) { + parser.addParseTarget(this.getClass().getMethod("setTimestamp", String.class, String.class), parserFieldName); + timestamps.put(parserFieldName, rowWriter.scalar(drillFieldName)); + } else if (parserFieldName.startsWith("TIME.EPOCH:")) { + parser.addParseTarget(this.getClass().getMethod("setTimestampFromEpoch", String.class, Long.class), parserFieldName); + timestamps.put(parserFieldName, rowWriter.scalar(drillFieldName)); + } else if (parserFieldName.startsWith("TIME.DATE")) { + parser.addParseTarget(this.getClass().getMethod("setDate", String.class, String.class), parserFieldName); + dates.put(parserFieldName, rowWriter.scalar(drillFieldName)); + } else if (parserFieldName.startsWith("TIME.TIME")) { + parser.addParseTarget(this.getClass().getMethod("setTime", String.class, String.class), parserFieldName); + times.put(parserFieldName, rowWriter.scalar(drillFieldName)); + } else { + parser.addParseTarget(this.getClass().getMethod("set", String.class, String.class), parserFieldName); + strings.put(parserFieldName, rowWriter.scalar(drillFieldName)); + } + } + } + + private TupleWriter getMapWriter(String mapName, RowSetLoader rowWriter) { + int index = rowWriter.tupleSchema().index(mapName); + if (index == -1) { + index = rowWriter.addColumn(SchemaBuilder.columnSchema(mapName, TypeProtos.MinorType.MAP, TypeProtos.DataMode.REQUIRED)); + } + return rowWriter.tuple(index); + } + + /** + * Helper function to write a 1D long column + * + * @param rowWriter The row to which the data will be written + * @param name The column name + * @param value The value to be written + */ + private void writeLongColumn(TupleWriter rowWriter, String name, long value) { + ScalarWriter colWriter = getColWriter(rowWriter, name, MinorType.BIGINT); + colWriter.setLong(value); + } + + /** + * Helper function to write a 1D String column + * + * @param rowWriter The row to which the data will be written + * @param name The column name + * @param value The value to be written + */ + private void writeStringColumn(TupleWriter rowWriter, String name, String value) { + ScalarWriter colWriter = getColWriter(rowWriter, name, MinorType.VARCHAR); + colWriter.setString(value); + } + + /** + * Helper function to write a 1D String column + * + * @param rowWriter The row to which the data will be written + * @param name The column name + * @param value The value to be written + */ + private void writeFloatColumn(TupleWriter rowWriter, String name, double value) { + ScalarWriter colWriter = getColWriter(rowWriter, name, MinorType.FLOAT8); + colWriter.setDouble(value); + } + + private ScalarWriter getColWriter(TupleWriter tupleWriter, String fieldName, TypeProtos.MinorType type) { + int index = tupleWriter.tupleSchema().index(fieldName); + if (index == -1) { + ColumnMetadata colSchema = MetadataUtils.newScalar(fieldName, type, TypeProtos.DataMode.OPTIONAL); + index = tupleWriter.addColumn(colSchema); + } + return tupleWriter.scalar(index); + } +} diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java new file mode 100644 index 00000000000..36fe949e019 --- /dev/null +++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java @@ -0,0 +1,258 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.httpd; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.store.dfs.easy.EasySubScan; +import org.apache.drill.shaded.guava.com.google.common.base.Preconditions; +import org.apache.drill.shaded.guava.com.google.common.collect.Maps; +import nl.basjes.parse.core.Casts; +import nl.basjes.parse.core.Parser; +import nl.basjes.parse.core.exceptions.DissectionFailure; +import nl.basjes.parse.core.exceptions.InvalidDissectorException; +import nl.basjes.parse.core.exceptions.MissingDissectorsException; +import nl.basjes.parse.httpdlog.HttpdLoglineParser; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.EnumSet; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +public class HttpdParser { + + private static final Logger logger = LoggerFactory.getLogger(HttpdParser.class); + + public static final String PARSER_WILDCARD = ".*"; + public static final String REMAPPING_FLAG = "#"; + private final Parser parser; + private final List requestedColumns; + private final Map mappedColumns; + private final HttpdLogRecord record; + private final String logFormat; + private Map requestedPaths; + private EnumSet casts; + + + public HttpdParser(final String logFormat, final String timestampFormat, final boolean flattenWildcards, final EasySubScan scan) { + + Preconditions.checkArgument(logFormat != null && !logFormat.trim().isEmpty(), "logFormat cannot be null or empty"); + + this.logFormat = logFormat; + this.record = new HttpdLogRecord(timestampFormat, flattenWildcards); + + if (timestampFormat == null) { + this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat); + } else { + this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat, timestampFormat); + } + + /* + * The log parser has the possibility of parsing the user agent and extracting additional fields + * Unfortunately, doing so negatively affects the speed of the parser. Uncommenting this line and another in + * the HttpLogRecord will enable these fields. We will add this functionality in a future PR. + * this.parser.addDissector(new UserAgentDissector()); + */ + + this.requestedColumns = scan.getColumns(); + + if (timestampFormat != null && !timestampFormat.trim().isEmpty()) { + logger.info("Custom timestamp format has been specified. This is an informational note only as custom timestamps is rather unusual."); + } + if (logFormat.contains("\n")) { + logger.info("Specified logformat is a multiline log format: {}", logFormat); + } + + mappedColumns = new HashMap<>(); + } + + /** + * We do not expose the underlying parser or the record which is used to manage the writers. + * + * @param line log line to tear apart. + * @throws DissectionFailure if there is a generic dissector failure + * @throws InvalidDissectorException if the dissector is not valid + * @throws MissingDissectorsException if the dissector is missing + */ + public void parse(final String line) throws DissectionFailure, InvalidDissectorException, MissingDissectorsException { + parser.parse(record, line); + record.finishRecord(); + } + + public TupleMetadata setupParser() + throws NoSuchMethodException, MissingDissectorsException, InvalidDissectorException { + + SchemaBuilder builder = new SchemaBuilder(); + + /* + * If the user has selected fields, then we will use them to configure the parser because this would be the most + * efficient way to parse the log. + */ + List allParserPaths = parser.getPossiblePaths(); + + /* + * Use all possible paths that the parser has determined from the specified log format. + */ + + requestedPaths = Maps.newConcurrentMap(); + + for (final String parserPath : allParserPaths) { + requestedPaths.put(HttpdUtils.drillFormattedFieldName(parserPath), parserPath); + } + + /* + * By adding the parse target to the dummy instance we activate it for use. Which we can then use to find out which + * paths cast to which native data types. After we are done figuring this information out, we throw this away + * because this will be the slowest parsing path possible for the specified format. + */ + Parser dummy = new HttpdLoglineParser<>(Object.class, logFormat); + + /* This is the second line to uncomment to add the user agent parsing. + * dummy.addDissector(new UserAgentDissector()); + */ + dummy.addParseTarget(String.class.getMethod("indexOf", String.class), allParserPaths); + + for (final Map.Entry entry : requestedPaths.entrySet()) { + + /* + If the column is not requested explicitly, remove it from the requested path list. + */ + if (! isRequested(entry.getKey()) && + !(isStarQuery()) && + !isMetadataQuery() && + !isOnlyImplicitColumns() ) { + requestedPaths.remove(entry.getKey()); + continue; + } + + /* + * Check the field specified by the user to see if it is supposed to be remapped. + */ + if (entry.getValue().startsWith(REMAPPING_FLAG)) { + /* + * Because this field is being remapped we need to replace the field name that the parser uses. + */ + entry.setValue(entry.getValue().substring(REMAPPING_FLAG.length())); + + final String[] pieces = entry.getValue().split(":"); + HttpdUtils.addTypeRemapping(parser, pieces[1], pieces[0]); + casts = Casts.STRING_ONLY; + } else { + casts = dummy.getCasts(entry.getValue()); + } + + Casts dataType = (Casts) casts.toArray()[casts.size() - 1]; + + switch (dataType) { + case STRING: + if (entry.getValue().startsWith("TIME.STAMP:")) { + builder.addNullable(entry.getKey(), MinorType.TIMESTAMP); + mappedColumns.put(entry.getKey(), MinorType.TIMESTAMP); + } else if (entry.getValue().startsWith("TIME.DATE:")) { + builder.addNullable(entry.getKey(), MinorType.DATE); + mappedColumns.put(entry.getKey(), MinorType.DATE); + } else if (entry.getValue().startsWith("TIME.TIME:")) { + builder.addNullable(entry.getKey(), MinorType.TIME); + mappedColumns.put(entry.getKey(), MinorType.TIME); + } else if (HttpdUtils.isWildcard(entry.getValue())) { + builder.addMap(entry.getValue()); + mappedColumns.put(entry.getKey(), MinorType.MAP); + } + else { + builder.addNullable(entry.getKey(), TypeProtos.MinorType.VARCHAR); + mappedColumns.put(entry.getKey(), MinorType.VARCHAR); + } + break; + case LONG: + if (entry.getValue().startsWith("TIME.EPOCH:")) { + builder.addNullable(entry.getKey(), MinorType.TIMESTAMP); + mappedColumns.put(entry.getKey(), MinorType.TIMESTAMP); + } else { + builder.addNullable(entry.getKey(), TypeProtos.MinorType.BIGINT); + mappedColumns.put(entry.getKey(), MinorType.BIGINT); + } + break; + case DOUBLE: + builder.addNullable(entry.getKey(), TypeProtos.MinorType.FLOAT8); + mappedColumns.put(entry.getKey(), MinorType.FLOAT8); + break; + default: + logger.error("HTTPD Unsupported data type {} for field {}", dataType.toString(), entry.getKey()); + break; + } + } + return builder.build(); + } + + public void addFieldsToParser(RowSetLoader rowWriter) { + for (final Map.Entry entry : requestedPaths.entrySet()) { + try { + record.addField(parser, rowWriter, casts, entry.getValue(), entry.getKey(), mappedColumns); + } catch (NoSuchMethodException e) { + logger.error("Error adding fields to parser."); + } + } + logger.debug("Added Fields to Parser"); + } + + public boolean isStarQuery() { + return requestedColumns.size() == 1 && requestedColumns.get(0).isDynamicStar(); + } + + public boolean isMetadataQuery() { + return requestedColumns.size() == 0; + } + + public boolean isRequested(String colName) { + for (SchemaPath path : requestedColumns) { + if (path.isDynamicStar()) { + return true; + } else if (path.nameEquals(colName)) { + return true; + } + } + return false; + } + + /* + This is for the edge case where a query only contains the implicit fields. + */ + public boolean isOnlyImplicitColumns() { + + // If there are more than two columns, this isn't an issue. + if (requestedColumns.size() > 2) { + return false; + } + + if (requestedColumns.size() == 1) { + return requestedColumns.get(0).nameEquals(HttpdLogBatchReader.RAW_LINE_COL_NAME) || + requestedColumns.get(0).nameEquals(HttpdLogBatchReader.MATCHED_COL_NAME); + } else { + return (requestedColumns.get(0).nameEquals(HttpdLogBatchReader.RAW_LINE_COL_NAME) || + requestedColumns.get(0).nameEquals(HttpdLogBatchReader.MATCHED_COL_NAME)) && + (requestedColumns.get(1).nameEquals(HttpdLogBatchReader.RAW_LINE_COL_NAME) || + requestedColumns.get(1).nameEquals(HttpdLogBatchReader.MATCHED_COL_NAME)); + } + } +} diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdUtils.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdUtils.java new file mode 100644 index 00000000000..5a975b657b2 --- /dev/null +++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdUtils.java @@ -0,0 +1,78 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.httpd; + +import nl.basjes.parse.core.Parser; + +public class HttpdUtils { + + public static final String PARSER_WILDCARD = ".*"; + public static final String SAFE_WILDCARD = "_$"; + public static final String SAFE_SEPARATOR = "_"; + + /** + * Drill cannot deal with fields with dots in them like request.referer. For the sake of simplicity we are going + * ensure the field name is cleansed. The resultant output field will look like: request_referer.
+ * Additionally, wild cards will get replaced with _$ + * + * @param parserFieldName name to be cleansed. + * @return The field name formatted for Drill + */ + public static String drillFormattedFieldName(String parserFieldName) { + if (parserFieldName.contains(":")) { + String[] fieldPart = parserFieldName.split(":"); + return fieldPart[1].replaceAll("_", "__").replace(PARSER_WILDCARD, SAFE_WILDCARD).replaceAll("\\.", SAFE_SEPARATOR); + } else { + return parserFieldName.replaceAll("_", "__").replace(PARSER_WILDCARD, SAFE_WILDCARD).replaceAll("\\.", SAFE_SEPARATOR); + } + } + + /** + * In order to define a type remapping the format of the field configuration will look like:
+ * HTTP.URI:request.firstline.uri.query.[parameter name]
+ * + * @param parser Add type remapping to this parser instance. + * @param fieldName request.firstline.uri.query.[parameter_name] + * @param fieldType HTTP.URI, etc.. + */ + public static void addTypeRemapping(final Parser parser, final String fieldName, final String fieldType) { + parser.addTypeRemapping(fieldName, fieldType); + } + + /** + * Returns true if the field is a wildcard AKA map field, false if not. + * @param fieldName The target field name + * @return True if the field is a wildcard, false if not + */ + public static boolean isWildcard(String fieldName) { + return fieldName.endsWith(PARSER_WILDCARD); + } + + /** + * The HTTPD parser formats fields using the format HTTP.URI:request.firstline.uri.query. + * For maps, we only want the last part of this, so this function returns the last bit of the + * field name. + * @param mapField The unformatted field name + * @return The last part of the field name + */ + public static String getFieldNameFromMap(String mapField) { + return mapField.substring(mapField.lastIndexOf('.') + 1); + } + +} diff --git a/contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json b/contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json new file mode 100644 index 00000000000..145e9474699 --- /dev/null +++ b/contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json @@ -0,0 +1,37 @@ +{ + "storage":{ + "dfs": { + "type": "file", + "formats": { + "httpd" : { + "type" : "httpd", + "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"", + "maxErrors": 0, + "flattenWildcards": false + } + } + }, + "cp": { + "type": "file", + "formats": { + "httpd" : { + "type" : "httpd", + "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"", + "maxErrors": 0, + "flattenWildcards": false + } + } + }, + "s3": { + "type": "file", + "formats": { + "httpd" : { + "type" : "httpd", + "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"", + "maxErrors": 0, + "flattenWildcards": false + } + } + } + } +} diff --git a/contrib/format-httpd/src/main/resources/drill-module.conf b/contrib/format-httpd/src/main/resources/drill-module.conf new file mode 100644 index 00000000000..6236c500159 --- /dev/null +++ b/contrib/format-httpd/src/main/resources/drill-module.conf @@ -0,0 +1,23 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +# This file tells Drill to consider this module when class path scanning. +# This file can also include any supplementary configuration information. +# This file is in HOCON format, see https://github.com/typesafehub/config/blob/master/HOCON.md for more information. + +drill.classpath.scanning.packages += "org.apache.drill.exec.store.httpd" diff --git a/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java b/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java new file mode 100644 index 00000000000..2dd97fa3630 --- /dev/null +++ b/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java @@ -0,0 +1,583 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.httpd; + +import org.apache.drill.categories.RowSetTests; +import org.apache.drill.common.exceptions.DrillRuntimeException; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSetBuilder; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.rpc.RpcException; +import org.apache.drill.test.ClusterFixture; +import org.apache.drill.test.ClusterTest; +import org.apache.drill.test.rowSet.RowSetComparison; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.joda.time.LocalDate; +import org.joda.time.LocalTime; +import org.junit.BeforeClass; +import org.junit.Test; +import org.junit.experimental.categories.Category; +import java.nio.file.Paths; +import static org.apache.drill.test.QueryTestUtil.generateCompressedFile; +import static org.junit.Assert.assertEquals; +import static org.apache.drill.test.rowSet.RowSetUtilities.mapArray; +import static org.junit.Assert.assertTrue; +import static org.junit.Assert.fail; + + +@Category(RowSetTests.class) +public class TestHTTPDLogReader extends ClusterTest { + + @BeforeClass + public static void setup() throws Exception { + ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher)); + + // Needed for compressed file unit test + dirTestWatcher.copyResourceToRoot(Paths.get("httpd/")); + } + + @Test + public void testDateField() throws RpcException { + String sql = "SELECT `request_receive_time` FROM cp.`httpd/hackers-access-small.httpd` LIMIT 5"; + RowSet results = client.queryBuilder().sql(sql).rowSet(); + + TupleMetadata expectedSchema = new SchemaBuilder() + .addNullable("request_receive_time", MinorType.TIMESTAMP) + .build(); + RowSet expected = client.rowSetBuilder(expectedSchema) + .addRow(1445742685000L) + .addRow(1445742686000L) + .addRow(1445742687000L) + .addRow(1445743471000L) + .addRow(1445743472000L) + .build(); + + RowSetUtilities.verify(expected, results); + } + + @Test + public void testDateEpochField() throws RpcException { + String sql = "SELECT `request_receive_time`, `request_receive_time_epoch` FROM cp.`httpd/hackers-access-small.httpd` LIMIT 5"; + RowSet results = client.queryBuilder().sql(sql).rowSet(); + + TupleMetadata expectedSchema = new SchemaBuilder() + .addNullable("request_receive_time", MinorType.TIMESTAMP) + .addNullable("request_receive_time_epoch", MinorType.TIMESTAMP) + .build(); + RowSet expected = client.rowSetBuilder(expectedSchema) + .addRow(1445742685000L, 1445742685000L) + .addRow(1445742686000L, 1445742686000L) + .addRow(1445742687000L, 1445742687000L ) + .addRow(1445743471000L, 1445743471000L) + .addRow(1445743472000L, 1445743472000L) + .build(); + + RowSetUtilities.verify(expected, results); + } + + @Test + public void testCount() throws Exception { + String sql = "SELECT COUNT(*) FROM cp.`httpd/hackers-access-small.httpd`"; + long result = client.queryBuilder().sql(sql).singletonLong(); + assertEquals(10L, result); + } + + @Test + public void testSerDe() throws Exception { + String sql = "SELECT COUNT(*) AS cnt FROM cp.`httpd/hackers-access-small.httpd`"; + String plan = queryBuilder().sql(sql).explainJson(); + long cnt = queryBuilder().physical(plan).singletonLong(); + assertEquals("Counts should match",10L, cnt); + } + + @Test + public void testFlattenMap() throws Exception { + String sql = "SELECT request_firstline_original_uri_query_came__from " + + "FROM table(cp.`httpd/hackers-access-small.httpd` (type => 'httpd', logFormat => '%h %l %u %t \\\"%r\\\" %s %b \\\"%{Referer}i\\\" " + + "\\\"%{User-agent}i\\\"', " + + "flattenWildcards => true)) WHERE `request_firstline_original_uri_query_came__from` IS NOT NULL"; + + RowSet results = client.queryBuilder().sql(sql).rowSet(); + + TupleMetadata expectedSchema = new SchemaBuilder() + .addNullable("request_firstline_original_uri_query_came__from", MinorType.VARCHAR) + .build(); + + RowSet expected = client.rowSetBuilder(expectedSchema) + .addRow("http://howto.basjes.nl/join_form") + .build(); + + new RowSetComparison(expected).verifyAndClearAll(results); + } + + + @Test + public void testLimitPushdown() throws Exception { + String sql = "SELECT * FROM cp.`httpd/hackers-access-small.httpd` LIMIT 5"; + + queryBuilder() + .sql(sql) + .planMatcher() + .include("Limit", "maxRecords=5") + .match(); + } + + @Test + public void testMapField() throws Exception { + String sql = "SELECT data.`request_firstline_original_uri_query_$`.aqb AS aqb, data.`request_firstline_original_uri_query_$`.t AS data_time " + + "FROM cp.`httpd/example1.httpd` AS data"; + + RowSet results = client.queryBuilder().sql(sql).rowSet(); + + TupleMetadata expectedSchema = new SchemaBuilder() + .addNullable("aqb", MinorType.VARCHAR) + .addNullable("data_time", MinorType.VARCHAR) + .build(); + + RowSet expected = client.rowSetBuilder(expectedSchema) + .addRow("1", "19/5/2012 23:51:27 2 -120") + .build(); + + new RowSetComparison(expected).verifyAndClearAll(results); + } + + @Test + public void testSingleExplicitColumn() throws Exception { + String sql = "SELECT request_referer FROM cp.`httpd/hackers-access-small.httpd`"; + RowSet results = client.queryBuilder().sql(sql).rowSet(); + + TupleMetadata expectedSchema = new SchemaBuilder() + .addNullable("request_referer", MinorType.VARCHAR) + .build(); + + RowSet expected = client.rowSetBuilder(expectedSchema) + .addRow("http://howto.basjes.nl/") + .addRow("http://howto.basjes.nl/") + .addRow("http://howto.basjes.nl/join_form") + .addRow("http://howto.basjes.nl/") + .addRow("http://howto.basjes.nl/join_form") + .addRow("http://howto.basjes.nl/join_form") + .addRow("http://howto.basjes.nl/") + .addRow("http://howto.basjes.nl/login_form") + .addRow("http://howto.basjes.nl/") + .addRow("http://howto.basjes.nl/") + .build(); + + assertEquals(results.rowCount(), 10); + new RowSetComparison(expected).verifyAndClearAll(results); + } + + + @Test + public void testImplicitColumn() throws Exception { + String sql = "SELECT _raw FROM cp.`httpd/hackers-access-small.httpd`"; + RowSet results = client.queryBuilder().sql(sql).rowSet(); + + TupleMetadata expectedSchema = new SchemaBuilder() + .addNullable("_raw", MinorType.VARCHAR) + .build(); + + RowSet expected = client.rowSetBuilder(expectedSchema) + .addRow("195.154.46.135 - - [25/Oct/2015:04:11:25 +0100] \"GET /linux/doing-pxe-without-dhcp-control HTTP/1.1\" 200 24323 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0\"") + .addRow("23.95.237.180 - - [25/Oct/2015:04:11:26 +0100] \"GET /join_form HTTP/1.0\" 200 11114 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0\"") + .addRow("23.95.237.180 - - [25/Oct/2015:04:11:27 +0100] \"POST /join_form HTTP/1.1\" 302 9093 \"http://howto.basjes.nl/join_form\" \"Mozilla/5.0 (Windows NT 5.1; rv:35.0) " + + "Gecko/20100101 Firefox/35.0\"") + .addRow("158.222.5.157 - - [25/Oct/2015:04:24:31 +0100] \"GET /join_form HTTP/1.0\" 200 11114 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21\"") + .addRow("158.222.5.157 - - [25/Oct/2015:04:24:32 +0100] \"POST /join_form HTTP/1.1\" 302 9093 \"http://howto.basjes.nl/join_form\" \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21\"") + .addRow("158.222.5.157 - - [25/Oct/2015:04:24:37 +0100] \"GET /acl_users/credentials_cookie_auth/require_login?came_from=http%3A//howto.basjes.nl/join_form HTTP/1.1\" 200 10716 \"http://howto.basjes.nl/join_form\" \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21\"") + .addRow("158.222.5.157 - - [25/Oct/2015:04:24:39 +0100] \"GET /login_form HTTP/1.1\" 200 10543 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21\"") + .addRow("158.222.5.157 - - [25/Oct/2015:04:24:41 +0100] \"POST /login_form HTTP/1.1\" 200 16810 \"http://howto.basjes.nl/login_form\" \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21\"") + .addRow("5.39.5.5 - - [25/Oct/2015:04:32:22 +0100] \"GET /join_form HTTP/1.1\" 200 11114 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 5.1; rv:34.0) Gecko/20100101 Firefox/34.0\"") + .addRow("180.180.64.16 - - [25/Oct/2015:04:34:37 +0100] \"GET /linux/doing-pxe-without-dhcp-control HTTP/1.1\" 200 24323 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0\"") + .build(); + + new RowSetComparison(expected).verifyAndClearAll(results); + } + + @Test + public void testExplicitSomeQuery() throws Exception { + String sql = "SELECT request_referer_ref, request_receive_time_last_time, request_firstline_uri_protocol FROM cp.`httpd/hackers-access-small.httpd`"; + + RowSet results = client.queryBuilder().sql(sql).rowSet(); + + TupleMetadata expectedSchema = new SchemaBuilder() + .addNullable("request_referer_ref", MinorType.VARCHAR) + .addNullable("request_receive_time_last_time", MinorType.TIME) + .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR) + .buildSchema(); + + RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema) + .addRow(null, new LocalTime("04:11:25"), null) + .addRow(null, new LocalTime("04:11:26"), null) + .addRow(null, new LocalTime("04:11:27"), null) + .addRow(null, new LocalTime("04:24:31"), null) + .addRow(null, new LocalTime("04:24:32"), null) + .addRow(null, new LocalTime("04:24:37"), null) + .addRow(null, new LocalTime("04:24:39"), null) + .addRow(null, new LocalTime("04:24:41"), null) + .addRow(null, new LocalTime("04:32:22"), null) + .addRow(null, new LocalTime("04:34:37"), null) + .build(); + + new RowSetComparison(expected).verifyAndClearAll(results); + } + + + @Test + public void testExplicitSomeQueryWithCompressedFile() throws Exception { + generateCompressedFile("httpd/hackers-access-small.httpd", "zip", "httpd/hackers-access-small.httpd.zip" ); + + String sql = "SELECT request_referer_ref, request_receive_time_last_time, request_firstline_uri_protocol FROM dfs.`httpd/hackers-access-small.httpd.zip`"; + + RowSet results = client.queryBuilder().sql(sql).rowSet(); + + TupleMetadata expectedSchema = new SchemaBuilder() + .addNullable("request_referer_ref", MinorType.VARCHAR) + .addNullable("request_receive_time_last_time", MinorType.TIME) + .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR) + .buildSchema(); + + RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema) + .addRow(null, new LocalTime("04:11:25"), null) + .addRow(null, new LocalTime("04:11:26"), null) + .addRow(null, new LocalTime("04:11:27"), null) + .addRow(null, new LocalTime("04:24:31"), null) + .addRow(null, new LocalTime("04:24:32"), null) + .addRow(null, new LocalTime("04:24:37"), null) + .addRow(null, new LocalTime("04:24:39"), null) + .addRow(null, new LocalTime("04:24:41"), null) + .addRow(null, new LocalTime("04:32:22"), null) + .addRow(null, new LocalTime("04:34:37"), null) + .build(); + + new RowSetComparison(expected).verifyAndClearAll(results); + } + + @Test + public void testStarRowSet() throws Exception { + String sql = "SELECT * FROM cp.`httpd/hackers-access-really-small.httpd`"; + + RowSet results = client.queryBuilder().sql(sql).rowSet(); + + TupleMetadata expectedSchema = new SchemaBuilder() + .addNullable("request_referer_ref", MinorType.VARCHAR) + .addNullable("request_receive_time_last_time", MinorType.TIME) + .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR) + .addNullable("request_receive_time_microsecond", MinorType.BIGINT) + .addNullable("request_receive_time_last_microsecond__utc", MinorType.BIGINT) + .addNullable("request_firstline_original_protocol", MinorType.VARCHAR) + .addNullable("request_firstline_original_uri_host", MinorType.VARCHAR) + .addNullable("request_referer_host", MinorType.VARCHAR) + .addNullable("request_receive_time_month__utc", MinorType.BIGINT) + .addNullable("request_receive_time_last_minute", MinorType.BIGINT) + .addNullable("request_firstline_protocol_version", MinorType.VARCHAR) + .addNullable("request_receive_time_time__utc", MinorType.TIME) + .addNullable("request_referer_last_ref", MinorType.VARCHAR) + .addNullable("request_receive_time_last_timezone", MinorType.VARCHAR) + .addNullable("request_receive_time_last_weekofweekyear", MinorType.BIGINT) + .addNullable("request_referer_last", MinorType.VARCHAR) + .addNullable("request_receive_time_minute", MinorType.BIGINT) + .addNullable("connection_client_host_last", MinorType.VARCHAR) + .addNullable("request_receive_time_last_millisecond__utc", MinorType.BIGINT) + .addNullable("request_firstline_original_uri", MinorType.VARCHAR) + .addNullable("request_firstline", MinorType.VARCHAR) + .addNullable("request_receive_time_nanosecond", MinorType.BIGINT) + .addNullable("request_receive_time_last_millisecond", MinorType.BIGINT) + .addNullable("request_receive_time_day", MinorType.BIGINT) + .addNullable("request_referer_port", MinorType.BIGINT) + .addNullable("request_firstline_original_uri_port", MinorType.BIGINT) + .addNullable("request_receive_time_year", MinorType.BIGINT) + .addNullable("request_receive_time_last_date", MinorType.DATE) + .addNullable("request_receive_time_last_time__utc", MinorType.TIME) + .addNullable("request_receive_time_last_hour__utc", MinorType.BIGINT) + .addNullable("request_firstline_original_protocol_version", MinorType.VARCHAR) + .addNullable("request_firstline_original_method", MinorType.VARCHAR) + .addNullable("request_receive_time_last_year__utc", MinorType.BIGINT) + .addNullable("request_firstline_uri", MinorType.VARCHAR) + .addNullable("request_referer_last_host", MinorType.VARCHAR) + .addNullable("request_receive_time_last_minute__utc", MinorType.BIGINT) + .addNullable("request_receive_time_weekofweekyear", MinorType.BIGINT) + .addNullable("request_firstline_uri_userinfo", MinorType.VARCHAR) + .addNullable("request_receive_time_epoch", MinorType.TIMESTAMP) + .addNullable("connection_client_logname", MinorType.BIGINT) + .addNullable("response_body_bytes", MinorType.BIGINT) + .addNullable("request_receive_time_nanosecond__utc", MinorType.BIGINT) + .addNullable("request_firstline_protocol", MinorType.VARCHAR) + .addNullable("request_receive_time_microsecond__utc", MinorType.BIGINT) + .addNullable("request_receive_time_hour", MinorType.BIGINT) + .addNullable("request_firstline_uri_host", MinorType.VARCHAR) + .addNullable("request_referer_last_port", MinorType.BIGINT) + .addNullable("request_receive_time_last_epoch", MinorType.TIMESTAMP) + .addNullable("request_receive_time_last_weekyear__utc", MinorType.BIGINT) + .addNullable("request_user-agent", MinorType.VARCHAR) + .addNullable("request_receive_time_weekyear", MinorType.BIGINT) + .addNullable("request_receive_time_timezone", MinorType.VARCHAR) + .addNullable("response_body_bytesclf", MinorType.BIGINT) + .addNullable("request_receive_time_last_date__utc", MinorType.DATE) + .addNullable("request_receive_time_millisecond__utc", MinorType.BIGINT) + .addNullable("request_referer_last_protocol", MinorType.VARCHAR) + .addNullable("request_firstline_uri_query", MinorType.VARCHAR) + .addNullable("request_receive_time_minute__utc", MinorType.BIGINT) + .addNullable("request_firstline_original_uri_protocol", MinorType.VARCHAR) + .addNullable("request_referer_query", MinorType.VARCHAR) + .addNullable("request_receive_time_date", MinorType.DATE) + .addNullable("request_firstline_uri_port", MinorType.BIGINT) + .addNullable("request_receive_time_last_second__utc", MinorType.BIGINT) + .addNullable("request_referer_last_userinfo", MinorType.VARCHAR) + .addNullable("request_receive_time_last_second", MinorType.BIGINT) + .addNullable("request_receive_time_last_monthname__utc", MinorType.VARCHAR) + .addNullable("request_firstline_method", MinorType.VARCHAR) + .addNullable("request_receive_time_last_month__utc", MinorType.BIGINT) + .addNullable("request_receive_time_millisecond", MinorType.BIGINT) + .addNullable("request_receive_time_day__utc", MinorType.BIGINT) + .addNullable("request_receive_time_year__utc", MinorType.BIGINT) + .addNullable("request_receive_time_weekofweekyear__utc", MinorType.BIGINT) + .addNullable("request_receive_time_second", MinorType.BIGINT) + .addNullable("request_firstline_original_uri_ref", MinorType.VARCHAR) + .addNullable("connection_client_logname_last", MinorType.BIGINT) + .addNullable("request_receive_time_last_year", MinorType.BIGINT) + .addNullable("request_firstline_original_uri_path", MinorType.VARCHAR) + .addNullable("connection_client_host", MinorType.VARCHAR) + .addNullable("request_firstline_original_uri_query", MinorType.VARCHAR) + .addNullable("request_referer_userinfo", MinorType.VARCHAR) + .addNullable("request_receive_time_last_monthname", MinorType.VARCHAR) + .addNullable("request_referer_path", MinorType.VARCHAR) + .addNullable("request_receive_time_monthname", MinorType.VARCHAR) + .addNullable("request_receive_time_last_month", MinorType.BIGINT) + .addNullable("request_referer_last_query", MinorType.VARCHAR) + .addNullable("request_firstline_uri_ref", MinorType.VARCHAR) + .addNullable("request_receive_time_last_day", MinorType.BIGINT) + .addNullable("request_receive_time_time", MinorType.TIME) + .addNullable("request_status_original", MinorType.VARCHAR) + .addNullable("request_receive_time_last_weekofweekyear__utc", MinorType.BIGINT) + .addNullable("request_user-agent_last", MinorType.VARCHAR) + .addNullable("request_receive_time_last_weekyear", MinorType.BIGINT) + .addNullable("request_receive_time_last_microsecond", MinorType.BIGINT) + .addNullable("request_firstline_original", MinorType.VARCHAR) + .addNullable("request_status", MinorType.VARCHAR) + .addNullable("request_referer_last_path", MinorType.VARCHAR) + .addNullable("request_receive_time_month", MinorType.BIGINT) + .addNullable("request_referer", MinorType.VARCHAR) + .addNullable("request_receive_time_last_day__utc", MinorType.BIGINT) + .addNullable("request_referer_protocol", MinorType.VARCHAR) + .addNullable("request_receive_time_monthname__utc", MinorType.VARCHAR) + .addNullable("response_body_bytes_last", MinorType.BIGINT) + .addNullable("request_receive_time", MinorType.TIMESTAMP) + .addNullable("request_receive_time_last_nanosecond", MinorType.BIGINT) + .addNullable("request_firstline_uri_path", MinorType.VARCHAR) + .addNullable("request_firstline_original_uri_userinfo", MinorType.VARCHAR) + .addNullable("request_receive_time_date__utc", MinorType.DATE) + .addNullable("request_receive_time_last", MinorType.TIMESTAMP) + .addNullable("request_receive_time_last_nanosecond__utc", MinorType.BIGINT) + .addNullable("request_receive_time_last_hour", MinorType.BIGINT) + .addNullable("request_receive_time_hour__utc", MinorType.BIGINT) + .addNullable("request_receive_time_second__utc", MinorType.BIGINT) + .addNullable("connection_client_user_last", MinorType.VARCHAR) + .addNullable("request_receive_time_weekyear__utc", MinorType.BIGINT) + .addNullable("connection_client_user", MinorType.VARCHAR) + .add("request_firstline_original_uri_query_$", MinorType.MAP) + .add("request_referer_query_$", MinorType.MAP) + .add("request_referer_last_query_$", MinorType.MAP) + .add("request_firstline_uri_query_$", MinorType.MAP) + .build(); + + RowSet expected = client.rowSetBuilder(expectedSchema) + .addRow(null, new LocalTime("04:11:25"), null, 0, 0, "HTTP", null, "howto.basjes.nl", 10, 11, "1.1", new LocalTime("03:11:25"), null, "+01:00", 43, "http://howto.basjes" + + ".nl/", + 11, "195.154.46.135", 0, + "/linux/doing-pxe-without-dhcp-control", "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1", 0, 0, 25, null, null, 2015, new LocalDate("2015-10-25"), new LocalTime("03" + + ":11:25"), + 3, "1" + + ".1", "GET", + 2015, "/linux/doing-pxe-without-dhcp-control", "howto.basjes.nl", 11, 43, null, 1445742685000L, null, 24323, 0, "HTTP", 0, 4, null, null, 1445742685000L, 2015, "Mozilla" + + "/5" + + ".0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, "+01:00", 24323, new LocalDate("2015-10-25"), 0, "http", null, 11, null, null, new LocalDate("2015-10" + + "-25"), null, 25, + null, 25, + "October", "GET", 10, 0, 25, 2015, 43, 25, null, null, 2015, "/linux/doing-pxe-without-dhcp-control", "195.154.46.135", null, null, "October", "/", "October", 10, null, + null, 25, new LocalTime("04:11:25"), "200", 43, "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, 0, "GET /linux/doing-pxe-without-dhcp-control " + + "HTTP/1.1", "200", "/", + 10, "http://howto.basjes.nl/", 25, "http", "October", 24323, 1445742685000L, 0, "/linux/doing-pxe-without-dhcp-control", null, new LocalDate("2015-10-25"), 1445742685000L, + 0, 4, 3, 25, null, 2015, null, mapArray(), mapArray(), mapArray(), mapArray()) + .build(); + + new RowSetComparison(expected).verifyAndClearAll(results); + } + + @Test + public void testExplicitAllFields() throws Exception { + String sql = "SELECT `request_referer_ref`, `request_receive_time_last_time`, `request_firstline_uri_protocol`, `request_receive_time_microsecond`, `request_receive_time_last_microsecond__utc`, `request_firstline_original_protocol`, `request_firstline_original_uri_host`, `request_referer_host`, `request_receive_time_month__utc`, `request_receive_time_last_minute`, `request_firstline_protocol_version`, `request_receive_time_time__utc`, `request_referer_last_ref`, `request_receive_time_last_timezone`, `request_receive_time_last_weekofweekyear`, `request_referer_last`, `request_receive_time_minute`, `connection_client_host_last`, `request_receive_time_last_millisecond__utc`, `request_firstline_original_uri`, `request_firstline`, `request_receive_time_nanosecond`, `request_receive_time_last_millisecond`, `request_receive_time_day`, `request_referer_port`, `request_firstline_original_uri_port`, `request_receive_time_year`, `request_receive_time_last_date`, `request_receive_time_last_time__utc`, `request_receive_time_last_hour__utc`, `request_firstline_original_protocol_version`, `request_firstline_original_method`, `request_receive_time_last_year__utc`, `request_firstline_uri`, `request_referer_last_host`, `request_receive_time_last_minute__utc`, `request_receive_time_weekofweekyear`, `request_firstline_uri_userinfo`, `request_receive_time_epoch`, `connection_client_logname`, `response_body_bytes`, `request_receive_time_nanosecond__utc`, `request_firstline_protocol`, `request_receive_time_microsecond__utc`, `request_receive_time_hour`, `request_firstline_uri_host`, `request_referer_last_port`, `request_receive_time_last_epoch`, `request_receive_time_last_weekyear__utc`, `request_user-agent`, `request_receive_time_weekyear`, `request_receive_time_timezone`, `response_body_bytesclf`, `request_receive_time_last_date__utc`, `request_receive_time_millisecond__utc`, `request_referer_last_protocol`, `request_firstline_uri_query`, `request_receive_time_minute__utc`, `request_firstline_original_uri_protocol`, `request_referer_query`, `request_receive_time_date`, `request_firstline_uri_port`, `request_receive_time_last_second__utc`, `request_referer_last_userinfo`, `request_receive_time_last_second`, `request_receive_time_last_monthname__utc`, `request_firstline_method`, `request_receive_time_last_month__utc`, `request_receive_time_millisecond`, `request_receive_time_day__utc`, `request_receive_time_year__utc`, `request_receive_time_weekofweekyear__utc`, `request_receive_time_second`, `request_firstline_original_uri_ref`, `connection_client_logname_last`, `request_receive_time_last_year`, `request_firstline_original_uri_path`, `connection_client_host`, `request_firstline_original_uri_query`, `request_referer_userinfo`, `request_receive_time_last_monthname`, `request_referer_path`, `request_receive_time_monthname`, `request_receive_time_last_month`, `request_referer_last_query`, `request_firstline_uri_ref`, `request_receive_time_last_day`, `request_receive_time_time`, `request_status_original`, `request_receive_time_last_weekofweekyear__utc`, `request_user-agent_last`, `request_receive_time_last_weekyear`, `request_receive_time_last_microsecond`, `request_firstline_original`, `request_status`, `request_referer_last_path`, `request_receive_time_month`, `request_receive_time_last_day__utc`, `request_referer`, `request_referer_protocol`, `request_receive_time_monthname__utc`, `response_body_bytes_last`, `request_receive_time`, `request_receive_time_last_nanosecond`, `request_firstline_uri_path`, `request_firstline_original_uri_userinfo`, `request_receive_time_date__utc`, `request_receive_time_last`, `request_receive_time_last_nanosecond__utc`, `request_receive_time_last_hour`, `request_receive_time_hour__utc`, `request_receive_time_second__utc`, `connection_client_user_last`, `request_receive_time_weekyear__utc`, `connection_client_user`, `request_firstline_original_uri_query_$`, `request_referer_query_$`, `request_referer_last_query_$`, `request_firstline_uri_query_$` FROM cp.`httpd/hackers-access-really-small.httpd`"; + + RowSet results = client.queryBuilder().sql(sql).rowSet(); + + TupleMetadata expectedSchema = new SchemaBuilder() + .addNullable("request_referer_ref", MinorType.VARCHAR) + .addNullable("request_receive_time_last_time", MinorType.TIME) + .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR) + .addNullable("request_receive_time_microsecond", MinorType.BIGINT) + .addNullable("request_receive_time_last_microsecond__utc", MinorType.BIGINT) + .addNullable("request_firstline_original_protocol", MinorType.VARCHAR) + .addNullable("request_firstline_original_uri_host", MinorType.VARCHAR) + .addNullable("request_referer_host", MinorType.VARCHAR) + .addNullable("request_receive_time_month__utc", MinorType.BIGINT) + .addNullable("request_receive_time_last_minute", MinorType.BIGINT) + .addNullable("request_firstline_protocol_version", MinorType.VARCHAR) + .addNullable("request_receive_time_time__utc", MinorType.TIME) + .addNullable("request_referer_last_ref", MinorType.VARCHAR) + .addNullable("request_receive_time_last_timezone", MinorType.VARCHAR) + .addNullable("request_receive_time_last_weekofweekyear", MinorType.BIGINT) + .addNullable("request_referer_last", MinorType.VARCHAR) + .addNullable("request_receive_time_minute", MinorType.BIGINT) + .addNullable("connection_client_host_last", MinorType.VARCHAR) + .addNullable("request_receive_time_last_millisecond__utc", MinorType.BIGINT) + .addNullable("request_firstline_original_uri", MinorType.VARCHAR) + .addNullable("request_firstline", MinorType.VARCHAR) + .addNullable("request_receive_time_nanosecond", MinorType.BIGINT) + .addNullable("request_receive_time_last_millisecond", MinorType.BIGINT) + .addNullable("request_receive_time_day", MinorType.BIGINT) + .addNullable("request_referer_port", MinorType.BIGINT) + .addNullable("request_firstline_original_uri_port", MinorType.BIGINT) + .addNullable("request_receive_time_year", MinorType.BIGINT) + .addNullable("request_receive_time_last_date", MinorType.DATE) + .addNullable("request_receive_time_last_time__utc", MinorType.TIME) + .addNullable("request_receive_time_last_hour__utc", MinorType.BIGINT) + .addNullable("request_firstline_original_protocol_version", MinorType.VARCHAR) + .addNullable("request_firstline_original_method", MinorType.VARCHAR) + .addNullable("request_receive_time_last_year__utc", MinorType.BIGINT) + .addNullable("request_firstline_uri", MinorType.VARCHAR) + .addNullable("request_referer_last_host", MinorType.VARCHAR) + .addNullable("request_receive_time_last_minute__utc", MinorType.BIGINT) + .addNullable("request_receive_time_weekofweekyear", MinorType.BIGINT) + .addNullable("request_firstline_uri_userinfo", MinorType.VARCHAR) + .addNullable("request_receive_time_epoch", MinorType.TIMESTAMP) + .addNullable("connection_client_logname", MinorType.BIGINT) + .addNullable("response_body_bytes", MinorType.BIGINT) + .addNullable("request_receive_time_nanosecond__utc", MinorType.BIGINT) + .addNullable("request_firstline_protocol", MinorType.VARCHAR) + .addNullable("request_receive_time_microsecond__utc", MinorType.BIGINT) + .addNullable("request_receive_time_hour", MinorType.BIGINT) + .addNullable("request_firstline_uri_host", MinorType.VARCHAR) + .addNullable("request_referer_last_port", MinorType.BIGINT) + .addNullable("request_receive_time_last_epoch", MinorType.TIMESTAMP) + .addNullable("request_receive_time_last_weekyear__utc", MinorType.BIGINT) + .addNullable("request_user-agent", MinorType.VARCHAR) + .addNullable("request_receive_time_weekyear", MinorType.BIGINT) + .addNullable("request_receive_time_timezone", MinorType.VARCHAR) + .addNullable("response_body_bytesclf", MinorType.BIGINT) + .addNullable("request_receive_time_last_date__utc", MinorType.DATE) + .addNullable("request_receive_time_millisecond__utc", MinorType.BIGINT) + .addNullable("request_referer_last_protocol", MinorType.VARCHAR) + .addNullable("request_firstline_uri_query", MinorType.VARCHAR) + .addNullable("request_receive_time_minute__utc", MinorType.BIGINT) + .addNullable("request_firstline_original_uri_protocol", MinorType.VARCHAR) + .addNullable("request_referer_query", MinorType.VARCHAR) + .addNullable("request_receive_time_date", MinorType.DATE) + .addNullable("request_firstline_uri_port", MinorType.BIGINT) + .addNullable("request_receive_time_last_second__utc", MinorType.BIGINT) + .addNullable("request_referer_last_userinfo", MinorType.VARCHAR) + .addNullable("request_receive_time_last_second", MinorType.BIGINT) + .addNullable("request_receive_time_last_monthname__utc", MinorType.VARCHAR) + .addNullable("request_firstline_method", MinorType.VARCHAR) + .addNullable("request_receive_time_last_month__utc", MinorType.BIGINT) + .addNullable("request_receive_time_millisecond", MinorType.BIGINT) + .addNullable("request_receive_time_day__utc", MinorType.BIGINT) + .addNullable("request_receive_time_year__utc", MinorType.BIGINT) + .addNullable("request_receive_time_weekofweekyear__utc", MinorType.BIGINT) + .addNullable("request_receive_time_second", MinorType.BIGINT) + .addNullable("request_firstline_original_uri_ref", MinorType.VARCHAR) + .addNullable("connection_client_logname_last", MinorType.BIGINT) + .addNullable("request_receive_time_last_year", MinorType.BIGINT) + .addNullable("request_firstline_original_uri_path", MinorType.VARCHAR) + .addNullable("connection_client_host", MinorType.VARCHAR) + .addNullable("request_firstline_original_uri_query", MinorType.VARCHAR) + .addNullable("request_referer_userinfo", MinorType.VARCHAR) + .addNullable("request_receive_time_last_monthname", MinorType.VARCHAR) + .addNullable("request_referer_path", MinorType.VARCHAR) + .addNullable("request_receive_time_monthname", MinorType.VARCHAR) + .addNullable("request_receive_time_last_month", MinorType.BIGINT) + .addNullable("request_referer_last_query", MinorType.VARCHAR) + .addNullable("request_firstline_uri_ref", MinorType.VARCHAR) + .addNullable("request_receive_time_last_day", MinorType.BIGINT) + .addNullable("request_receive_time_time", MinorType.TIME) + .addNullable("request_status_original", MinorType.VARCHAR) + .addNullable("request_receive_time_last_weekofweekyear__utc", MinorType.BIGINT) + .addNullable("request_user-agent_last", MinorType.VARCHAR) + .addNullable("request_receive_time_last_weekyear", MinorType.BIGINT) + .addNullable("request_receive_time_last_microsecond", MinorType.BIGINT) + .addNullable("request_firstline_original", MinorType.VARCHAR) + .addNullable("request_status", MinorType.VARCHAR) + .addNullable("request_referer_last_path", MinorType.VARCHAR) + .addNullable("request_receive_time_month", MinorType.BIGINT) + .addNullable("request_receive_time_last_day__utc", MinorType.BIGINT) + .addNullable("request_referer", MinorType.VARCHAR) + .addNullable("request_referer_protocol", MinorType.VARCHAR) + .addNullable("request_receive_time_monthname__utc", MinorType.VARCHAR) + .addNullable("response_body_bytes_last", MinorType.BIGINT) + .addNullable("request_receive_time", MinorType.TIMESTAMP) + .addNullable("request_receive_time_last_nanosecond", MinorType.BIGINT) + .addNullable("request_firstline_uri_path", MinorType.VARCHAR) + .addNullable("request_firstline_original_uri_userinfo", MinorType.VARCHAR) + .addNullable("request_receive_time_date__utc", MinorType.DATE) + .addNullable("request_receive_time_last", MinorType.TIMESTAMP) + .addNullable("request_receive_time_last_nanosecond__utc", MinorType.BIGINT) + .addNullable("request_receive_time_last_hour", MinorType.BIGINT) + .addNullable("request_receive_time_hour__utc", MinorType.BIGINT) + .addNullable("request_receive_time_second__utc", MinorType.BIGINT) + .addNullable("connection_client_user_last", MinorType.VARCHAR) + .addNullable("request_receive_time_weekyear__utc", MinorType.BIGINT) + .addNullable("connection_client_user", MinorType.VARCHAR) + .add("request_firstline_original_uri_query_$", MinorType.MAP) + .add("request_referer_query_$", MinorType.MAP) + .add("request_referer_last_query_$", MinorType.MAP) + .add("request_firstline_uri_query_$", MinorType.MAP) + .build(); + + RowSet expected = client.rowSetBuilder(expectedSchema) + .addRow(null, new LocalTime("04:11:25"), null, 0, 0, "HTTP", null, "howto.basjes.nl", 10, 11, "1.1", new LocalTime("03:11:25"), null, "+01:00", 43, "http://howto.basjes.nl/", + 11, "195.154.46.135", 0, + "/linux/doing-pxe-without-dhcp-control", "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1", 0, 0, 25, null, null, 2015, new LocalDate("2015-10-25"), new LocalTime("03" + + ":11:25"), 3, "1" + + ".1", "GET", + 2015, "/linux/doing-pxe-without-dhcp-control", "howto.basjes.nl", 11, 43, null, 1445742685000L, null, 24323, 0, "HTTP", 0, 4, null, null, 1445742685000L, 2015, "Mozilla" + + "/5" + + ".0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, "+01:00", 24323, new LocalDate("2015-10-25"), 0, "http", null, 11, null, null, new LocalDate("2015-10" + + "-25"), null, 25, null, 25, + "October", "GET", 10, 0, 25, 2015, 43, 25, null, null, 2015, "/linux/doing-pxe-without-dhcp-control", "195.154.46.135", null, null, "October", "/", "October", 10, null, + null, 25, new LocalTime("04:11:25"), "200", 43, "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, 0, "GET /linux/doing-pxe-without-dhcp-control " + + "HTTP/1.1", "200", "/", + 10, 25, "http://howto.basjes.nl/", "http", "October", 24323, 1445742685000L, 0, "/linux/doing-pxe-without-dhcp-control", null, new LocalDate("2015-10-25"), 1445742685000L, + 0, 4, 3, 25, null, 2015, null, mapArray(), mapArray(), mapArray(), mapArray()) + .build(); + + new RowSetComparison(expected).verifyAndClearAll(results); + } + + @Test + public void testInvalidFormat() throws Exception { + String sql = "SELECT * FROM cp.`httpd/dfs-bootstrap.httpd`"; + try { + run(sql); + fail(); + } catch (DrillRuntimeException e) { + assertTrue(e.getMessage().contains("Error reading HTTPD file ")); + } + } +} diff --git a/exec/java-exec/src/test/resources/store/httpd/dfs-bootstrap.httpd b/contrib/format-httpd/src/test/resources/httpd/dfs-bootstrap.httpd similarity index 100% rename from exec/java-exec/src/test/resources/store/httpd/dfs-bootstrap.httpd rename to contrib/format-httpd/src/test/resources/httpd/dfs-bootstrap.httpd diff --git a/exec/java-exec/src/test/resources/store/httpd/example1.httpd b/contrib/format-httpd/src/test/resources/httpd/example1.httpd similarity index 100% rename from exec/java-exec/src/test/resources/store/httpd/example1.httpd rename to contrib/format-httpd/src/test/resources/httpd/example1.httpd diff --git a/contrib/format-httpd/src/test/resources/httpd/hackers-access-really-small.httpd b/contrib/format-httpd/src/test/resources/httpd/hackers-access-really-small.httpd new file mode 100644 index 00000000000..decb3c2ee40 --- /dev/null +++ b/contrib/format-httpd/src/test/resources/httpd/hackers-access-really-small.httpd @@ -0,0 +1 @@ +195.154.46.135 - - [25/Oct/2015:04:11:25 +0100] "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1" 200 24323 "http://howto.basjes.nl/" "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0" diff --git a/exec/java-exec/src/test/resources/httpd/hackers-access-small.httpd b/contrib/format-httpd/src/test/resources/httpd/hackers-access-small.httpd similarity index 100% rename from exec/java-exec/src/test/resources/httpd/hackers-access-small.httpd rename to contrib/format-httpd/src/test/resources/httpd/hackers-access-small.httpd diff --git a/contrib/format-httpd/src/test/resources/logback-test.txt b/contrib/format-httpd/src/test/resources/logback-test.txt new file mode 100644 index 00000000000..2adcf8105a2 --- /dev/null +++ b/contrib/format-httpd/src/test/resources/logback-test.txt @@ -0,0 +1,65 @@ + + + + + + + true + 10000 + true + ${LILITH_HOSTNAME:-localhost} + + + + + + + + + + + + + + + + + + + + + + %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n + + + + + + + + + + \ No newline at end of file diff --git a/contrib/pom.xml b/contrib/pom.xml index 22393e0f4e2..f5f60eeb571 100644 --- a/contrib/pom.xml +++ b/contrib/pom.xml @@ -46,6 +46,7 @@ format-syslog format-ltsv format-excel + format-httpd format-esri format-hdf5 format-spss diff --git a/contrib/udfs/pom.xml b/contrib/udfs/pom.xml index f41d35bd77d..a22000544f8 100644 --- a/contrib/udfs/pom.xml +++ b/contrib/udfs/pom.xml @@ -66,7 +66,7 @@ nl.basjes.parse.useragent yauaa - 5.16 + 5.19 diff --git a/distribution/pom.xml b/distribution/pom.xml index 6a1b29f14dd..c6ebecbe061 100644 --- a/distribution/pom.xml +++ b/distribution/pom.xml @@ -342,6 +342,11 @@ drill-format-syslog ${project.version} + + org.apache.drill.contrib + drill-format-httpd + ${project.version} + org.apache.drill.contrib drill-format-hdf5 diff --git a/distribution/src/assemble/component.xml b/distribution/src/assemble/component.xml index b9a2fce4912..2148fb8d588 100644 --- a/distribution/src/assemble/component.xml +++ b/distribution/src/assemble/component.xml @@ -46,6 +46,7 @@ org.apache.drill.contrib:drill-format-esri:jar org.apache.drill.contrib:drill-format-hdf5:jar org.apache.drill.contrib:drill-format-ltsv:jar + org.apache.drill.contrib:drill-format-httpd:jar org.apache.drill.contrib:drill-format-excel:jar org.apache.drill.contrib:drill-format-spss:jar org.apache.drill.contrib:drill-jdbc-storage:jar diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java deleted file mode 100644 index 7bcb0a4d96a..00000000000 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java +++ /dev/null @@ -1,252 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.drill.exec.store.httpd; - -import java.io.IOException; -import java.util.HashMap; -import java.util.List; - -import nl.basjes.parse.core.exceptions.DissectionFailure; -import nl.basjes.parse.core.exceptions.InvalidDissectorException; -import nl.basjes.parse.core.exceptions.MissingDissectorsException; - -import org.apache.drill.common.exceptions.ExecutionSetupException; -import org.apache.drill.common.exceptions.UserException; -import org.apache.drill.common.expression.SchemaPath; -import org.apache.drill.common.logical.StoragePluginConfig; -import org.apache.drill.exec.ExecConstants; -import org.apache.drill.exec.ops.FragmentContext; -import org.apache.drill.exec.ops.OperatorContext; -import org.apache.drill.exec.physical.impl.OutputMutator; -import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType; -import org.apache.drill.exec.planner.common.DrillStatsTable.TableStatistics; -import org.apache.drill.exec.server.DrillbitContext; -import org.apache.drill.exec.store.AbstractRecordReader; -import org.apache.drill.exec.store.RecordWriter; -import org.apache.drill.exec.store.dfs.DrillFileSystem; -import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin; -import org.apache.drill.exec.store.dfs.easy.EasyWriter; -import org.apache.drill.exec.store.dfs.easy.FileWork; -import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter; -import org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter; -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.fs.FileSystem; -import org.apache.hadoop.fs.Path; -import org.apache.hadoop.io.LongWritable; -import org.apache.hadoop.io.Text; -import org.apache.hadoop.mapred.FileSplit; -import org.apache.hadoop.mapred.JobConf; -import org.apache.hadoop.mapred.LineRecordReader; -import org.apache.hadoop.mapred.Reporter; -import org.apache.hadoop.mapred.TextInputFormat; -import java.util.Collections; -import java.util.Map; -import org.apache.drill.exec.store.RecordReader; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -public class HttpdLogFormatPlugin extends EasyFormatPlugin { - private static final Logger logger = LoggerFactory.getLogger(HttpdLogFormatPlugin.class); - - private static final String PLUGIN_EXTENSION = "httpd"; - private static final int VECTOR_MEMORY_ALLOCATION = 4095; - - public HttpdLogFormatPlugin(final String name, final DrillbitContext context, final Configuration fsConf, - final StoragePluginConfig storageConfig, final HttpdLogFormatConfig formatConfig) { - - super(name, context, fsConf, storageConfig, formatConfig, true, false, true, true, - Collections.singletonList(PLUGIN_EXTENSION), PLUGIN_EXTENSION); - } - - @Override - public boolean supportsStatistics() { - return false; - } - - @Override - public TableStatistics readStatistics(FileSystem fs, Path statsTablePath) { - throw new UnsupportedOperationException("unimplemented"); - } - - @Override - public void writeStatistics(TableStatistics statistics, FileSystem fs, Path statsTablePath) { - throw new UnsupportedOperationException("unimplemented"); - } - - /** - * Reads httpd logs lines terminated with a newline character. - */ - private class HttpdLogRecordReader extends AbstractRecordReader { - - private final DrillFileSystem fs; - private final FileWork work; - private final FragmentContext fragmentContext; - private ComplexWriter writer; - private HttpdParser parser; - private LineRecordReader lineReader; - private LongWritable lineNumber; - - public HttpdLogRecordReader(final FragmentContext context, final DrillFileSystem fs, final FileWork work, final List columns) { - this.fs = fs; - this.work = work; - this.fragmentContext = context; - setColumns(columns); - } - - /** - * The query fields passed in are formatted in a way that Drill requires. - * Those must be cleaned up to work with the parser. - * - * @return Map with Drill field names as a key and Parser Field names as a - * value - */ - private Map makeParserFields() { - Map fieldMapping = new HashMap<>(); - for (final SchemaPath sp : getColumns()) { - String drillField = sp.getRootSegment().getPath(); - try { - String parserField = HttpdParser.parserFormattedFieldName(drillField); - fieldMapping.put(drillField, parserField); - } catch (Exception e) { - logger.info("Putting field: {} into map", drillField, e); - } - } - return fieldMapping; - } - - @Override - public void setup(final OperatorContext context, final OutputMutator output) throws ExecutionSetupException { - try { - /* - * Extract the list of field names for the parser to use if it is NOT a star query. If it is a star query just - * pass through an empty map, because the parser is going to have to build all possibilities. - */ - final Map fieldMapping = !isStarQuery() ? makeParserFields() : null; - writer = new VectorContainerWriter(output); - - parser = new HttpdParser(writer.rootAsMap(), context.getManagedBuffer(), - HttpdLogFormatPlugin.this.getConfig().getLogFormat(), - HttpdLogFormatPlugin.this.getConfig().getTimestampFormat(), - fieldMapping); - - final Path path = fs.makeQualified(work.getPath()); - FileSplit split = new FileSplit(path, work.getStart(), work.getLength(), new String[]{""}); - TextInputFormat inputFormat = new TextInputFormat(); - JobConf job = new JobConf(fs.getConf()); - job.setInt("io.file.buffer.size", fragmentContext.getConfig().getInt(ExecConstants.TEXT_LINE_READER_BUFFER_SIZE)); - job.setInputFormat(inputFormat.getClass()); - lineReader = (LineRecordReader) inputFormat.getRecordReader(split, job, Reporter.NULL); - lineNumber = lineReader.createKey(); - } catch (NoSuchMethodException | MissingDissectorsException | InvalidDissectorException e) { - throw handleAndGenerate("Failure creating HttpdParser", e); - } catch (IOException e) { - throw handleAndGenerate("Failure creating HttpdRecordReader", e); - } - } - - private RuntimeException handleAndGenerate(final String s, final Exception e) { - throw UserException.dataReadError(e) - .message(s + "\n%s", e.getMessage()) - .addContext("Path", work.getPath()) - .addContext("Split Start", work.getStart()) - .addContext("Split Length", work.getLength()) - .addContext("Local Line Number", lineNumber.get()) - .build(logger); - } - - /** - * This record reader is given a batch of records (lines) to read. Next acts upon a batch of records. - * - * @return Number of records in this batch. - */ - @Override - public int next() { - try { - final Text line = lineReader.createValue(); - - writer.allocate(); - writer.reset(); - - int recordCount = 0; - while (recordCount < VECTOR_MEMORY_ALLOCATION && lineReader.next(lineNumber, line)) { - writer.setPosition(recordCount); - parser.parse(line.toString()); - recordCount++; - } - writer.setValueCount(recordCount); - - return recordCount; - } catch (DissectionFailure | InvalidDissectorException | MissingDissectorsException | IOException e) { - throw handleAndGenerate("Failure while parsing log record.", e); - } - } - - @Override - public void close() throws Exception { - try { - if (lineReader != null) { - lineReader.close(); - } - } catch (IOException e) { - logger.warn("Failure while closing Httpd reader.", e); - } - } - - @Override - public String toString() { - return "HttpdLogRecordReader[Path=" + work.getPath() - + ", Start=" + work.getStart() - + ", Length=" + work.getLength() - + ", Line=" + lineNumber.get() - + "]"; - } - } - - /** - * This plugin supports pushing project down into the parser. Only fields - * specifically asked for within the configuration will be parsed. If no - * fields are asked for then all possible fields will be returned. - * - * @return true - */ - @Override - public boolean supportsPushDown() { - return true; - } - - @Override - public RecordReader getRecordReader(final FragmentContext context, final DrillFileSystem dfs, - final FileWork fileWork, final List columns, final String userName) { - return new HttpdLogRecordReader(context, dfs, fileWork, columns); - } - - @Override - public RecordWriter getRecordWriter(final FragmentContext context, final EasyWriter writer) { - throw new UnsupportedOperationException("Drill doesn't currently support writing HTTPd logs"); - } - - @Override - public int getReaderOperatorType() { - return CoreOperatorType.HTPPD_LOG_SUB_SCAN_VALUE; - } - - @Override - public int getWriterOperatorType() { - throw new UnsupportedOperationException(); - } -} diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java deleted file mode 100644 index 45c251de1fd..00000000000 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java +++ /dev/null @@ -1,346 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.drill.exec.store.httpd; - -import org.apache.drill.shaded.guava.com.google.common.base.Charsets; -import org.apache.drill.shaded.guava.com.google.common.collect.Maps; -import io.netty.buffer.DrillBuf; - -import java.util.EnumSet; -import java.util.HashMap; -import java.util.Map; - -import nl.basjes.parse.core.Casts; -import nl.basjes.parse.core.Parser; -import org.apache.drill.exec.vector.complex.writer.BaseWriter.MapWriter; -import org.apache.drill.exec.vector.complex.writer.BigIntWriter; -import org.apache.drill.exec.vector.complex.writer.Float8Writer; -import org.apache.drill.exec.vector.complex.writer.VarCharWriter; -import org.apache.drill.exec.vector.complex.writer.TimeStampWriter; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.text.SimpleDateFormat; -import java.util.Date; - -public class HttpdLogRecord { - - private static final Logger logger = LoggerFactory.getLogger(HttpdLogRecord.class); - - private final Map strings = Maps.newHashMap(); - private final Map longs = Maps.newHashMap(); - private final Map doubles = Maps.newHashMap(); - private final Map times = new HashMap<>(); - private final Map wildcards = Maps.newHashMap(); - private final Map cleanExtensions = Maps.newHashMap(); - private final Map startedWildcards = Maps.newHashMap(); - private final Map wildcardWriters = Maps.newHashMap(); - private final SimpleDateFormat dateFormatter; - private DrillBuf managedBuffer; - private String timeFormat; - - public HttpdLogRecord(final DrillBuf managedBuffer, final String timeFormat) { - this.managedBuffer = managedBuffer; - this.timeFormat = timeFormat; - this.dateFormatter = new SimpleDateFormat(this.timeFormat); - } - - /** - * Call this method after a record has been parsed. This finished the lifecycle of any maps that were written and - * removes all the entries for the next record to be able to work. - */ - public void finishRecord() { - for (MapWriter writer : wildcardWriters.values()) { - writer.end(); - } - wildcardWriters.clear(); - startedWildcards.clear(); - } - - private DrillBuf buf(final int size) { - if (managedBuffer.capacity() < size) { - managedBuffer = managedBuffer.reallocIfNeeded(size); - } - return managedBuffer; - } - - private void writeString(VarCharWriter writer, String value) { - final byte[] stringBytes = value.getBytes(Charsets.UTF_8); - final DrillBuf stringBuffer = buf(stringBytes.length); - stringBuffer.clear(); - stringBuffer.writeBytes(stringBytes); - writer.writeVarChar(0, stringBytes.length, stringBuffer); - } - - /** - * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get - * called when the value of a log field is a String data type. - * - * @param field name of field - * @param value value of field - */ - @SuppressWarnings("unused") - public void set(String field, String value) { - if (value != null) { - final VarCharWriter w = strings.get(field); - if (w != null) { - logger.trace("Parsed field: {}, as string: {}", field, value); - writeString(w, value); - } else { - logger.warn("No 'string' writer found for field: {}", field); - } - } - } - - /** - * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get - * called when the value of a log field is a Long data type. - * - * @param field name of field - * @param value value of field - */ - @SuppressWarnings("unused") - public void set(String field, Long value) { - if (value != null) { - final BigIntWriter w = longs.get(field); - if (w != null) { - logger.trace("Parsed field: {}, as long: {}", field, value); - w.writeBigInt(value); - } else { - logger.warn("No 'long' writer found for field: {}", field); - } - } - } - - /** - * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get - * called when the value of a log field is a timesstamp data type. - * - * @param field name of field - * @param value value of field - */ - @SuppressWarnings("unused") - public void setTimestamp(String field, String value) { - if (value != null) { - //Convert the date string into a long - long ts = 0; - try { - Date d = this.dateFormatter.parse(value); - ts = d.getTime(); - } catch (Exception e) { - //If the date formatter does not successfully create a date, the timestamp will fall back to zero - //Do not throw exception - } - final TimeStampWriter tw = times.get(field); - if (tw != null) { - logger.trace("Parsed field: {}, as time: {}", field, value); - tw.writeTimeStamp(ts); - } else { - logger.warn("No 'timestamp' writer found for field: {}", field); - } - } - } - - /** - * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get - * called when the value of a log field is a Double data type. - * - * @param field name of field - * @param value value of field - */ - @SuppressWarnings("unused") - public void set(String field, Double value) { - if (value != null) { - final Float8Writer w = doubles.get(field); - if (w != null) { - logger.trace("Parsed field: {}, as double: {}", field, value); - w.writeFloat8(value); - } else { - logger.warn("No 'double' writer found for field: {}", field); - } - } - } - - /** - * This method is referenced and called via reflection. When the parser processes a field like: - * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be - * invoked.
- * - * @param field name of field - * @param value value of field - */ - @SuppressWarnings("unused") - public void setWildcard(String field, String value) { - if (value != null) { - final MapWriter mapWriter = getWildcardWriter(field); - logger.trace("Parsed wildcard field: {}, as string: {}", field, value); - final VarCharWriter w = mapWriter.varChar(cleanExtensions.get(field)); - writeString(w, value); - } - } - - /** - * This method is referenced and called via reflection. When the parser processes a field like: - * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be - * invoked.
- * - * @param field name of field - * @param value value of field - */ - @SuppressWarnings("unused") - public void setWildcard(String field, Long value) { - if (value != null) { - final MapWriter mapWriter = getWildcardWriter(field); - logger.trace("Parsed wildcard field: {}, as long: {}", field, value); - final BigIntWriter w = mapWriter.bigInt(cleanExtensions.get(field)); - w.writeBigInt(value); - } - } - - /** - * This method is referenced and called via reflection. When the parser processes a field like: - * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be - * invoked.
- * - * @param field name of field - * @param value value of field - */ - @SuppressWarnings("unused") - public void setWildcard(String field, Double value) { - if (value != null) { - final MapWriter mapWriter = getWildcardWriter(field); - logger.trace("Parsed wildcard field: {}, as double: {}", field, value); - final Float8Writer w = mapWriter.float8(cleanExtensions.get(field)); - w.writeFloat8(value); - } - } - - /** - * For a configuration like HTTP.URI:request.firstline.uri.query.*, a writer was created with name - * HTTP.URI:request.firstline.uri.query, we traverse the list of wildcard writers to see which one is the root of the - * name of the field passed in like HTTP.URI:request.firstline.uri.query.old. This is writer entry that is needed. - * - * @param field like HTTP.URI:request.firstline.uri.query.old where 'old' is one of many different parameter names. - * @return the writer to be used for this field. - */ - private MapWriter getWildcardWriter(String field) { - MapWriter writer = startedWildcards.get(field); - if (writer == null) { - for (Map.Entry entry : wildcards.entrySet()) { - final String root = entry.getKey(); - if (field.startsWith(root)) { - writer = entry.getValue(); - - /** - * In order to save some time, store the cleaned version of the field extension. It is possible it will have - * unsafe characters in it. - */ - if (!cleanExtensions.containsKey(field)) { - final String extension = field.substring(root.length() + 1); - final String cleanExtension = HttpdParser.drillFormattedFieldName(extension); - cleanExtensions.put(field, cleanExtension); - logger.debug("Added extension: field='{}' with cleanExtension='{}'", field, cleanExtension); - } - - /** - * We already know we have the writer, but if we have put this writer in the started list, do NOT call start - * again. - */ - if (!wildcardWriters.containsKey(root)) { - /** - * Start and store this root map writer for later retrieval. - */ - logger.debug("Starting new wildcard field writer: {}", field); - writer.start(); - startedWildcards.put(field, writer); - wildcardWriters.put(root, writer); - } - - /** - * Break out of the for loop when we find a root writer that matches the field. - */ - break; - } - } - } - - return writer; - } - - public Map getStrings() { - return strings; - } - - public Map getLongs() { - return longs; - } - - public Map getDoubles() { - return doubles; - } - - public Map getTimes() { - return times; - } - - /** - * This record will be used with a single parser. For each field that is to be parsed a setter will be called. It - * registers a setter method for each field being parsed. It also builds the data writers to hold the data beings - * parsed. - * - * @param parser - * @param mapWriter - * @param type - * @param parserFieldName - * @param drillFieldName - * @throws NoSuchMethodException - */ - public void addField(final Parser parser, final MapWriter mapWriter, final EnumSet type, final String parserFieldName, final String drillFieldName) throws NoSuchMethodException { - final boolean hasWildcard = parserFieldName.endsWith(HttpdParser.PARSER_WILDCARD); - - /** - * This is a dynamic way to map the setter for each specified field type.
- * e.g. a TIME.STAMP may map to a LONG while a referrer may map to a STRING - */ - if (hasWildcard) { - final String cleanName = parserFieldName.substring(0, parserFieldName.length() - HttpdParser.PARSER_WILDCARD.length()); - logger.debug("Adding WILDCARD parse target: {} as {}, with field name: {}", parserFieldName, cleanName, drillFieldName); - parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, String.class), parserFieldName); - parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, Double.class), parserFieldName); - parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, Long.class), parserFieldName); - wildcards.put(cleanName, mapWriter.map(drillFieldName)); - } else if (type.contains(Casts.DOUBLE)) { - logger.debug("Adding DOUBLE parse target: {}, with field name: {}", parserFieldName, drillFieldName); - parser.addParseTarget(this.getClass().getMethod("set", String.class, Double.class), parserFieldName); - doubles.put(parserFieldName, mapWriter.float8(drillFieldName)); - } else if (type.contains(Casts.LONG)) { - logger.debug("Adding LONG parse target: {}, with field name: {}", parserFieldName, drillFieldName); - parser.addParseTarget(this.getClass().getMethod("set", String.class, Long.class), parserFieldName); - longs.put(parserFieldName, mapWriter.bigInt(drillFieldName)); - } else { - logger.debug("Adding STRING parse target: {}, with field name: {}", parserFieldName, drillFieldName); - if (parserFieldName.startsWith("TIME.STAMP:")) { - parser.addParseTarget(this.getClass().getMethod("setTimestamp", String.class, String.class), parserFieldName); - times.put(parserFieldName, mapWriter.timeStamp(drillFieldName)); - } else { - parser.addParseTarget(this.getClass().getMethod("set", String.class, String.class), parserFieldName); - strings.put(parserFieldName, mapWriter.varChar(drillFieldName)); - } - } - } -} diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java deleted file mode 100644 index 7da7a95d1f5..00000000000 --- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java +++ /dev/null @@ -1,441 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.drill.exec.store.httpd; - -import org.apache.drill.shaded.guava.com.google.common.base.Preconditions; -import org.apache.drill.shaded.guava.com.google.common.collect.Maps; -import io.netty.buffer.DrillBuf; -import nl.basjes.parse.core.Casts; -import nl.basjes.parse.core.Parser; -import nl.basjes.parse.core.exceptions.DissectionFailure; -import nl.basjes.parse.core.exceptions.InvalidDissectorException; -import nl.basjes.parse.core.exceptions.MissingDissectorsException; -import nl.basjes.parse.httpdlog.HttpdLoglineParser; -import org.apache.drill.exec.vector.complex.writer.BaseWriter.MapWriter; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -import java.util.EnumSet; -import java.util.HashMap; -import java.util.List; -import java.util.Map; - -public class HttpdParser { - - private static final Logger logger = LoggerFactory.getLogger(HttpdParser.class); - - public static final String PARSER_WILDCARD = ".*"; - public static final String SAFE_WILDCARD = "_$"; - public static final String SAFE_SEPARATOR = "_"; - public static final String REMAPPING_FLAG = "#"; - private final Parser parser; - private final HttpdLogRecord record; - - public static final HashMap LOGFIELDS = new HashMap(); - - static { - LOGFIELDS.put("connection.client.ip", "IP:connection.client.ip"); - LOGFIELDS.put("connection.client.ip.last", "IP:connection.client.ip.last"); - LOGFIELDS.put("connection.client.ip.original", "IP:connection.client.ip.original"); - LOGFIELDS.put("connection.client.ip.last", "IP:connection.client.ip.last"); - LOGFIELDS.put("connection.client.peerip", "IP:connection.client.peerip"); - LOGFIELDS.put("connection.client.peerip.last", "IP:connection.client.peerip.last"); - LOGFIELDS.put("connection.client.peerip.original", "IP:connection.client.peerip.original"); - LOGFIELDS.put("connection.client.peerip.last", "IP:connection.client.peerip.last"); - LOGFIELDS.put("connection.server.ip", "IP:connection.server.ip"); - LOGFIELDS.put("connection.server.ip.last", "IP:connection.server.ip.last"); - LOGFIELDS.put("connection.server.ip.original", "IP:connection.server.ip.original"); - LOGFIELDS.put("connection.server.ip.last", "IP:connection.server.ip.last"); - LOGFIELDS.put("response.body.bytes", "BYTES:response.body.bytes"); - LOGFIELDS.put("response.body.bytes.last", "BYTES:response.body.bytes.last"); - LOGFIELDS.put("response.body.bytes.original", "BYTES:response.body.bytes.original"); - LOGFIELDS.put("response.body.bytes.last", "BYTES:response.body.bytes.last"); - LOGFIELDS.put("response.body.bytesclf", "BYTES:response.body.bytesclf"); - LOGFIELDS.put("response.body.bytes", "BYTESCLF:response.body.bytes"); - LOGFIELDS.put("response.body.bytes.last", "BYTESCLF:response.body.bytes.last"); - LOGFIELDS.put("response.body.bytes.original", "BYTESCLF:response.body.bytes.original"); - LOGFIELDS.put("response.body.bytes.last", "BYTESCLF:response.body.bytes.last"); - LOGFIELDS.put("request.cookies.foobar", "HTTP.COOKIE:request.cookies.foobar"); - LOGFIELDS.put("server.environment.foobar", "VARIABLE:server.environment.foobar"); - LOGFIELDS.put("server.filename", "FILENAME:server.filename"); - LOGFIELDS.put("server.filename.last", "FILENAME:server.filename.last"); - LOGFIELDS.put("server.filename.original", "FILENAME:server.filename.original"); - LOGFIELDS.put("server.filename.last", "FILENAME:server.filename.last"); - LOGFIELDS.put("connection.client.host", "IP:connection.client.host"); - LOGFIELDS.put("connection.client.host.last", "IP:connection.client.host.last"); - LOGFIELDS.put("connection.client.host.original", "IP:connection.client.host.original"); - LOGFIELDS.put("connection.client.host.last", "IP:connection.client.host.last"); - LOGFIELDS.put("request.protocol", "PROTOCOL:request.protocol"); - LOGFIELDS.put("request.protocol.last", "PROTOCOL:request.protocol.last"); - LOGFIELDS.put("request.protocol.original", "PROTOCOL:request.protocol.original"); - LOGFIELDS.put("request.protocol.last", "PROTOCOL:request.protocol.last"); - LOGFIELDS.put("request.header.foobar", "HTTP.HEADER:request.header.foobar"); - LOGFIELDS.put("request.trailer.foobar", "HTTP.TRAILER:request.trailer.foobar"); - LOGFIELDS.put("connection.keepalivecount", "NUMBER:connection.keepalivecount"); - LOGFIELDS.put("connection.keepalivecount.last", "NUMBER:connection.keepalivecount.last"); - LOGFIELDS.put("connection.keepalivecount.original", "NUMBER:connection.keepalivecount.original"); - LOGFIELDS.put("connection.keepalivecount.last", "NUMBER:connection.keepalivecount.last"); - LOGFIELDS.put("connection.client.logname", "NUMBER:connection.client.logname"); - LOGFIELDS.put("connection.client.logname.last", "NUMBER:connection.client.logname.last"); - LOGFIELDS.put("connection.client.logname.original", "NUMBER:connection.client.logname.original"); - LOGFIELDS.put("connection.client.logname.last", "NUMBER:connection.client.logname.last"); - LOGFIELDS.put("request.errorlogid", "STRING:request.errorlogid"); - LOGFIELDS.put("request.errorlogid.last", "STRING:request.errorlogid.last"); - LOGFIELDS.put("request.errorlogid.original", "STRING:request.errorlogid.original"); - LOGFIELDS.put("request.errorlogid.last", "STRING:request.errorlogid.last"); - LOGFIELDS.put("request.method", "HTTP.METHOD:request.method"); - LOGFIELDS.put("request.method.last", "HTTP.METHOD:request.method.last"); - LOGFIELDS.put("request.method.original", "HTTP.METHOD:request.method.original"); - LOGFIELDS.put("request.method.last", "HTTP.METHOD:request.method.last"); - LOGFIELDS.put("server.module_note.foobar", "STRING:server.module_note.foobar"); - LOGFIELDS.put("response.header.foobar", "HTTP.HEADER:response.header.foobar"); - LOGFIELDS.put("response.trailer.foobar", "HTTP.TRAILER:response.trailer.foobar"); - LOGFIELDS.put("request.server.port.canonical", "PORT:request.server.port.canonical"); - LOGFIELDS.put("request.server.port.canonical.last", "PORT:request.server.port.canonical.last"); - LOGFIELDS.put("request.server.port.canonical.original", "PORT:request.server.port.canonical.original"); - LOGFIELDS.put("request.server.port.canonical.last", "PORT:request.server.port.canonical.last"); - LOGFIELDS.put("connection.server.port.canonical", "PORT:connection.server.port.canonical"); - LOGFIELDS.put("connection.server.port.canonical.last", "PORT:connection.server.port.canonical.last"); - LOGFIELDS.put("connection.server.port.canonical.original", "PORT:connection.server.port.canonical.original"); - LOGFIELDS.put("connection.server.port.canonical.last", "PORT:connection.server.port.canonical.last"); - LOGFIELDS.put("connection.server.port", "PORT:connection.server.port"); - LOGFIELDS.put("connection.server.port.last", "PORT:connection.server.port.last"); - LOGFIELDS.put("connection.server.port.original", "PORT:connection.server.port.original"); - LOGFIELDS.put("connection.server.port.last", "PORT:connection.server.port.last"); - LOGFIELDS.put("connection.client.port", "PORT:connection.client.port"); - LOGFIELDS.put("connection.client.port.last", "PORT:connection.client.port.last"); - LOGFIELDS.put("connection.client.port.original", "PORT:connection.client.port.original"); - LOGFIELDS.put("connection.client.port.last", "PORT:connection.client.port.last"); - LOGFIELDS.put("connection.server.child.processid", "NUMBER:connection.server.child.processid"); - LOGFIELDS.put("connection.server.child.processid.last", "NUMBER:connection.server.child.processid.last"); - LOGFIELDS.put("connection.server.child.processid.original", "NUMBER:connection.server.child.processid.original"); - LOGFIELDS.put("connection.server.child.processid.last", "NUMBER:connection.server.child.processid.last"); - LOGFIELDS.put("connection.server.child.processid", "NUMBER:connection.server.child.processid"); - LOGFIELDS.put("connection.server.child.processid.last", "NUMBER:connection.server.child.processid.last"); - LOGFIELDS.put("connection.server.child.processid.original", "NUMBER:connection.server.child.processid.original"); - LOGFIELDS.put("connection.server.child.processid.last", "NUMBER:connection.server.child.processid.last"); - LOGFIELDS.put("connection.server.child.threadid", "NUMBER:connection.server.child.threadid"); - LOGFIELDS.put("connection.server.child.threadid.last", "NUMBER:connection.server.child.threadid.last"); - LOGFIELDS.put("connection.server.child.threadid.original", "NUMBER:connection.server.child.threadid.original"); - LOGFIELDS.put("connection.server.child.threadid.last", "NUMBER:connection.server.child.threadid.last"); - LOGFIELDS.put("connection.server.child.hexthreadid", "NUMBER:connection.server.child.hexthreadid"); - LOGFIELDS.put("connection.server.child.hexthreadid.last", "NUMBER:connection.server.child.hexthreadid.last"); - LOGFIELDS.put("connection.server.child.hexthreadid.original", "NUMBER:connection.server.child.hexthreadid.original"); - LOGFIELDS.put("connection.server.child.hexthreadid.last", "NUMBER:connection.server.child.hexthreadid.last"); - LOGFIELDS.put("request.querystring", "HTTP.QUERYSTRING:request.querystring"); - LOGFIELDS.put("request.querystring.last", "HTTP.QUERYSTRING:request.querystring.last"); - LOGFIELDS.put("request.querystring.original", "HTTP.QUERYSTRING:request.querystring.original"); - LOGFIELDS.put("request.querystring.last", "HTTP.QUERYSTRING:request.querystring.last"); - LOGFIELDS.put("request.firstline", "HTTP.FIRSTLINE:request.firstline"); - LOGFIELDS.put("request.firstline.original", "HTTP.FIRSTLINE:request.firstline.original"); - LOGFIELDS.put("request.firstline.original", "HTTP.FIRSTLINE:request.firstline.original"); - LOGFIELDS.put("request.firstline.last", "HTTP.FIRSTLINE:request.firstline.last"); - LOGFIELDS.put("request.handler", "STRING:request.handler"); - LOGFIELDS.put("request.handler.last", "STRING:request.handler.last"); - LOGFIELDS.put("request.handler.original", "STRING:request.handler.original"); - LOGFIELDS.put("request.handler.last", "STRING:request.handler.last"); - LOGFIELDS.put("request.status", "STRING:request.status"); - LOGFIELDS.put("request.status.original", "STRING:request.status.original"); - LOGFIELDS.put("request.status.original", "STRING:request.status.original"); - LOGFIELDS.put("request.status.last", "STRING:request.status.last"); - LOGFIELDS.put("request.receive.time", "TIME.STAMP:request.receive.time"); - LOGFIELDS.put("request.receive.time.last", "TIME.STAMP:request.receive.time.last"); - LOGFIELDS.put("request.receive.time.original", "TIME.STAMP:request.receive.time.original"); - LOGFIELDS.put("request.receive.time.last", "TIME.STAMP:request.receive.time.last"); - LOGFIELDS.put("request.receive.time.year", "TIME.YEAR:request.receive.time.year"); - LOGFIELDS.put("request.receive.time.begin.year", "TIME.YEAR:request.receive.time.begin.year"); - LOGFIELDS.put("request.receive.time.end.year", "TIME.YEAR:request.receive.time.end.year"); - LOGFIELDS.put("request.receive.time.sec", "TIME.SECONDS:request.receive.time.sec"); - LOGFIELDS.put("request.receive.time.sec", "TIME.SECONDS:request.receive.time.sec"); - LOGFIELDS.put("request.receive.time.sec.original", "TIME.SECONDS:request.receive.time.sec.original"); - LOGFIELDS.put("request.receive.time.sec.last", "TIME.SECONDS:request.receive.time.sec.last"); - LOGFIELDS.put("request.receive.time.begin.sec", "TIME.SECONDS:request.receive.time.begin.sec"); - LOGFIELDS.put("request.receive.time.begin.sec.last", "TIME.SECONDS:request.receive.time.begin.sec.last"); - LOGFIELDS.put("request.receive.time.begin.sec.original", "TIME.SECONDS:request.receive.time.begin.sec.original"); - LOGFIELDS.put("request.receive.time.begin.sec.last", "TIME.SECONDS:request.receive.time.begin.sec.last"); - LOGFIELDS.put("request.receive.time.end.sec", "TIME.SECONDS:request.receive.time.end.sec"); - LOGFIELDS.put("request.receive.time.end.sec.last", "TIME.SECONDS:request.receive.time.end.sec.last"); - LOGFIELDS.put("request.receive.time.end.sec.original", "TIME.SECONDS:request.receive.time.end.sec.original"); - LOGFIELDS.put("request.receive.time.end.sec.last", "TIME.SECONDS:request.receive.time.end.sec.last"); - LOGFIELDS.put("request.receive.time.begin.msec", "TIME.EPOCH:request.receive.time.begin.msec"); - LOGFIELDS.put("request.receive.time.msec", "TIME.EPOCH:request.receive.time.msec"); - LOGFIELDS.put("request.receive.time.msec.last", "TIME.EPOCH:request.receive.time.msec.last"); - LOGFIELDS.put("request.receive.time.msec.original", "TIME.EPOCH:request.receive.time.msec.original"); - LOGFIELDS.put("request.receive.time.msec.last", "TIME.EPOCH:request.receive.time.msec.last"); - LOGFIELDS.put("request.receive.time.begin.msec", "TIME.EPOCH:request.receive.time.begin.msec"); - LOGFIELDS.put("request.receive.time.begin.msec.last", "TIME.EPOCH:request.receive.time.begin.msec.last"); - LOGFIELDS.put("request.receive.time.begin.msec.original", "TIME.EPOCH:request.receive.time.begin.msec.original"); - LOGFIELDS.put("request.receive.time.begin.msec.last", "TIME.EPOCH:request.receive.time.begin.msec.last"); - LOGFIELDS.put("request.receive.time.end.msec", "TIME.EPOCH:request.receive.time.end.msec"); - LOGFIELDS.put("request.receive.time.end.msec.last", "TIME.EPOCH:request.receive.time.end.msec.last"); - LOGFIELDS.put("request.receive.time.end.msec.original", "TIME.EPOCH:request.receive.time.end.msec.original"); - LOGFIELDS.put("request.receive.time.end.msec.last", "TIME.EPOCH:request.receive.time.end.msec.last"); - LOGFIELDS.put("request.receive.time.begin.usec", "TIME.EPOCH.USEC:request.receive.time.begin.usec"); - LOGFIELDS.put("request.receive.time.usec", "TIME.EPOCH.USEC:request.receive.time.usec"); - LOGFIELDS.put("request.receive.time.usec.last", "TIME.EPOCH.USEC:request.receive.time.usec.last"); - LOGFIELDS.put("request.receive.time.usec.original", "TIME.EPOCH.USEC:request.receive.time.usec.original"); - LOGFIELDS.put("request.receive.time.usec.last", "TIME.EPOCH.USEC:request.receive.time.usec.last"); - LOGFIELDS.put("request.receive.time.begin.usec", "TIME.EPOCH.USEC:request.receive.time.begin.usec"); - LOGFIELDS.put("request.receive.time.begin.usec.last", "TIME.EPOCH.USEC:request.receive.time.begin.usec.last"); - LOGFIELDS.put("request.receive.time.begin.usec.original", "TIME.EPOCH.USEC:request.receive.time.begin.usec.original"); - LOGFIELDS.put("request.receive.time.begin.usec.last", "TIME.EPOCH.USEC:request.receive.time.begin.usec.last"); - LOGFIELDS.put("request.receive.time.end.usec", "TIME.EPOCH.USEC:request.receive.time.end.usec"); - LOGFIELDS.put("request.receive.time.end.usec.last", "TIME.EPOCH.USEC:request.receive.time.end.usec.last"); - LOGFIELDS.put("request.receive.time.end.usec.original", "TIME.EPOCH.USEC:request.receive.time.end.usec.original"); - LOGFIELDS.put("request.receive.time.end.usec.last", "TIME.EPOCH.USEC:request.receive.time.end.usec.last"); - LOGFIELDS.put("request.receive.time.begin.msec_frac", "TIME.EPOCH:request.receive.time.begin.msec_frac"); - LOGFIELDS.put("request.receive.time.msec_frac", "TIME.EPOCH:request.receive.time.msec_frac"); - LOGFIELDS.put("request.receive.time.msec_frac.last", "TIME.EPOCH:request.receive.time.msec_frac.last"); - LOGFIELDS.put("request.receive.time.msec_frac.original", "TIME.EPOCH:request.receive.time.msec_frac.original"); - LOGFIELDS.put("request.receive.time.msec_frac.last", "TIME.EPOCH:request.receive.time.msec_frac.last"); - LOGFIELDS.put("request.receive.time.begin.msec_frac", "TIME.EPOCH:request.receive.time.begin.msec_frac"); - LOGFIELDS.put("request.receive.time.begin.msec_frac.last", "TIME.EPOCH:request.receive.time.begin.msec_frac.last"); - LOGFIELDS.put("request.receive.time.begin.msec_frac.original", "TIME.EPOCH:request.receive.time.begin.msec_frac.original"); - LOGFIELDS.put("request.receive.time.begin.msec_frac.last", "TIME.EPOCH:request.receive.time.begin.msec_frac.last"); - LOGFIELDS.put("request.receive.time.end.msec_frac", "TIME.EPOCH:request.receive.time.end.msec_frac"); - LOGFIELDS.put("request.receive.time.end.msec_frac.last", "TIME.EPOCH:request.receive.time.end.msec_frac.last"); - LOGFIELDS.put("request.receive.time.end.msec_frac.original", "TIME.EPOCH:request.receive.time.end.msec_frac.original"); - LOGFIELDS.put("request.receive.time.end.msec_frac.last", "TIME.EPOCH:request.receive.time.end.msec_frac.last"); - LOGFIELDS.put("request.receive.time.begin.usec_frac", "FRAC:request.receive.time.begin.usec_frac"); - LOGFIELDS.put("request.receive.time.usec_frac", "FRAC:request.receive.time.usec_frac"); - LOGFIELDS.put("request.receive.time.usec_frac.last", "FRAC:request.receive.time.usec_frac.last"); - LOGFIELDS.put("request.receive.time.usec_frac.original", "FRAC:request.receive.time.usec_frac.original"); - LOGFIELDS.put("request.receive.time.usec_frac.last", "FRAC:request.receive.time.usec_frac.last"); - LOGFIELDS.put("request.receive.time.begin.usec_frac", "FRAC:request.receive.time.begin.usec_frac"); - LOGFIELDS.put("request.receive.time.begin.usec_frac.last", "FRAC:request.receive.time.begin.usec_frac.last"); - LOGFIELDS.put("request.receive.time.begin.usec_frac.original", "FRAC:request.receive.time.begin.usec_frac.original"); - LOGFIELDS.put("request.receive.time.begin.usec_frac.last", "FRAC:request.receive.time.begin.usec_frac.last"); - LOGFIELDS.put("request.receive.time.end.usec_frac", "FRAC:request.receive.time.end.usec_frac"); - LOGFIELDS.put("request.receive.time.end.usec_frac.last", "FRAC:request.receive.time.end.usec_frac.last"); - LOGFIELDS.put("request.receive.time.end.usec_frac.original", "FRAC:request.receive.time.end.usec_frac.original"); - LOGFIELDS.put("request.receive.time.end.usec_frac.last", "FRAC:request.receive.time.end.usec_frac.last"); - LOGFIELDS.put("response.server.processing.time", "SECONDS:response.server.processing.time"); - LOGFIELDS.put("response.server.processing.time.original", "SECONDS:response.server.processing.time.original"); - LOGFIELDS.put("response.server.processing.time.original", "SECONDS:response.server.processing.time.original"); - LOGFIELDS.put("response.server.processing.time.last", "SECONDS:response.server.processing.time.last"); - LOGFIELDS.put("server.process.time", "MICROSECONDS:server.process.time"); - LOGFIELDS.put("response.server.processing.time", "MICROSECONDS:response.server.processing.time"); - LOGFIELDS.put("response.server.processing.time.original", "MICROSECONDS:response.server.processing.time.original"); - LOGFIELDS.put("response.server.processing.time.original", "MICROSECONDS:response.server.processing.time.original"); - LOGFIELDS.put("response.server.processing.time.last", "MICROSECONDS:response.server.processing.time.last"); - LOGFIELDS.put("response.server.processing.time", "MICROSECONDS:response.server.processing.time"); - LOGFIELDS.put("response.server.processing.time.original", "MICROSECONDS:response.server.processing.time.original"); - LOGFIELDS.put("response.server.processing.time.original", "MICROSECONDS:response.server.processing.time.original"); - LOGFIELDS.put("response.server.processing.time.last", "MICROSECONDS:response.server.processing.time.last"); - LOGFIELDS.put("response.server.processing.time", "MILLISECONDS:response.server.processing.time"); - LOGFIELDS.put("response.server.processing.time.original", "MILLISECONDS:response.server.processing.time.original"); - LOGFIELDS.put("response.server.processing.time.original", "MILLISECONDS:response.server.processing.time.original"); - LOGFIELDS.put("response.server.processing.time.last", "MILLISECONDS:response.server.processing.time.last"); - LOGFIELDS.put("response.server.processing.time", "SECONDS:response.server.processing.time"); - LOGFIELDS.put("response.server.processing.time.original", "SECONDS:response.server.processing.time.original"); - LOGFIELDS.put("response.server.processing.time.original", "SECONDS:response.server.processing.time.original"); - LOGFIELDS.put("response.server.processing.time.last", "SECONDS:response.server.processing.time.last"); - LOGFIELDS.put("connection.client.user", "STRING:connection.client.user"); - LOGFIELDS.put("connection.client.user.last", "STRING:connection.client.user.last"); - LOGFIELDS.put("connection.client.user.original", "STRING:connection.client.user.original"); - LOGFIELDS.put("connection.client.user.last", "STRING:connection.client.user.last"); - LOGFIELDS.put("request.urlpath", "URI:request.urlpath"); - LOGFIELDS.put("request.urlpath.original", "URI:request.urlpath.original"); - LOGFIELDS.put("request.urlpath.original", "URI:request.urlpath.original"); - LOGFIELDS.put("request.urlpath.last", "URI:request.urlpath.last"); - LOGFIELDS.put("connection.server.name.canonical", "STRING:connection.server.name.canonical"); - LOGFIELDS.put("connection.server.name.canonical.last", "STRING:connection.server.name.canonical.last"); - LOGFIELDS.put("connection.server.name.canonical.original", "STRING:connection.server.name.canonical.original"); - LOGFIELDS.put("connection.server.name.canonical.last", "STRING:connection.server.name.canonical.last"); - LOGFIELDS.put("connection.server.name", "STRING:connection.server.name"); - LOGFIELDS.put("connection.server.name.last", "STRING:connection.server.name.last"); - LOGFIELDS.put("connection.server.name.original", "STRING:connection.server.name.original"); - LOGFIELDS.put("connection.server.name.last", "STRING:connection.server.name.last"); - LOGFIELDS.put("response.connection.status", "HTTP.CONNECTSTATUS:response.connection.status"); - LOGFIELDS.put("response.connection.status.last", "HTTP.CONNECTSTATUS:response.connection.status.last"); - LOGFIELDS.put("response.connection.status.original", "HTTP.CONNECTSTATUS:response.connection.status.original"); - LOGFIELDS.put("response.connection.status.last", "HTTP.CONNECTSTATUS:response.connection.status.last"); - LOGFIELDS.put("request.bytes", "BYTES:request.bytes"); - LOGFIELDS.put("request.bytes.last", "BYTES:request.bytes.last"); - LOGFIELDS.put("request.bytes.original", "BYTES:request.bytes.original"); - LOGFIELDS.put("request.bytes.last", "BYTES:request.bytes.last"); - LOGFIELDS.put("response.bytes", "BYTES:response.bytes"); - LOGFIELDS.put("response.bytes.last", "BYTES:response.bytes.last"); - LOGFIELDS.put("response.bytes.original", "BYTES:response.bytes.original"); - LOGFIELDS.put("response.bytes.last", "BYTES:response.bytes.last"); - LOGFIELDS.put("total.bytes", "BYTES:total.bytes"); - LOGFIELDS.put("total.bytes.last", "BYTES:total.bytes.last"); - LOGFIELDS.put("total.bytes.original", "BYTES:total.bytes.original"); - LOGFIELDS.put("total.bytes.last", "BYTES:total.bytes.last"); - LOGFIELDS.put("request.cookies", "HTTP.COOKIES:request.cookies"); - LOGFIELDS.put("request.cookies.last", "HTTP.COOKIES:request.cookies.last"); - LOGFIELDS.put("request.cookies.original", "HTTP.COOKIES:request.cookies.original"); - LOGFIELDS.put("request.cookies.last", "HTTP.COOKIES:request.cookies.last"); - LOGFIELDS.put("response.cookies", "HTTP.SETCOOKIES:response.cookies"); - LOGFIELDS.put("response.cookies.last", "HTTP.SETCOOKIES:response.cookies.last"); - LOGFIELDS.put("response.cookies.original", "HTTP.SETCOOKIES:response.cookies.original"); - LOGFIELDS.put("response.cookies.last", "HTTP.SETCOOKIES:response.cookies.last"); - LOGFIELDS.put("request.user-agent", "HTTP.USERAGENT:request.user-agent"); - LOGFIELDS.put("request.user-agent.last", "HTTP.USERAGENT:request.user-agent.last"); - LOGFIELDS.put("request.user-agent.original", "HTTP.USERAGENT:request.user-agent.original"); - LOGFIELDS.put("request.user-agent.last", "HTTP.USERAGENT:request.user-agent.last"); - LOGFIELDS.put("request.referer", "HTTP.URI:request.referer"); - LOGFIELDS.put("request.referer.last", "HTTP.URI:request.referer.last"); - LOGFIELDS.put("request.referer.original", "HTTP.URI:request.referer.original"); - LOGFIELDS.put("request.referer.last", "HTTP.URI:request.referer.last"); - } - - public HttpdParser(final MapWriter mapWriter, final DrillBuf managedBuffer, final String logFormat, - final String timestampFormat, final Map fieldMapping) - throws NoSuchMethodException, MissingDissectorsException, InvalidDissectorException { - - Preconditions.checkArgument(logFormat != null && !logFormat.trim().isEmpty(), "logFormat cannot be null or empty"); - - this.record = new HttpdLogRecord(managedBuffer, timestampFormat); - this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat, timestampFormat); - - setupParser(mapWriter, logFormat, fieldMapping); - - if (timestampFormat != null && !timestampFormat.trim().isEmpty()) { - logger.info("Custom timestamp format has been specified. This is an informational note only as custom timestamps is rather unusual."); - } - if (logFormat.contains("\n")) { - logger.info("Specified logformat is a multiline log format: {}", logFormat); - } - } - - /** - * We do not expose the underlying parser or the record which is used to manage the writers. - * - * @param line log line to tear apart. - * @throws DissectionFailure - * @throws InvalidDissectorException - * @throws MissingDissectorsException - */ - public void parse(final String line) throws DissectionFailure, InvalidDissectorException, MissingDissectorsException { - parser.parse(record, line); - record.finishRecord(); - } - - /** - * In order to define a type remapping the format of the field configuration will look like:
- * HTTP.URI:request.firstline.uri.query.[parameter name]
- * - * @param parser Add type remapping to this parser instance. - * @param fieldName request.firstline.uri.query.[parameter_name] - * @param fieldType HTTP.URI, etc.. - */ - private void addTypeRemapping(final Parser parser, final String fieldName, final String fieldType) { - logger.debug("Adding type remapping - fieldName: {}, fieldType: {}", fieldName, fieldType); - parser.addTypeRemapping(fieldName, fieldType); - } - - /** - * The parser deals with dots unlike Drill wanting underscores request_referer. For the sake of simplicity we are - * going replace the dots. The resultant output field will look like: request.referer.
- * Additionally, wild cards will get replaced with .* - * - * @param drillFieldName name to be cleansed. - * @return - */ - public static String parserFormattedFieldName(String drillFieldName) { - - //The Useragent fields contain a dash which causes potential problems if the field name is not escaped properly - //This removes the dash - if (drillFieldName.contains("useragent")) { - drillFieldName = drillFieldName.replace("useragent", "user-agent"); - } - - String tempFieldName; - tempFieldName = LOGFIELDS.get(drillFieldName); - return tempFieldName.replace(SAFE_WILDCARD, PARSER_WILDCARD).replaceAll(SAFE_SEPARATOR, ".").replaceAll("\\.\\.", "_"); - } - - /** - * Drill cannot deal with fields with dots in them like request.referer. For the sake of simplicity we are going - * ensure the field name is cleansed. The resultant output field will look like: request_referer.
- * Additionally, wild cards will get replaced with _$ - * - * @param parserFieldName name to be cleansed. - * @return - */ - public static String drillFormattedFieldName(String parserFieldName) { - - //The Useragent fields contain a dash which causes potential problems if the field name is not escaped properly - //This removes the dash - if (parserFieldName.contains("user-agent")) { - parserFieldName = parserFieldName.replace("user-agent", "useragent"); - } - - if (parserFieldName.contains(":")) { - String[] fieldPart = parserFieldName.split(":"); - return fieldPart[1].replaceAll("_", "__").replace(PARSER_WILDCARD, SAFE_WILDCARD).replaceAll("\\.", SAFE_SEPARATOR); - } else { - return parserFieldName.replaceAll("_", "__").replace(PARSER_WILDCARD, SAFE_WILDCARD).replaceAll("\\.", SAFE_SEPARATOR); - } - } - - private void setupParser(final MapWriter mapWriter, final String logFormat, final Map fieldMapping) - throws NoSuchMethodException, MissingDissectorsException, InvalidDissectorException { - - /** - * If the user has selected fields, then we will use them to configure the parser because this would be the most - * efficient way to parse the log. - */ - final Map requestedPaths; - final List allParserPaths = parser.getPossiblePaths(); - if (fieldMapping != null && !fieldMapping.isEmpty()) { - logger.debug("Using fields defined by user."); - requestedPaths = fieldMapping; - } else { - /** - * Use all possible paths that the parser has determined from the specified log format. - */ - logger.debug("No fields defined by user, defaulting to all possible fields."); - requestedPaths = Maps.newHashMap(); - for (final String parserPath : allParserPaths) { - requestedPaths.put(drillFormattedFieldName(parserPath), parserPath); - } - } - - /** - * By adding the parse target to the dummy instance we activate it for use. Which we can then use to find out which - * paths cast to which native data types. After we are done figuring this information out, we throw this away - * because this will be the slowest parsing path possible for the specified format. - */ - Parser dummy = new HttpdLoglineParser<>(Object.class, logFormat); - dummy.addParseTarget(String.class.getMethod("indexOf", String.class), allParserPaths); - for (final Map.Entry entry : requestedPaths.entrySet()) { - final EnumSet casts; - - /** - * Check the field specified by the user to see if it is supposed to be remapped. - */ - if (entry.getValue().startsWith(REMAPPING_FLAG)) { - /** - * Because this field is being remapped we need to replace the field name that the parser uses. - */ - entry.setValue(entry.getValue().substring(REMAPPING_FLAG.length())); - - final String[] pieces = entry.getValue().split(":"); - addTypeRemapping(parser, pieces[1], pieces[0]); - casts = Casts.STRING_ONLY; - } else { - casts = dummy.getCasts(entry.getValue()); - } - - logger.debug("Setting up drill field: {}, parser field: {}, which casts as: {}", entry.getKey(), entry.getValue(), casts); - record.addField(parser, mapWriter, casts, entry.getValue(), entry.getKey()); - } - } -} diff --git a/exec/java-exec/src/main/resources/bootstrap-storage-plugins.json b/exec/java-exec/src/main/resources/bootstrap-storage-plugins.json index 4aa17540d1b..dd8a659dc40 100644 --- a/exec/java-exec/src/main/resources/bootstrap-storage-plugins.json +++ b/exec/java-exec/src/main/resources/bootstrap-storage-plugins.json @@ -31,11 +31,6 @@ "extensions" : [ "tsv" ], "fieldDelimiter" : "\t" }, - "httpd" : { - "type" : "httpd", - "logFormat" : "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"", - "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ" - }, "parquet" : { "type" : "parquet" }, diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/store/FormatPluginSerDeTest.java b/exec/java-exec/src/test/java/org/apache/drill/exec/store/FormatPluginSerDeTest.java index ebcb300ac87..600177308bc 100644 --- a/exec/java-exec/src/test/java/org/apache/drill/exec/store/FormatPluginSerDeTest.java +++ b/exec/java-exec/src/test/java/org/apache/drill/exec/store/FormatPluginSerDeTest.java @@ -91,19 +91,6 @@ public void testPcap() throws Exception { ); } - @Test - public void testHttpd() throws Exception { - String path = "store/httpd/dfs-test-bootstrap-test.httpd"; - dirTestWatcher.copyResourceToRoot(Paths.get(path)); - String logFormat = "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""; - String timeStampFormat = "dd/MMM/yyyy:HH:mm:ss ZZ"; - testPhysicalPlanSubmission( - String.format("select * from dfs.`%s`", path), - String.format("select * from table(dfs.`%s`(type=>'httpd', logFormat=>'%s'))", path, logFormat), - String.format("select * from table(dfs.`%s`(type=>'httpd', logFormat=>'%s', timestampFormat=>'%s'))", path, logFormat, timeStampFormat) - ); - } - @Test public void testJson() throws Exception { testPhysicalPlanSubmission( diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java b/exec/java-exec/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java deleted file mode 100644 index c86ee52112b..00000000000 --- a/exec/java-exec/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java +++ /dev/null @@ -1,218 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one - * or more contributor license agreements. See the NOTICE file - * distributed with this work for additional information - * regarding copyright ownership. The ASF licenses this file - * to you under the Apache License, Version 2.0 (the - * "License"); you may not use this file except in compliance - * with the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.drill.exec.store.httpd; - -import org.apache.drill.common.types.TypeProtos.MinorType; -import org.apache.drill.exec.record.metadata.SchemaBuilder; -import org.apache.drill.exec.record.metadata.TupleMetadata; -import org.apache.drill.exec.rpc.RpcException; -import org.apache.drill.test.BaseDirTestWatcher; -import org.apache.drill.test.ClusterFixture; -import org.apache.drill.test.ClusterTest; -import org.apache.drill.exec.physical.rowSet.RowSet; -import org.apache.drill.test.rowSet.RowSetUtilities; -import org.junit.BeforeClass; -import org.junit.ClassRule; -import org.junit.Test; - -import java.time.LocalDateTime; -import java.util.HashMap; - -import static org.junit.Assert.assertEquals; - -public class TestHTTPDLogReader extends ClusterTest { - - @ClassRule - public static final BaseDirTestWatcher dirTestWatcher = new BaseDirTestWatcher(); - - @BeforeClass - public static void setup() throws Exception { - ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher)); - - // Define a temporary format plugin for the "cp" storage plugin. - HttpdLogFormatConfig sampleConfig = new HttpdLogFormatConfig( - "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"", null); - cluster.defineFormat("cp", "sample", sampleConfig); - } - - @Test - public void testDateField() throws RpcException { - String sql = "SELECT `request_receive_time` FROM cp.`httpd/hackers-access-small.httpd` LIMIT 5"; - RowSet results = client.queryBuilder().sql(sql).rowSet(); - - TupleMetadata expectedSchema = new SchemaBuilder() - .addNullable("request_receive_time", MinorType.TIMESTAMP) - .buildSchema(); - RowSet expected = client.rowSetBuilder(expectedSchema) - .addRow(1445742685000L) - .addRow(1445742686000L) - .addRow(1445742687000L) - .addRow(1445743471000L) - .addRow(1445743472000L) - .build(); - - RowSetUtilities.verify(expected, results); - } - - @Test - public void testSelectColumns() throws Exception { - String sql = "SELECT request_referer_ref,\n" + - "request_receive_time_last_time,\n" + - "request_firstline_uri_protocol,\n" + - "request_receive_time_microsecond,\n" + - "request_receive_time_last_microsecond__utc,\n" + - "request_firstline_original_protocol,\n" + - "request_firstline_original_uri_host,\n" + - "request_referer_host,\n" + - "request_receive_time_month__utc,\n" + - "request_receive_time_last_minute,\n" + - "request_firstline_protocol_version,\n" + - "request_receive_time_time__utc,\n" + - "request_referer_last_ref,\n" + - "request_receive_time_last_timezone,\n" + - "request_receive_time_last_weekofweekyear,\n" + - "request_referer_last,\n" + - "request_receive_time_minute,\n" + - "connection_client_host_last,\n" + - "request_receive_time_last_millisecond__utc,\n" + - "request_firstline_original_uri,\n" + - "request_firstline,\n" + - "request_receive_time_nanosecond,\n" + - "request_receive_time_last_millisecond,\n" + - "request_receive_time_day,\n" + - "request_referer_port,\n" + - "request_firstline_original_uri_port,\n" + - "request_receive_time_year,\n" + - "request_receive_time_last_date,\n" + - "request_receive_time_last_time__utc,\n" + - "request_receive_time_last_hour__utc,\n" + - "request_firstline_original_protocol_version,\n" + - "request_firstline_original_method,\n" + - "request_receive_time_last_year__utc,\n" + - "request_firstline_uri,\n" + - "request_referer_last_host,\n" + - "request_receive_time_last_minute__utc,\n" + - "request_receive_time_weekofweekyear,\n" + - "request_firstline_uri_userinfo,\n" + - "request_receive_time_epoch,\n" + - "connection_client_logname,\n" + - "response_body_bytes,\n" + - "request_receive_time_nanosecond__utc,\n" + - "request_firstline_protocol,\n" + - "request_receive_time_microsecond__utc,\n" + - "request_receive_time_hour,\n" + - "request_firstline_uri_host,\n" + - "request_referer_last_port,\n" + - "request_receive_time_last_epoch,\n" + - "request_receive_time_last_weekyear__utc,\n" + - "request_useragent,\n" + - "request_receive_time_weekyear,\n" + - "request_receive_time_timezone,\n" + - "response_body_bytesclf,\n" + - "request_receive_time_last_date__utc,\n" + - "request_receive_time_millisecond__utc,\n" + - "request_referer_last_protocol,\n" + - "request_status_last,\n" + - "request_firstline_uri_query,\n" + - "request_receive_time_minute__utc,\n" + - "request_firstline_original_uri_protocol,\n" + - "request_referer_query,\n" + - "request_receive_time_date,\n" + - "request_firstline_uri_port,\n" + - "request_receive_time_last_second__utc,\n" + - "request_referer_last_userinfo,\n" + - "request_receive_time_last_second,\n" + - "request_receive_time_last_monthname__utc,\n" + - "request_firstline_method,\n" + - "request_receive_time_last_month__utc,\n" + - "request_receive_time_millisecond,\n" + - "request_receive_time_day__utc,\n" + - "request_receive_time_year__utc,\n" + - "request_receive_time_weekofweekyear__utc,\n" + - "request_receive_time_second,\n" + - "request_firstline_original_uri_ref,\n" + - "connection_client_logname_last,\n" + - "request_receive_time_last_year,\n" + - "request_firstline_original_uri_path,\n" + - "connection_client_host,\n" + - "request_firstline_original_uri_query,\n" + - "request_referer_userinfo,\n" + - "request_receive_time_last_monthname,\n" + - "request_referer_path,\n" + - "request_receive_time_monthname,\n" + - "request_receive_time_last_month,\n" + - "request_referer_last_query,\n" + - "request_firstline_uri_ref,\n" + - "request_receive_time_last_day,\n" + - "request_receive_time_time,\n" + - "request_receive_time_last_weekofweekyear__utc,\n" + - "request_useragent_last,\n" + - "request_receive_time_last_weekyear,\n" + - "request_receive_time_last_microsecond,\n" + - "request_firstline_original,\n" + - "request_referer_last_path,\n" + - "request_receive_time_month,\n" + - "request_receive_time_last_day__utc,\n" + - "request_referer,\n" + - "request_referer_protocol,\n" + - "request_receive_time_monthname__utc,\n" + - "response_body_bytes_last,\n" + - "request_receive_time,\n" + - "request_receive_time_last_nanosecond,\n" + - "request_firstline_uri_path,\n" + - "request_firstline_original_uri_userinfo,\n" + - "request_receive_time_date__utc,\n" + - "request_receive_time_last,\n" + - "request_receive_time_last_nanosecond__utc,\n" + - "request_receive_time_last_hour,\n" + - "request_receive_time_hour__utc,\n" + - "request_receive_time_second__utc,\n" + - "connection_client_user_last,\n" + - "request_receive_time_weekyear__utc,\n" + - "connection_client_user\n" + - "FROM cp.`httpd/hackers-access-small.httpd`\n" + - "LIMIT 1"; - - testBuilder() - .sqlQuery(sql) - .unOrdered() - .baselineColumns("request_referer_ref", "request_receive_time_last_time", "request_firstline_uri_protocol", "request_receive_time_microsecond", "request_receive_time_last_microsecond__utc", "request_firstline_original_protocol", "request_firstline_original_uri_host", "request_referer_host", "request_receive_time_month__utc", "request_receive_time_last_minute", "request_firstline_protocol_version", "request_receive_time_time__utc", "request_referer_last_ref", "request_receive_time_last_timezone", "request_receive_time_last_weekofweekyear", "request_referer_last", "request_receive_time_minute", "connection_client_host_last", "request_receive_time_last_millisecond__utc", "request_firstline_original_uri", "request_firstline", "request_receive_time_nanosecond", "request_receive_time_last_millisecond", "request_receive_time_day", "request_referer_port", "request_firstline_original_uri_port", "request_receive_time_year", "request_receive_time_last_date", "request_receive_time_last_time__utc", "request_receive_time_last_hour__utc", "request_firstline_original_protocol_version", "request_firstline_original_method", "request_receive_time_last_year__utc", "request_firstline_uri", "request_referer_last_host", "request_receive_time_last_minute__utc", "request_receive_time_weekofweekyear", "request_firstline_uri_userinfo", "request_receive_time_epoch", "connection_client_logname", "response_body_bytes", "request_receive_time_nanosecond__utc", "request_firstline_protocol", "request_receive_time_microsecond__utc", "request_receive_time_hour", "request_firstline_uri_host", "request_referer_last_port", "request_receive_time_last_epoch", "request_receive_time_last_weekyear__utc", "request_useragent", "request_receive_time_weekyear", "request_receive_time_timezone", "response_body_bytesclf", "request_receive_time_last_date__utc", "request_receive_time_millisecond__utc", "request_referer_last_protocol", "request_status_last", "request_firstline_uri_query", "request_receive_time_minute__utc", "request_firstline_original_uri_protocol", "request_referer_query", "request_receive_time_date", "request_firstline_uri_port", "request_receive_time_last_second__utc", "request_referer_last_userinfo", "request_receive_time_last_second", "request_receive_time_last_monthname__utc", "request_firstline_method", "request_receive_time_last_month__utc", "request_receive_time_millisecond", "request_receive_time_day__utc", "request_receive_time_year__utc", "request_receive_time_weekofweekyear__utc", "request_receive_time_second", "request_firstline_original_uri_ref", "connection_client_logname_last", "request_receive_time_last_year", "request_firstline_original_uri_path", "connection_client_host", "request_firstline_original_uri_query", "request_referer_userinfo", "request_receive_time_last_monthname", "request_referer_path", "request_receive_time_monthname", "request_receive_time_last_month", "request_referer_last_query", "request_firstline_uri_ref", "request_receive_time_last_day", "request_receive_time_time", "request_receive_time_last_weekofweekyear__utc", "request_useragent_last", "request_receive_time_last_weekyear", "request_receive_time_last_microsecond", "request_firstline_original", "request_referer_last_path", "request_receive_time_month", "request_receive_time_last_day__utc", "request_referer", "request_referer_protocol", "request_receive_time_monthname__utc", "response_body_bytes_last", "request_receive_time", "request_receive_time_last_nanosecond", "request_firstline_uri_path", "request_firstline_original_uri_userinfo", "request_receive_time_date__utc", "request_receive_time_last", "request_receive_time_last_nanosecond__utc", "request_receive_time_last_hour", "request_receive_time_hour__utc", "request_receive_time_second__utc", "connection_client_user_last", "request_receive_time_weekyear__utc", "connection_client_user") - .baselineValues(null, "04:11:25", null, 0L, 0L, "HTTP", null, "howto.basjes.nl", 10L, 11L, "1.1", "03:11:25", null, null, 43L, "http://howto.basjes.nl/", 11L, "195.154.46.135", 0L, "/linux/doing-pxe-without-dhcp-control", "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1", 0L, 0L, 25L, null, null, 2015L, "2015-10-25", "03:11:25", 3L, "1.1", "GET", 2015L, "/linux/doing-pxe-without-dhcp-control", "howto.basjes.nl", 11L, 43L, null, 1445742685000L, null, 24323L, 0L, "HTTP", 0L, 4L, null, null, 1445742685000L, 2015L, "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015L, null, 24323L, "2015-10-25", 0L, "http", "200", "", 11L, null, "", "2015-10-25", null, 25L, null, 25L, "October", "GET", 10L, 0L, 25L, 2015L, 43L, 25L, null, null, 2015L, "/linux/doing-pxe-without-dhcp-control", "195.154.46.135", "", null, "October", "/", "October", 10L, "", null, 25L, "04:11:25", 43L, "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015L, 0L, "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1", "/", 10L, 25L, "http://howto.basjes.nl/", "http", "October", 24323L, LocalDateTime.parse("2015-10-25T03:11:25"), 0L, "/linux/doing-pxe-without-dhcp-control", null, "2015-10-25", LocalDateTime.parse("2015-10-25T03:11:25"), 0L, 4L, 3L, 25L, null, 2015L, null) - .go(); - } - - - @Test - public void testCount() throws Exception { - String sql = "SELECT COUNT(*) FROM cp.`httpd/hackers-access-small.httpd`"; - long result = client.queryBuilder().sql(sql).singletonLong(); - assertEquals(10, result); - } - - @Test - public void testStar() throws Exception { - String sql = "SELECT * FROM cp.`httpd/hackers-access-small.httpd` LIMIT 1"; - - testBuilder() - .sqlQuery(sql) - .unOrdered() - .baselineColumns("request_referer_ref","request_receive_time_last_time","request_firstline_uri_protocol","request_receive_time_microsecond","request_receive_time_last_microsecond__utc","request_firstline_original_uri_query_$","request_firstline_original_protocol","request_firstline_original_uri_host","request_referer_host","request_receive_time_month__utc","request_receive_time_last_minute","request_firstline_protocol_version","request_receive_time_time__utc","request_referer_last_ref","request_receive_time_last_timezone","request_receive_time_last_weekofweekyear","request_referer_last","request_receive_time_minute","connection_client_host_last","request_receive_time_last_millisecond__utc","request_firstline_original_uri","request_firstline","request_receive_time_nanosecond","request_receive_time_last_millisecond","request_receive_time_day","request_referer_port","request_firstline_original_uri_port","request_receive_time_year","request_receive_time_last_date","request_referer_query_$","request_receive_time_last_time__utc","request_receive_time_last_hour__utc","request_firstline_original_protocol_version","request_firstline_original_method","request_receive_time_last_year__utc","request_firstline_uri","request_referer_last_host","request_receive_time_last_minute__utc","request_receive_time_weekofweekyear","request_firstline_uri_userinfo","request_receive_time_epoch","connection_client_logname","response_body_bytes","request_receive_time_nanosecond__utc","request_firstline_protocol","request_receive_time_microsecond__utc","request_receive_time_hour","request_firstline_uri_host","request_referer_last_port","request_receive_time_last_epoch","request_receive_time_last_weekyear__utc","request_receive_time_weekyear","request_receive_time_timezone","response_body_bytesclf","request_receive_time_last_date__utc","request_useragent_last","request_useragent","request_receive_time_millisecond__utc","request_referer_last_protocol","request_status_last","request_firstline_uri_query","request_receive_time_minute__utc","request_firstline_original_uri_protocol","request_referer_query","request_receive_time_date","request_firstline_uri_port","request_receive_time_last_second__utc","request_referer_last_userinfo","request_receive_time_last_second","request_receive_time_last_monthname__utc","request_firstline_method","request_receive_time_last_month__utc","request_receive_time_millisecond","request_receive_time_day__utc","request_receive_time_year__utc","request_receive_time_weekofweekyear__utc","request_receive_time_second","request_firstline_original_uri_ref","connection_client_logname_last","request_receive_time_last_year","request_firstline_original_uri_path","connection_client_host","request_referer_last_query_$","request_firstline_original_uri_query","request_referer_userinfo","request_receive_time_last_monthname","request_referer_path","request_receive_time_monthname","request_receive_time_last_month","request_referer_last_query","request_firstline_uri_ref","request_receive_time_last_day","request_receive_time_time","request_receive_time_last_weekofweekyear__utc","request_receive_time_last_weekyear","request_receive_time_last_microsecond","request_firstline_original","request_firstline_uri_query_$","request_referer_last_path","request_receive_time_month","request_receive_time_last_day__utc","request_referer","request_referer_protocol","request_receive_time_monthname__utc","response_body_bytes_last","request_receive_time","request_receive_time_last_nanosecond","request_firstline_uri_path","request_firstline_original_uri_userinfo","request_receive_time_date__utc","request_receive_time_last","request_receive_time_last_nanosecond__utc","request_receive_time_last_hour","request_receive_time_hour__utc","request_receive_time_second__utc","connection_client_user_last","request_receive_time_weekyear__utc","connection_client_user") - .baselineValues(null,"04:11:25",null,0L,0L,new HashMap<>(),"HTTP",null,"howto.basjes.nl",10L,11L,"1.1","03:11:25",null,null,43L,"http://howto.basjes.nl/",11L,"195.154.46.135",0L,"/linux/doing-pxe-without-dhcp-control","GET /linux/doing-pxe-without-dhcp-control HTTP/1.1",0L,0L,25L,null,null,2015L,"2015-10-25",new HashMap<>(),"03:11:25",3L,"1.1","GET",2015L,"/linux/doing-pxe-without-dhcp-control","howto.basjes.nl",11L,43L,null,1445742685000L,null,24323L,0L,"HTTP",0L,4L,null,null,1445742685000L,2015L,2015L,null,24323L,"2015-10-25","Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0","Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0",0L,"http","200","",11L,null,"","2015-10-25",null,25L,null,25L,"October","GET",10L,0L,25L,2015L,43L,25L,null,null,2015L,"/linux/doing-pxe-without-dhcp-control","195.154.46.135",new HashMap<>(),"",null,"October","/","October",10L,"",null,25L,"04:11:25",43L,2015L,0L,"GET /linux/doing-pxe-without-dhcp-control HTTP/1.1",new HashMap<>(),"/",10L,25L,"http://howto.basjes.nl/","http","October",24323L,LocalDateTime.parse("2015-10-25T03:11:25"),0L,"/linux/doing-pxe-without-dhcp-control",null,"2015-10-25",LocalDateTime.parse("2015-10-25T03:11:25"),0L,4L,3L,25L,null,2015L,null) - .go(); - } -} diff --git a/exec/java-exec/src/test/resources/plugins/mock-plugin-upgrade.json b/exec/java-exec/src/test/resources/plugins/mock-plugin-upgrade.json index ad39fa1e0d6..36b12d5e4bd 100644 --- a/exec/java-exec/src/test/resources/plugins/mock-plugin-upgrade.json +++ b/exec/java-exec/src/test/resources/plugins/mock-plugin-upgrade.json @@ -26,11 +26,6 @@ "extensions" : [ "tsv" ], "delimiter" : "\t" }, - "httpd" : { - "type" : "httpd", - "logFormat" : "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"", - "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ" - }, "parquet" : { "type" : "parquet" }, @@ -152,11 +147,6 @@ "extensions" : [ "tsv" ], "delimiter" : "\t" }, - "httpd" : { - "type" : "httpd", - "logFormat" : "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"", - "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ" - }, "parquet" : { "type" : "parquet" }, diff --git a/exec/java-exec/src/test/resources/store/httpd/dfs-test-bootstrap-test.httpd b/exec/java-exec/src/test/resources/store/httpd/dfs-test-bootstrap-test.httpd deleted file mode 100644 index d48fa12a4b8..00000000000 --- a/exec/java-exec/src/test/resources/store/httpd/dfs-test-bootstrap-test.httpd +++ /dev/null @@ -1,5 +0,0 @@ -195.154.46.135 - - [25/Oct/2015:04:11:25 +0100] "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1" 200 24323 "http://howto.basjes.nl/" "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0" -23.95.237.180 - - [25/Oct/2015:04:11:26 +0100] "GET /join_form HTTP/1.0" 200 11114 "http://howto.basjes.nl/" "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0" -23.95.237.180 - - [25/Oct/2015:04:11:27 +0100] "POST /join_form HTTP/1.1" 302 9093 "http://howto.basjes.nl/join_form" "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0" -158.222.5.157 - - [25/Oct/2015:04:24:31 +0100] "GET /join_form HTTP/1.0" 200 11114 "http://howto.basjes.nl/" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21" -158.222.5.157 - - [25/Oct/2015:04:24:32 +0100] "POST /join_form HTTP/1.1" 302 9093 "http://howto.basjes.nl/join_form" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21"