diff --git a/contrib/format-httpd/README.md b/contrib/format-httpd/README.md
new file mode 100644
index 00000000000..4d45c0ac390
--- /dev/null
+++ b/contrib/format-httpd/README.md
@@ -0,0 +1,75 @@
+# Web Server Log Format Plugin (HTTPD)
+This plugin enables Drill to read and query httpd (Apache Web Server) and nginx access logs natively. This plugin uses the work by [Niels Basjes](https://github.com/nielsbasjes
+) which is available here: https://github.com/nielsbasjes/logparser.
+
+## Configuration
+There are five fields which you can to configure in order for Drill to read web server logs. In general the defaults should be fine, however the fields are:
+* **`logFormat`**: The log format string is the format string found in your web server configuration. If you have multiple logFormats then you can add all of them in this
+ single parameter separated by a newline (`\n`). The parser will automatically select the first matching format.
+* **`timestampFormat`**: The format of time stamps in your log files. This setting is optional and is almost never needed.
+* **`extensions`**: The file extension of your web server logs. Defaults to `httpd`.
+* **`maxErrors`**: Sets the plugin error tolerance. When set to any value less than `0`, Drill will ignore all errors. If unspecified then maxErrors is 0 which will cause the query to fail on the first error.
+* **`flattenWildcards`**: There are a few variables which Drill extracts into maps. Defaults to `false`.
+
+
+```json
+"httpd" : {
+ "type" : "httpd",
+ "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"",
+ "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ",
+ "maxErrors": 0,
+ "flattenWildcards": false
+}
+```
+
+## Data Model
+The fields which Drill will return from HTTPD access logs should be fairly self explanatory and should all be mapped to correct data types. For instance, `TIMESTAMP` fields are
+ all Drill `TIMESTAMPS` and so forth.
+
+### Nested Columns
+The HTTPD parser can produce a few columns of nested data. For instance, the various `query_string` columns are parsed into Drill maps so that if you want to look for a specific
+ field, you can do so.
+
+ Drill allows you to directly access maps in with the format of:
+ ```
+
..
+```
+ One note is that in order to access a map, you must assign an alias to your table as shown below:
+ ```sql
+SELECT mylogs.`request_firstline_uri_query_$`.`username` AS username
+FROM dfs.test.`logfile.httpd` AS mylogs
+
+```
+In this example, we assign an alias of `mylogs` to the table, the column name is `request_firstline_uri_query_$` and then the individual field within that mapping is `username
+`. This particular example enables you to analyze items in query strings.
+
+### Flattening Maps
+In the event that you have a map field that you would like broken into columns rather than getting the nested fields, you can set the `flattenWildcards` option to `true` and
+Drill will create columns for these fields. For example if you have a URI Query option called `username`. If you selected the `flattedWildcards` option, Drill will create a
+field called `request_firstline_uri_query_username`.
+
+** Note that underscores in the field name are replaced with double underscores **
+
+ ## Useful Functions
+ If you are using Drill to analyze web access logs, there are a few other useful functions which you should know about:
+
+ * `parse_url()`: This function accepts a URL as an argument and returns a map of the URL's protocol, authority, host, and path.
+ * `parse_query()`: This function accepts a query string and returns a key/value pairing of the variables submitted in the request.
+ * `parse_user_agent()`, `parse_user_agent( , )`: The function parse_user_agent() takes a user agent string as an argument and
+ returns a map of the available fields. Note that not every field will be present in every user agent string.
+ [Complete Docs Here](https://github.com/apache/drill/tree/master/contrib/udfs#user-agent-functions)
+
+
+## Implicit Columns
+Data queried by this plugin will return two implicit columns:
+
+* **`_raw`**: This returns the raw, unparsed log line
+* **`_matched`**: Returns `true` or `false` depending on whether the line matched the config string.
+
+Thus, if you wanted to see which lines in your log file were not matching the config, you could use the following query:
+
+```sql
+SELECT _raw
+FROM
+WHERE _matched = false
+```
\ No newline at end of file
diff --git a/contrib/format-httpd/pom.xml b/contrib/format-httpd/pom.xml
new file mode 100644
index 00000000000..50ae6185b7b
--- /dev/null
+++ b/contrib/format-httpd/pom.xml
@@ -0,0 +1,100 @@
+
+
+
+ 4.0.0
+
+ drill-contrib-parent
+ org.apache.drill.contrib
+ 1.19.0-SNAPSHOT
+
+ drill-format-httpd
+ contrib/httpd-format-plugin
+
+
+
+ org.apache.drill.exec
+ drill-java-exec
+ ${project.version}
+
+
+
+ nl.basjes.parse.httpdlog
+ httpdlog-parser
+ 5.6
+
+
+ commons-codec
+ commons-codec
+
+
+ commons-logging
+ commons-logging
+
+
+
+
+ nl.basjes.parse.useragent
+ yauaa-logparser
+ 5.19
+
+
+
+ org.apache.drill.exec
+ drill-java-exec
+ tests
+ ${project.version}
+ test
+
+
+ org.apache.drill
+ drill-common
+ tests
+ ${project.version}
+ test
+
+
+
+
+
+ maven-resources-plugin
+
+
+ copy-java-sources
+ process-sources
+
+ copy-resources
+
+
+ ${basedir}/target/classes/org/apache/drill/exec/store/httpd
+
+
+
+ src/main/java/org/apache/drill/exec/store/httpd
+ true
+
+
+
+
+
+
+
+
+
diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogBatchReader.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogBatchReader.java
new file mode 100644
index 00000000000..07f14393856
--- /dev/null
+++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogBatchReader.java
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.httpd;
+
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
+import org.apache.hadoop.mapred.FileSplit;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.BufferedReader;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+
+public class HttpdLogBatchReader implements ManagedReader {
+
+ private static final Logger logger = LoggerFactory.getLogger(HttpdLogBatchReader.class);
+ public static final String RAW_LINE_COL_NAME = "_raw";
+ public static final String MATCHED_COL_NAME = "_matched";
+ private final HttpdLogFormatConfig formatConfig;
+ private final int maxRecords;
+ private final EasySubScan scan;
+ private HttpdParser parser;
+ private FileSplit split;
+ private InputStream fsStream;
+ private RowSetLoader rowWriter;
+ private BufferedReader reader;
+ private int lineNumber;
+ private CustomErrorContext errorContext;
+ private ScalarWriter rawLineWriter;
+ private ScalarWriter matchedWriter;
+ private int errorCount;
+
+
+ public HttpdLogBatchReader(HttpdLogFormatConfig formatConfig, int maxRecords, EasySubScan scan) {
+ this.formatConfig = formatConfig;
+ this.maxRecords = maxRecords;
+ this.scan = scan;
+ }
+
+ @Override
+ public boolean open(FileSchemaNegotiator negotiator) {
+ // Open the input stream to the log file
+ openFile(negotiator);
+ errorContext = negotiator.parentErrorContext();
+ try {
+ parser = new HttpdParser(formatConfig.getLogFormat(), formatConfig.getTimestampFormat(), formatConfig.getFlattenWildcards(), scan);
+ negotiator.tableSchema(parser.setupParser(), false);
+ } catch (Exception e) {
+ throw UserException.dataReadError(e)
+ .message("Error opening HTTPD file: " + e.getMessage())
+ .addContext(errorContext)
+ .build(logger);
+ }
+
+ ResultSetLoader loader = negotiator.build();
+ rowWriter = loader.writer();
+ parser.addFieldsToParser(rowWriter);
+ rawLineWriter = addImplicitColumn(RAW_LINE_COL_NAME, MinorType.VARCHAR);
+ matchedWriter = addImplicitColumn(MATCHED_COL_NAME, MinorType.BIT);
+ return true;
+ }
+
+ @Override
+ public boolean next() {
+ while (!rowWriter.isFull()) {
+ if (!nextLine(rowWriter)) {
+ return false;
+ }
+ }
+ return true;
+ }
+
+ private boolean nextLine(RowSetLoader rowWriter) {
+ String line;
+
+ // Check if the limit has been reached
+ if (rowWriter.limitReached(maxRecords)) {
+ return false;
+ }
+
+ try {
+ line = reader.readLine();
+ if (line == null) {
+ return false;
+ } else if (line.isEmpty()) {
+ return true;
+ }
+ } catch (Exception e) {
+ throw UserException.dataReadError(e)
+ .message("Error reading HTTPD file at line number %d", lineNumber)
+ .addContext(e.getMessage())
+ .addContext(errorContext)
+ .build(logger);
+ }
+ // Start the row
+ rowWriter.start();
+
+ try {
+ parser.parse(line);
+ matchedWriter.setBoolean(true);
+ } catch (Exception e) {
+ errorCount++;
+ if (errorCount >= formatConfig.getMaxErrors()) {
+ throw UserException.dataReadError()
+ .message("Error reading HTTPD file at line number %d", lineNumber)
+ .addContext(e.getMessage())
+ .addContext(errorContext)
+ .build(logger);
+ } else {
+ matchedWriter.setBoolean(false);
+ }
+ }
+
+ // Write raw line
+ rawLineWriter.setString(line);
+
+ // Finish the row
+ rowWriter.save();
+ lineNumber++;
+
+ return true;
+ }
+
+ @Override
+ public void close() {
+ if (fsStream == null) {
+ return;
+ }
+ try {
+ fsStream.close();
+ } catch (IOException e) {
+ logger.warn("Error when closing HTTPD file: {} {}", split.getPath().toString(), e.getMessage());
+ }
+ fsStream = null;
+ }
+
+ private void openFile(FileSchemaNegotiator negotiator) {
+ split = negotiator.split();
+ try {
+ fsStream = negotiator.fileSystem().openPossiblyCompressedStream(split.getPath());
+ } catch (Exception e) {
+ throw UserException
+ .dataReadError(e)
+ .message("Failed to open open input file: %s", split.getPath().toString())
+ .addContext(e.getMessage())
+ .build(logger);
+ }
+ reader = new BufferedReader(new InputStreamReader(fsStream, Charsets.UTF_8));
+ }
+
+ private ScalarWriter addImplicitColumn(String colName, MinorType type) {
+ ColumnMetadata colSchema = MetadataUtils.newScalar(colName, type, TypeProtos.DataMode.OPTIONAL);
+ colSchema.setBooleanProperty(ColumnMetadata.EXCLUDE_FROM_WILDCARD, true);
+ int index = rowWriter.addColumn(colSchema);
+
+ return rowWriter.scalar(index);
+ }
+}
diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java
similarity index 56%
rename from exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java
rename to contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java
index 0aa7ecefd81..a1f56177328 100644
--- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java
+++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatConfig.java
@@ -17,33 +17,46 @@
*/
package org.apache.drill.exec.store.httpd;
-import java.util.Objects;
-
-import org.apache.drill.common.PlanStringBuilder;
-import org.apache.drill.common.logical.FormatPluginConfig;
-
import com.fasterxml.jackson.annotation.JsonCreator;
import com.fasterxml.jackson.annotation.JsonInclude;
import com.fasterxml.jackson.annotation.JsonProperty;
import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.common.PlanStringBuilder;
+import org.apache.drill.common.logical.FormatPluginConfig;
+import org.apache.drill.shaded.guava.com.google.common.collect.ImmutableList;
-@JsonTypeName("httpd")
+import java.util.Collections;
+import java.util.List;
+import java.util.Objects;
+
+@JsonTypeName(HttpdLogFormatPlugin.DEFAULT_NAME)
@JsonInclude(JsonInclude.Include.NON_DEFAULT)
public class HttpdLogFormatConfig implements FormatPluginConfig {
public static final String DEFAULT_TS_FORMAT = "dd/MMM/yyyy:HH:mm:ss ZZ";
+ public final String logFormat;
+ public final String timestampFormat;
+ public final List extensions;
+ public final boolean flattenWildcards;
+ public final int maxErrors;
- // No extensions?
- private final String logFormat;
- private final String timestampFormat;
@JsonCreator
public HttpdLogFormatConfig(
+ @JsonProperty("extensions") List extensions,
@JsonProperty("logFormat") String logFormat,
- @JsonProperty("timestampFormat") String timestampFormat) {
+ @JsonProperty("timestampFormat") String timestampFormat,
+ @JsonProperty("maxErrors") int maxErrors,
+ @JsonProperty("flattenWildcards") boolean flattenWildcards
+ ) {
+
+ this.extensions = extensions == null
+ ? Collections.singletonList("httpd")
+ : ImmutableList.copyOf(extensions);
this.logFormat = logFormat;
- this.timestampFormat = timestampFormat == null
- ? DEFAULT_TS_FORMAT : timestampFormat;
+ this.timestampFormat = timestampFormat;
+ this.maxErrors = maxErrors;
+ this.flattenWildcards = flattenWildcards;
}
/**
@@ -61,23 +74,32 @@ public String getTimestampFormat() {
return timestampFormat;
}
+ public List getExtensions() {
+ return extensions;
+ }
+
+ public int getMaxErrors() { return maxErrors;}
+
+ public boolean getFlattenWildcards () { return flattenWildcards; }
+
@Override
public int hashCode() {
- return Objects.hash(logFormat, timestampFormat);
+ return Objects.hash(logFormat, timestampFormat, maxErrors, flattenWildcards);
}
@Override
- public boolean equals(Object o) {
- if (this == o) {
+ public boolean equals(Object obj) {
+ if (this == obj) {
return true;
}
- if (o == null || getClass() != o.getClass()) {
+ if (obj == null || getClass() != obj.getClass()) {
return false;
}
-
- HttpdLogFormatConfig that = (HttpdLogFormatConfig) o;
- return Objects.equals(logFormat, that.logFormat) &&
- Objects.equals(timestampFormat, that.timestampFormat);
+ HttpdLogFormatConfig other = (HttpdLogFormatConfig) obj;
+ return Objects.equals(logFormat, other.logFormat)
+ && Objects.equals(timestampFormat, other.timestampFormat)
+ && Objects.equals(maxErrors, other.maxErrors)
+ && Objects.equals(flattenWildcards, other.flattenWildcards);
}
@Override
@@ -85,6 +107,8 @@ public String toString() {
return new PlanStringBuilder(this)
.field("log format", logFormat)
.field("timestamp format", timestampFormat)
+ .field("max errors", maxErrors)
+ .field("flattenWildcards", flattenWildcards)
.toString();
}
}
diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java
new file mode 100644
index 00000000000..674bfdb7cd6
--- /dev/null
+++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java
@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.httpd;
+
+import org.apache.drill.common.logical.StoragePluginConfig;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.Types;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileReaderFactory;
+import org.apache.drill.exec.physical.impl.scan.file.FileScanFramework.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.impl.scan.framework.ManagedReader;
+import org.apache.drill.exec.proto.UserBitShared;
+import org.apache.drill.exec.server.DrillbitContext;
+import org.apache.drill.exec.server.options.OptionManager;
+import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.hadoop.conf.Configuration;
+
+public class HttpdLogFormatPlugin extends EasyFormatPlugin {
+
+ protected static final String DEFAULT_NAME = "httpd";
+
+ private static class HtttpLogReaderFactory extends FileReaderFactory {
+
+ private final HttpdLogFormatConfig config;
+ private final int maxRecords;
+ private final EasySubScan scan;
+
+ private HtttpLogReaderFactory(HttpdLogFormatConfig config, int maxRecords, EasySubScan scan) {
+ this.config = config;
+ this.maxRecords = maxRecords;
+ this.scan = scan;
+ }
+
+ @Override
+ public ManagedReader extends FileScanFramework.FileSchemaNegotiator> newReader() {
+ return new HttpdLogBatchReader(config, maxRecords, scan);
+ }
+ }
+
+ public HttpdLogFormatPlugin(final String name,
+ final DrillbitContext context,
+ final Configuration fsConf,
+ final StoragePluginConfig storageConfig,
+ final HttpdLogFormatConfig formatConfig) {
+
+ super(name, easyConfig(fsConf, formatConfig), context, storageConfig, formatConfig);
+ }
+
+ private static EasyFormatConfig easyConfig(Configuration fsConf, HttpdLogFormatConfig pluginConfig) {
+ EasyFormatConfig config = new EasyFormatConfig();
+ config.readable = true;
+ config.writable = false;
+ config.blockSplittable = false;
+ config.compressible = true;
+ config.supportsProjectPushdown = true;
+ config.extensions = pluginConfig.getExtensions();
+ config.fsConf = fsConf;
+ config.defaultName = DEFAULT_NAME;
+ config.readerOperatorType = UserBitShared.CoreOperatorType.HTPPD_LOG_SUB_SCAN_VALUE;
+ config.useEnhancedScan = true;
+ config.supportsLimitPushdown = true;
+ return config;
+ }
+
+ @Override
+ public ManagedReader extends FileSchemaNegotiator> newBatchReader(
+ EasySubScan scan, OptionManager options) {
+ return new HttpdLogBatchReader(formatConfig, scan.getMaxRecords(), scan);
+ }
+
+ @Override
+ protected FileScanFramework.FileScanBuilder frameworkBuilder(OptionManager options, EasySubScan scan) {
+ FileScanFramework.FileScanBuilder builder = new FileScanFramework.FileScanBuilder();
+ builder.setReaderFactory(new HtttpLogReaderFactory(formatConfig, scan.getMaxRecords(), scan));
+
+ initScanBuilder(builder, scan);
+ builder.nullType(Types.optional(TypeProtos.MinorType.VARCHAR));
+ return builder;
+ }
+}
diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java
new file mode 100644
index 00000000000..8f2c73acbb7
--- /dev/null
+++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java
@@ -0,0 +1,482 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.httpd;
+
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.ColumnMetadata;
+import org.apache.drill.exec.record.metadata.MetadataUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.Map;
+
+import nl.basjes.parse.core.Casts;
+import nl.basjes.parse.core.Parser;
+import org.joda.time.Instant;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.text.SimpleDateFormat;
+import java.util.Date;
+
+public class HttpdLogRecord {
+
+ private static final Logger logger = LoggerFactory.getLogger(HttpdLogRecord.class);
+
+ private final Map strings = Maps.newHashMap();
+ private final Map longs = Maps.newHashMap();
+ private final Map doubles = Maps.newHashMap();
+ private final Map dates = Maps.newHashMap();
+ private final Map times = Maps.newHashMap();
+ private final Map timestamps = new HashMap<>();
+ private final Map wildcards = Maps.newHashMap();
+ private final Map cleanExtensions = Maps.newHashMap();
+ private final Map startedWildcards = Maps.newHashMap();
+ private final Map wildcardWriters = Maps.newHashMap();
+ private final SimpleDateFormat dateFormatter;
+ private RowSetLoader rootRowWriter;
+ private final boolean flattenWildcards;
+
+ public HttpdLogRecord(String timeFormat, boolean flattenWildcards) {
+ if (timeFormat == null) {
+ timeFormat = HttpdLogFormatConfig.DEFAULT_TS_FORMAT;
+ }
+ this.dateFormatter = new SimpleDateFormat(timeFormat);
+ this.flattenWildcards = flattenWildcards;
+ }
+
+ /**
+ * Call this method after a record has been parsed. This finished the lifecycle of any maps that were written and
+ * removes all the entries for the next record to be able to work.
+ */
+ public void finishRecord() {
+ wildcardWriters.clear();
+ startedWildcards.clear();
+ }
+
+ /**
+ * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
+ * called when the value of a log field is a String data type.
+ *
+ * @param field name of field
+ * @param value value of field
+ */
+ @SuppressWarnings("unused")
+ public void set(String field, String value) {
+ if (value != null) {
+ final ScalarWriter w = strings.get(field);
+ if (w != null) {
+ logger.debug("Parsed field: {}, as string: {}", field, value);
+ w.setString(value);
+ } else {
+ logger.warn("No 'string' writer found for field: {}", field);
+ }
+ }
+ }
+
+ /**
+ * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
+ * called when the value of a log field is a Long data type.
+ *
+ * @param field name of field
+ * @param value value of field
+ */
+ @SuppressWarnings("unused")
+ public void set(String field, Long value) {
+ if (value != null) {
+ final ScalarWriter w = longs.get(field);
+ if (w != null) {
+ logger.debug("Parsed field: {}, as long: {}", field, value);
+ w.setLong(value);
+ } else {
+ logger.warn("No 'long' writer found for field: {}", field);
+ }
+ }
+ }
+
+ /**
+ * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
+ * called when the value of a log field is a Date data type.
+ *
+ * @param field name of field
+ * @param value value of field
+ */
+ @SuppressWarnings("unused")
+ public void setDate(String field, String value) {
+ if (value != null) {
+ final ScalarWriter w = dates.get(field);
+ if (w != null) {
+ logger.debug("Parsed field: {}, as long: {}", field, value);
+ w.setDate(new LocalDate(value));
+ } else {
+ logger.warn("No 'date' writer found for field: {}", field);
+ }
+ }
+ }
+
+ /**
+ * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
+ * called when the value of a log field is a Time data type.
+ *
+ * @param field name of field
+ * @param value value of field
+ */
+ @SuppressWarnings("unused")
+ public void setTime(String field, String value) {
+ if (value != null) {
+ final ScalarWriter w = times.get(field);
+ if (w != null) {
+ logger.debug("Parsed field: {}, as long: {}", field, value);
+ w.setTime(new LocalTime(value));
+ } else {
+ logger.warn("No 'date' writer found for field: {}", field);
+ }
+ }
+ }
+
+ /**
+ * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
+ * called when the value of a log field is a timesstamp data type.
+ *
+ * @param field name of field
+ * @param value value of field
+ */
+ @SuppressWarnings("unused")
+ public void setTimestampFromEpoch(String field, Long value) {
+ if (value != null) {
+ final ScalarWriter w = timestamps.get(field);
+ if (w != null) {
+ logger.debug("Parsed field: {}, as timestamp: {}", field, value);
+ w.setTimestamp(new Instant(value));
+ } else {
+ logger.warn("No 'timestamp' writer found for field: {}", field);
+ }
+ }
+ }
+
+
+ /**
+ * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
+ * called when the value of a log field is a timesstamp data type.
+ *
+ * @param field name of field
+ * @param value value of field
+ */
+ @SuppressWarnings("unused")
+ public void setTimestamp(String field, String value) {
+ if (value != null) {
+ //Convert the date string into a long
+ long ts = 0;
+ try {
+ Date d = this.dateFormatter.parse(value);
+ ts = d.getTime();
+ } catch (Exception e) {
+ //If the date formatter does not successfully create a date, the timestamp will fall back to zero
+ //Do not throw exception
+ }
+ final ScalarWriter tw = timestamps.get(field);
+ if (tw != null) {
+ logger.debug("Parsed field: {}, as time: {}", field, value);
+ tw.setTimestamp(new Instant(ts));
+ } else {
+ logger.warn("No 'timestamp' writer found for field: {}", field);
+ }
+ }
+ }
+
+ /**
+ * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
+ * called when the value of a log field is a Double data type.
+ *
+ * @param field name of field
+ * @param value value of field
+ */
+ @SuppressWarnings("unused")
+ public void set(String field, Double value) {
+ if (value != null) {
+ final ScalarWriter w = doubles.get(field);
+ if (w != null) {
+ logger.debug("Parsed field: {}, as double: {}", field, value);
+ w.setDouble(value);
+ } else {
+ logger.warn("No 'double' writer found for field: {}", field);
+ }
+ }
+ }
+
+ /**
+ * This method is referenced and called via reflection. When the parser processes a field like:
+ * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be
+ * invoked.
+ *
+ * @param field name of field
+ * @param value value of field
+ */
+ @SuppressWarnings("unused")
+ public void setWildcard(String field, String value) {
+ if (value != null) {
+ String cleanedField = HttpdUtils.getFieldNameFromMap(field);
+ if (flattenWildcards) {
+ String drillFieldName = HttpdUtils.drillFormattedFieldName(field);
+ ScalarWriter writer = getColWriter(rootRowWriter, drillFieldName, MinorType.VARCHAR);
+ writer.setString(value);
+ } else {
+ final TupleWriter mapWriter = getWildcardWriter(field);
+ logger.debug("Parsed wildcard field: {}, as String: {}", field, value);
+ writeStringColumn(mapWriter, cleanedField, value);
+ }
+ }
+ }
+
+ /**
+ * This method is referenced and called via reflection. When the parser processes a field like:
+ * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be
+ * invoked.
+ *
+ * @param field name of field
+ * @param value value of field
+ */
+ @SuppressWarnings("unused")
+ public void setWildcard(String field, Long value) {
+ if (value != null) {
+ String cleanedField = HttpdUtils.getFieldNameFromMap(field);
+
+ if (flattenWildcards) {
+ String drillFieldName = HttpdUtils.drillFormattedFieldName(field);
+ ScalarWriter writer = getColWriter(rootRowWriter, drillFieldName, MinorType.BIGINT);
+ writer.setLong(value);
+ } else {
+ final TupleWriter mapWriter = getWildcardWriter(field);
+ logger.debug("Parsed wildcard field: {}, as long: {}", field, value);
+ writeLongColumn(mapWriter, cleanedField, value);
+ }
+ }
+ }
+
+ /**
+ * This method is referenced and called via reflection. When the parser processes a field like:
+ * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be
+ * invoked.
+ *
+ * @param field name of field
+ * @param value value of field
+ */
+ @SuppressWarnings("unused")
+ public void setWildcard(String field, Double value) {
+ if (value != null) {
+ String cleanedField = HttpdUtils.getFieldNameFromMap(field);
+
+ if (flattenWildcards) {
+ String drillFieldName = HttpdUtils.drillFormattedFieldName(field);
+ ScalarWriter writer = getColWriter(rootRowWriter, drillFieldName, MinorType.FLOAT8);
+ writer.setDouble(value);
+ } else {
+ final TupleWriter mapWriter = getWildcardWriter(field);
+ logger.debug("Parsed wildcard field: {}, as double: {}", field, value);
+ writeFloatColumn(mapWriter, cleanedField, value);
+ }
+ }
+ }
+
+ /**
+ * For a configuration like HTTP.URI:request.firstline.uri.query.*, a writer was created with name
+ * HTTP.URI:request.firstline.uri.query, we traverse the list of wildcard writers to see which one is the root of the
+ * name of the field passed in like HTTP.URI:request.firstline.uri.query.old. This is writer entry that is needed.
+ *
+ * @param field like HTTP.URI:request.firstline.uri.query.old where 'old' is one of many different parameter names.
+ * @return the writer to be used for this field.
+ */
+ private TupleWriter getWildcardWriter(String field) {
+
+ TupleWriter writer = startedWildcards.get(field);
+ if (writer == null) {
+ for (Map.Entry entry : wildcards.entrySet()) {
+ String root = entry.getKey();
+ if (field.startsWith(root)) {
+ writer = entry.getValue();
+ /*
+ * In order to save some time, store the cleaned version of the field extension. It is possible it will have
+ * unsafe characters in it.
+ */
+ if (!cleanExtensions.containsKey(field)) {
+ String extension = field.substring(root.length() + 1);
+ String cleanExtension = HttpdUtils.drillFormattedFieldName(extension);
+ cleanExtensions.put(field, cleanExtension);
+ logger.debug("Added extension: field='{}' with cleanExtension='{}'", field, cleanExtension);
+ }
+
+ /*
+ * We already know we have the writer, but if we have put this writer in the started list, do NOT call start
+ * again.
+ */
+ if (!wildcardWriters.containsKey(root)) {
+ /*
+ * Start and store this root map writer for later retrieval.
+ */
+ logger.debug("Starting new wildcard field writer: {}", field);
+ startedWildcards.put(field, writer);
+ wildcardWriters.put(root, writer);
+ }
+ /*
+ * Break out of the for loop when we find a root writer that matches the field.
+ */
+ break;
+ }
+ }
+ }
+
+ return writer;
+ }
+
+ public Map getStrings() {
+ return strings;
+ }
+
+ public Map getLongs() {
+ return longs;
+ }
+
+ public Map getDoubles() {
+ return doubles;
+ }
+
+ public Map getTimestamps() {
+ return timestamps;
+ }
+
+ /**
+ * This record will be used with a single parser. For each field that is to be parsed a setter will be called. It
+ * registers a setter method for each field being parsed. It also builds the data writers to hold the data beings
+ * parsed.
+ *
+ * @param parser The initialized HttpdParser
+ * @param rowWriter An initialized RowSetLoader object
+ * @param type The Drill MinorType which sets the data type in the rowWriter
+ * @param parserFieldName The field name which is generated by the Httpd Parser. These are not "Drill safe"
+ * @param drillFieldName The Drill safe field name
+ * @param mappedColumns A list of columns mapped to their correct Drill data type
+ * @throws NoSuchMethodException Thrown in the event that the parser does not have a correct setter method
+ */
+ public void addField(final Parser parser,
+ final RowSetLoader rowWriter,
+ final EnumSet type,
+ final String parserFieldName,
+ final String drillFieldName,
+ Map mappedColumns) throws NoSuchMethodException {
+ final boolean hasWildcard = parserFieldName.endsWith(HttpdParser.PARSER_WILDCARD);
+
+ logger.debug("Field name: {}", parserFieldName);
+ rootRowWriter = rowWriter;
+ /*
+ * This is a dynamic way to map the setter for each specified field type.
+ * e.g. a TIME.EPOCH may map to a LONG while a referrer may map to a STRING
+ */
+ if (hasWildcard) {
+ final String cleanName = parserFieldName.substring(0, parserFieldName.length() - HttpdParser.PARSER_WILDCARD.length());
+ logger.debug("Adding WILDCARD parse target: {} as {}, with field name: {}", parserFieldName, cleanName, drillFieldName);
+ parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, String.class), parserFieldName);
+ parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, Double.class), parserFieldName);
+ parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, Long.class), parserFieldName);
+ wildcards.put(cleanName, getMapWriter(drillFieldName, rowWriter));
+ } else if (type.contains(Casts.DOUBLE) || mappedColumns.get(drillFieldName) == MinorType.FLOAT8) {
+ parser.addParseTarget(this.getClass().getMethod("set", String.class, Double.class), parserFieldName);
+ doubles.put(parserFieldName, rowWriter.scalar(drillFieldName));
+ } else if (type.contains(Casts.LONG) || mappedColumns.get(drillFieldName) == MinorType.BIGINT) {
+ parser.addParseTarget(this.getClass().getMethod("set", String.class, Long.class), parserFieldName);
+ longs.put(parserFieldName, rowWriter.scalar(drillFieldName));
+ } else {
+ if (parserFieldName.startsWith("TIME.STAMP:")) {
+ parser.addParseTarget(this.getClass().getMethod("setTimestamp", String.class, String.class), parserFieldName);
+ timestamps.put(parserFieldName, rowWriter.scalar(drillFieldName));
+ } else if (parserFieldName.startsWith("TIME.EPOCH:")) {
+ parser.addParseTarget(this.getClass().getMethod("setTimestampFromEpoch", String.class, Long.class), parserFieldName);
+ timestamps.put(parserFieldName, rowWriter.scalar(drillFieldName));
+ } else if (parserFieldName.startsWith("TIME.DATE")) {
+ parser.addParseTarget(this.getClass().getMethod("setDate", String.class, String.class), parserFieldName);
+ dates.put(parserFieldName, rowWriter.scalar(drillFieldName));
+ } else if (parserFieldName.startsWith("TIME.TIME")) {
+ parser.addParseTarget(this.getClass().getMethod("setTime", String.class, String.class), parserFieldName);
+ times.put(parserFieldName, rowWriter.scalar(drillFieldName));
+ } else {
+ parser.addParseTarget(this.getClass().getMethod("set", String.class, String.class), parserFieldName);
+ strings.put(parserFieldName, rowWriter.scalar(drillFieldName));
+ }
+ }
+ }
+
+ private TupleWriter getMapWriter(String mapName, RowSetLoader rowWriter) {
+ int index = rowWriter.tupleSchema().index(mapName);
+ if (index == -1) {
+ index = rowWriter.addColumn(SchemaBuilder.columnSchema(mapName, TypeProtos.MinorType.MAP, TypeProtos.DataMode.REQUIRED));
+ }
+ return rowWriter.tuple(index);
+ }
+
+ /**
+ * Helper function to write a 1D long column
+ *
+ * @param rowWriter The row to which the data will be written
+ * @param name The column name
+ * @param value The value to be written
+ */
+ private void writeLongColumn(TupleWriter rowWriter, String name, long value) {
+ ScalarWriter colWriter = getColWriter(rowWriter, name, MinorType.BIGINT);
+ colWriter.setLong(value);
+ }
+
+ /**
+ * Helper function to write a 1D String column
+ *
+ * @param rowWriter The row to which the data will be written
+ * @param name The column name
+ * @param value The value to be written
+ */
+ private void writeStringColumn(TupleWriter rowWriter, String name, String value) {
+ ScalarWriter colWriter = getColWriter(rowWriter, name, MinorType.VARCHAR);
+ colWriter.setString(value);
+ }
+
+ /**
+ * Helper function to write a 1D String column
+ *
+ * @param rowWriter The row to which the data will be written
+ * @param name The column name
+ * @param value The value to be written
+ */
+ private void writeFloatColumn(TupleWriter rowWriter, String name, double value) {
+ ScalarWriter colWriter = getColWriter(rowWriter, name, MinorType.FLOAT8);
+ colWriter.setDouble(value);
+ }
+
+ private ScalarWriter getColWriter(TupleWriter tupleWriter, String fieldName, TypeProtos.MinorType type) {
+ int index = tupleWriter.tupleSchema().index(fieldName);
+ if (index == -1) {
+ ColumnMetadata colSchema = MetadataUtils.newScalar(fieldName, type, TypeProtos.DataMode.OPTIONAL);
+ index = tupleWriter.addColumn(colSchema);
+ }
+ return tupleWriter.scalar(index);
+ }
+}
diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
new file mode 100644
index 00000000000..36fe949e019
--- /dev/null
+++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
@@ -0,0 +1,258 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.httpd;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
+import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
+import nl.basjes.parse.core.Casts;
+import nl.basjes.parse.core.Parser;
+import nl.basjes.parse.core.exceptions.DissectionFailure;
+import nl.basjes.parse.core.exceptions.InvalidDissectorException;
+import nl.basjes.parse.core.exceptions.MissingDissectorsException;
+import nl.basjes.parse.httpdlog.HttpdLoglineParser;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.EnumSet;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+public class HttpdParser {
+
+ private static final Logger logger = LoggerFactory.getLogger(HttpdParser.class);
+
+ public static final String PARSER_WILDCARD = ".*";
+ public static final String REMAPPING_FLAG = "#";
+ private final Parser parser;
+ private final List requestedColumns;
+ private final Map mappedColumns;
+ private final HttpdLogRecord record;
+ private final String logFormat;
+ private Map requestedPaths;
+ private EnumSet casts;
+
+
+ public HttpdParser(final String logFormat, final String timestampFormat, final boolean flattenWildcards, final EasySubScan scan) {
+
+ Preconditions.checkArgument(logFormat != null && !logFormat.trim().isEmpty(), "logFormat cannot be null or empty");
+
+ this.logFormat = logFormat;
+ this.record = new HttpdLogRecord(timestampFormat, flattenWildcards);
+
+ if (timestampFormat == null) {
+ this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat);
+ } else {
+ this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat, timestampFormat);
+ }
+
+ /*
+ * The log parser has the possibility of parsing the user agent and extracting additional fields
+ * Unfortunately, doing so negatively affects the speed of the parser. Uncommenting this line and another in
+ * the HttpLogRecord will enable these fields. We will add this functionality in a future PR.
+ * this.parser.addDissector(new UserAgentDissector());
+ */
+
+ this.requestedColumns = scan.getColumns();
+
+ if (timestampFormat != null && !timestampFormat.trim().isEmpty()) {
+ logger.info("Custom timestamp format has been specified. This is an informational note only as custom timestamps is rather unusual.");
+ }
+ if (logFormat.contains("\n")) {
+ logger.info("Specified logformat is a multiline log format: {}", logFormat);
+ }
+
+ mappedColumns = new HashMap<>();
+ }
+
+ /**
+ * We do not expose the underlying parser or the record which is used to manage the writers.
+ *
+ * @param line log line to tear apart.
+ * @throws DissectionFailure if there is a generic dissector failure
+ * @throws InvalidDissectorException if the dissector is not valid
+ * @throws MissingDissectorsException if the dissector is missing
+ */
+ public void parse(final String line) throws DissectionFailure, InvalidDissectorException, MissingDissectorsException {
+ parser.parse(record, line);
+ record.finishRecord();
+ }
+
+ public TupleMetadata setupParser()
+ throws NoSuchMethodException, MissingDissectorsException, InvalidDissectorException {
+
+ SchemaBuilder builder = new SchemaBuilder();
+
+ /*
+ * If the user has selected fields, then we will use them to configure the parser because this would be the most
+ * efficient way to parse the log.
+ */
+ List allParserPaths = parser.getPossiblePaths();
+
+ /*
+ * Use all possible paths that the parser has determined from the specified log format.
+ */
+
+ requestedPaths = Maps.newConcurrentMap();
+
+ for (final String parserPath : allParserPaths) {
+ requestedPaths.put(HttpdUtils.drillFormattedFieldName(parserPath), parserPath);
+ }
+
+ /*
+ * By adding the parse target to the dummy instance we activate it for use. Which we can then use to find out which
+ * paths cast to which native data types. After we are done figuring this information out, we throw this away
+ * because this will be the slowest parsing path possible for the specified format.
+ */
+ Parser dummy = new HttpdLoglineParser<>(Object.class, logFormat);
+
+ /* This is the second line to uncomment to add the user agent parsing.
+ * dummy.addDissector(new UserAgentDissector());
+ */
+ dummy.addParseTarget(String.class.getMethod("indexOf", String.class), allParserPaths);
+
+ for (final Map.Entry entry : requestedPaths.entrySet()) {
+
+ /*
+ If the column is not requested explicitly, remove it from the requested path list.
+ */
+ if (! isRequested(entry.getKey()) &&
+ !(isStarQuery()) &&
+ !isMetadataQuery() &&
+ !isOnlyImplicitColumns() ) {
+ requestedPaths.remove(entry.getKey());
+ continue;
+ }
+
+ /*
+ * Check the field specified by the user to see if it is supposed to be remapped.
+ */
+ if (entry.getValue().startsWith(REMAPPING_FLAG)) {
+ /*
+ * Because this field is being remapped we need to replace the field name that the parser uses.
+ */
+ entry.setValue(entry.getValue().substring(REMAPPING_FLAG.length()));
+
+ final String[] pieces = entry.getValue().split(":");
+ HttpdUtils.addTypeRemapping(parser, pieces[1], pieces[0]);
+ casts = Casts.STRING_ONLY;
+ } else {
+ casts = dummy.getCasts(entry.getValue());
+ }
+
+ Casts dataType = (Casts) casts.toArray()[casts.size() - 1];
+
+ switch (dataType) {
+ case STRING:
+ if (entry.getValue().startsWith("TIME.STAMP:")) {
+ builder.addNullable(entry.getKey(), MinorType.TIMESTAMP);
+ mappedColumns.put(entry.getKey(), MinorType.TIMESTAMP);
+ } else if (entry.getValue().startsWith("TIME.DATE:")) {
+ builder.addNullable(entry.getKey(), MinorType.DATE);
+ mappedColumns.put(entry.getKey(), MinorType.DATE);
+ } else if (entry.getValue().startsWith("TIME.TIME:")) {
+ builder.addNullable(entry.getKey(), MinorType.TIME);
+ mappedColumns.put(entry.getKey(), MinorType.TIME);
+ } else if (HttpdUtils.isWildcard(entry.getValue())) {
+ builder.addMap(entry.getValue());
+ mappedColumns.put(entry.getKey(), MinorType.MAP);
+ }
+ else {
+ builder.addNullable(entry.getKey(), TypeProtos.MinorType.VARCHAR);
+ mappedColumns.put(entry.getKey(), MinorType.VARCHAR);
+ }
+ break;
+ case LONG:
+ if (entry.getValue().startsWith("TIME.EPOCH:")) {
+ builder.addNullable(entry.getKey(), MinorType.TIMESTAMP);
+ mappedColumns.put(entry.getKey(), MinorType.TIMESTAMP);
+ } else {
+ builder.addNullable(entry.getKey(), TypeProtos.MinorType.BIGINT);
+ mappedColumns.put(entry.getKey(), MinorType.BIGINT);
+ }
+ break;
+ case DOUBLE:
+ builder.addNullable(entry.getKey(), TypeProtos.MinorType.FLOAT8);
+ mappedColumns.put(entry.getKey(), MinorType.FLOAT8);
+ break;
+ default:
+ logger.error("HTTPD Unsupported data type {} for field {}", dataType.toString(), entry.getKey());
+ break;
+ }
+ }
+ return builder.build();
+ }
+
+ public void addFieldsToParser(RowSetLoader rowWriter) {
+ for (final Map.Entry entry : requestedPaths.entrySet()) {
+ try {
+ record.addField(parser, rowWriter, casts, entry.getValue(), entry.getKey(), mappedColumns);
+ } catch (NoSuchMethodException e) {
+ logger.error("Error adding fields to parser.");
+ }
+ }
+ logger.debug("Added Fields to Parser");
+ }
+
+ public boolean isStarQuery() {
+ return requestedColumns.size() == 1 && requestedColumns.get(0).isDynamicStar();
+ }
+
+ public boolean isMetadataQuery() {
+ return requestedColumns.size() == 0;
+ }
+
+ public boolean isRequested(String colName) {
+ for (SchemaPath path : requestedColumns) {
+ if (path.isDynamicStar()) {
+ return true;
+ } else if (path.nameEquals(colName)) {
+ return true;
+ }
+ }
+ return false;
+ }
+
+ /*
+ This is for the edge case where a query only contains the implicit fields.
+ */
+ public boolean isOnlyImplicitColumns() {
+
+ // If there are more than two columns, this isn't an issue.
+ if (requestedColumns.size() > 2) {
+ return false;
+ }
+
+ if (requestedColumns.size() == 1) {
+ return requestedColumns.get(0).nameEquals(HttpdLogBatchReader.RAW_LINE_COL_NAME) ||
+ requestedColumns.get(0).nameEquals(HttpdLogBatchReader.MATCHED_COL_NAME);
+ } else {
+ return (requestedColumns.get(0).nameEquals(HttpdLogBatchReader.RAW_LINE_COL_NAME) ||
+ requestedColumns.get(0).nameEquals(HttpdLogBatchReader.MATCHED_COL_NAME)) &&
+ (requestedColumns.get(1).nameEquals(HttpdLogBatchReader.RAW_LINE_COL_NAME) ||
+ requestedColumns.get(1).nameEquals(HttpdLogBatchReader.MATCHED_COL_NAME));
+ }
+ }
+}
diff --git a/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdUtils.java b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdUtils.java
new file mode 100644
index 00000000000..5a975b657b2
--- /dev/null
+++ b/contrib/format-httpd/src/main/java/org/apache/drill/exec/store/httpd/HttpdUtils.java
@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.httpd;
+
+import nl.basjes.parse.core.Parser;
+
+public class HttpdUtils {
+
+ public static final String PARSER_WILDCARD = ".*";
+ public static final String SAFE_WILDCARD = "_$";
+ public static final String SAFE_SEPARATOR = "_";
+
+ /**
+ * Drill cannot deal with fields with dots in them like request.referer. For the sake of simplicity we are going
+ * ensure the field name is cleansed. The resultant output field will look like: request_referer.
+ * Additionally, wild cards will get replaced with _$
+ *
+ * @param parserFieldName name to be cleansed.
+ * @return The field name formatted for Drill
+ */
+ public static String drillFormattedFieldName(String parserFieldName) {
+ if (parserFieldName.contains(":")) {
+ String[] fieldPart = parserFieldName.split(":");
+ return fieldPart[1].replaceAll("_", "__").replace(PARSER_WILDCARD, SAFE_WILDCARD).replaceAll("\\.", SAFE_SEPARATOR);
+ } else {
+ return parserFieldName.replaceAll("_", "__").replace(PARSER_WILDCARD, SAFE_WILDCARD).replaceAll("\\.", SAFE_SEPARATOR);
+ }
+ }
+
+ /**
+ * In order to define a type remapping the format of the field configuration will look like:
+ * HTTP.URI:request.firstline.uri.query.[parameter name]
+ *
+ * @param parser Add type remapping to this parser instance.
+ * @param fieldName request.firstline.uri.query.[parameter_name]
+ * @param fieldType HTTP.URI, etc..
+ */
+ public static void addTypeRemapping(final Parser parser, final String fieldName, final String fieldType) {
+ parser.addTypeRemapping(fieldName, fieldType);
+ }
+
+ /**
+ * Returns true if the field is a wildcard AKA map field, false if not.
+ * @param fieldName The target field name
+ * @return True if the field is a wildcard, false if not
+ */
+ public static boolean isWildcard(String fieldName) {
+ return fieldName.endsWith(PARSER_WILDCARD);
+ }
+
+ /**
+ * The HTTPD parser formats fields using the format HTTP.URI:request.firstline.uri.query.
+ * For maps, we only want the last part of this, so this function returns the last bit of the
+ * field name.
+ * @param mapField The unformatted field name
+ * @return The last part of the field name
+ */
+ public static String getFieldNameFromMap(String mapField) {
+ return mapField.substring(mapField.lastIndexOf('.') + 1);
+ }
+
+}
diff --git a/contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json b/contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json
new file mode 100644
index 00000000000..145e9474699
--- /dev/null
+++ b/contrib/format-httpd/src/main/resources/bootstrap-format-plugins.json
@@ -0,0 +1,37 @@
+{
+ "storage":{
+ "dfs": {
+ "type": "file",
+ "formats": {
+ "httpd" : {
+ "type" : "httpd",
+ "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"",
+ "maxErrors": 0,
+ "flattenWildcards": false
+ }
+ }
+ },
+ "cp": {
+ "type": "file",
+ "formats": {
+ "httpd" : {
+ "type" : "httpd",
+ "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"",
+ "maxErrors": 0,
+ "flattenWildcards": false
+ }
+ }
+ },
+ "s3": {
+ "type": "file",
+ "formats": {
+ "httpd" : {
+ "type" : "httpd",
+ "logFormat" : "%h %l %u %t \"%r\" %s %b \"%{Referer}i\" \"%{User-agent}i\"",
+ "maxErrors": 0,
+ "flattenWildcards": false
+ }
+ }
+ }
+ }
+}
diff --git a/contrib/format-httpd/src/main/resources/drill-module.conf b/contrib/format-httpd/src/main/resources/drill-module.conf
new file mode 100644
index 00000000000..6236c500159
--- /dev/null
+++ b/contrib/format-httpd/src/main/resources/drill-module.conf
@@ -0,0 +1,23 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# This file tells Drill to consider this module when class path scanning.
+# This file can also include any supplementary configuration information.
+# This file is in HOCON format, see https://github.com/typesafehub/config/blob/master/HOCON.md for more information.
+
+drill.classpath.scanning.packages += "org.apache.drill.exec.store.httpd"
diff --git a/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java b/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java
new file mode 100644
index 00000000000..2dd97fa3630
--- /dev/null
+++ b/contrib/format-httpd/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java
@@ -0,0 +1,583 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.httpd;
+
+import org.apache.drill.categories.RowSetTests;
+import org.apache.drill.common.exceptions.DrillRuntimeException;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.rpc.RpcException;
+import org.apache.drill.test.ClusterFixture;
+import org.apache.drill.test.ClusterTest;
+import org.apache.drill.test.rowSet.RowSetComparison;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.joda.time.LocalDate;
+import org.joda.time.LocalTime;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.experimental.categories.Category;
+import java.nio.file.Paths;
+import static org.apache.drill.test.QueryTestUtil.generateCompressedFile;
+import static org.junit.Assert.assertEquals;
+import static org.apache.drill.test.rowSet.RowSetUtilities.mapArray;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+
+@Category(RowSetTests.class)
+public class TestHTTPDLogReader extends ClusterTest {
+
+ @BeforeClass
+ public static void setup() throws Exception {
+ ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
+
+ // Needed for compressed file unit test
+ dirTestWatcher.copyResourceToRoot(Paths.get("httpd/"));
+ }
+
+ @Test
+ public void testDateField() throws RpcException {
+ String sql = "SELECT `request_receive_time` FROM cp.`httpd/hackers-access-small.httpd` LIMIT 5";
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("request_receive_time", MinorType.TIMESTAMP)
+ .build();
+ RowSet expected = client.rowSetBuilder(expectedSchema)
+ .addRow(1445742685000L)
+ .addRow(1445742686000L)
+ .addRow(1445742687000L)
+ .addRow(1445743471000L)
+ .addRow(1445743472000L)
+ .build();
+
+ RowSetUtilities.verify(expected, results);
+ }
+
+ @Test
+ public void testDateEpochField() throws RpcException {
+ String sql = "SELECT `request_receive_time`, `request_receive_time_epoch` FROM cp.`httpd/hackers-access-small.httpd` LIMIT 5";
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("request_receive_time", MinorType.TIMESTAMP)
+ .addNullable("request_receive_time_epoch", MinorType.TIMESTAMP)
+ .build();
+ RowSet expected = client.rowSetBuilder(expectedSchema)
+ .addRow(1445742685000L, 1445742685000L)
+ .addRow(1445742686000L, 1445742686000L)
+ .addRow(1445742687000L, 1445742687000L )
+ .addRow(1445743471000L, 1445743471000L)
+ .addRow(1445743472000L, 1445743472000L)
+ .build();
+
+ RowSetUtilities.verify(expected, results);
+ }
+
+ @Test
+ public void testCount() throws Exception {
+ String sql = "SELECT COUNT(*) FROM cp.`httpd/hackers-access-small.httpd`";
+ long result = client.queryBuilder().sql(sql).singletonLong();
+ assertEquals(10L, result);
+ }
+
+ @Test
+ public void testSerDe() throws Exception {
+ String sql = "SELECT COUNT(*) AS cnt FROM cp.`httpd/hackers-access-small.httpd`";
+ String plan = queryBuilder().sql(sql).explainJson();
+ long cnt = queryBuilder().physical(plan).singletonLong();
+ assertEquals("Counts should match",10L, cnt);
+ }
+
+ @Test
+ public void testFlattenMap() throws Exception {
+ String sql = "SELECT request_firstline_original_uri_query_came__from " +
+ "FROM table(cp.`httpd/hackers-access-small.httpd` (type => 'httpd', logFormat => '%h %l %u %t \\\"%r\\\" %s %b \\\"%{Referer}i\\\" " +
+ "\\\"%{User-agent}i\\\"', " +
+ "flattenWildcards => true)) WHERE `request_firstline_original_uri_query_came__from` IS NOT NULL";
+
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("request_firstline_original_uri_query_came__from", MinorType.VARCHAR)
+ .build();
+
+ RowSet expected = client.rowSetBuilder(expectedSchema)
+ .addRow("http://howto.basjes.nl/join_form")
+ .build();
+
+ new RowSetComparison(expected).verifyAndClearAll(results);
+ }
+
+
+ @Test
+ public void testLimitPushdown() throws Exception {
+ String sql = "SELECT * FROM cp.`httpd/hackers-access-small.httpd` LIMIT 5";
+
+ queryBuilder()
+ .sql(sql)
+ .planMatcher()
+ .include("Limit", "maxRecords=5")
+ .match();
+ }
+
+ @Test
+ public void testMapField() throws Exception {
+ String sql = "SELECT data.`request_firstline_original_uri_query_$`.aqb AS aqb, data.`request_firstline_original_uri_query_$`.t AS data_time " +
+ "FROM cp.`httpd/example1.httpd` AS data";
+
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("aqb", MinorType.VARCHAR)
+ .addNullable("data_time", MinorType.VARCHAR)
+ .build();
+
+ RowSet expected = client.rowSetBuilder(expectedSchema)
+ .addRow("1", "19/5/2012 23:51:27 2 -120")
+ .build();
+
+ new RowSetComparison(expected).verifyAndClearAll(results);
+ }
+
+ @Test
+ public void testSingleExplicitColumn() throws Exception {
+ String sql = "SELECT request_referer FROM cp.`httpd/hackers-access-small.httpd`";
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("request_referer", MinorType.VARCHAR)
+ .build();
+
+ RowSet expected = client.rowSetBuilder(expectedSchema)
+ .addRow("http://howto.basjes.nl/")
+ .addRow("http://howto.basjes.nl/")
+ .addRow("http://howto.basjes.nl/join_form")
+ .addRow("http://howto.basjes.nl/")
+ .addRow("http://howto.basjes.nl/join_form")
+ .addRow("http://howto.basjes.nl/join_form")
+ .addRow("http://howto.basjes.nl/")
+ .addRow("http://howto.basjes.nl/login_form")
+ .addRow("http://howto.basjes.nl/")
+ .addRow("http://howto.basjes.nl/")
+ .build();
+
+ assertEquals(results.rowCount(), 10);
+ new RowSetComparison(expected).verifyAndClearAll(results);
+ }
+
+
+ @Test
+ public void testImplicitColumn() throws Exception {
+ String sql = "SELECT _raw FROM cp.`httpd/hackers-access-small.httpd`";
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("_raw", MinorType.VARCHAR)
+ .build();
+
+ RowSet expected = client.rowSetBuilder(expectedSchema)
+ .addRow("195.154.46.135 - - [25/Oct/2015:04:11:25 +0100] \"GET /linux/doing-pxe-without-dhcp-control HTTP/1.1\" 200 24323 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0\"")
+ .addRow("23.95.237.180 - - [25/Oct/2015:04:11:26 +0100] \"GET /join_form HTTP/1.0\" 200 11114 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0\"")
+ .addRow("23.95.237.180 - - [25/Oct/2015:04:11:27 +0100] \"POST /join_form HTTP/1.1\" 302 9093 \"http://howto.basjes.nl/join_form\" \"Mozilla/5.0 (Windows NT 5.1; rv:35.0) " +
+ "Gecko/20100101 Firefox/35.0\"")
+ .addRow("158.222.5.157 - - [25/Oct/2015:04:24:31 +0100] \"GET /join_form HTTP/1.0\" 200 11114 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21\"")
+ .addRow("158.222.5.157 - - [25/Oct/2015:04:24:32 +0100] \"POST /join_form HTTP/1.1\" 302 9093 \"http://howto.basjes.nl/join_form\" \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21\"")
+ .addRow("158.222.5.157 - - [25/Oct/2015:04:24:37 +0100] \"GET /acl_users/credentials_cookie_auth/require_login?came_from=http%3A//howto.basjes.nl/join_form HTTP/1.1\" 200 10716 \"http://howto.basjes.nl/join_form\" \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21\"")
+ .addRow("158.222.5.157 - - [25/Oct/2015:04:24:39 +0100] \"GET /login_form HTTP/1.1\" 200 10543 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21\"")
+ .addRow("158.222.5.157 - - [25/Oct/2015:04:24:41 +0100] \"POST /login_form HTTP/1.1\" 200 16810 \"http://howto.basjes.nl/login_form\" \"Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21\"")
+ .addRow("5.39.5.5 - - [25/Oct/2015:04:32:22 +0100] \"GET /join_form HTTP/1.1\" 200 11114 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 5.1; rv:34.0) Gecko/20100101 Firefox/34.0\"")
+ .addRow("180.180.64.16 - - [25/Oct/2015:04:34:37 +0100] \"GET /linux/doing-pxe-without-dhcp-control HTTP/1.1\" 200 24323 \"http://howto.basjes.nl/\" \"Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0\"")
+ .build();
+
+ new RowSetComparison(expected).verifyAndClearAll(results);
+ }
+
+ @Test
+ public void testExplicitSomeQuery() throws Exception {
+ String sql = "SELECT request_referer_ref, request_receive_time_last_time, request_firstline_uri_protocol FROM cp.`httpd/hackers-access-small.httpd`";
+
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("request_referer_ref", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_time", MinorType.TIME)
+ .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR)
+ .buildSchema();
+
+ RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+ .addRow(null, new LocalTime("04:11:25"), null)
+ .addRow(null, new LocalTime("04:11:26"), null)
+ .addRow(null, new LocalTime("04:11:27"), null)
+ .addRow(null, new LocalTime("04:24:31"), null)
+ .addRow(null, new LocalTime("04:24:32"), null)
+ .addRow(null, new LocalTime("04:24:37"), null)
+ .addRow(null, new LocalTime("04:24:39"), null)
+ .addRow(null, new LocalTime("04:24:41"), null)
+ .addRow(null, new LocalTime("04:32:22"), null)
+ .addRow(null, new LocalTime("04:34:37"), null)
+ .build();
+
+ new RowSetComparison(expected).verifyAndClearAll(results);
+ }
+
+
+ @Test
+ public void testExplicitSomeQueryWithCompressedFile() throws Exception {
+ generateCompressedFile("httpd/hackers-access-small.httpd", "zip", "httpd/hackers-access-small.httpd.zip" );
+
+ String sql = "SELECT request_referer_ref, request_receive_time_last_time, request_firstline_uri_protocol FROM dfs.`httpd/hackers-access-small.httpd.zip`";
+
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("request_referer_ref", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_time", MinorType.TIME)
+ .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR)
+ .buildSchema();
+
+ RowSet expected = new RowSetBuilder(client.allocator(), expectedSchema)
+ .addRow(null, new LocalTime("04:11:25"), null)
+ .addRow(null, new LocalTime("04:11:26"), null)
+ .addRow(null, new LocalTime("04:11:27"), null)
+ .addRow(null, new LocalTime("04:24:31"), null)
+ .addRow(null, new LocalTime("04:24:32"), null)
+ .addRow(null, new LocalTime("04:24:37"), null)
+ .addRow(null, new LocalTime("04:24:39"), null)
+ .addRow(null, new LocalTime("04:24:41"), null)
+ .addRow(null, new LocalTime("04:32:22"), null)
+ .addRow(null, new LocalTime("04:34:37"), null)
+ .build();
+
+ new RowSetComparison(expected).verifyAndClearAll(results);
+ }
+
+ @Test
+ public void testStarRowSet() throws Exception {
+ String sql = "SELECT * FROM cp.`httpd/hackers-access-really-small.httpd`";
+
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("request_referer_ref", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_time", MinorType.TIME)
+ .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR)
+ .addNullable("request_receive_time_microsecond", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_microsecond__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_original_protocol", MinorType.VARCHAR)
+ .addNullable("request_firstline_original_uri_host", MinorType.VARCHAR)
+ .addNullable("request_referer_host", MinorType.VARCHAR)
+ .addNullable("request_receive_time_month__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_minute", MinorType.BIGINT)
+ .addNullable("request_firstline_protocol_version", MinorType.VARCHAR)
+ .addNullable("request_receive_time_time__utc", MinorType.TIME)
+ .addNullable("request_referer_last_ref", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_timezone", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_weekofweekyear", MinorType.BIGINT)
+ .addNullable("request_referer_last", MinorType.VARCHAR)
+ .addNullable("request_receive_time_minute", MinorType.BIGINT)
+ .addNullable("connection_client_host_last", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_millisecond__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_original_uri", MinorType.VARCHAR)
+ .addNullable("request_firstline", MinorType.VARCHAR)
+ .addNullable("request_receive_time_nanosecond", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_millisecond", MinorType.BIGINT)
+ .addNullable("request_receive_time_day", MinorType.BIGINT)
+ .addNullable("request_referer_port", MinorType.BIGINT)
+ .addNullable("request_firstline_original_uri_port", MinorType.BIGINT)
+ .addNullable("request_receive_time_year", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_date", MinorType.DATE)
+ .addNullable("request_receive_time_last_time__utc", MinorType.TIME)
+ .addNullable("request_receive_time_last_hour__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_original_protocol_version", MinorType.VARCHAR)
+ .addNullable("request_firstline_original_method", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_year__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_uri", MinorType.VARCHAR)
+ .addNullable("request_referer_last_host", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_minute__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_weekofweekyear", MinorType.BIGINT)
+ .addNullable("request_firstline_uri_userinfo", MinorType.VARCHAR)
+ .addNullable("request_receive_time_epoch", MinorType.TIMESTAMP)
+ .addNullable("connection_client_logname", MinorType.BIGINT)
+ .addNullable("response_body_bytes", MinorType.BIGINT)
+ .addNullable("request_receive_time_nanosecond__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_protocol", MinorType.VARCHAR)
+ .addNullable("request_receive_time_microsecond__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_hour", MinorType.BIGINT)
+ .addNullable("request_firstline_uri_host", MinorType.VARCHAR)
+ .addNullable("request_referer_last_port", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_epoch", MinorType.TIMESTAMP)
+ .addNullable("request_receive_time_last_weekyear__utc", MinorType.BIGINT)
+ .addNullable("request_user-agent", MinorType.VARCHAR)
+ .addNullable("request_receive_time_weekyear", MinorType.BIGINT)
+ .addNullable("request_receive_time_timezone", MinorType.VARCHAR)
+ .addNullable("response_body_bytesclf", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_date__utc", MinorType.DATE)
+ .addNullable("request_receive_time_millisecond__utc", MinorType.BIGINT)
+ .addNullable("request_referer_last_protocol", MinorType.VARCHAR)
+ .addNullable("request_firstline_uri_query", MinorType.VARCHAR)
+ .addNullable("request_receive_time_minute__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_original_uri_protocol", MinorType.VARCHAR)
+ .addNullable("request_referer_query", MinorType.VARCHAR)
+ .addNullable("request_receive_time_date", MinorType.DATE)
+ .addNullable("request_firstline_uri_port", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_second__utc", MinorType.BIGINT)
+ .addNullable("request_referer_last_userinfo", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_second", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_monthname__utc", MinorType.VARCHAR)
+ .addNullable("request_firstline_method", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_month__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_millisecond", MinorType.BIGINT)
+ .addNullable("request_receive_time_day__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_year__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_weekofweekyear__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_second", MinorType.BIGINT)
+ .addNullable("request_firstline_original_uri_ref", MinorType.VARCHAR)
+ .addNullable("connection_client_logname_last", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_year", MinorType.BIGINT)
+ .addNullable("request_firstline_original_uri_path", MinorType.VARCHAR)
+ .addNullable("connection_client_host", MinorType.VARCHAR)
+ .addNullable("request_firstline_original_uri_query", MinorType.VARCHAR)
+ .addNullable("request_referer_userinfo", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_monthname", MinorType.VARCHAR)
+ .addNullable("request_referer_path", MinorType.VARCHAR)
+ .addNullable("request_receive_time_monthname", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_month", MinorType.BIGINT)
+ .addNullable("request_referer_last_query", MinorType.VARCHAR)
+ .addNullable("request_firstline_uri_ref", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_day", MinorType.BIGINT)
+ .addNullable("request_receive_time_time", MinorType.TIME)
+ .addNullable("request_status_original", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_weekofweekyear__utc", MinorType.BIGINT)
+ .addNullable("request_user-agent_last", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_weekyear", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_microsecond", MinorType.BIGINT)
+ .addNullable("request_firstline_original", MinorType.VARCHAR)
+ .addNullable("request_status", MinorType.VARCHAR)
+ .addNullable("request_referer_last_path", MinorType.VARCHAR)
+ .addNullable("request_receive_time_month", MinorType.BIGINT)
+ .addNullable("request_referer", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_day__utc", MinorType.BIGINT)
+ .addNullable("request_referer_protocol", MinorType.VARCHAR)
+ .addNullable("request_receive_time_monthname__utc", MinorType.VARCHAR)
+ .addNullable("response_body_bytes_last", MinorType.BIGINT)
+ .addNullable("request_receive_time", MinorType.TIMESTAMP)
+ .addNullable("request_receive_time_last_nanosecond", MinorType.BIGINT)
+ .addNullable("request_firstline_uri_path", MinorType.VARCHAR)
+ .addNullable("request_firstline_original_uri_userinfo", MinorType.VARCHAR)
+ .addNullable("request_receive_time_date__utc", MinorType.DATE)
+ .addNullable("request_receive_time_last", MinorType.TIMESTAMP)
+ .addNullable("request_receive_time_last_nanosecond__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_hour", MinorType.BIGINT)
+ .addNullable("request_receive_time_hour__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_second__utc", MinorType.BIGINT)
+ .addNullable("connection_client_user_last", MinorType.VARCHAR)
+ .addNullable("request_receive_time_weekyear__utc", MinorType.BIGINT)
+ .addNullable("connection_client_user", MinorType.VARCHAR)
+ .add("request_firstline_original_uri_query_$", MinorType.MAP)
+ .add("request_referer_query_$", MinorType.MAP)
+ .add("request_referer_last_query_$", MinorType.MAP)
+ .add("request_firstline_uri_query_$", MinorType.MAP)
+ .build();
+
+ RowSet expected = client.rowSetBuilder(expectedSchema)
+ .addRow(null, new LocalTime("04:11:25"), null, 0, 0, "HTTP", null, "howto.basjes.nl", 10, 11, "1.1", new LocalTime("03:11:25"), null, "+01:00", 43, "http://howto.basjes" +
+ ".nl/",
+ 11, "195.154.46.135", 0,
+ "/linux/doing-pxe-without-dhcp-control", "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1", 0, 0, 25, null, null, 2015, new LocalDate("2015-10-25"), new LocalTime("03" +
+ ":11:25"),
+ 3, "1" +
+ ".1", "GET",
+ 2015, "/linux/doing-pxe-without-dhcp-control", "howto.basjes.nl", 11, 43, null, 1445742685000L, null, 24323, 0, "HTTP", 0, 4, null, null, 1445742685000L, 2015, "Mozilla" +
+ "/5" +
+ ".0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, "+01:00", 24323, new LocalDate("2015-10-25"), 0, "http", null, 11, null, null, new LocalDate("2015-10" +
+ "-25"), null, 25,
+ null, 25,
+ "October", "GET", 10, 0, 25, 2015, 43, 25, null, null, 2015, "/linux/doing-pxe-without-dhcp-control", "195.154.46.135", null, null, "October", "/", "October", 10, null,
+ null, 25, new LocalTime("04:11:25"), "200", 43, "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, 0, "GET /linux/doing-pxe-without-dhcp-control " +
+ "HTTP/1.1", "200", "/",
+ 10, "http://howto.basjes.nl/", 25, "http", "October", 24323, 1445742685000L, 0, "/linux/doing-pxe-without-dhcp-control", null, new LocalDate("2015-10-25"), 1445742685000L,
+ 0, 4, 3, 25, null, 2015, null, mapArray(), mapArray(), mapArray(), mapArray())
+ .build();
+
+ new RowSetComparison(expected).verifyAndClearAll(results);
+ }
+
+ @Test
+ public void testExplicitAllFields() throws Exception {
+ String sql = "SELECT `request_referer_ref`, `request_receive_time_last_time`, `request_firstline_uri_protocol`, `request_receive_time_microsecond`, `request_receive_time_last_microsecond__utc`, `request_firstline_original_protocol`, `request_firstline_original_uri_host`, `request_referer_host`, `request_receive_time_month__utc`, `request_receive_time_last_minute`, `request_firstline_protocol_version`, `request_receive_time_time__utc`, `request_referer_last_ref`, `request_receive_time_last_timezone`, `request_receive_time_last_weekofweekyear`, `request_referer_last`, `request_receive_time_minute`, `connection_client_host_last`, `request_receive_time_last_millisecond__utc`, `request_firstline_original_uri`, `request_firstline`, `request_receive_time_nanosecond`, `request_receive_time_last_millisecond`, `request_receive_time_day`, `request_referer_port`, `request_firstline_original_uri_port`, `request_receive_time_year`, `request_receive_time_last_date`, `request_receive_time_last_time__utc`, `request_receive_time_last_hour__utc`, `request_firstline_original_protocol_version`, `request_firstline_original_method`, `request_receive_time_last_year__utc`, `request_firstline_uri`, `request_referer_last_host`, `request_receive_time_last_minute__utc`, `request_receive_time_weekofweekyear`, `request_firstline_uri_userinfo`, `request_receive_time_epoch`, `connection_client_logname`, `response_body_bytes`, `request_receive_time_nanosecond__utc`, `request_firstline_protocol`, `request_receive_time_microsecond__utc`, `request_receive_time_hour`, `request_firstline_uri_host`, `request_referer_last_port`, `request_receive_time_last_epoch`, `request_receive_time_last_weekyear__utc`, `request_user-agent`, `request_receive_time_weekyear`, `request_receive_time_timezone`, `response_body_bytesclf`, `request_receive_time_last_date__utc`, `request_receive_time_millisecond__utc`, `request_referer_last_protocol`, `request_firstline_uri_query`, `request_receive_time_minute__utc`, `request_firstline_original_uri_protocol`, `request_referer_query`, `request_receive_time_date`, `request_firstline_uri_port`, `request_receive_time_last_second__utc`, `request_referer_last_userinfo`, `request_receive_time_last_second`, `request_receive_time_last_monthname__utc`, `request_firstline_method`, `request_receive_time_last_month__utc`, `request_receive_time_millisecond`, `request_receive_time_day__utc`, `request_receive_time_year__utc`, `request_receive_time_weekofweekyear__utc`, `request_receive_time_second`, `request_firstline_original_uri_ref`, `connection_client_logname_last`, `request_receive_time_last_year`, `request_firstline_original_uri_path`, `connection_client_host`, `request_firstline_original_uri_query`, `request_referer_userinfo`, `request_receive_time_last_monthname`, `request_referer_path`, `request_receive_time_monthname`, `request_receive_time_last_month`, `request_referer_last_query`, `request_firstline_uri_ref`, `request_receive_time_last_day`, `request_receive_time_time`, `request_status_original`, `request_receive_time_last_weekofweekyear__utc`, `request_user-agent_last`, `request_receive_time_last_weekyear`, `request_receive_time_last_microsecond`, `request_firstline_original`, `request_status`, `request_referer_last_path`, `request_receive_time_month`, `request_receive_time_last_day__utc`, `request_referer`, `request_referer_protocol`, `request_receive_time_monthname__utc`, `response_body_bytes_last`, `request_receive_time`, `request_receive_time_last_nanosecond`, `request_firstline_uri_path`, `request_firstline_original_uri_userinfo`, `request_receive_time_date__utc`, `request_receive_time_last`, `request_receive_time_last_nanosecond__utc`, `request_receive_time_last_hour`, `request_receive_time_hour__utc`, `request_receive_time_second__utc`, `connection_client_user_last`, `request_receive_time_weekyear__utc`, `connection_client_user`, `request_firstline_original_uri_query_$`, `request_referer_query_$`, `request_referer_last_query_$`, `request_firstline_uri_query_$` FROM cp.`httpd/hackers-access-really-small.httpd`";
+
+ RowSet results = client.queryBuilder().sql(sql).rowSet();
+
+ TupleMetadata expectedSchema = new SchemaBuilder()
+ .addNullable("request_referer_ref", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_time", MinorType.TIME)
+ .addNullable("request_firstline_uri_protocol", MinorType.VARCHAR)
+ .addNullable("request_receive_time_microsecond", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_microsecond__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_original_protocol", MinorType.VARCHAR)
+ .addNullable("request_firstline_original_uri_host", MinorType.VARCHAR)
+ .addNullable("request_referer_host", MinorType.VARCHAR)
+ .addNullable("request_receive_time_month__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_minute", MinorType.BIGINT)
+ .addNullable("request_firstline_protocol_version", MinorType.VARCHAR)
+ .addNullable("request_receive_time_time__utc", MinorType.TIME)
+ .addNullable("request_referer_last_ref", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_timezone", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_weekofweekyear", MinorType.BIGINT)
+ .addNullable("request_referer_last", MinorType.VARCHAR)
+ .addNullable("request_receive_time_minute", MinorType.BIGINT)
+ .addNullable("connection_client_host_last", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_millisecond__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_original_uri", MinorType.VARCHAR)
+ .addNullable("request_firstline", MinorType.VARCHAR)
+ .addNullable("request_receive_time_nanosecond", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_millisecond", MinorType.BIGINT)
+ .addNullable("request_receive_time_day", MinorType.BIGINT)
+ .addNullable("request_referer_port", MinorType.BIGINT)
+ .addNullable("request_firstline_original_uri_port", MinorType.BIGINT)
+ .addNullable("request_receive_time_year", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_date", MinorType.DATE)
+ .addNullable("request_receive_time_last_time__utc", MinorType.TIME)
+ .addNullable("request_receive_time_last_hour__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_original_protocol_version", MinorType.VARCHAR)
+ .addNullable("request_firstline_original_method", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_year__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_uri", MinorType.VARCHAR)
+ .addNullable("request_referer_last_host", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_minute__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_weekofweekyear", MinorType.BIGINT)
+ .addNullable("request_firstline_uri_userinfo", MinorType.VARCHAR)
+ .addNullable("request_receive_time_epoch", MinorType.TIMESTAMP)
+ .addNullable("connection_client_logname", MinorType.BIGINT)
+ .addNullable("response_body_bytes", MinorType.BIGINT)
+ .addNullable("request_receive_time_nanosecond__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_protocol", MinorType.VARCHAR)
+ .addNullable("request_receive_time_microsecond__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_hour", MinorType.BIGINT)
+ .addNullable("request_firstline_uri_host", MinorType.VARCHAR)
+ .addNullable("request_referer_last_port", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_epoch", MinorType.TIMESTAMP)
+ .addNullable("request_receive_time_last_weekyear__utc", MinorType.BIGINT)
+ .addNullable("request_user-agent", MinorType.VARCHAR)
+ .addNullable("request_receive_time_weekyear", MinorType.BIGINT)
+ .addNullable("request_receive_time_timezone", MinorType.VARCHAR)
+ .addNullable("response_body_bytesclf", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_date__utc", MinorType.DATE)
+ .addNullable("request_receive_time_millisecond__utc", MinorType.BIGINT)
+ .addNullable("request_referer_last_protocol", MinorType.VARCHAR)
+ .addNullable("request_firstline_uri_query", MinorType.VARCHAR)
+ .addNullable("request_receive_time_minute__utc", MinorType.BIGINT)
+ .addNullable("request_firstline_original_uri_protocol", MinorType.VARCHAR)
+ .addNullable("request_referer_query", MinorType.VARCHAR)
+ .addNullable("request_receive_time_date", MinorType.DATE)
+ .addNullable("request_firstline_uri_port", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_second__utc", MinorType.BIGINT)
+ .addNullable("request_referer_last_userinfo", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_second", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_monthname__utc", MinorType.VARCHAR)
+ .addNullable("request_firstline_method", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_month__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_millisecond", MinorType.BIGINT)
+ .addNullable("request_receive_time_day__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_year__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_weekofweekyear__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_second", MinorType.BIGINT)
+ .addNullable("request_firstline_original_uri_ref", MinorType.VARCHAR)
+ .addNullable("connection_client_logname_last", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_year", MinorType.BIGINT)
+ .addNullable("request_firstline_original_uri_path", MinorType.VARCHAR)
+ .addNullable("connection_client_host", MinorType.VARCHAR)
+ .addNullable("request_firstline_original_uri_query", MinorType.VARCHAR)
+ .addNullable("request_referer_userinfo", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_monthname", MinorType.VARCHAR)
+ .addNullable("request_referer_path", MinorType.VARCHAR)
+ .addNullable("request_receive_time_monthname", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_month", MinorType.BIGINT)
+ .addNullable("request_referer_last_query", MinorType.VARCHAR)
+ .addNullable("request_firstline_uri_ref", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_day", MinorType.BIGINT)
+ .addNullable("request_receive_time_time", MinorType.TIME)
+ .addNullable("request_status_original", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_weekofweekyear__utc", MinorType.BIGINT)
+ .addNullable("request_user-agent_last", MinorType.VARCHAR)
+ .addNullable("request_receive_time_last_weekyear", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_microsecond", MinorType.BIGINT)
+ .addNullable("request_firstline_original", MinorType.VARCHAR)
+ .addNullable("request_status", MinorType.VARCHAR)
+ .addNullable("request_referer_last_path", MinorType.VARCHAR)
+ .addNullable("request_receive_time_month", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_day__utc", MinorType.BIGINT)
+ .addNullable("request_referer", MinorType.VARCHAR)
+ .addNullable("request_referer_protocol", MinorType.VARCHAR)
+ .addNullable("request_receive_time_monthname__utc", MinorType.VARCHAR)
+ .addNullable("response_body_bytes_last", MinorType.BIGINT)
+ .addNullable("request_receive_time", MinorType.TIMESTAMP)
+ .addNullable("request_receive_time_last_nanosecond", MinorType.BIGINT)
+ .addNullable("request_firstline_uri_path", MinorType.VARCHAR)
+ .addNullable("request_firstline_original_uri_userinfo", MinorType.VARCHAR)
+ .addNullable("request_receive_time_date__utc", MinorType.DATE)
+ .addNullable("request_receive_time_last", MinorType.TIMESTAMP)
+ .addNullable("request_receive_time_last_nanosecond__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_last_hour", MinorType.BIGINT)
+ .addNullable("request_receive_time_hour__utc", MinorType.BIGINT)
+ .addNullable("request_receive_time_second__utc", MinorType.BIGINT)
+ .addNullable("connection_client_user_last", MinorType.VARCHAR)
+ .addNullable("request_receive_time_weekyear__utc", MinorType.BIGINT)
+ .addNullable("connection_client_user", MinorType.VARCHAR)
+ .add("request_firstline_original_uri_query_$", MinorType.MAP)
+ .add("request_referer_query_$", MinorType.MAP)
+ .add("request_referer_last_query_$", MinorType.MAP)
+ .add("request_firstline_uri_query_$", MinorType.MAP)
+ .build();
+
+ RowSet expected = client.rowSetBuilder(expectedSchema)
+ .addRow(null, new LocalTime("04:11:25"), null, 0, 0, "HTTP", null, "howto.basjes.nl", 10, 11, "1.1", new LocalTime("03:11:25"), null, "+01:00", 43, "http://howto.basjes.nl/",
+ 11, "195.154.46.135", 0,
+ "/linux/doing-pxe-without-dhcp-control", "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1", 0, 0, 25, null, null, 2015, new LocalDate("2015-10-25"), new LocalTime("03" +
+ ":11:25"), 3, "1" +
+ ".1", "GET",
+ 2015, "/linux/doing-pxe-without-dhcp-control", "howto.basjes.nl", 11, 43, null, 1445742685000L, null, 24323, 0, "HTTP", 0, 4, null, null, 1445742685000L, 2015, "Mozilla" +
+ "/5" +
+ ".0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, "+01:00", 24323, new LocalDate("2015-10-25"), 0, "http", null, 11, null, null, new LocalDate("2015-10" +
+ "-25"), null, 25, null, 25,
+ "October", "GET", 10, 0, 25, 2015, 43, 25, null, null, 2015, "/linux/doing-pxe-without-dhcp-control", "195.154.46.135", null, null, "October", "/", "October", 10, null,
+ null, 25, new LocalTime("04:11:25"), "200", 43, "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015, 0, "GET /linux/doing-pxe-without-dhcp-control " +
+ "HTTP/1.1", "200", "/",
+ 10, 25, "http://howto.basjes.nl/", "http", "October", 24323, 1445742685000L, 0, "/linux/doing-pxe-without-dhcp-control", null, new LocalDate("2015-10-25"), 1445742685000L,
+ 0, 4, 3, 25, null, 2015, null, mapArray(), mapArray(), mapArray(), mapArray())
+ .build();
+
+ new RowSetComparison(expected).verifyAndClearAll(results);
+ }
+
+ @Test
+ public void testInvalidFormat() throws Exception {
+ String sql = "SELECT * FROM cp.`httpd/dfs-bootstrap.httpd`";
+ try {
+ run(sql);
+ fail();
+ } catch (DrillRuntimeException e) {
+ assertTrue(e.getMessage().contains("Error reading HTTPD file "));
+ }
+ }
+}
diff --git a/exec/java-exec/src/test/resources/store/httpd/dfs-bootstrap.httpd b/contrib/format-httpd/src/test/resources/httpd/dfs-bootstrap.httpd
similarity index 100%
rename from exec/java-exec/src/test/resources/store/httpd/dfs-bootstrap.httpd
rename to contrib/format-httpd/src/test/resources/httpd/dfs-bootstrap.httpd
diff --git a/exec/java-exec/src/test/resources/store/httpd/example1.httpd b/contrib/format-httpd/src/test/resources/httpd/example1.httpd
similarity index 100%
rename from exec/java-exec/src/test/resources/store/httpd/example1.httpd
rename to contrib/format-httpd/src/test/resources/httpd/example1.httpd
diff --git a/contrib/format-httpd/src/test/resources/httpd/hackers-access-really-small.httpd b/contrib/format-httpd/src/test/resources/httpd/hackers-access-really-small.httpd
new file mode 100644
index 00000000000..decb3c2ee40
--- /dev/null
+++ b/contrib/format-httpd/src/test/resources/httpd/hackers-access-really-small.httpd
@@ -0,0 +1 @@
+195.154.46.135 - - [25/Oct/2015:04:11:25 +0100] "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1" 200 24323 "http://howto.basjes.nl/" "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0"
diff --git a/exec/java-exec/src/test/resources/httpd/hackers-access-small.httpd b/contrib/format-httpd/src/test/resources/httpd/hackers-access-small.httpd
similarity index 100%
rename from exec/java-exec/src/test/resources/httpd/hackers-access-small.httpd
rename to contrib/format-httpd/src/test/resources/httpd/hackers-access-small.httpd
diff --git a/contrib/format-httpd/src/test/resources/logback-test.txt b/contrib/format-httpd/src/test/resources/logback-test.txt
new file mode 100644
index 00000000000..2adcf8105a2
--- /dev/null
+++ b/contrib/format-httpd/src/test/resources/logback-test.txt
@@ -0,0 +1,65 @@
+
+
+
+
+
+
+ true
+ 10000
+ true
+ ${LILITH_HOSTNAME:-localhost}
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ %d{HH:mm:ss.SSS} [%thread] %-5level %logger{36} - %msg%n
+
+
+
+
+
+
+
+
+
+
\ No newline at end of file
diff --git a/contrib/pom.xml b/contrib/pom.xml
index 22393e0f4e2..f5f60eeb571 100644
--- a/contrib/pom.xml
+++ b/contrib/pom.xml
@@ -46,6 +46,7 @@
format-syslog
format-ltsv
format-excel
+ format-httpd
format-esri
format-hdf5
format-spss
diff --git a/contrib/udfs/pom.xml b/contrib/udfs/pom.xml
index f41d35bd77d..a22000544f8 100644
--- a/contrib/udfs/pom.xml
+++ b/contrib/udfs/pom.xml
@@ -66,7 +66,7 @@
nl.basjes.parse.useragent
yauaa
- 5.16
+ 5.19
diff --git a/distribution/pom.xml b/distribution/pom.xml
index 6a1b29f14dd..c6ebecbe061 100644
--- a/distribution/pom.xml
+++ b/distribution/pom.xml
@@ -342,6 +342,11 @@
drill-format-syslog
${project.version}
+
+ org.apache.drill.contrib
+ drill-format-httpd
+ ${project.version}
+
org.apache.drill.contrib
drill-format-hdf5
diff --git a/distribution/src/assemble/component.xml b/distribution/src/assemble/component.xml
index b9a2fce4912..2148fb8d588 100644
--- a/distribution/src/assemble/component.xml
+++ b/distribution/src/assemble/component.xml
@@ -46,6 +46,7 @@
org.apache.drill.contrib:drill-format-esri:jar
org.apache.drill.contrib:drill-format-hdf5:jar
org.apache.drill.contrib:drill-format-ltsv:jar
+ org.apache.drill.contrib:drill-format-httpd:jar
org.apache.drill.contrib:drill-format-excel:jar
org.apache.drill.contrib:drill-format-spss:jar
org.apache.drill.contrib:drill-jdbc-storage:jar
diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java
deleted file mode 100644
index 7bcb0a4d96a..00000000000
--- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogFormatPlugin.java
+++ /dev/null
@@ -1,252 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.drill.exec.store.httpd;
-
-import java.io.IOException;
-import java.util.HashMap;
-import java.util.List;
-
-import nl.basjes.parse.core.exceptions.DissectionFailure;
-import nl.basjes.parse.core.exceptions.InvalidDissectorException;
-import nl.basjes.parse.core.exceptions.MissingDissectorsException;
-
-import org.apache.drill.common.exceptions.ExecutionSetupException;
-import org.apache.drill.common.exceptions.UserException;
-import org.apache.drill.common.expression.SchemaPath;
-import org.apache.drill.common.logical.StoragePluginConfig;
-import org.apache.drill.exec.ExecConstants;
-import org.apache.drill.exec.ops.FragmentContext;
-import org.apache.drill.exec.ops.OperatorContext;
-import org.apache.drill.exec.physical.impl.OutputMutator;
-import org.apache.drill.exec.proto.UserBitShared.CoreOperatorType;
-import org.apache.drill.exec.planner.common.DrillStatsTable.TableStatistics;
-import org.apache.drill.exec.server.DrillbitContext;
-import org.apache.drill.exec.store.AbstractRecordReader;
-import org.apache.drill.exec.store.RecordWriter;
-import org.apache.drill.exec.store.dfs.DrillFileSystem;
-import org.apache.drill.exec.store.dfs.easy.EasyFormatPlugin;
-import org.apache.drill.exec.store.dfs.easy.EasyWriter;
-import org.apache.drill.exec.store.dfs.easy.FileWork;
-import org.apache.drill.exec.vector.complex.impl.VectorContainerWriter;
-import org.apache.drill.exec.vector.complex.writer.BaseWriter.ComplexWriter;
-import org.apache.hadoop.conf.Configuration;
-import org.apache.hadoop.fs.FileSystem;
-import org.apache.hadoop.fs.Path;
-import org.apache.hadoop.io.LongWritable;
-import org.apache.hadoop.io.Text;
-import org.apache.hadoop.mapred.FileSplit;
-import org.apache.hadoop.mapred.JobConf;
-import org.apache.hadoop.mapred.LineRecordReader;
-import org.apache.hadoop.mapred.Reporter;
-import org.apache.hadoop.mapred.TextInputFormat;
-import java.util.Collections;
-import java.util.Map;
-import org.apache.drill.exec.store.RecordReader;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-public class HttpdLogFormatPlugin extends EasyFormatPlugin {
- private static final Logger logger = LoggerFactory.getLogger(HttpdLogFormatPlugin.class);
-
- private static final String PLUGIN_EXTENSION = "httpd";
- private static final int VECTOR_MEMORY_ALLOCATION = 4095;
-
- public HttpdLogFormatPlugin(final String name, final DrillbitContext context, final Configuration fsConf,
- final StoragePluginConfig storageConfig, final HttpdLogFormatConfig formatConfig) {
-
- super(name, context, fsConf, storageConfig, formatConfig, true, false, true, true,
- Collections.singletonList(PLUGIN_EXTENSION), PLUGIN_EXTENSION);
- }
-
- @Override
- public boolean supportsStatistics() {
- return false;
- }
-
- @Override
- public TableStatistics readStatistics(FileSystem fs, Path statsTablePath) {
- throw new UnsupportedOperationException("unimplemented");
- }
-
- @Override
- public void writeStatistics(TableStatistics statistics, FileSystem fs, Path statsTablePath) {
- throw new UnsupportedOperationException("unimplemented");
- }
-
- /**
- * Reads httpd logs lines terminated with a newline character.
- */
- private class HttpdLogRecordReader extends AbstractRecordReader {
-
- private final DrillFileSystem fs;
- private final FileWork work;
- private final FragmentContext fragmentContext;
- private ComplexWriter writer;
- private HttpdParser parser;
- private LineRecordReader lineReader;
- private LongWritable lineNumber;
-
- public HttpdLogRecordReader(final FragmentContext context, final DrillFileSystem fs, final FileWork work, final List columns) {
- this.fs = fs;
- this.work = work;
- this.fragmentContext = context;
- setColumns(columns);
- }
-
- /**
- * The query fields passed in are formatted in a way that Drill requires.
- * Those must be cleaned up to work with the parser.
- *
- * @return Map with Drill field names as a key and Parser Field names as a
- * value
- */
- private Map makeParserFields() {
- Map fieldMapping = new HashMap<>();
- for (final SchemaPath sp : getColumns()) {
- String drillField = sp.getRootSegment().getPath();
- try {
- String parserField = HttpdParser.parserFormattedFieldName(drillField);
- fieldMapping.put(drillField, parserField);
- } catch (Exception e) {
- logger.info("Putting field: {} into map", drillField, e);
- }
- }
- return fieldMapping;
- }
-
- @Override
- public void setup(final OperatorContext context, final OutputMutator output) throws ExecutionSetupException {
- try {
- /*
- * Extract the list of field names for the parser to use if it is NOT a star query. If it is a star query just
- * pass through an empty map, because the parser is going to have to build all possibilities.
- */
- final Map fieldMapping = !isStarQuery() ? makeParserFields() : null;
- writer = new VectorContainerWriter(output);
-
- parser = new HttpdParser(writer.rootAsMap(), context.getManagedBuffer(),
- HttpdLogFormatPlugin.this.getConfig().getLogFormat(),
- HttpdLogFormatPlugin.this.getConfig().getTimestampFormat(),
- fieldMapping);
-
- final Path path = fs.makeQualified(work.getPath());
- FileSplit split = new FileSplit(path, work.getStart(), work.getLength(), new String[]{""});
- TextInputFormat inputFormat = new TextInputFormat();
- JobConf job = new JobConf(fs.getConf());
- job.setInt("io.file.buffer.size", fragmentContext.getConfig().getInt(ExecConstants.TEXT_LINE_READER_BUFFER_SIZE));
- job.setInputFormat(inputFormat.getClass());
- lineReader = (LineRecordReader) inputFormat.getRecordReader(split, job, Reporter.NULL);
- lineNumber = lineReader.createKey();
- } catch (NoSuchMethodException | MissingDissectorsException | InvalidDissectorException e) {
- throw handleAndGenerate("Failure creating HttpdParser", e);
- } catch (IOException e) {
- throw handleAndGenerate("Failure creating HttpdRecordReader", e);
- }
- }
-
- private RuntimeException handleAndGenerate(final String s, final Exception e) {
- throw UserException.dataReadError(e)
- .message(s + "\n%s", e.getMessage())
- .addContext("Path", work.getPath())
- .addContext("Split Start", work.getStart())
- .addContext("Split Length", work.getLength())
- .addContext("Local Line Number", lineNumber.get())
- .build(logger);
- }
-
- /**
- * This record reader is given a batch of records (lines) to read. Next acts upon a batch of records.
- *
- * @return Number of records in this batch.
- */
- @Override
- public int next() {
- try {
- final Text line = lineReader.createValue();
-
- writer.allocate();
- writer.reset();
-
- int recordCount = 0;
- while (recordCount < VECTOR_MEMORY_ALLOCATION && lineReader.next(lineNumber, line)) {
- writer.setPosition(recordCount);
- parser.parse(line.toString());
- recordCount++;
- }
- writer.setValueCount(recordCount);
-
- return recordCount;
- } catch (DissectionFailure | InvalidDissectorException | MissingDissectorsException | IOException e) {
- throw handleAndGenerate("Failure while parsing log record.", e);
- }
- }
-
- @Override
- public void close() throws Exception {
- try {
- if (lineReader != null) {
- lineReader.close();
- }
- } catch (IOException e) {
- logger.warn("Failure while closing Httpd reader.", e);
- }
- }
-
- @Override
- public String toString() {
- return "HttpdLogRecordReader[Path=" + work.getPath()
- + ", Start=" + work.getStart()
- + ", Length=" + work.getLength()
- + ", Line=" + lineNumber.get()
- + "]";
- }
- }
-
- /**
- * This plugin supports pushing project down into the parser. Only fields
- * specifically asked for within the configuration will be parsed. If no
- * fields are asked for then all possible fields will be returned.
- *
- * @return true
- */
- @Override
- public boolean supportsPushDown() {
- return true;
- }
-
- @Override
- public RecordReader getRecordReader(final FragmentContext context, final DrillFileSystem dfs,
- final FileWork fileWork, final List columns, final String userName) {
- return new HttpdLogRecordReader(context, dfs, fileWork, columns);
- }
-
- @Override
- public RecordWriter getRecordWriter(final FragmentContext context, final EasyWriter writer) {
- throw new UnsupportedOperationException("Drill doesn't currently support writing HTTPd logs");
- }
-
- @Override
- public int getReaderOperatorType() {
- return CoreOperatorType.HTPPD_LOG_SUB_SCAN_VALUE;
- }
-
- @Override
- public int getWriterOperatorType() {
- throw new UnsupportedOperationException();
- }
-}
diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java
deleted file mode 100644
index 45c251de1fd..00000000000
--- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdLogRecord.java
+++ /dev/null
@@ -1,346 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.drill.exec.store.httpd;
-
-import org.apache.drill.shaded.guava.com.google.common.base.Charsets;
-import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
-import io.netty.buffer.DrillBuf;
-
-import java.util.EnumSet;
-import java.util.HashMap;
-import java.util.Map;
-
-import nl.basjes.parse.core.Casts;
-import nl.basjes.parse.core.Parser;
-import org.apache.drill.exec.vector.complex.writer.BaseWriter.MapWriter;
-import org.apache.drill.exec.vector.complex.writer.BigIntWriter;
-import org.apache.drill.exec.vector.complex.writer.Float8Writer;
-import org.apache.drill.exec.vector.complex.writer.VarCharWriter;
-import org.apache.drill.exec.vector.complex.writer.TimeStampWriter;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import java.text.SimpleDateFormat;
-import java.util.Date;
-
-public class HttpdLogRecord {
-
- private static final Logger logger = LoggerFactory.getLogger(HttpdLogRecord.class);
-
- private final Map strings = Maps.newHashMap();
- private final Map longs = Maps.newHashMap();
- private final Map doubles = Maps.newHashMap();
- private final Map times = new HashMap<>();
- private final Map wildcards = Maps.newHashMap();
- private final Map cleanExtensions = Maps.newHashMap();
- private final Map startedWildcards = Maps.newHashMap();
- private final Map wildcardWriters = Maps.newHashMap();
- private final SimpleDateFormat dateFormatter;
- private DrillBuf managedBuffer;
- private String timeFormat;
-
- public HttpdLogRecord(final DrillBuf managedBuffer, final String timeFormat) {
- this.managedBuffer = managedBuffer;
- this.timeFormat = timeFormat;
- this.dateFormatter = new SimpleDateFormat(this.timeFormat);
- }
-
- /**
- * Call this method after a record has been parsed. This finished the lifecycle of any maps that were written and
- * removes all the entries for the next record to be able to work.
- */
- public void finishRecord() {
- for (MapWriter writer : wildcardWriters.values()) {
- writer.end();
- }
- wildcardWriters.clear();
- startedWildcards.clear();
- }
-
- private DrillBuf buf(final int size) {
- if (managedBuffer.capacity() < size) {
- managedBuffer = managedBuffer.reallocIfNeeded(size);
- }
- return managedBuffer;
- }
-
- private void writeString(VarCharWriter writer, String value) {
- final byte[] stringBytes = value.getBytes(Charsets.UTF_8);
- final DrillBuf stringBuffer = buf(stringBytes.length);
- stringBuffer.clear();
- stringBuffer.writeBytes(stringBytes);
- writer.writeVarChar(0, stringBytes.length, stringBuffer);
- }
-
- /**
- * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
- * called when the value of a log field is a String data type.
- *
- * @param field name of field
- * @param value value of field
- */
- @SuppressWarnings("unused")
- public void set(String field, String value) {
- if (value != null) {
- final VarCharWriter w = strings.get(field);
- if (w != null) {
- logger.trace("Parsed field: {}, as string: {}", field, value);
- writeString(w, value);
- } else {
- logger.warn("No 'string' writer found for field: {}", field);
- }
- }
- }
-
- /**
- * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
- * called when the value of a log field is a Long data type.
- *
- * @param field name of field
- * @param value value of field
- */
- @SuppressWarnings("unused")
- public void set(String field, Long value) {
- if (value != null) {
- final BigIntWriter w = longs.get(field);
- if (w != null) {
- logger.trace("Parsed field: {}, as long: {}", field, value);
- w.writeBigInt(value);
- } else {
- logger.warn("No 'long' writer found for field: {}", field);
- }
- }
- }
-
- /**
- * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
- * called when the value of a log field is a timesstamp data type.
- *
- * @param field name of field
- * @param value value of field
- */
- @SuppressWarnings("unused")
- public void setTimestamp(String field, String value) {
- if (value != null) {
- //Convert the date string into a long
- long ts = 0;
- try {
- Date d = this.dateFormatter.parse(value);
- ts = d.getTime();
- } catch (Exception e) {
- //If the date formatter does not successfully create a date, the timestamp will fall back to zero
- //Do not throw exception
- }
- final TimeStampWriter tw = times.get(field);
- if (tw != null) {
- logger.trace("Parsed field: {}, as time: {}", field, value);
- tw.writeTimeStamp(ts);
- } else {
- logger.warn("No 'timestamp' writer found for field: {}", field);
- }
- }
- }
-
- /**
- * This method is referenced and called via reflection. This is added as a parsing target for the parser. It will get
- * called when the value of a log field is a Double data type.
- *
- * @param field name of field
- * @param value value of field
- */
- @SuppressWarnings("unused")
- public void set(String field, Double value) {
- if (value != null) {
- final Float8Writer w = doubles.get(field);
- if (w != null) {
- logger.trace("Parsed field: {}, as double: {}", field, value);
- w.writeFloat8(value);
- } else {
- logger.warn("No 'double' writer found for field: {}", field);
- }
- }
- }
-
- /**
- * This method is referenced and called via reflection. When the parser processes a field like:
- * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be
- * invoked.
- *
- * @param field name of field
- * @param value value of field
- */
- @SuppressWarnings("unused")
- public void setWildcard(String field, String value) {
- if (value != null) {
- final MapWriter mapWriter = getWildcardWriter(field);
- logger.trace("Parsed wildcard field: {}, as string: {}", field, value);
- final VarCharWriter w = mapWriter.varChar(cleanExtensions.get(field));
- writeString(w, value);
- }
- }
-
- /**
- * This method is referenced and called via reflection. When the parser processes a field like:
- * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be
- * invoked.
- *
- * @param field name of field
- * @param value value of field
- */
- @SuppressWarnings("unused")
- public void setWildcard(String field, Long value) {
- if (value != null) {
- final MapWriter mapWriter = getWildcardWriter(field);
- logger.trace("Parsed wildcard field: {}, as long: {}", field, value);
- final BigIntWriter w = mapWriter.bigInt(cleanExtensions.get(field));
- w.writeBigInt(value);
- }
- }
-
- /**
- * This method is referenced and called via reflection. When the parser processes a field like:
- * HTTP.URI:request.firstline.uri.query.* where star is an arbitrary field that the parser found this method will be
- * invoked.
- *
- * @param field name of field
- * @param value value of field
- */
- @SuppressWarnings("unused")
- public void setWildcard(String field, Double value) {
- if (value != null) {
- final MapWriter mapWriter = getWildcardWriter(field);
- logger.trace("Parsed wildcard field: {}, as double: {}", field, value);
- final Float8Writer w = mapWriter.float8(cleanExtensions.get(field));
- w.writeFloat8(value);
- }
- }
-
- /**
- * For a configuration like HTTP.URI:request.firstline.uri.query.*, a writer was created with name
- * HTTP.URI:request.firstline.uri.query, we traverse the list of wildcard writers to see which one is the root of the
- * name of the field passed in like HTTP.URI:request.firstline.uri.query.old. This is writer entry that is needed.
- *
- * @param field like HTTP.URI:request.firstline.uri.query.old where 'old' is one of many different parameter names.
- * @return the writer to be used for this field.
- */
- private MapWriter getWildcardWriter(String field) {
- MapWriter writer = startedWildcards.get(field);
- if (writer == null) {
- for (Map.Entry entry : wildcards.entrySet()) {
- final String root = entry.getKey();
- if (field.startsWith(root)) {
- writer = entry.getValue();
-
- /**
- * In order to save some time, store the cleaned version of the field extension. It is possible it will have
- * unsafe characters in it.
- */
- if (!cleanExtensions.containsKey(field)) {
- final String extension = field.substring(root.length() + 1);
- final String cleanExtension = HttpdParser.drillFormattedFieldName(extension);
- cleanExtensions.put(field, cleanExtension);
- logger.debug("Added extension: field='{}' with cleanExtension='{}'", field, cleanExtension);
- }
-
- /**
- * We already know we have the writer, but if we have put this writer in the started list, do NOT call start
- * again.
- */
- if (!wildcardWriters.containsKey(root)) {
- /**
- * Start and store this root map writer for later retrieval.
- */
- logger.debug("Starting new wildcard field writer: {}", field);
- writer.start();
- startedWildcards.put(field, writer);
- wildcardWriters.put(root, writer);
- }
-
- /**
- * Break out of the for loop when we find a root writer that matches the field.
- */
- break;
- }
- }
- }
-
- return writer;
- }
-
- public Map getStrings() {
- return strings;
- }
-
- public Map getLongs() {
- return longs;
- }
-
- public Map getDoubles() {
- return doubles;
- }
-
- public Map getTimes() {
- return times;
- }
-
- /**
- * This record will be used with a single parser. For each field that is to be parsed a setter will be called. It
- * registers a setter method for each field being parsed. It also builds the data writers to hold the data beings
- * parsed.
- *
- * @param parser
- * @param mapWriter
- * @param type
- * @param parserFieldName
- * @param drillFieldName
- * @throws NoSuchMethodException
- */
- public void addField(final Parser parser, final MapWriter mapWriter, final EnumSet type, final String parserFieldName, final String drillFieldName) throws NoSuchMethodException {
- final boolean hasWildcard = parserFieldName.endsWith(HttpdParser.PARSER_WILDCARD);
-
- /**
- * This is a dynamic way to map the setter for each specified field type.
- * e.g. a TIME.STAMP may map to a LONG while a referrer may map to a STRING
- */
- if (hasWildcard) {
- final String cleanName = parserFieldName.substring(0, parserFieldName.length() - HttpdParser.PARSER_WILDCARD.length());
- logger.debug("Adding WILDCARD parse target: {} as {}, with field name: {}", parserFieldName, cleanName, drillFieldName);
- parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, String.class), parserFieldName);
- parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, Double.class), parserFieldName);
- parser.addParseTarget(this.getClass().getMethod("setWildcard", String.class, Long.class), parserFieldName);
- wildcards.put(cleanName, mapWriter.map(drillFieldName));
- } else if (type.contains(Casts.DOUBLE)) {
- logger.debug("Adding DOUBLE parse target: {}, with field name: {}", parserFieldName, drillFieldName);
- parser.addParseTarget(this.getClass().getMethod("set", String.class, Double.class), parserFieldName);
- doubles.put(parserFieldName, mapWriter.float8(drillFieldName));
- } else if (type.contains(Casts.LONG)) {
- logger.debug("Adding LONG parse target: {}, with field name: {}", parserFieldName, drillFieldName);
- parser.addParseTarget(this.getClass().getMethod("set", String.class, Long.class), parserFieldName);
- longs.put(parserFieldName, mapWriter.bigInt(drillFieldName));
- } else {
- logger.debug("Adding STRING parse target: {}, with field name: {}", parserFieldName, drillFieldName);
- if (parserFieldName.startsWith("TIME.STAMP:")) {
- parser.addParseTarget(this.getClass().getMethod("setTimestamp", String.class, String.class), parserFieldName);
- times.put(parserFieldName, mapWriter.timeStamp(drillFieldName));
- } else {
- parser.addParseTarget(this.getClass().getMethod("set", String.class, String.class), parserFieldName);
- strings.put(parserFieldName, mapWriter.varChar(drillFieldName));
- }
- }
- }
-}
diff --git a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java b/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
deleted file mode 100644
index 7da7a95d1f5..00000000000
--- a/exec/java-exec/src/main/java/org/apache/drill/exec/store/httpd/HttpdParser.java
+++ /dev/null
@@ -1,441 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.drill.exec.store.httpd;
-
-import org.apache.drill.shaded.guava.com.google.common.base.Preconditions;
-import org.apache.drill.shaded.guava.com.google.common.collect.Maps;
-import io.netty.buffer.DrillBuf;
-import nl.basjes.parse.core.Casts;
-import nl.basjes.parse.core.Parser;
-import nl.basjes.parse.core.exceptions.DissectionFailure;
-import nl.basjes.parse.core.exceptions.InvalidDissectorException;
-import nl.basjes.parse.core.exceptions.MissingDissectorsException;
-import nl.basjes.parse.httpdlog.HttpdLoglineParser;
-import org.apache.drill.exec.vector.complex.writer.BaseWriter.MapWriter;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import java.util.EnumSet;
-import java.util.HashMap;
-import java.util.List;
-import java.util.Map;
-
-public class HttpdParser {
-
- private static final Logger logger = LoggerFactory.getLogger(HttpdParser.class);
-
- public static final String PARSER_WILDCARD = ".*";
- public static final String SAFE_WILDCARD = "_$";
- public static final String SAFE_SEPARATOR = "_";
- public static final String REMAPPING_FLAG = "#";
- private final Parser parser;
- private final HttpdLogRecord record;
-
- public static final HashMap LOGFIELDS = new HashMap();
-
- static {
- LOGFIELDS.put("connection.client.ip", "IP:connection.client.ip");
- LOGFIELDS.put("connection.client.ip.last", "IP:connection.client.ip.last");
- LOGFIELDS.put("connection.client.ip.original", "IP:connection.client.ip.original");
- LOGFIELDS.put("connection.client.ip.last", "IP:connection.client.ip.last");
- LOGFIELDS.put("connection.client.peerip", "IP:connection.client.peerip");
- LOGFIELDS.put("connection.client.peerip.last", "IP:connection.client.peerip.last");
- LOGFIELDS.put("connection.client.peerip.original", "IP:connection.client.peerip.original");
- LOGFIELDS.put("connection.client.peerip.last", "IP:connection.client.peerip.last");
- LOGFIELDS.put("connection.server.ip", "IP:connection.server.ip");
- LOGFIELDS.put("connection.server.ip.last", "IP:connection.server.ip.last");
- LOGFIELDS.put("connection.server.ip.original", "IP:connection.server.ip.original");
- LOGFIELDS.put("connection.server.ip.last", "IP:connection.server.ip.last");
- LOGFIELDS.put("response.body.bytes", "BYTES:response.body.bytes");
- LOGFIELDS.put("response.body.bytes.last", "BYTES:response.body.bytes.last");
- LOGFIELDS.put("response.body.bytes.original", "BYTES:response.body.bytes.original");
- LOGFIELDS.put("response.body.bytes.last", "BYTES:response.body.bytes.last");
- LOGFIELDS.put("response.body.bytesclf", "BYTES:response.body.bytesclf");
- LOGFIELDS.put("response.body.bytes", "BYTESCLF:response.body.bytes");
- LOGFIELDS.put("response.body.bytes.last", "BYTESCLF:response.body.bytes.last");
- LOGFIELDS.put("response.body.bytes.original", "BYTESCLF:response.body.bytes.original");
- LOGFIELDS.put("response.body.bytes.last", "BYTESCLF:response.body.bytes.last");
- LOGFIELDS.put("request.cookies.foobar", "HTTP.COOKIE:request.cookies.foobar");
- LOGFIELDS.put("server.environment.foobar", "VARIABLE:server.environment.foobar");
- LOGFIELDS.put("server.filename", "FILENAME:server.filename");
- LOGFIELDS.put("server.filename.last", "FILENAME:server.filename.last");
- LOGFIELDS.put("server.filename.original", "FILENAME:server.filename.original");
- LOGFIELDS.put("server.filename.last", "FILENAME:server.filename.last");
- LOGFIELDS.put("connection.client.host", "IP:connection.client.host");
- LOGFIELDS.put("connection.client.host.last", "IP:connection.client.host.last");
- LOGFIELDS.put("connection.client.host.original", "IP:connection.client.host.original");
- LOGFIELDS.put("connection.client.host.last", "IP:connection.client.host.last");
- LOGFIELDS.put("request.protocol", "PROTOCOL:request.protocol");
- LOGFIELDS.put("request.protocol.last", "PROTOCOL:request.protocol.last");
- LOGFIELDS.put("request.protocol.original", "PROTOCOL:request.protocol.original");
- LOGFIELDS.put("request.protocol.last", "PROTOCOL:request.protocol.last");
- LOGFIELDS.put("request.header.foobar", "HTTP.HEADER:request.header.foobar");
- LOGFIELDS.put("request.trailer.foobar", "HTTP.TRAILER:request.trailer.foobar");
- LOGFIELDS.put("connection.keepalivecount", "NUMBER:connection.keepalivecount");
- LOGFIELDS.put("connection.keepalivecount.last", "NUMBER:connection.keepalivecount.last");
- LOGFIELDS.put("connection.keepalivecount.original", "NUMBER:connection.keepalivecount.original");
- LOGFIELDS.put("connection.keepalivecount.last", "NUMBER:connection.keepalivecount.last");
- LOGFIELDS.put("connection.client.logname", "NUMBER:connection.client.logname");
- LOGFIELDS.put("connection.client.logname.last", "NUMBER:connection.client.logname.last");
- LOGFIELDS.put("connection.client.logname.original", "NUMBER:connection.client.logname.original");
- LOGFIELDS.put("connection.client.logname.last", "NUMBER:connection.client.logname.last");
- LOGFIELDS.put("request.errorlogid", "STRING:request.errorlogid");
- LOGFIELDS.put("request.errorlogid.last", "STRING:request.errorlogid.last");
- LOGFIELDS.put("request.errorlogid.original", "STRING:request.errorlogid.original");
- LOGFIELDS.put("request.errorlogid.last", "STRING:request.errorlogid.last");
- LOGFIELDS.put("request.method", "HTTP.METHOD:request.method");
- LOGFIELDS.put("request.method.last", "HTTP.METHOD:request.method.last");
- LOGFIELDS.put("request.method.original", "HTTP.METHOD:request.method.original");
- LOGFIELDS.put("request.method.last", "HTTP.METHOD:request.method.last");
- LOGFIELDS.put("server.module_note.foobar", "STRING:server.module_note.foobar");
- LOGFIELDS.put("response.header.foobar", "HTTP.HEADER:response.header.foobar");
- LOGFIELDS.put("response.trailer.foobar", "HTTP.TRAILER:response.trailer.foobar");
- LOGFIELDS.put("request.server.port.canonical", "PORT:request.server.port.canonical");
- LOGFIELDS.put("request.server.port.canonical.last", "PORT:request.server.port.canonical.last");
- LOGFIELDS.put("request.server.port.canonical.original", "PORT:request.server.port.canonical.original");
- LOGFIELDS.put("request.server.port.canonical.last", "PORT:request.server.port.canonical.last");
- LOGFIELDS.put("connection.server.port.canonical", "PORT:connection.server.port.canonical");
- LOGFIELDS.put("connection.server.port.canonical.last", "PORT:connection.server.port.canonical.last");
- LOGFIELDS.put("connection.server.port.canonical.original", "PORT:connection.server.port.canonical.original");
- LOGFIELDS.put("connection.server.port.canonical.last", "PORT:connection.server.port.canonical.last");
- LOGFIELDS.put("connection.server.port", "PORT:connection.server.port");
- LOGFIELDS.put("connection.server.port.last", "PORT:connection.server.port.last");
- LOGFIELDS.put("connection.server.port.original", "PORT:connection.server.port.original");
- LOGFIELDS.put("connection.server.port.last", "PORT:connection.server.port.last");
- LOGFIELDS.put("connection.client.port", "PORT:connection.client.port");
- LOGFIELDS.put("connection.client.port.last", "PORT:connection.client.port.last");
- LOGFIELDS.put("connection.client.port.original", "PORT:connection.client.port.original");
- LOGFIELDS.put("connection.client.port.last", "PORT:connection.client.port.last");
- LOGFIELDS.put("connection.server.child.processid", "NUMBER:connection.server.child.processid");
- LOGFIELDS.put("connection.server.child.processid.last", "NUMBER:connection.server.child.processid.last");
- LOGFIELDS.put("connection.server.child.processid.original", "NUMBER:connection.server.child.processid.original");
- LOGFIELDS.put("connection.server.child.processid.last", "NUMBER:connection.server.child.processid.last");
- LOGFIELDS.put("connection.server.child.processid", "NUMBER:connection.server.child.processid");
- LOGFIELDS.put("connection.server.child.processid.last", "NUMBER:connection.server.child.processid.last");
- LOGFIELDS.put("connection.server.child.processid.original", "NUMBER:connection.server.child.processid.original");
- LOGFIELDS.put("connection.server.child.processid.last", "NUMBER:connection.server.child.processid.last");
- LOGFIELDS.put("connection.server.child.threadid", "NUMBER:connection.server.child.threadid");
- LOGFIELDS.put("connection.server.child.threadid.last", "NUMBER:connection.server.child.threadid.last");
- LOGFIELDS.put("connection.server.child.threadid.original", "NUMBER:connection.server.child.threadid.original");
- LOGFIELDS.put("connection.server.child.threadid.last", "NUMBER:connection.server.child.threadid.last");
- LOGFIELDS.put("connection.server.child.hexthreadid", "NUMBER:connection.server.child.hexthreadid");
- LOGFIELDS.put("connection.server.child.hexthreadid.last", "NUMBER:connection.server.child.hexthreadid.last");
- LOGFIELDS.put("connection.server.child.hexthreadid.original", "NUMBER:connection.server.child.hexthreadid.original");
- LOGFIELDS.put("connection.server.child.hexthreadid.last", "NUMBER:connection.server.child.hexthreadid.last");
- LOGFIELDS.put("request.querystring", "HTTP.QUERYSTRING:request.querystring");
- LOGFIELDS.put("request.querystring.last", "HTTP.QUERYSTRING:request.querystring.last");
- LOGFIELDS.put("request.querystring.original", "HTTP.QUERYSTRING:request.querystring.original");
- LOGFIELDS.put("request.querystring.last", "HTTP.QUERYSTRING:request.querystring.last");
- LOGFIELDS.put("request.firstline", "HTTP.FIRSTLINE:request.firstline");
- LOGFIELDS.put("request.firstline.original", "HTTP.FIRSTLINE:request.firstline.original");
- LOGFIELDS.put("request.firstline.original", "HTTP.FIRSTLINE:request.firstline.original");
- LOGFIELDS.put("request.firstline.last", "HTTP.FIRSTLINE:request.firstline.last");
- LOGFIELDS.put("request.handler", "STRING:request.handler");
- LOGFIELDS.put("request.handler.last", "STRING:request.handler.last");
- LOGFIELDS.put("request.handler.original", "STRING:request.handler.original");
- LOGFIELDS.put("request.handler.last", "STRING:request.handler.last");
- LOGFIELDS.put("request.status", "STRING:request.status");
- LOGFIELDS.put("request.status.original", "STRING:request.status.original");
- LOGFIELDS.put("request.status.original", "STRING:request.status.original");
- LOGFIELDS.put("request.status.last", "STRING:request.status.last");
- LOGFIELDS.put("request.receive.time", "TIME.STAMP:request.receive.time");
- LOGFIELDS.put("request.receive.time.last", "TIME.STAMP:request.receive.time.last");
- LOGFIELDS.put("request.receive.time.original", "TIME.STAMP:request.receive.time.original");
- LOGFIELDS.put("request.receive.time.last", "TIME.STAMP:request.receive.time.last");
- LOGFIELDS.put("request.receive.time.year", "TIME.YEAR:request.receive.time.year");
- LOGFIELDS.put("request.receive.time.begin.year", "TIME.YEAR:request.receive.time.begin.year");
- LOGFIELDS.put("request.receive.time.end.year", "TIME.YEAR:request.receive.time.end.year");
- LOGFIELDS.put("request.receive.time.sec", "TIME.SECONDS:request.receive.time.sec");
- LOGFIELDS.put("request.receive.time.sec", "TIME.SECONDS:request.receive.time.sec");
- LOGFIELDS.put("request.receive.time.sec.original", "TIME.SECONDS:request.receive.time.sec.original");
- LOGFIELDS.put("request.receive.time.sec.last", "TIME.SECONDS:request.receive.time.sec.last");
- LOGFIELDS.put("request.receive.time.begin.sec", "TIME.SECONDS:request.receive.time.begin.sec");
- LOGFIELDS.put("request.receive.time.begin.sec.last", "TIME.SECONDS:request.receive.time.begin.sec.last");
- LOGFIELDS.put("request.receive.time.begin.sec.original", "TIME.SECONDS:request.receive.time.begin.sec.original");
- LOGFIELDS.put("request.receive.time.begin.sec.last", "TIME.SECONDS:request.receive.time.begin.sec.last");
- LOGFIELDS.put("request.receive.time.end.sec", "TIME.SECONDS:request.receive.time.end.sec");
- LOGFIELDS.put("request.receive.time.end.sec.last", "TIME.SECONDS:request.receive.time.end.sec.last");
- LOGFIELDS.put("request.receive.time.end.sec.original", "TIME.SECONDS:request.receive.time.end.sec.original");
- LOGFIELDS.put("request.receive.time.end.sec.last", "TIME.SECONDS:request.receive.time.end.sec.last");
- LOGFIELDS.put("request.receive.time.begin.msec", "TIME.EPOCH:request.receive.time.begin.msec");
- LOGFIELDS.put("request.receive.time.msec", "TIME.EPOCH:request.receive.time.msec");
- LOGFIELDS.put("request.receive.time.msec.last", "TIME.EPOCH:request.receive.time.msec.last");
- LOGFIELDS.put("request.receive.time.msec.original", "TIME.EPOCH:request.receive.time.msec.original");
- LOGFIELDS.put("request.receive.time.msec.last", "TIME.EPOCH:request.receive.time.msec.last");
- LOGFIELDS.put("request.receive.time.begin.msec", "TIME.EPOCH:request.receive.time.begin.msec");
- LOGFIELDS.put("request.receive.time.begin.msec.last", "TIME.EPOCH:request.receive.time.begin.msec.last");
- LOGFIELDS.put("request.receive.time.begin.msec.original", "TIME.EPOCH:request.receive.time.begin.msec.original");
- LOGFIELDS.put("request.receive.time.begin.msec.last", "TIME.EPOCH:request.receive.time.begin.msec.last");
- LOGFIELDS.put("request.receive.time.end.msec", "TIME.EPOCH:request.receive.time.end.msec");
- LOGFIELDS.put("request.receive.time.end.msec.last", "TIME.EPOCH:request.receive.time.end.msec.last");
- LOGFIELDS.put("request.receive.time.end.msec.original", "TIME.EPOCH:request.receive.time.end.msec.original");
- LOGFIELDS.put("request.receive.time.end.msec.last", "TIME.EPOCH:request.receive.time.end.msec.last");
- LOGFIELDS.put("request.receive.time.begin.usec", "TIME.EPOCH.USEC:request.receive.time.begin.usec");
- LOGFIELDS.put("request.receive.time.usec", "TIME.EPOCH.USEC:request.receive.time.usec");
- LOGFIELDS.put("request.receive.time.usec.last", "TIME.EPOCH.USEC:request.receive.time.usec.last");
- LOGFIELDS.put("request.receive.time.usec.original", "TIME.EPOCH.USEC:request.receive.time.usec.original");
- LOGFIELDS.put("request.receive.time.usec.last", "TIME.EPOCH.USEC:request.receive.time.usec.last");
- LOGFIELDS.put("request.receive.time.begin.usec", "TIME.EPOCH.USEC:request.receive.time.begin.usec");
- LOGFIELDS.put("request.receive.time.begin.usec.last", "TIME.EPOCH.USEC:request.receive.time.begin.usec.last");
- LOGFIELDS.put("request.receive.time.begin.usec.original", "TIME.EPOCH.USEC:request.receive.time.begin.usec.original");
- LOGFIELDS.put("request.receive.time.begin.usec.last", "TIME.EPOCH.USEC:request.receive.time.begin.usec.last");
- LOGFIELDS.put("request.receive.time.end.usec", "TIME.EPOCH.USEC:request.receive.time.end.usec");
- LOGFIELDS.put("request.receive.time.end.usec.last", "TIME.EPOCH.USEC:request.receive.time.end.usec.last");
- LOGFIELDS.put("request.receive.time.end.usec.original", "TIME.EPOCH.USEC:request.receive.time.end.usec.original");
- LOGFIELDS.put("request.receive.time.end.usec.last", "TIME.EPOCH.USEC:request.receive.time.end.usec.last");
- LOGFIELDS.put("request.receive.time.begin.msec_frac", "TIME.EPOCH:request.receive.time.begin.msec_frac");
- LOGFIELDS.put("request.receive.time.msec_frac", "TIME.EPOCH:request.receive.time.msec_frac");
- LOGFIELDS.put("request.receive.time.msec_frac.last", "TIME.EPOCH:request.receive.time.msec_frac.last");
- LOGFIELDS.put("request.receive.time.msec_frac.original", "TIME.EPOCH:request.receive.time.msec_frac.original");
- LOGFIELDS.put("request.receive.time.msec_frac.last", "TIME.EPOCH:request.receive.time.msec_frac.last");
- LOGFIELDS.put("request.receive.time.begin.msec_frac", "TIME.EPOCH:request.receive.time.begin.msec_frac");
- LOGFIELDS.put("request.receive.time.begin.msec_frac.last", "TIME.EPOCH:request.receive.time.begin.msec_frac.last");
- LOGFIELDS.put("request.receive.time.begin.msec_frac.original", "TIME.EPOCH:request.receive.time.begin.msec_frac.original");
- LOGFIELDS.put("request.receive.time.begin.msec_frac.last", "TIME.EPOCH:request.receive.time.begin.msec_frac.last");
- LOGFIELDS.put("request.receive.time.end.msec_frac", "TIME.EPOCH:request.receive.time.end.msec_frac");
- LOGFIELDS.put("request.receive.time.end.msec_frac.last", "TIME.EPOCH:request.receive.time.end.msec_frac.last");
- LOGFIELDS.put("request.receive.time.end.msec_frac.original", "TIME.EPOCH:request.receive.time.end.msec_frac.original");
- LOGFIELDS.put("request.receive.time.end.msec_frac.last", "TIME.EPOCH:request.receive.time.end.msec_frac.last");
- LOGFIELDS.put("request.receive.time.begin.usec_frac", "FRAC:request.receive.time.begin.usec_frac");
- LOGFIELDS.put("request.receive.time.usec_frac", "FRAC:request.receive.time.usec_frac");
- LOGFIELDS.put("request.receive.time.usec_frac.last", "FRAC:request.receive.time.usec_frac.last");
- LOGFIELDS.put("request.receive.time.usec_frac.original", "FRAC:request.receive.time.usec_frac.original");
- LOGFIELDS.put("request.receive.time.usec_frac.last", "FRAC:request.receive.time.usec_frac.last");
- LOGFIELDS.put("request.receive.time.begin.usec_frac", "FRAC:request.receive.time.begin.usec_frac");
- LOGFIELDS.put("request.receive.time.begin.usec_frac.last", "FRAC:request.receive.time.begin.usec_frac.last");
- LOGFIELDS.put("request.receive.time.begin.usec_frac.original", "FRAC:request.receive.time.begin.usec_frac.original");
- LOGFIELDS.put("request.receive.time.begin.usec_frac.last", "FRAC:request.receive.time.begin.usec_frac.last");
- LOGFIELDS.put("request.receive.time.end.usec_frac", "FRAC:request.receive.time.end.usec_frac");
- LOGFIELDS.put("request.receive.time.end.usec_frac.last", "FRAC:request.receive.time.end.usec_frac.last");
- LOGFIELDS.put("request.receive.time.end.usec_frac.original", "FRAC:request.receive.time.end.usec_frac.original");
- LOGFIELDS.put("request.receive.time.end.usec_frac.last", "FRAC:request.receive.time.end.usec_frac.last");
- LOGFIELDS.put("response.server.processing.time", "SECONDS:response.server.processing.time");
- LOGFIELDS.put("response.server.processing.time.original", "SECONDS:response.server.processing.time.original");
- LOGFIELDS.put("response.server.processing.time.original", "SECONDS:response.server.processing.time.original");
- LOGFIELDS.put("response.server.processing.time.last", "SECONDS:response.server.processing.time.last");
- LOGFIELDS.put("server.process.time", "MICROSECONDS:server.process.time");
- LOGFIELDS.put("response.server.processing.time", "MICROSECONDS:response.server.processing.time");
- LOGFIELDS.put("response.server.processing.time.original", "MICROSECONDS:response.server.processing.time.original");
- LOGFIELDS.put("response.server.processing.time.original", "MICROSECONDS:response.server.processing.time.original");
- LOGFIELDS.put("response.server.processing.time.last", "MICROSECONDS:response.server.processing.time.last");
- LOGFIELDS.put("response.server.processing.time", "MICROSECONDS:response.server.processing.time");
- LOGFIELDS.put("response.server.processing.time.original", "MICROSECONDS:response.server.processing.time.original");
- LOGFIELDS.put("response.server.processing.time.original", "MICROSECONDS:response.server.processing.time.original");
- LOGFIELDS.put("response.server.processing.time.last", "MICROSECONDS:response.server.processing.time.last");
- LOGFIELDS.put("response.server.processing.time", "MILLISECONDS:response.server.processing.time");
- LOGFIELDS.put("response.server.processing.time.original", "MILLISECONDS:response.server.processing.time.original");
- LOGFIELDS.put("response.server.processing.time.original", "MILLISECONDS:response.server.processing.time.original");
- LOGFIELDS.put("response.server.processing.time.last", "MILLISECONDS:response.server.processing.time.last");
- LOGFIELDS.put("response.server.processing.time", "SECONDS:response.server.processing.time");
- LOGFIELDS.put("response.server.processing.time.original", "SECONDS:response.server.processing.time.original");
- LOGFIELDS.put("response.server.processing.time.original", "SECONDS:response.server.processing.time.original");
- LOGFIELDS.put("response.server.processing.time.last", "SECONDS:response.server.processing.time.last");
- LOGFIELDS.put("connection.client.user", "STRING:connection.client.user");
- LOGFIELDS.put("connection.client.user.last", "STRING:connection.client.user.last");
- LOGFIELDS.put("connection.client.user.original", "STRING:connection.client.user.original");
- LOGFIELDS.put("connection.client.user.last", "STRING:connection.client.user.last");
- LOGFIELDS.put("request.urlpath", "URI:request.urlpath");
- LOGFIELDS.put("request.urlpath.original", "URI:request.urlpath.original");
- LOGFIELDS.put("request.urlpath.original", "URI:request.urlpath.original");
- LOGFIELDS.put("request.urlpath.last", "URI:request.urlpath.last");
- LOGFIELDS.put("connection.server.name.canonical", "STRING:connection.server.name.canonical");
- LOGFIELDS.put("connection.server.name.canonical.last", "STRING:connection.server.name.canonical.last");
- LOGFIELDS.put("connection.server.name.canonical.original", "STRING:connection.server.name.canonical.original");
- LOGFIELDS.put("connection.server.name.canonical.last", "STRING:connection.server.name.canonical.last");
- LOGFIELDS.put("connection.server.name", "STRING:connection.server.name");
- LOGFIELDS.put("connection.server.name.last", "STRING:connection.server.name.last");
- LOGFIELDS.put("connection.server.name.original", "STRING:connection.server.name.original");
- LOGFIELDS.put("connection.server.name.last", "STRING:connection.server.name.last");
- LOGFIELDS.put("response.connection.status", "HTTP.CONNECTSTATUS:response.connection.status");
- LOGFIELDS.put("response.connection.status.last", "HTTP.CONNECTSTATUS:response.connection.status.last");
- LOGFIELDS.put("response.connection.status.original", "HTTP.CONNECTSTATUS:response.connection.status.original");
- LOGFIELDS.put("response.connection.status.last", "HTTP.CONNECTSTATUS:response.connection.status.last");
- LOGFIELDS.put("request.bytes", "BYTES:request.bytes");
- LOGFIELDS.put("request.bytes.last", "BYTES:request.bytes.last");
- LOGFIELDS.put("request.bytes.original", "BYTES:request.bytes.original");
- LOGFIELDS.put("request.bytes.last", "BYTES:request.bytes.last");
- LOGFIELDS.put("response.bytes", "BYTES:response.bytes");
- LOGFIELDS.put("response.bytes.last", "BYTES:response.bytes.last");
- LOGFIELDS.put("response.bytes.original", "BYTES:response.bytes.original");
- LOGFIELDS.put("response.bytes.last", "BYTES:response.bytes.last");
- LOGFIELDS.put("total.bytes", "BYTES:total.bytes");
- LOGFIELDS.put("total.bytes.last", "BYTES:total.bytes.last");
- LOGFIELDS.put("total.bytes.original", "BYTES:total.bytes.original");
- LOGFIELDS.put("total.bytes.last", "BYTES:total.bytes.last");
- LOGFIELDS.put("request.cookies", "HTTP.COOKIES:request.cookies");
- LOGFIELDS.put("request.cookies.last", "HTTP.COOKIES:request.cookies.last");
- LOGFIELDS.put("request.cookies.original", "HTTP.COOKIES:request.cookies.original");
- LOGFIELDS.put("request.cookies.last", "HTTP.COOKIES:request.cookies.last");
- LOGFIELDS.put("response.cookies", "HTTP.SETCOOKIES:response.cookies");
- LOGFIELDS.put("response.cookies.last", "HTTP.SETCOOKIES:response.cookies.last");
- LOGFIELDS.put("response.cookies.original", "HTTP.SETCOOKIES:response.cookies.original");
- LOGFIELDS.put("response.cookies.last", "HTTP.SETCOOKIES:response.cookies.last");
- LOGFIELDS.put("request.user-agent", "HTTP.USERAGENT:request.user-agent");
- LOGFIELDS.put("request.user-agent.last", "HTTP.USERAGENT:request.user-agent.last");
- LOGFIELDS.put("request.user-agent.original", "HTTP.USERAGENT:request.user-agent.original");
- LOGFIELDS.put("request.user-agent.last", "HTTP.USERAGENT:request.user-agent.last");
- LOGFIELDS.put("request.referer", "HTTP.URI:request.referer");
- LOGFIELDS.put("request.referer.last", "HTTP.URI:request.referer.last");
- LOGFIELDS.put("request.referer.original", "HTTP.URI:request.referer.original");
- LOGFIELDS.put("request.referer.last", "HTTP.URI:request.referer.last");
- }
-
- public HttpdParser(final MapWriter mapWriter, final DrillBuf managedBuffer, final String logFormat,
- final String timestampFormat, final Map fieldMapping)
- throws NoSuchMethodException, MissingDissectorsException, InvalidDissectorException {
-
- Preconditions.checkArgument(logFormat != null && !logFormat.trim().isEmpty(), "logFormat cannot be null or empty");
-
- this.record = new HttpdLogRecord(managedBuffer, timestampFormat);
- this.parser = new HttpdLoglineParser<>(HttpdLogRecord.class, logFormat, timestampFormat);
-
- setupParser(mapWriter, logFormat, fieldMapping);
-
- if (timestampFormat != null && !timestampFormat.trim().isEmpty()) {
- logger.info("Custom timestamp format has been specified. This is an informational note only as custom timestamps is rather unusual.");
- }
- if (logFormat.contains("\n")) {
- logger.info("Specified logformat is a multiline log format: {}", logFormat);
- }
- }
-
- /**
- * We do not expose the underlying parser or the record which is used to manage the writers.
- *
- * @param line log line to tear apart.
- * @throws DissectionFailure
- * @throws InvalidDissectorException
- * @throws MissingDissectorsException
- */
- public void parse(final String line) throws DissectionFailure, InvalidDissectorException, MissingDissectorsException {
- parser.parse(record, line);
- record.finishRecord();
- }
-
- /**
- * In order to define a type remapping the format of the field configuration will look like:
- * HTTP.URI:request.firstline.uri.query.[parameter name]
- *
- * @param parser Add type remapping to this parser instance.
- * @param fieldName request.firstline.uri.query.[parameter_name]
- * @param fieldType HTTP.URI, etc..
- */
- private void addTypeRemapping(final Parser parser, final String fieldName, final String fieldType) {
- logger.debug("Adding type remapping - fieldName: {}, fieldType: {}", fieldName, fieldType);
- parser.addTypeRemapping(fieldName, fieldType);
- }
-
- /**
- * The parser deals with dots unlike Drill wanting underscores request_referer. For the sake of simplicity we are
- * going replace the dots. The resultant output field will look like: request.referer.
- * Additionally, wild cards will get replaced with .*
- *
- * @param drillFieldName name to be cleansed.
- * @return
- */
- public static String parserFormattedFieldName(String drillFieldName) {
-
- //The Useragent fields contain a dash which causes potential problems if the field name is not escaped properly
- //This removes the dash
- if (drillFieldName.contains("useragent")) {
- drillFieldName = drillFieldName.replace("useragent", "user-agent");
- }
-
- String tempFieldName;
- tempFieldName = LOGFIELDS.get(drillFieldName);
- return tempFieldName.replace(SAFE_WILDCARD, PARSER_WILDCARD).replaceAll(SAFE_SEPARATOR, ".").replaceAll("\\.\\.", "_");
- }
-
- /**
- * Drill cannot deal with fields with dots in them like request.referer. For the sake of simplicity we are going
- * ensure the field name is cleansed. The resultant output field will look like: request_referer.
- * Additionally, wild cards will get replaced with _$
- *
- * @param parserFieldName name to be cleansed.
- * @return
- */
- public static String drillFormattedFieldName(String parserFieldName) {
-
- //The Useragent fields contain a dash which causes potential problems if the field name is not escaped properly
- //This removes the dash
- if (parserFieldName.contains("user-agent")) {
- parserFieldName = parserFieldName.replace("user-agent", "useragent");
- }
-
- if (parserFieldName.contains(":")) {
- String[] fieldPart = parserFieldName.split(":");
- return fieldPart[1].replaceAll("_", "__").replace(PARSER_WILDCARD, SAFE_WILDCARD).replaceAll("\\.", SAFE_SEPARATOR);
- } else {
- return parserFieldName.replaceAll("_", "__").replace(PARSER_WILDCARD, SAFE_WILDCARD).replaceAll("\\.", SAFE_SEPARATOR);
- }
- }
-
- private void setupParser(final MapWriter mapWriter, final String logFormat, final Map fieldMapping)
- throws NoSuchMethodException, MissingDissectorsException, InvalidDissectorException {
-
- /**
- * If the user has selected fields, then we will use them to configure the parser because this would be the most
- * efficient way to parse the log.
- */
- final Map requestedPaths;
- final List allParserPaths = parser.getPossiblePaths();
- if (fieldMapping != null && !fieldMapping.isEmpty()) {
- logger.debug("Using fields defined by user.");
- requestedPaths = fieldMapping;
- } else {
- /**
- * Use all possible paths that the parser has determined from the specified log format.
- */
- logger.debug("No fields defined by user, defaulting to all possible fields.");
- requestedPaths = Maps.newHashMap();
- for (final String parserPath : allParserPaths) {
- requestedPaths.put(drillFormattedFieldName(parserPath), parserPath);
- }
- }
-
- /**
- * By adding the parse target to the dummy instance we activate it for use. Which we can then use to find out which
- * paths cast to which native data types. After we are done figuring this information out, we throw this away
- * because this will be the slowest parsing path possible for the specified format.
- */
- Parser dummy = new HttpdLoglineParser<>(Object.class, logFormat);
- dummy.addParseTarget(String.class.getMethod("indexOf", String.class), allParserPaths);
- for (final Map.Entry entry : requestedPaths.entrySet()) {
- final EnumSet casts;
-
- /**
- * Check the field specified by the user to see if it is supposed to be remapped.
- */
- if (entry.getValue().startsWith(REMAPPING_FLAG)) {
- /**
- * Because this field is being remapped we need to replace the field name that the parser uses.
- */
- entry.setValue(entry.getValue().substring(REMAPPING_FLAG.length()));
-
- final String[] pieces = entry.getValue().split(":");
- addTypeRemapping(parser, pieces[1], pieces[0]);
- casts = Casts.STRING_ONLY;
- } else {
- casts = dummy.getCasts(entry.getValue());
- }
-
- logger.debug("Setting up drill field: {}, parser field: {}, which casts as: {}", entry.getKey(), entry.getValue(), casts);
- record.addField(parser, mapWriter, casts, entry.getValue(), entry.getKey());
- }
- }
-}
diff --git a/exec/java-exec/src/main/resources/bootstrap-storage-plugins.json b/exec/java-exec/src/main/resources/bootstrap-storage-plugins.json
index 4aa17540d1b..dd8a659dc40 100644
--- a/exec/java-exec/src/main/resources/bootstrap-storage-plugins.json
+++ b/exec/java-exec/src/main/resources/bootstrap-storage-plugins.json
@@ -31,11 +31,6 @@
"extensions" : [ "tsv" ],
"fieldDelimiter" : "\t"
},
- "httpd" : {
- "type" : "httpd",
- "logFormat" : "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"",
- "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ"
- },
"parquet" : {
"type" : "parquet"
},
diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/store/FormatPluginSerDeTest.java b/exec/java-exec/src/test/java/org/apache/drill/exec/store/FormatPluginSerDeTest.java
index ebcb300ac87..600177308bc 100644
--- a/exec/java-exec/src/test/java/org/apache/drill/exec/store/FormatPluginSerDeTest.java
+++ b/exec/java-exec/src/test/java/org/apache/drill/exec/store/FormatPluginSerDeTest.java
@@ -91,19 +91,6 @@ public void testPcap() throws Exception {
);
}
- @Test
- public void testHttpd() throws Exception {
- String path = "store/httpd/dfs-test-bootstrap-test.httpd";
- dirTestWatcher.copyResourceToRoot(Paths.get(path));
- String logFormat = "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"";
- String timeStampFormat = "dd/MMM/yyyy:HH:mm:ss ZZ";
- testPhysicalPlanSubmission(
- String.format("select * from dfs.`%s`", path),
- String.format("select * from table(dfs.`%s`(type=>'httpd', logFormat=>'%s'))", path, logFormat),
- String.format("select * from table(dfs.`%s`(type=>'httpd', logFormat=>'%s', timestampFormat=>'%s'))", path, logFormat, timeStampFormat)
- );
- }
-
@Test
public void testJson() throws Exception {
testPhysicalPlanSubmission(
diff --git a/exec/java-exec/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java b/exec/java-exec/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java
deleted file mode 100644
index c86ee52112b..00000000000
--- a/exec/java-exec/src/test/java/org/apache/drill/exec/store/httpd/TestHTTPDLogReader.java
+++ /dev/null
@@ -1,218 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements. See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership. The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License. You may obtain a copy of the License at
- *
- * http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.drill.exec.store.httpd;
-
-import org.apache.drill.common.types.TypeProtos.MinorType;
-import org.apache.drill.exec.record.metadata.SchemaBuilder;
-import org.apache.drill.exec.record.metadata.TupleMetadata;
-import org.apache.drill.exec.rpc.RpcException;
-import org.apache.drill.test.BaseDirTestWatcher;
-import org.apache.drill.test.ClusterFixture;
-import org.apache.drill.test.ClusterTest;
-import org.apache.drill.exec.physical.rowSet.RowSet;
-import org.apache.drill.test.rowSet.RowSetUtilities;
-import org.junit.BeforeClass;
-import org.junit.ClassRule;
-import org.junit.Test;
-
-import java.time.LocalDateTime;
-import java.util.HashMap;
-
-import static org.junit.Assert.assertEquals;
-
-public class TestHTTPDLogReader extends ClusterTest {
-
- @ClassRule
- public static final BaseDirTestWatcher dirTestWatcher = new BaseDirTestWatcher();
-
- @BeforeClass
- public static void setup() throws Exception {
- ClusterTest.startCluster(ClusterFixture.builder(dirTestWatcher));
-
- // Define a temporary format plugin for the "cp" storage plugin.
- HttpdLogFormatConfig sampleConfig = new HttpdLogFormatConfig(
- "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"", null);
- cluster.defineFormat("cp", "sample", sampleConfig);
- }
-
- @Test
- public void testDateField() throws RpcException {
- String sql = "SELECT `request_receive_time` FROM cp.`httpd/hackers-access-small.httpd` LIMIT 5";
- RowSet results = client.queryBuilder().sql(sql).rowSet();
-
- TupleMetadata expectedSchema = new SchemaBuilder()
- .addNullable("request_receive_time", MinorType.TIMESTAMP)
- .buildSchema();
- RowSet expected = client.rowSetBuilder(expectedSchema)
- .addRow(1445742685000L)
- .addRow(1445742686000L)
- .addRow(1445742687000L)
- .addRow(1445743471000L)
- .addRow(1445743472000L)
- .build();
-
- RowSetUtilities.verify(expected, results);
- }
-
- @Test
- public void testSelectColumns() throws Exception {
- String sql = "SELECT request_referer_ref,\n" +
- "request_receive_time_last_time,\n" +
- "request_firstline_uri_protocol,\n" +
- "request_receive_time_microsecond,\n" +
- "request_receive_time_last_microsecond__utc,\n" +
- "request_firstline_original_protocol,\n" +
- "request_firstline_original_uri_host,\n" +
- "request_referer_host,\n" +
- "request_receive_time_month__utc,\n" +
- "request_receive_time_last_minute,\n" +
- "request_firstline_protocol_version,\n" +
- "request_receive_time_time__utc,\n" +
- "request_referer_last_ref,\n" +
- "request_receive_time_last_timezone,\n" +
- "request_receive_time_last_weekofweekyear,\n" +
- "request_referer_last,\n" +
- "request_receive_time_minute,\n" +
- "connection_client_host_last,\n" +
- "request_receive_time_last_millisecond__utc,\n" +
- "request_firstline_original_uri,\n" +
- "request_firstline,\n" +
- "request_receive_time_nanosecond,\n" +
- "request_receive_time_last_millisecond,\n" +
- "request_receive_time_day,\n" +
- "request_referer_port,\n" +
- "request_firstline_original_uri_port,\n" +
- "request_receive_time_year,\n" +
- "request_receive_time_last_date,\n" +
- "request_receive_time_last_time__utc,\n" +
- "request_receive_time_last_hour__utc,\n" +
- "request_firstline_original_protocol_version,\n" +
- "request_firstline_original_method,\n" +
- "request_receive_time_last_year__utc,\n" +
- "request_firstline_uri,\n" +
- "request_referer_last_host,\n" +
- "request_receive_time_last_minute__utc,\n" +
- "request_receive_time_weekofweekyear,\n" +
- "request_firstline_uri_userinfo,\n" +
- "request_receive_time_epoch,\n" +
- "connection_client_logname,\n" +
- "response_body_bytes,\n" +
- "request_receive_time_nanosecond__utc,\n" +
- "request_firstline_protocol,\n" +
- "request_receive_time_microsecond__utc,\n" +
- "request_receive_time_hour,\n" +
- "request_firstline_uri_host,\n" +
- "request_referer_last_port,\n" +
- "request_receive_time_last_epoch,\n" +
- "request_receive_time_last_weekyear__utc,\n" +
- "request_useragent,\n" +
- "request_receive_time_weekyear,\n" +
- "request_receive_time_timezone,\n" +
- "response_body_bytesclf,\n" +
- "request_receive_time_last_date__utc,\n" +
- "request_receive_time_millisecond__utc,\n" +
- "request_referer_last_protocol,\n" +
- "request_status_last,\n" +
- "request_firstline_uri_query,\n" +
- "request_receive_time_minute__utc,\n" +
- "request_firstline_original_uri_protocol,\n" +
- "request_referer_query,\n" +
- "request_receive_time_date,\n" +
- "request_firstline_uri_port,\n" +
- "request_receive_time_last_second__utc,\n" +
- "request_referer_last_userinfo,\n" +
- "request_receive_time_last_second,\n" +
- "request_receive_time_last_monthname__utc,\n" +
- "request_firstline_method,\n" +
- "request_receive_time_last_month__utc,\n" +
- "request_receive_time_millisecond,\n" +
- "request_receive_time_day__utc,\n" +
- "request_receive_time_year__utc,\n" +
- "request_receive_time_weekofweekyear__utc,\n" +
- "request_receive_time_second,\n" +
- "request_firstline_original_uri_ref,\n" +
- "connection_client_logname_last,\n" +
- "request_receive_time_last_year,\n" +
- "request_firstline_original_uri_path,\n" +
- "connection_client_host,\n" +
- "request_firstline_original_uri_query,\n" +
- "request_referer_userinfo,\n" +
- "request_receive_time_last_monthname,\n" +
- "request_referer_path,\n" +
- "request_receive_time_monthname,\n" +
- "request_receive_time_last_month,\n" +
- "request_referer_last_query,\n" +
- "request_firstline_uri_ref,\n" +
- "request_receive_time_last_day,\n" +
- "request_receive_time_time,\n" +
- "request_receive_time_last_weekofweekyear__utc,\n" +
- "request_useragent_last,\n" +
- "request_receive_time_last_weekyear,\n" +
- "request_receive_time_last_microsecond,\n" +
- "request_firstline_original,\n" +
- "request_referer_last_path,\n" +
- "request_receive_time_month,\n" +
- "request_receive_time_last_day__utc,\n" +
- "request_referer,\n" +
- "request_referer_protocol,\n" +
- "request_receive_time_monthname__utc,\n" +
- "response_body_bytes_last,\n" +
- "request_receive_time,\n" +
- "request_receive_time_last_nanosecond,\n" +
- "request_firstline_uri_path,\n" +
- "request_firstline_original_uri_userinfo,\n" +
- "request_receive_time_date__utc,\n" +
- "request_receive_time_last,\n" +
- "request_receive_time_last_nanosecond__utc,\n" +
- "request_receive_time_last_hour,\n" +
- "request_receive_time_hour__utc,\n" +
- "request_receive_time_second__utc,\n" +
- "connection_client_user_last,\n" +
- "request_receive_time_weekyear__utc,\n" +
- "connection_client_user\n" +
- "FROM cp.`httpd/hackers-access-small.httpd`\n" +
- "LIMIT 1";
-
- testBuilder()
- .sqlQuery(sql)
- .unOrdered()
- .baselineColumns("request_referer_ref", "request_receive_time_last_time", "request_firstline_uri_protocol", "request_receive_time_microsecond", "request_receive_time_last_microsecond__utc", "request_firstline_original_protocol", "request_firstline_original_uri_host", "request_referer_host", "request_receive_time_month__utc", "request_receive_time_last_minute", "request_firstline_protocol_version", "request_receive_time_time__utc", "request_referer_last_ref", "request_receive_time_last_timezone", "request_receive_time_last_weekofweekyear", "request_referer_last", "request_receive_time_minute", "connection_client_host_last", "request_receive_time_last_millisecond__utc", "request_firstline_original_uri", "request_firstline", "request_receive_time_nanosecond", "request_receive_time_last_millisecond", "request_receive_time_day", "request_referer_port", "request_firstline_original_uri_port", "request_receive_time_year", "request_receive_time_last_date", "request_receive_time_last_time__utc", "request_receive_time_last_hour__utc", "request_firstline_original_protocol_version", "request_firstline_original_method", "request_receive_time_last_year__utc", "request_firstline_uri", "request_referer_last_host", "request_receive_time_last_minute__utc", "request_receive_time_weekofweekyear", "request_firstline_uri_userinfo", "request_receive_time_epoch", "connection_client_logname", "response_body_bytes", "request_receive_time_nanosecond__utc", "request_firstline_protocol", "request_receive_time_microsecond__utc", "request_receive_time_hour", "request_firstline_uri_host", "request_referer_last_port", "request_receive_time_last_epoch", "request_receive_time_last_weekyear__utc", "request_useragent", "request_receive_time_weekyear", "request_receive_time_timezone", "response_body_bytesclf", "request_receive_time_last_date__utc", "request_receive_time_millisecond__utc", "request_referer_last_protocol", "request_status_last", "request_firstline_uri_query", "request_receive_time_minute__utc", "request_firstline_original_uri_protocol", "request_referer_query", "request_receive_time_date", "request_firstline_uri_port", "request_receive_time_last_second__utc", "request_referer_last_userinfo", "request_receive_time_last_second", "request_receive_time_last_monthname__utc", "request_firstline_method", "request_receive_time_last_month__utc", "request_receive_time_millisecond", "request_receive_time_day__utc", "request_receive_time_year__utc", "request_receive_time_weekofweekyear__utc", "request_receive_time_second", "request_firstline_original_uri_ref", "connection_client_logname_last", "request_receive_time_last_year", "request_firstline_original_uri_path", "connection_client_host", "request_firstline_original_uri_query", "request_referer_userinfo", "request_receive_time_last_monthname", "request_referer_path", "request_receive_time_monthname", "request_receive_time_last_month", "request_referer_last_query", "request_firstline_uri_ref", "request_receive_time_last_day", "request_receive_time_time", "request_receive_time_last_weekofweekyear__utc", "request_useragent_last", "request_receive_time_last_weekyear", "request_receive_time_last_microsecond", "request_firstline_original", "request_referer_last_path", "request_receive_time_month", "request_receive_time_last_day__utc", "request_referer", "request_referer_protocol", "request_receive_time_monthname__utc", "response_body_bytes_last", "request_receive_time", "request_receive_time_last_nanosecond", "request_firstline_uri_path", "request_firstline_original_uri_userinfo", "request_receive_time_date__utc", "request_receive_time_last", "request_receive_time_last_nanosecond__utc", "request_receive_time_last_hour", "request_receive_time_hour__utc", "request_receive_time_second__utc", "connection_client_user_last", "request_receive_time_weekyear__utc", "connection_client_user")
- .baselineValues(null, "04:11:25", null, 0L, 0L, "HTTP", null, "howto.basjes.nl", 10L, 11L, "1.1", "03:11:25", null, null, 43L, "http://howto.basjes.nl/", 11L, "195.154.46.135", 0L, "/linux/doing-pxe-without-dhcp-control", "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1", 0L, 0L, 25L, null, null, 2015L, "2015-10-25", "03:11:25", 3L, "1.1", "GET", 2015L, "/linux/doing-pxe-without-dhcp-control", "howto.basjes.nl", 11L, 43L, null, 1445742685000L, null, 24323L, 0L, "HTTP", 0L, 4L, null, null, 1445742685000L, 2015L, "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015L, null, 24323L, "2015-10-25", 0L, "http", "200", "", 11L, null, "", "2015-10-25", null, 25L, null, 25L, "October", "GET", 10L, 0L, 25L, 2015L, 43L, 25L, null, null, 2015L, "/linux/doing-pxe-without-dhcp-control", "195.154.46.135", "", null, "October", "/", "October", 10L, "", null, 25L, "04:11:25", 43L, "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0", 2015L, 0L, "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1", "/", 10L, 25L, "http://howto.basjes.nl/", "http", "October", 24323L, LocalDateTime.parse("2015-10-25T03:11:25"), 0L, "/linux/doing-pxe-without-dhcp-control", null, "2015-10-25", LocalDateTime.parse("2015-10-25T03:11:25"), 0L, 4L, 3L, 25L, null, 2015L, null)
- .go();
- }
-
-
- @Test
- public void testCount() throws Exception {
- String sql = "SELECT COUNT(*) FROM cp.`httpd/hackers-access-small.httpd`";
- long result = client.queryBuilder().sql(sql).singletonLong();
- assertEquals(10, result);
- }
-
- @Test
- public void testStar() throws Exception {
- String sql = "SELECT * FROM cp.`httpd/hackers-access-small.httpd` LIMIT 1";
-
- testBuilder()
- .sqlQuery(sql)
- .unOrdered()
- .baselineColumns("request_referer_ref","request_receive_time_last_time","request_firstline_uri_protocol","request_receive_time_microsecond","request_receive_time_last_microsecond__utc","request_firstline_original_uri_query_$","request_firstline_original_protocol","request_firstline_original_uri_host","request_referer_host","request_receive_time_month__utc","request_receive_time_last_minute","request_firstline_protocol_version","request_receive_time_time__utc","request_referer_last_ref","request_receive_time_last_timezone","request_receive_time_last_weekofweekyear","request_referer_last","request_receive_time_minute","connection_client_host_last","request_receive_time_last_millisecond__utc","request_firstline_original_uri","request_firstline","request_receive_time_nanosecond","request_receive_time_last_millisecond","request_receive_time_day","request_referer_port","request_firstline_original_uri_port","request_receive_time_year","request_receive_time_last_date","request_referer_query_$","request_receive_time_last_time__utc","request_receive_time_last_hour__utc","request_firstline_original_protocol_version","request_firstline_original_method","request_receive_time_last_year__utc","request_firstline_uri","request_referer_last_host","request_receive_time_last_minute__utc","request_receive_time_weekofweekyear","request_firstline_uri_userinfo","request_receive_time_epoch","connection_client_logname","response_body_bytes","request_receive_time_nanosecond__utc","request_firstline_protocol","request_receive_time_microsecond__utc","request_receive_time_hour","request_firstline_uri_host","request_referer_last_port","request_receive_time_last_epoch","request_receive_time_last_weekyear__utc","request_receive_time_weekyear","request_receive_time_timezone","response_body_bytesclf","request_receive_time_last_date__utc","request_useragent_last","request_useragent","request_receive_time_millisecond__utc","request_referer_last_protocol","request_status_last","request_firstline_uri_query","request_receive_time_minute__utc","request_firstline_original_uri_protocol","request_referer_query","request_receive_time_date","request_firstline_uri_port","request_receive_time_last_second__utc","request_referer_last_userinfo","request_receive_time_last_second","request_receive_time_last_monthname__utc","request_firstline_method","request_receive_time_last_month__utc","request_receive_time_millisecond","request_receive_time_day__utc","request_receive_time_year__utc","request_receive_time_weekofweekyear__utc","request_receive_time_second","request_firstline_original_uri_ref","connection_client_logname_last","request_receive_time_last_year","request_firstline_original_uri_path","connection_client_host","request_referer_last_query_$","request_firstline_original_uri_query","request_referer_userinfo","request_receive_time_last_monthname","request_referer_path","request_receive_time_monthname","request_receive_time_last_month","request_referer_last_query","request_firstline_uri_ref","request_receive_time_last_day","request_receive_time_time","request_receive_time_last_weekofweekyear__utc","request_receive_time_last_weekyear","request_receive_time_last_microsecond","request_firstline_original","request_firstline_uri_query_$","request_referer_last_path","request_receive_time_month","request_receive_time_last_day__utc","request_referer","request_referer_protocol","request_receive_time_monthname__utc","response_body_bytes_last","request_receive_time","request_receive_time_last_nanosecond","request_firstline_uri_path","request_firstline_original_uri_userinfo","request_receive_time_date__utc","request_receive_time_last","request_receive_time_last_nanosecond__utc","request_receive_time_last_hour","request_receive_time_hour__utc","request_receive_time_second__utc","connection_client_user_last","request_receive_time_weekyear__utc","connection_client_user")
- .baselineValues(null,"04:11:25",null,0L,0L,new HashMap<>(),"HTTP",null,"howto.basjes.nl",10L,11L,"1.1","03:11:25",null,null,43L,"http://howto.basjes.nl/",11L,"195.154.46.135",0L,"/linux/doing-pxe-without-dhcp-control","GET /linux/doing-pxe-without-dhcp-control HTTP/1.1",0L,0L,25L,null,null,2015L,"2015-10-25",new HashMap<>(),"03:11:25",3L,"1.1","GET",2015L,"/linux/doing-pxe-without-dhcp-control","howto.basjes.nl",11L,43L,null,1445742685000L,null,24323L,0L,"HTTP",0L,4L,null,null,1445742685000L,2015L,2015L,null,24323L,"2015-10-25","Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0","Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0",0L,"http","200","",11L,null,"","2015-10-25",null,25L,null,25L,"October","GET",10L,0L,25L,2015L,43L,25L,null,null,2015L,"/linux/doing-pxe-without-dhcp-control","195.154.46.135",new HashMap<>(),"",null,"October","/","October",10L,"",null,25L,"04:11:25",43L,2015L,0L,"GET /linux/doing-pxe-without-dhcp-control HTTP/1.1",new HashMap<>(),"/",10L,25L,"http://howto.basjes.nl/","http","October",24323L,LocalDateTime.parse("2015-10-25T03:11:25"),0L,"/linux/doing-pxe-without-dhcp-control",null,"2015-10-25",LocalDateTime.parse("2015-10-25T03:11:25"),0L,4L,3L,25L,null,2015L,null)
- .go();
- }
-}
diff --git a/exec/java-exec/src/test/resources/plugins/mock-plugin-upgrade.json b/exec/java-exec/src/test/resources/plugins/mock-plugin-upgrade.json
index ad39fa1e0d6..36b12d5e4bd 100644
--- a/exec/java-exec/src/test/resources/plugins/mock-plugin-upgrade.json
+++ b/exec/java-exec/src/test/resources/plugins/mock-plugin-upgrade.json
@@ -26,11 +26,6 @@
"extensions" : [ "tsv" ],
"delimiter" : "\t"
},
- "httpd" : {
- "type" : "httpd",
- "logFormat" : "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"",
- "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ"
- },
"parquet" : {
"type" : "parquet"
},
@@ -152,11 +147,6 @@
"extensions" : [ "tsv" ],
"delimiter" : "\t"
},
- "httpd" : {
- "type" : "httpd",
- "logFormat" : "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\"",
- "timestampFormat" : "dd/MMM/yyyy:HH:mm:ss ZZ"
- },
"parquet" : {
"type" : "parquet"
},
diff --git a/exec/java-exec/src/test/resources/store/httpd/dfs-test-bootstrap-test.httpd b/exec/java-exec/src/test/resources/store/httpd/dfs-test-bootstrap-test.httpd
deleted file mode 100644
index d48fa12a4b8..00000000000
--- a/exec/java-exec/src/test/resources/store/httpd/dfs-test-bootstrap-test.httpd
+++ /dev/null
@@ -1,5 +0,0 @@
-195.154.46.135 - - [25/Oct/2015:04:11:25 +0100] "GET /linux/doing-pxe-without-dhcp-control HTTP/1.1" 200 24323 "http://howto.basjes.nl/" "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0"
-23.95.237.180 - - [25/Oct/2015:04:11:26 +0100] "GET /join_form HTTP/1.0" 200 11114 "http://howto.basjes.nl/" "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0"
-23.95.237.180 - - [25/Oct/2015:04:11:27 +0100] "POST /join_form HTTP/1.1" 302 9093 "http://howto.basjes.nl/join_form" "Mozilla/5.0 (Windows NT 5.1; rv:35.0) Gecko/20100101 Firefox/35.0"
-158.222.5.157 - - [25/Oct/2015:04:24:31 +0100] "GET /join_form HTTP/1.0" 200 11114 "http://howto.basjes.nl/" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21"
-158.222.5.157 - - [25/Oct/2015:04:24:32 +0100] "POST /join_form HTTP/1.1" 302 9093 "http://howto.basjes.nl/join_form" "Mozilla/5.0 (Windows NT 6.3; WOW64; rv:34.0) Gecko/20100101 Firefox/34.0 AlexaToolbar/alxf-2.21"