Skip to content
This repository was archived by the owner on Aug 20, 2025. It is now read-only.
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
64a2fc6
save some work and notes
mmiklavc Jan 25, 2017
a6a6ab6
Extraction done
mmiklavc Jan 27, 2017
47d814e
Multithreading the SimpleEnrichmentFlatFileLoader
cestella Jan 27, 2017
918d4ce
doc changes.
cestella Jan 27, 2017
c6ca3a8
Updating docs.
cestella Jan 27, 2017
8c9a79c
Investigating integration tests.
cestella Jan 28, 2017
315bd18
Update integration test to be a proper integration test.
cestella Jan 28, 2017
004c6f4
Adding spliterator unit test for completeness
cestella Jan 28, 2017
f8dd48e
Updating test to use a proper file
cestella Jan 28, 2017
9b04f97
Updating docs and renaming a few things.
cestella Jan 28, 2017
eb5b82c
Update one more test case.
cestella Jan 28, 2017
81c42af
partial commit - adding additional filter and transform for indicator
mmiklavc Jan 30, 2017
310c98b
Merge branch 'master' into unified_loader
cestella Jan 30, 2017
3f6e3ba
Updating simple enrichment flat file loader to be complete.
cestella Jan 31, 2017
2bdaf41
Merge branch 'master' into top-domains
mmiklavc Jan 31, 2017
79cfdb4
Removing old threatintel_bulk_load.sh script and integrating into the…
cestella Jan 31, 2017
bf7756b
Forgot licenses.
cestella Jan 31, 2017
e5729a2
Merge with master. Get indicator transforms and filter working
mmiklavc Feb 1, 2017
a104f46
updating script.
cestella Feb 1, 2017
b121e13
Merge branch 'master' into unified_loader
cestella Feb 1, 2017
b5a9e5a
Added gzip and zip to regular files
cestella Feb 1, 2017
323267d
Fixed stupid zip issue.
cestella Feb 1, 2017
bc26b5b
Updating readme and making progress bar optional and better.
cestella Feb 1, 2017
6cdf35d
updating tests to include gzip and zip
cestella Feb 1, 2017
fd718bf
Refactor
mmiklavc Feb 1, 2017
d24f0c9
Get unit test for extractor decorator working
mmiklavc Feb 2, 2017
d9bb54e
Add negative test cases. Refactor options as enum in extractor decorator
mmiklavc Feb 2, 2017
43c09c8
Intermediate commit - need to fetch from PR432
mmiklavc Feb 3, 2017
eafc786
Get integration tests for flat file loader working with my branch. Fi…
mmiklavc Feb 3, 2017
ad1aef7
Get integration tests working for Stellar transformations in the file…
mmiklavc Feb 3, 2017
799811c
Reacted to @mmiklavcic
cestella Feb 3, 2017
d25dbc5
Shaving off seconds for the integration tests.
cestella Feb 6, 2017
c0b275b
whoops, missed one.
cestella Feb 6, 2017
169e442
Merge remote-tracking branch 'cestella/unified_loader' into top-domai…
mmiklavc Feb 6, 2017
c27b783
Add license headers to new files
mmiklavc Feb 6, 2017
b73339f
Add README info for loader Stellar transformations. Add integration t…
mmiklavc Feb 7, 2017
aca8e63
Fix merge conflicts with master
mmiklavc Feb 8, 2017
b5cc03d
Make extractortest happy
mmiklavc Feb 8, 2017
03daced
Fix some issues and suggestions
mmiklavc Feb 8, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ public void toTldTest_unknowntld() {

@Test
public void removeTldTest() {
runWithArguments("DOMAIN_REMOVE_TLD", "google.com", "google");
runWithArguments("DOMAIN_REMOVE_TLD", "www.google.co.uk", "www.google");
runWithArguments("DOMAIN_REMOVE_TLD", "www.google.com", "www.google");
runWithArguments("DOMAIN_REMOVE_TLD", "com", "");
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,11 +22,13 @@
import org.junit.Test;

public class ConversionUtilsTest {

@Test
public void testIntegerConversions() {
Object o = 1;
Assert.assertEquals(Integer.valueOf(1), ConversionUtils.convert(o, Integer.class));
Assert.assertEquals(Integer.valueOf(1), ConversionUtils.convert("1", Integer.class));
Assert.assertNull(ConversionUtils.convert("foo", Integer.class));
}

}
81 changes: 80 additions & 1 deletion metron-platform/metron-data-management/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ for the value will be 'meta'. For instance, given an input string of `123.45.12
would be extracted:
* Indicator : `123.45.123.12`
* Type : `malicious_ip`
* Value : `{ "source" : "the grapevine" }`
* Value : `{ "ip" : "123.45.123.12", "source" : "the grapevine" }`

### STIX Extractor

Expand Down Expand Up @@ -127,6 +127,85 @@ addresses from the set of all possible addresses. Note that if no categories ar
Also, only address and domain types allow filtering via `stix_address_categories` and `stix_domain_categories` config
parameters.

### Common Extractor Properties

Users also have the ability to transform and filter enrichment and threat intel data using Stellar as it is loaded into HBase. This feature is available to all extractor types.

As an example, we will be providing a CSV list of top domains as an enrichment and filtering the value metadata, as well as the indicator column, with Stellar expressions.

````
{
"config" : {
"zk_quorum" : "node1:2181",
"columns" : {
"rank" : 0,
"domain" : 1
},
"value_transform" : {
"domain" : "DOMAIN_REMOVE_TLD(domain)"
},
"value_filter" : "LENGTH(domain) > 0",
"indicator_column" : "domain",
"indicator_transform" : {
"indicator" : "DOMAIN_REMOVE_TLD(indicator)"
},
"indicator_filter" : "LENGTH(indicator) > 0",
"type" : "top_domains",
"separator" : ","
},
"extractor" : "CSV"
}
````

There are 2 property maps that work with full Stellar expressions, and 2 properties that will work with Stellar predicates.

| Property | Description |
|---------------------|-------------|
| value_transform | Transform fields defined in the "columns" mapping with Stellar transformations. New keys introduced in the transform will be added to the key metadata. |
| value_filter | Allows additional filtering with Stellar predicates based on results from the value transformations. In this example, records whose domain property is empty after removing the TLD will be omitted. |
| indicator_transform | Transform the indicator column independent of the value transformations. You can refer to the original indicator value by using "indicator" as the variable name, as shown in the example above. In addition, if you prefer to piggyback your transformations, you can refer to the variable "domain", which will allow your indicator transforms to inherit transformations done to this value during the value transformations. |
| indicator_filter | Allows additional filtering with Stellar predicates based on results from the value transformations. In this example, records whose indicator value is empty after removing the TLD will be omitted. |

top-list.csv
````
1,google.com
2,youtube.com
...
````

Running a file import with the above data and extractor configuration would result in the following 2 extracted data records:

| Indicator | Type | Value |
|-----------|------|-------|
| google | top_domains | { "rank" : "1", "domain" : "google" } |
| yahoo | top_domains | { "rank" : "2", "domain" : "yahoo" } |

Similar to the parser framework, providing a Zookeeper quorum via the zk_quorum property will enable Stellar to access properties that reside in the global config.
Expanding on our example above, if the global config looks as follows:
````
{
"global_property" : "metron-ftw"
}
````

And we expand our value_tranform:
````
...
"value_transform" : {
"domain" : "DOMAIN_REMOVE_TLD(domain)",
"a-new-prop" : "global_property"
},
...

````

The resulting value data would look like the following:

| Indicator | Type | Value |
|-----------|------|-------|
| google | top_domains | { "rank" : "1", "domain" : "google", "a-new-prop" : "metron-ftw" } |
| yahoo | top_domains | { "rank" : "2", "domain" : "yahoo", "a-new-prop" : "metron-ftw" } |

## Enrichment Config

In order to automatically add new enrichment and threat intel types to existing, running enrichment topologies, you will
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
/**
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.metron.dataloads.extractor;

import org.apache.metron.enrichment.lookup.LookupKV;

import java.io.IOException;
import java.util.Map;

public class ExtractorDecorator implements Extractor {

protected final Extractor decoratedExtractor;

public ExtractorDecorator(Extractor decoratedExtractor) {
this.decoratedExtractor = decoratedExtractor;
}

@Override
public Iterable<LookupKV> extract(String line) throws IOException {
return decoratedExtractor.extract(line);
}

@Override
public void initialize(Map<String, Object> config) {
decoratedExtractor.initialize(config);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -26,55 +26,77 @@
import java.io.InputStream;
import java.lang.reflect.InvocationTargetException;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.Map;

public class ExtractorHandler {
final static ObjectMapper _mapper = new ObjectMapper();
private Map<String, Object> config;
private Extractor extractor;
private InputFormatHandler inputFormat = Formats.BY_LINE;
final static ObjectMapper _mapper = new ObjectMapper();
private Map<String, Object> config;
private Extractor extractor;
private InputFormatHandler inputFormat = Formats.BY_LINE;

public Map<String, Object> getConfig() {
return config;
}
public Map<String, Object> getConfig() {
return config;
}

public void setConfig(Map<String, Object> config) {
this.config = config;
}
/**
* Set by jackson. Extractor configuration from JSON
*/
public void setConfig(Map<String, Object> config) {
this.config = config;
}

public InputFormatHandler getInputFormat() {
return inputFormat;
}
public InputFormatHandler getInputFormat() {
return inputFormat;
}

public void setInputFormat(String handler) {
try {
this.inputFormat= Formats.create(handler);
} catch (ClassNotFoundException | InstantiationException | IllegalAccessException | NoSuchMethodException | InvocationTargetException e) {
throw new IllegalStateException("Unable to create an inputformathandler", e);
}
/**
* Set by jackson
*/
public void setInputFormat(String handler) {
try {
this.inputFormat = Formats.create(handler);
} catch (ClassNotFoundException | InstantiationException | IllegalAccessException | NoSuchMethodException | InvocationTargetException e) {
throw new IllegalStateException("Unable to create an inputformathandler", e);
}
}

public Extractor getExtractor() {
return extractor;
}
public void setExtractor(String extractor) {
try {
this.extractor = Extractors.create(extractor);
} catch (ClassNotFoundException | IllegalAccessException | InstantiationException | NoSuchMethodException | InvocationTargetException e) {
throw new IllegalStateException("Unable to create an extractor", e);
}
}
public Extractor getExtractor() {
return extractor;
}

public static synchronized ExtractorHandler load(InputStream is) throws IOException {
ExtractorHandler ret = _mapper.readValue(is, ExtractorHandler.class);
ret.getExtractor().initialize(ret.getConfig());
return ret;
}
public static synchronized ExtractorHandler load(String s, Charset c) throws IOException {
return load( new ByteArrayInputStream(s.getBytes(c)));
}
public static synchronized ExtractorHandler load(String s) throws IOException {
return load( s, Charset.defaultCharset());
/**
* Set by jackson.
*
* @param extractor Name of extractor to instantiate
*/
public void setExtractor(String extractor) {
try {
this.extractor = Extractors.create(extractor);
} catch (ClassNotFoundException | IllegalAccessException | InstantiationException | NoSuchMethodException | InvocationTargetException e) {
throw new IllegalStateException("Unable to create an extractor", e);
}
}

/**
* Load json configuration
*/
public static synchronized ExtractorHandler load(InputStream is) throws IOException {
ExtractorHandler ret = _mapper.readValue(is, ExtractorHandler.class);
ret.getExtractor().initialize(ret.getConfig());
return ret;
}

/**
* Load json configuration
*/
public static synchronized ExtractorHandler load(String s, Charset c) throws IOException {
return load(new ByteArrayInputStream(s.getBytes(c)));
}

/**
* Load json configuration
*/
public static synchronized ExtractorHandler load(String s) throws IOException {
return load(s, Charset.defaultCharset());
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@
import org.apache.metron.dataloads.extractor.stix.StixExtractor;

import java.lang.reflect.InvocationTargetException;
import java.util.Map;

public enum Extractors implements ExtractorCreator {
CSV(new ExtractorCreator() {
Expand Down Expand Up @@ -49,11 +48,11 @@ public Extractor create() {
public static Extractor create(String extractorName) throws ClassNotFoundException, IllegalAccessException, InstantiationException, NoSuchMethodException, InvocationTargetException {
try {
ExtractorCreator ec = Extractors.valueOf(extractorName);
return ec.create();
return new TransformFilterExtractorDecorator(ec.create());
}
catch(IllegalArgumentException iae) {
Extractor ex = (Extractor) Class.forName(extractorName).getConstructor().newInstance();
return ex;
return new TransformFilterExtractorDecorator(ex);
}
}
}
Loading