Skip to content
This repository was archived by the owner on Aug 20, 2025. It is now read-only.

Conversation

@cestella
Copy link
Member

@cestella cestella commented Dec 19, 2017

Contributor Comments

As a component of a strategy to detect Typosquatting, generating typosquatted domains is necessary. As such, a stellar function which replicates the functionality of dnstwist would be of use.

You can validate this in the REPL via:

{17:10}[system]~/Documents/workspace/metron/fork/incubator-metron:typosquat ✗ ➭ mvn exec:java -Dexec.mainClass="org.apache.metron.stellar.common.shell.StellarShell" -pl metron-platform/metron-common
[INFO] Scanning for projects...
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] Building metron-common 0.4.2
[INFO] ------------------------------------------------------------------------
[INFO]
[INFO] --- exec-maven-plugin:1.5.0:java (default-cli) @ metron-common ---
log4j:WARN No appenders could be found for logger (org.apache.metron.stellar.dsl.functions.resolver.BaseFunctionResolver).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Stellar, Go!
Please note that functions are loading lazily in the background and will be unavailable until loaded fully.
[Stellar]>>> Functions loaded, you may refer to functions now...

[Stellar]>>>
[Stellar]>>> filter := REDUCE( DOMAIN_TYPOSQUAT( 'amazon' ), (s, d) -> BLOOM_ADD(s, d), BLOOM_INIT())
[Stellar]>>> BLOOM_EXISTS( filter, 'amazon')
true
[Stellar]>>> BLOOM_EXISTS( filter, 'google')
false
[Stellar]>>> BLOOM_EXISTS( filter, 'amazoon')
true
[Stellar]>>>

Note: By itself, this is of some interest, but is not a complete solution. I suggest as a follow-on to this, two JIRAs:

  1. the ability through a new mode for the flat-file loader to write out serialized objects (e.g. a bloom filter containing all the typosquatted domains for a CSV of domains)
  2. the ability to take a serialized object from HDFS and load it into memory and return it (e.g. OBJECT_GET(path) (with a cache in front of it)

With these, in conjunction with the stellar function from this PR, we should have the ability to scalably detect typosquatted domains at the enrichment phase:

  1. with the flat file loader, generate a bloom filter containing the typosquatted domains from the set of known good domains
  2. upload to HDFS
  3. As an enrichment:
is_typosquatted := BLOOM_EXISTS(OBJECT_GET('/apps/metron/typosquat/alexa1m.ser', domain))

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?

  • Have you included steps or a guide to how the change may be verified and tested manually?

  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:

    mvn -q clean integration-test install && build_utils/verify_licenses.sh 
    
  • Have you written or updated unit tests and or integration tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

    cd site-book
    mvn site
    

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

Copy link
Contributor

@mmiklavc mmiklavc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I honestly don't have much to say. The strategy pattern seems to fit very nicely here and the function generate function looks simple enough. I might only request a small doc blurb for obvious patterns and a more in depth one for those that are more mysterious. I'm against documenting POJO's and the like just for the sake of documenting (gets stale, redundant, etc.) but I think some brief comments on these strategies would prove useful.

import java.util.HashMap;
import java.util.Map;

public enum Keyboards {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a mapping of common typing mistakes based on keys surrounding the letter that's the hashmap key. Can we add some docs to that effect?

@cestella
Copy link
Member Author

cestella commented Jan 5, 2018

Ok, I added better comments around the various strategies. Let me know if you see anything else.

@mmiklavc
Copy link
Contributor

mmiklavc commented Jan 5, 2018

Nice work, +1

@asfgit asfgit closed this in 0996b73 Jan 8, 2018
iraghumitra pushed a commit to iraghumitra/incubator-metron that referenced this pull request Feb 17, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants