Skip to content
This repository was archived by the owner on Aug 20, 2025. It is now read-only.

Conversation

@nickwallen
Copy link
Contributor

@nickwallen nickwallen commented Jul 25, 2019

This change upgrades all Enrichment and Data Management related components to work with HBase 2.0.2.

  • This PR is for the feature/METRON-2088-support-HDP-3.1 feature branch.

This PR is dependent on the following PRs and the diff will show those changes here until that PR is merged

Changes

  1. Changes the default build profile so that everything is built against HBase 2.0.2. Up until this PR, all changes on the feature/METRON-2088-support-HDP-3.1 feature branch were backwards compatible. I was not able to find a way to make many of these changes backwards compatible.

  2. Replaces all usages of the LegacyHBaseClient with the HBaseClient. The LegacyHBaseClient is not compatible with 2.0.2.

  3. Removed all of the legacy TableProvider and mock HBase related classes. These cannot be upgraded to HBase 2.0.2.

  4. Creates the EnrichmentLookup interface so that different implementations can be swapped in for testing where needed. For example, a FakeEnrichmentLookup allows the Enrichment integration test to function where we are not able to run a live HBase instance.

  5. Updated the TAXII loader to use an HBaseClient.

  6. Updated Streaming Enrichments to use an HBaseClient.

  7. Updated the Enrichment coprocessor to use an HBaseClient. The interfaces provided by HBase for a coprocessor also changed and required an update.

  8. Updated the Stellar functions ENRICHMENT_GET and ENRICHMENT_EXISTS for HBase 2.0.2.

  9. Updated the legacy HBase adapters for HBase 2.0.2.

  10. Removes the LeastRecentlyUsedPruner. This logic is exposed to the user in the script bin/threatintel_bulk_prune.sh. I had some difficulty getting the integration test working and I do not believe this is worth the effort to upgrade. I found almost no documentation around this functionality. I fully expect to initiate more discussion around this. If there is a need for this, I can work further on upgrading it.

  11. To simplify the class structure, LookupKV<KEY_T, VALUE_T> was replaced by EnrichmentResult. The only KEY_T ever used is EnrichmentKey and the only VALUE_T ever used is EnrichmentValue.

Acceptance Testing

Basics

Verify data is flowing through the system, from parsing to indexing

  1. Open Ambari and navigate to the Metron service http://node1:8080/#/main/services/METRON/summary

  2. Open the Alerts UI

  3. Verify alerts show up in the main UI - click the search icon (you may need to wait a moment for them to appear)

  4. Head back to Ambari and select the Kibana service http://node1:8080/#/main/services/KIBANA/summary

  5. Open the Kibana dashboard via the "Metron UI" option in the quick links

  6. Verify the dashboard is populating

Enrichment Coprocessor

  1. Run the following command from the CLI - you should see the coprocessor in the table attributes. Ambari should set this up as part of the MPack installation.

    $ echo "describe 'enrichment'" | hbase shell
    
    Table enrichment is ENABLED
    enrichment, {TABLE_ATTRIBUTES => {coprocessor$1 => 'hdfs://node1:8020/apps/metron/coprocessor/metron-hbase-server-0.7.2-uber.jar|org.apache.metron.hbase.
    coprocessor.EnrichmentCoprocessor||zookeeperUrl=node1:2181'}
    COLUMN FAMILIES DESCRIPTION
    {NAME => 't', VERSIONS => '1', EVICT_BLOCKS_ON_CLOSE => 'false', NEW_VERSION_BEHAVIOR => 'false', KEEP_DELETED_CELLS => 'FALSE', CACHE_DATA_ON_WRITE => '
    false', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', CACHE_INDEX_ON_WRITE => 'fa
    lse', IN_MEMORY => 'false', CACHE_BLOOMS_ON_WRITE => 'false', PREFETCH_BLOCKS_ON_OPEN => 'false', COMPRESSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE
    => '65536'}
    1 row(s)
    Took 5.9128 seconds
    
  2. Before we start adding enrichments, let's verify the enrichment_list table is empty

  3. Go to Swagger

  4. Click the sensor-enrichment-config-controller option.

  5. Click the GET /api/v1/sensor/enrichment/config/list/available/enrichments option.

  6. And finally click the "Try it out!" button. You should see an empty array returned in the response body.

Streaming Enrichments and Enrichment Stellar Functions in the REPL

  1. Create a Streaming Enrichment by following these instructions.

  2. Define the streaming enrichment and save it as a new source of telemetry.

    [Stellar]>>> conf := SHELL_EDIT(conf)
    {
      "parserClassName": "org.apache.metron.parsers.csv.CSVParser",
      "writerClassName": "org.apache.metron.writer.hbase.SimpleHbaseEnrichmentWriter",
      "sensorTopic": "user",
      "parserConfig": {
        "shew.table": "enrichment",
        "shew.cf": "t",
        "shew.keyColumns": "ip",
        "shew.enrichmentType": "user",
        "columns": {
          "user": 0,
          "ip": 1
        }
      }
    }
    [Stellar]>>>
    [Stellar]>>> CONFIG_PUT("PARSER", conf, "user")
    
  3. Go to the Management UI and start the new parser called 'user'.

  4. Create some test telemetry.

    [Stellar]>>> msgs := ["user1,192.168.1.1", "user2,192.168.1.2", "user3,192.168.1.3"]
    [user1,192.168.1.1, user2,192.168.1.2, user3,192.168.1.3]
    [Stellar]>>> KAFKA_PUT("user", msgs)
    3
    [Stellar]>>> KAFKA_PUT("user", msgs)
    3
    [Stellar]>>> KAFKA_PUT("user", msgs)
    3
    
  5. Ensure that the enrichments are persisted in HBase.

    [Stellar]>>> ENRICHMENT_GET('user', '192.168.1.1', 'enrichment', 't')
    {original_string=user1,192.168.1.1, guid=a6caf3c1-2506-4eb7-b33e-7c05b77cd72c, user=user1, timestamp=1551813589399, source.type=user}
    
    [Stellar]>>> ENRICHMENT_GET('user', '192.168.1.2', 'enrichment', 't')
    {original_string=user2,192.168.1.2, guid=49e4b8fa-c797-44f0-b041-cfb47983d54a, user=user2, timestamp=1551813589399, source.type=user}
    
    [Stellar]>>> ENRICHMENT_GET('user', '192.168.1.3', 'enrichment', 't')
    {original_string=user3,192.168.1.3, guid=324149fd-6c4c-42a3-b579-e218c032ea7f, user=user3, timestamp=1551813589402, source.type=user}
    

Load CSV enrichment data

  1. Now, let's perform an enrichment load. We'll do this as the metron user

    su - metron
    source /etc/default/metron
    
  2. Download the Alexa 1m dataset:

    wget http://s3.amazonaws.com/alexa-static/top-1m.csv.zip
    unzip top-1m.csv.zip
    head -10000 top-1m.csv > top-10k.csv
    
  3. Create an extractor.json for the CSV data by editing extractor.json and pasting in these contents:

    {
      "config": {
        "columns": {
          "domain": 1,
          "rank": 0
        },
        "indicator_column": "domain",
        "separator": ",",
        "type": "alexa"
      },
      "extractor": "CSV"
    }
    
  4. Import the data.

    $METRON_HOME/bin/flatfile_loader.sh -i /tmp/top-10k.csv -t enrichment -c t -e ./extractor.json
    # count data written and verify it's 10k
    echo "count 'enrichment'" | hbase shell
    
  5. Validate that the data was loaded. Expect at least 10k records.

    echo "count 'enrichment'" | hbase shell
    

Enrichment Coprocessor

Confirm that the enrichment added in the previous steps were 'found' by the coprocessor.

  1. Go to Swagger

  2. Click the sensor-enrichment-config-controller option.

  3. Click the GET /api/v1/sensor/enrichment/config/list/available/enrichments option.

  4. Click the "Try it out!" button. You should see a array returned with the value of each enrichment type that you have loaded.
    [ "alexa", "user" ]

Enrichment Stellar Functions in Storm

  1. Follow instructions similar to these to load
    the user data.

  2. Create a simple file called user.csv.
    jdoe,192.168.138.2

  3. Create a file called user-extractor.json.

    {
      "config": {
        "columns": {
          "user": 0,
          "ip": 1
        },
        "indicator_column": "ip",
        "separator": ",",
        "type": "user"
      },
      "extractor": "CSV"
    }
    
  4. Import the data.

    $METRON_HOME/bin/flatfile_loader.sh -i ./user.csv -t enrichment -c t -e ./user-extractor.json
    
  5. Enrich the Bro telemetry using the "user" data. Similar to here.

  6. Validate that the enrichment loaded successfully.

    [root@node1 0.7.2]# source /etc/default/metron
    [root@node1 0.7.2]# $METRON_HOME/bin/stellar -z $ZOOKEEPER
    
    [Stellar]>>> ip_dst_addr := "192.168.138.2"
    192.168.138.2
    
    [Stellar]>>> ENRICHMENT_GET('user', ip_dst_addr, 'enrichment', 't')
    {ip=192.168.138.2, user=jdoe}
    
  7. Use the User data to enrich the telemetry. Run the following commands in the REPL.

    [Stellar]>>> bro := SHELL_EDIT()
    {
     "enrichment" : {
       "fieldMap": {
         "geo": ["ip_dst_addr", "ip_src_addr"],
         "host": ["host"],
         "stellar" : {
           "config" : {
             "alexa" : "ENRICHMENT_GET('user', ip_dst_addr, 'enrichment', 't')"
           }
         }
       }
     },
     "threatIntel": {
       "fieldMap": {
         "hbaseThreatIntel": ["ip_src_addr", "ip_dst_addr"]
       },
       "fieldToTypeMap": {
         "ip_src_addr" : ["malicious_ip"],
         "ip_dst_addr" : ["malicious_ip"]
       }
     }
    }
    [Stellar]>>> CONFIG_PUT("ENRICHMENT", bro, "bro")
    
  8. Wait for the new configuration to be picked up by the running topology.

  9. Review the telemetry indexing into Elasticsearch. Look for records where the ip_dst_addr is 192.168.138.2. Ensure that some of the messages have a field called alexa created from this enrichment.

    {
      "_index": "bro_index_2019.08.13.20",
      "_type": "bro_doc",
      "_id": "AWyMxSJFg1bv3MpSt284",
      ...
      "_source": {          
        "ip_dst_addr": "192.168.138.2",
        "ip_src_addr": "192.168.138.158",
        "timestamp": 1565729823979,
        "source:type": "bro",
        "guid": "6778beb4-569d-478f-b1c9-8faaf475ac2f"
        ...
        "alexa:user": "jdoe",
        "alexa:ip": "192.168.138.2",
        ...
      },
      ...
    }
    

Legacy Adapters in Storm

  1. A legacy HBase adapter is used in the default demo telemetry.

  2. Review the telemetry indexed into Elasticsearch. Ensure that additional enrichment fields from the "malicious_ip" data is indexed into Elasticsearch.

Pull Request Checklist

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?
  • Have you included steps or a guide to how the change may be verified and tested manually?
  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
  • Have you written or updated unit tests and or integration tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

…m.google.thirdparty.publicsuffix.PublicSuffixPatterns not being relocated with the rest of Guava.
…ld not initialize class com.github.fge.jackson.JsonLoader
…mentConverterTest failures. Getting family, qualifier, value from Cell works differently in 2.0.2
Copy link
Contributor Author

@nickwallen nickwallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some commentary on these changes.

</relocation>
<relocation>
<pattern>com.google.thirdparty</pattern>
<shadedPattern>org.apache.metron.guava.thirdparty.${guava_version}</shadedPattern>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes an error with SimpleFlatFileSummarizerTest.testWholeFile caused by how we relocate Guava.
We relocate com.google.common which does not include all packages within Guava. It specifically misses com.google.thirdparty.publicsuffix.PublicSuffixPatterns which is the root cause of this test failure.

We may want to ensure that we relocate everything under com.google everywhere, but I didn't want to make such a drastic change just yet. And I know @merrimanr has a PR out that may shuffle what we need to do here.

<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<artifactId>hbase-mapreduce</artifactId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the newer version of HBase, we can pull in hbase-mapreduce rather than all of hbase-server. The hbase-mapreduce package did not exist in older versions of Hbase.

<!-- Test dependencies needed preferentially -->
<dependency>
<groupId>com.github.fge</groupId>
<artifactId>jackson-coreutils</artifactId>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed issue with metron-parsers unit tests: NoClassDefFoundError: Could not initialize class com.github.fge.jackson.JsonLoader

boolean initialized = false;
private static Cache<Table, EnrichmentLookup> enrichmentCollateralCache = CacheBuilder.newBuilder()
.build();
private static Cache<Table, EnrichmentLookup> lookupCache = createCache();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Altering the HBaseClient so that it can support multiple tables with the same connection would make sense here, instead of creating a separate HBaseClient for each Table. This was originally discussed here. There is a similar scenario in the TaxiiHandler.

I'd like to do this work as a follow-on PR, rather than increasing the heft of this PR.

…n.rest.service.impl.SensorEnrichmentConfigServiceImpl required a bean of type 'org.apache.metron.hbase.client.HBaseClient' that could not be found.
@mmiklavc
Copy link
Contributor

I'm currently hunting the source of this issue down for this PR. It's clearly a classpath conflict of some sort, but it will take a bit of sleuthing to find out why the dep is a problem now and wasn't in HDP 2.x.

2019-08-12 22:19:15,038 ERROR [RS_OPEN_REGION-regionserver/node1:16020-3] coprocessor.CoprocessorHost: The coprocessor org.apache.metron.hbase.coprocessor.EnrichmentCoprocessor threw java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/hadoop/hbase/util/CoprocessorClassLoader) previously initiated loading for a different type with name "org/apache/hadoop/conf/Configuration"
java.lang.LinkageError: loader constraint violation: loader (instance of org/apache/hadoop/hbase/util/CoprocessorClassLoader) previously initiated loading for a different type with name "org/apache/hadoop/conf/Configuration"
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:763)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:467)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
        at org.apache.hadoop.hbase.util.CoprocessorClassLoader.loadClass(CoprocessorClassLoader.java:317)
        at org.apache.hadoop.hbase.util.CoprocessorClassLoader.loadClass(CoprocessorClassLoader.java:289)
        at org.apache.metron.hbase.coprocessor.EnrichmentCoprocessor.getZookeeperUrl(EnrichmentCoprocessor.java:151)
        at org.apache.metron.hbase.coprocessor.EnrichmentCoprocessor.createCacheWriter(EnrichmentCoprocessor.java:133)
        at org.apache.metron.hbase.coprocessor.EnrichmentCoprocessor.start(EnrichmentCoprocessor.java:122)
        at org.apache.hadoop.hbase.coprocessor.BaseEnvironment.startup(BaseEnvironment.java:72)
        at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.checkAndLoadInstance(CoprocessorHost.java:263)
        at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:226)
        at org.apache.hadoop.hbase.coprocessor.CoprocessorHost.load(CoprocessorHost.java:185)
        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.loadTableCoprocessors(RegionCoprocessorHost.java:378)        at org.apache.hadoop.hbase.regionserver.RegionCoprocessorHost.<init>(RegionCoprocessorHost.java:274)
        at org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:806)
        at org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:706)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:6845)
        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7042)
        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:7015)
        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6973)
        at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6924)
        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:283)
        at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:104)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

chdir: "{{ metron_build_dir }}"
with_items:
- mvn package -DskipTests -T 2C -P HDP-2.5.0.0,mpack
- mvn package -DskipTests -T 2C -P HDP-3.1,mpack
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How exciting! 🎉

import java.text.SimpleDateFormat;
import java.util.Date;

public class LeastRecentlyUsedPruner {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Removes the LeastRecentlyUsedPruner. This logic is exposed to the user in the script bin/threatintel_bulk_prune.sh. I had some difficulty getting the integration test working and I do not believe this is worth the effort to upgrade. I found almost no documentation around this functionality. I fully expect to initiate more discussion around this. If there is a need for this, I can work further on upgrading it.

I think deprecating an entire feature set probably requires a discussion before we accept work that completely removes it. Typically when we've done this in the past there's been at least a complementary, if not more robust, alternative to the existing functionality.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I completely agree that deprecating a feature merits a community discussion. I'd start a discuss thread about this whole PR if there are any breaking changes.

For the record, I created the feature and would be in favor of deprecating it, but only after a discussion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I can open that discuss thread.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cestella can you answer the discuss around what this actually does? What the goal was?
Nice to see you ;)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice to see you too! I just responded.

@cestella
Copy link
Member

This PR piqued my interest. :)

First off, I'm glad to see we're fixing the use of the deprecated HTables and HBase APIs, so that's fantastic. Thanks for the effort on this @nickwallen .

I will say, however, that I was surprised at the size of a single PR until I looked. This PR seems to be a mix of:

  • Dependency changes for HBase 2.0.2
  • Rearchitecture/code abstraction rewriting (e.g. your point 11)
  • Replacing the deprecated API calls
  • Deprecating features

I have some concerns about this approach. Conflating so many things inside of a PR which is already inherently risky seems to increase the risk multiplicatively. It is also difficult to review, I'd say.

I would suggest that these three separate concerns be split across 3 separate PRs:

  • Replacing deprecated API calls against master
    • This is also a decent time to, instead of replacing the mistake of passing around HTableInterface to create a key-value store abstraction which you can pass around instead that supports scan, get and put.
  • Dependency changes for HBase 2.0.2 against this feature branch
  • Deprecating features
    • As I said in a previous comment, only after a community discussion and I'd strongly suggest it be separate from the upgrade.
  • Rearchitecture/code abstraction rewriting against master after the HDP upgrade has landed.

As I say, please don't take my feedback as an indication that I don't appreciate the work that went into this. There is much good here, but there is just..well..very much here. :)

@nickwallen
Copy link
Contributor Author

I will say, however, that I was surprised at the size of a single PR until I looked... Conflating so many things inside of a PR which is already inherently risky seems to increase the risk multiplicatively. It is also difficult to review, I'd say.

I do agree with you @cestella that this PR is large and I very much prefer small, isolated PRs. The HDP-3.1 upgrade has been a huge effort so far and I've tried to break it down into small, reviewable PRs. Here are all the PRs that we've been able to land so far as part of the upgrade.

I tried to follow that similar pattern prior to opening this PR and ran into some problems. I attempted to open separate PRs for each functional area included here. Something along the lines of...

  • Enrichment Coprocessor
  • Legacy Adapters and Stellar Enrichment functions
  • Data management; TAXII, CSV loaders

But since (1) the changes in the areas listed below are not backwards compatible (unlike all the other preceding PRs listed above) and (2) there are many interdependencies between these areas, I was not able to submit separate PRs that would actually compile.

To help cut the fat and reduce the size of this PR, there are some changes here that I could try to undo or extract into separate PRs. These come to mind immediately

  1. We have a public field accessed in EnrichmentKey. This was changed to getters and it ended up impacting a lot of files.
  2. LookupKV<KEY_T, VALUE_T> -> EnrichmentResult. I actually tried to 'undo' this before submitting this PR and I ran into a problem. I can try to to tackle this again so I can at least describe why this might be needed, if not address it.
  3. The deprecation of "Least Recently Used Pruner" (assuming that is acceptable to the community post-discuss) would have to come before this PR.

Rearchitecture/code abstraction rewriting against..

Do the items I listed above (1) public field access and (2) LookupKV -> EnrichmentResult cover what you mean by rearchitecture? Are there other items that you are thinking of that fall under this heading?

@nickwallen
Copy link
Contributor Author

Replacing deprecated API calls against master

Many of the changes here are not backwards compatible which prevents me from introducing them against master, unfortunately.

This is also a decent time to, instead of replacing the mistake of passing around HTableInterface to create a key-value store abstraction which you can pass around instead that supports scan, get and put.

For the work on this feature branch, the HBaseClient is that common abstraction. It was already in the code base and was used in a few different areas, so I leveraged that. As part of the upgrade, I've tried to port over as many functions as possible to use the HBaseClient. See #1456 for more context.

There are still a few remaining places that don't directly use an HBaseClient, but instead pass around HBase abstractions like Table as they always have. There is just a limit to how much time I can spend on this upgrade. The ones that I have not ported to HBaseClient were ones where the manner in which they are tested did not force my hand in transitioning them to HBaseClient.

For example, we have to integration test "streaming enrichments" against HBase, Kafka and Storm. With our current IT approach, these all have to run in memory. But we cannot run HBase in-memory with Storm and Kafka, which is why the integration tests in master actually use a "mock" HBase instance. In these cases since HBase cannot co-exist in memory, these classes were ported to HBaseClient and so ultimately use a FakeHBaseClientin the IT.

For those that were not ported to HbaseClient, I'd like us to upgrade or deprecate those after this feature branch is complete.

@mmiklavc
Copy link
Contributor

Still trying to wrap my head around this PR and the scope of changes.

Many of the changes here are not backwards compatible which prevents me from introducing them against master, unfortunately.

HBase introduced the new API in earlier versions, prior to deprecation. I don't doubt the problem exists, but I'm having a hard time understanding what we depended on that has 2 incompatible analogs in the newer versions of the HBase API. Can you provide some concrete examples?

@nickwallen
Copy link
Contributor Author

The Coprocessor changes are definitely not backwards compatible. That's what I can remember off-hand. I'd have to try and split this PR apart to see what else breaks to provide more detail.

@nickwallen
Copy link
Contributor Author

It took some work, but I was at least able to extract the core Enrichment components, the HBase adapter and Enrichment functions, into their own separate PR #1482 . That should help reviewers. Please take a look at #1482 and provide feedback. I will continue to work on cutting-up this PR into additional bite-size chunks.

I am going to close-out this PR, since it is not longer needed.

@nickwallen nickwallen closed this Aug 14, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants