Skip to content

Adding bitwise expressions#10230

Closed
kaplanmaxe wants to merge 1 commit intoapache:masterfrom
kaplanmaxe:feature-bitwise-expressions
Closed

Adding bitwise expressions#10230
kaplanmaxe wants to merge 1 commit intoapache:masterfrom
kaplanmaxe:feature-bitwise-expressions

Conversation

@kaplanmaxe
Copy link
Copy Markdown
Contributor

Related to #8560

This feature adds several bitwise expressions to be used on ingestion.

Use Case

Say you're doing CDC and your source table has a column of binary flags. For OLAP queries, it's extremely useful to have these flags extracted out into their own dimensions which can be done now on ingestion via the new expressions.

Example:

  • You might have a column in your source DB called flags that represents a series of binary flags. Bit 1 might mean an order was placed on mobile, bit 2 might mean it was purchased on web, bit 4 might mean it was the users first purchase.
  • Instead of ingesting the raw integer value of the flags column, you can break these columns out into something like mobile_ind, web_ind, first_purchase_ind where your expression can be bitwiseAnd(flags, 1), bitwiseAnd(flags, 2), bitwiseAnd(flags, 4).

Druid SQL currently does not support bitwise operations (#8560) which makes these even more valuable IMO.

Tested on a local cluster

Ingestion spec:

{
  "type": "index_parallel",
  "spec": {
    "ioConfig": {
      "type": "index_parallel",
      "inputSource": {
        "type": "inline",
        "data": "\"x\",\"y\"\n4,2\n8,4\n16,8\n3,2\n5,2"
      },
      "inputFormat": {
        "type": "csv",
        "findColumnsFromHeader": true
      }
    },
    "tuningConfig": {
      "type": "index_parallel",
      "partitionsSpec": {
        "type": "dynamic"
      }
    },
    "dataSchema": {
      "dataSource": "bitwise_test",
      "granularitySpec": {
        "type": "uniform",
        "queryGranularity": "NONE",
        "rollup": false,
        "segmentGranularity": "YEAR"
      },
      "timestampSpec": {
        "column": "!!!_no_such_column_!!!",
        "missingValue": "2010-01-01T00:00:00Z"
      },
      "transformSpec": {
        "transforms": [
          {
            "type": "expression",
            "name": "zBitwiseAnd",
            "expression": "bitwiseAnd(CAST(x, 'LONG'), CAST(y, 'LONG'))"
          },
          {
            "type": "expression",
            "name": "zBitwiseOr",
            "expression": "bitwiseOr(CAST(x, 'LONG'), CAST(y, 'LONG'))"
          },
          {
            "type": "expression",
            "name": "zBitwiseComplement",
            "expression": "bitwiseComplement(CAST(x, 'LONG'))"
          },
          {
            "type": "expression",
            "expression": "bitwiseShiftLeft(CAST(x, 'LONG'), CAST(y, 'LONG'))",
            "name": "zBitwiseShiftLeft"
          },
          {
            "type": "expression",
            "name": "zBitwiseShiftRight",
            "expression": "bitwiseShiftRight(CAST(x, 'LONG'), CAST(y, 'LONG'))"
          },
          {
            "type": "expression",
            "name": "zBitwiseXor",
            "expression": "bitwiseXor(CAST(x, 'LONG'), CAST(y, 'LONG'))"
          }
        ]
      },
      "dimensionsSpec": {
        "dimensions": [
          {
            "type": "long",
            "name": "x"
          },
          {
            "type": "long",
            "name": "y"
          },
          {
            "type": "long",
            "name": "zBitwiseAnd"
          },
          {
            "type": "long",
            "name": "zBitwiseComplement"
          },
          {
            "type": "long",
            "name": "zBitwiseOr"
          },
          {
            "type": "long",
            "name": "zBitwiseShiftLeft"
          },
          {
            "type": "long",
            "name": "zBitwiseShiftRight"
          },
          {
            "type": "long",
            "name": "zBitwiseXor"
          }
        ]
      }
    }
  }
}

Screenshot from 2020-08-01 18-27-54


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • been tested in a test Druid cluster.

@kaplanmaxe kaplanmaxe changed the title Adding bitwise operation expressions Adding bitwise expressions Aug 1, 2020
Comment thread docs/misc/math-expr.md
|atan|atan(x) would return the arc tangent of x|
|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y|
|bitwiseComplement|bitwiseComplement(x) would return the result of ~x|
|bitwiseOr|bitwiseOr(x,y) would return the result of x [PIPE] y |
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added [PIPE] as I had trouble escaping | within the table. Can play with it though if necessary

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

| should work (it's done elsewhere in the file, for logical OR).

@kaplanmaxe
Copy link
Copy Markdown
Contributor Author

Looks like the CI failed on spellcheck of the expression names. Let me know if you want me to change the names

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Aug 7, 2020

Looks like the CI failed on spellcheck of the expression names. Let me know if you want me to change the names

Generally you'd just add them to the spell checker exclusions file: website/.spelling.

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Aug 7, 2020

@kaplanmaxe thank you for the contribution!

There's another patch with a similar feature: #10084. I prefer the approach you took in this patch — of defining named functions — because it will be easier to integrate into the SQL layer. It is pretty quick to write a SQL operator that is backed by a native expr function. You also have a few more functions added (like bit shifts), which is great.

Some comments:

  1. Please add some tests. You could start by copying the tests from add bitwise and, or, negate expressions #10084 and then adding new ones for the additional functions you have. That patch has a pretty good range of tests for a variety of cases. It's a bit more complicated because it also modified the grammar. Your tests would mostly be going in "FunctionTest".
  2. The functions should be named like other native expr functions, using snake_case rather than camelCase.
  3. If you're interested in adding SQL functions too, or as a follow up, check out IPv4AddressMatchOperatorConversion for an example of how to implement a SQL function that maps cleanly onto a native expr function. It is mostly just declaring the proper type and operand checking info for the SQL layer.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall 👍

I think you'll need a bit more to fully handle null valued arguments, looking at the code I think it would probably be best to modify BivariateMathFunction to return null if either argument isNumericNull to match UnivariateMathFunction, but knowing the tests there are likely places where this will break things, but maybe we will get lucky on this one.

Agree with @gianm that borrowing/adapting tests from #10084 would be nice, plus add more for the functions that were not implemented as operators in that PR.

@kaplanmaxe
Copy link
Copy Markdown
Contributor Author

Thank you @gianm and @clintropolis for your very detailed reviews and feedback. As a first time contributor and someone unfamiliar with the code base, this was extremely helpful.

I will update this over the weekend likely. Unfortunately, i am effected by a major power outage atm, but hope it comes back soon. Druid has been amazing for our use case, and I'm really excited this is conceptually approved as it's the biggest thing affecting us atm. Expect to see some revisions soon within the next few days!

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Aug 12, 2020

Looking forward to it, @kaplanmaxe.

@stale
Copy link
Copy Markdown

stale Bot commented Oct 11, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale Bot added the stale label Oct 11, 2020
@teyeheimans
Copy link
Copy Markdown

I would love to see this functionality in druid. Any chance this could be fixed/merged?

@stale
Copy link
Copy Markdown

stale Bot commented Oct 13, 2020

This issue is no longer marked as stale.

@stale stale Bot removed the stale label Oct 13, 2020
@kaplanmaxe
Copy link
Copy Markdown
Contributor Author

Hey guys. Sorry I haven't responded recently.

I'm just going to be honest, our team found a workaround for this one, and I likely don't have the capacity to finish this up. I know this feature is going to be really valuable to end users, and I don't mind if someone else picks up this PR and finishes it up. Really sorry about that, but just want to be honest with intentions so you guys aren't waiting on me to pick it up. In the end, just want what is best for Druid

@clintropolis
Copy link
Copy Markdown
Member

I'm just going to be honest, our team found a workaround for this one, and I likely don't have the capacity to finish this up. I know this feature is going to be really valuable to end users, and I don't mind if someone else picks up this PR and finishes it up. Really sorry about that, but just want to be honest with intentions so you guys aren't waiting on me to pick it up. In the end, just want what is best for Druid

@kaplanmaxe I should be able to pick this up sometime soon and get it finished up so we can get this feature in, thanks!

@stale
Copy link
Copy Markdown

stale Bot commented Dec 25, 2020

This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 4 weeks if no further activity occurs. If you think that's incorrect or this pull request should instead be reviewed, please simply write any comment. Even if closed, you can still revive the PR at any time or discuss it on the dev@druid.apache.org list. Thank you for your contributions.

@stale stale Bot added the stale label Dec 25, 2020
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jan 8, 2021

Let's keep this one open, stalebot.

@stale
Copy link
Copy Markdown

stale Bot commented Jan 8, 2021

This issue is no longer marked as stale.

@stale stale Bot removed the stale label Jan 8, 2021
@clintropolis
Copy link
Copy Markdown
Member

Ah, this one can be closed actually, i picked up the commit from here and added tests in #10605

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants