Skip to content
This repository was archived by the owner on Aug 20, 2025. It is now read-only.

Conversation

@ottobackwards
Copy link
Contributor

@ottobackwards ottobackwards commented Aug 6, 2017

This PR adds support for Assignment to the Stellar Language. It is dependent on METRON-711 before it can be merged.

I still need to do the doc work, and more testing.

Concept

  • Allow for assignment of values to variables only
  • Pre and Post increment and decrement for variables
  • Assignment both returns the assigned value AND updates the variable
  • VariableResolver implementation may or may not choose to support updates
  • The Shell handles all assignments ( foo := 'val' ) as ("foo = 'val'"). Exposing the whole operation to Stellar allows catching illegal values and grammar
  • The LambdaExpression does attempt to update variables, but this would have effect if the state was using a VariableResolver instance that supported updates AND the variable was not a 'lambda' variable.
 String expr  = "MAP([ foo, bar, baz ], (item) -> count *= 2  )";
      Map<String,Object> map = new HashMap<String,Object>(){{
        put("foo",1);
        put("bar",1);
        put("baz",1);
        put("count", 1);
      }};
      for(int i = 0 ; i < 5; i++){
        run(expr, map);
      }
      Assert.assertEquals(new DecimalFormat("#").format(Math.pow(2,15)), map.get("count").toString());

Support for:
= , +=, -=, *=, /=, ++var, --var, var++, var--

Side effects:

This is opt-in for the resolvers. The MapVariableResolver does NOT support update, so there is no change to Enrichment, Parsers, Profile, and Shell ( beyond what has been mentioned ).

Testing:

See BasicStellarTest:test*Assign() for some usages.

  • Existing profiles, enrichments, parser transforms would work
  • Run different statements in the shell ( including those that would have worked before like = := 6.
  • more to come

NOTE

this is like the rebound relationship, the PR that leads to the PR that actually lands
this issues around the variable resolvers and such will dictate a different approach

This PR is to get the concept as a proposal out for review and start the discussion. I will start a
discuss thread as well.


Note: This PR resolves METRON-693 : Stellar REPL lets reserved tokens be used as unusable variable names as well.

Pull Request Checklist

Thank you for submitting a contribution to Apache Metron.
Please refer to our Development Guidelines for the complete guide to follow for contributions.
Please refer also to our Build Verification Guidelines for complete smoke testing guides.

In order to streamline the review of the contribution we ask you follow these guidelines and ask you to double check the following:

For all changes:

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?

and METRON-711

For code changes:

  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?

  • Have you included steps or a guide to how the change may be verified and tested manually?

  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:

    mvn -q clean integration-test install && build_utils/verify_licenses.sh 
    
  • Have you written or updated unit tests and or integration tests to verify your changes?

  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?

  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

For documentation related changes:

  • Have you ensured that format looks appropriate for the output in which it is rendered by building and verifying the site-book? If not then run the following commands and the verify changes via site-book/target/site/index.html:

    cd site-book
    mvn site
    

Note:

Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible.
It is also recommended that travis-ci is set up for your personal repository such that your branches are built there before submitting a pull request.

@ottobackwards
Copy link
Contributor Author

Added pre and post increment and decrement

@ottobackwards ottobackwards changed the title METRON-1090 [NO MERGE UNTIL METRON-711] Add Assignment to Stellar Language METRON-1090 Add Assignment to Stellar Language Aug 17, 2017
@JonZeolla
Copy link
Member

Is this ready for review? If so, can you deconflict?

@ottobackwards
Copy link
Contributor Author

should be set @JonZeolla

@ottobackwards
Copy link
Contributor Author

It is fair to say that this PR is more for conceptual review than anything else.

@ottobackwards
Copy link
Contributor Author

@cestella

@ottobackwards
Copy link
Contributor Author

Reviewer Note: Would a default method on the interface be better for update()?

@cestella
Copy link
Member

Hmm, so this is an interesting one. We sorta have a de facto assignment outside of stellar, but I'd be interested in having it moved into stellar like this. I'd also be sorta ok with it being =, but we chose := earlier, so that is an open question as to whether it should be consistent.

Also, it seems like processing a stellar statement should return 2 things now:

  1. the output value
  2. optionally a variable assigned

We should also refactor the StellarAssignment class to use this somehow. Other than that, this seems extremely promising!

@cestella
Copy link
Member

I would like to mention that I'm actually much more open minded to moving to = than my text may seem. I just want the conversation, of course :)

@ottobackwards
Copy link
Contributor Author

I just removed the conflicts and changed StellarAssignment to account for this

@ottobackwards
Copy link
Contributor Author

The question for me @cestella is the resolvers, and the consequences of having them support updating etc. I left it so that they don't by default because I didn't understand all the consequences and thought we could think it through

@cestella
Copy link
Member

Yeah, I'm ok with them not updating at the moment (though I think that's where we may want to go). I think a stop-gap is to just be able to pass back the variable assigned and the value from a stellar statement. We manage updating the resolvers out of band in the existing code.

@ottobackwards
Copy link
Contributor Author

ottobackwards commented Jan 18, 2018

@cestella also: := is in the shell not the language. The mechanics of what is LANG and what is MACHINE is part of the issue.

@cestella
Copy link
Member

well, := is in the shell and also in the enrichment configuration (see here. Seems like if we're promoting this to the language, we should make things consistent in one way or the other.

@ottobackwards
Copy link
Contributor Author

POV issue I think. Stellar has the LANGUAGE ( grammar ), the default implementation, and the 'host' execution mechanics.

:= is part of the host execution mechanics.

The issue may not be being the same as much as the necessity for := as a host construct at all if things are done inside stellar.

Does that make sense?

@cestella
Copy link
Member

Agreed, this is a POV issue. Though from the user perspective they aren't distinguishing between "this is in the REPL" vs "this is in the enrichment topology", so I think we do need to manage the experience if we're moving functionality from the MACHINE to the LANGUAGE to ensure it's consistent.

Beyond that, it's unclear whether we need an resolver update function as part of THIS PR, but I think we very much will need it if we decide to make stellar multi-expression. I suspect we should punt on it until that effort.

@ottobackwards
Copy link
Contributor Author

By punt you mean leave this implementation that doesn't implement update in the resolvers?

@ottobackwards
Copy link
Contributor Author

also, we aren't consistent, in some places ( transformations ) we use the configuration logical structure to imply assignment, in enrichment it is :=, but those are configuration mechanisms not stellar mechanisms

@nickwallen
Copy link
Contributor

I have not looked at the code yet, but wanted to comment on the assignment discussion. IMO focusing on user experience is the way to go. To that end...

  • We should use = because that is just more obvious.
  • We should remove := from the REPL environment (very easy, just delete AssignmentCommand).
  • All references in the docs to := need to be changed to =. (Ugh)

All of that can happen in this PR or as multiple PRs (however @ottobackwards sees fit), but that is the right end state.

Alternative Option: The only other reasonable option IMO is to support both = and := (for some undetermined period of time.) The only snafu is that both should have the same implementation. So support for := would have to be added to the 'language' and we should NOT rely on the AssignmentCommand implementation in the REPL.

I am mildly against the "Alternative Option", but wanted to offer it up for discussion.

@cestella
Copy link
Member

Yeah, I'm totally cool with going with =, but I will point out one snag. What about users with existing stellar enrichments using :=, would we provide a migration path beyond saying, "Hey, move your stuff over" in the Upgrading docs?

Migration, sadly sometimes, is also part of user experience. :)

@nickwallen
Copy link
Contributor

@cestella Migration, sadly sometimes, is also part of user experience. :)

Very true. That's why I offered up the "Alternative Option".

Only my own laziness makes me "mildly against" that :) . But based on user experience (like you mention), I think we should probably do it.

@cestella
Copy link
Member

hah, well @nickwallen, laziness is often a good trait in a developer. Honestly, I am ok with not migrating here in favor of more intuitive approaches. I chose := because I needed the functionality and worried about making the language more complex (there's something different about a language that has assignments versus an expression language. I wanted to keep it an expression language at the time.), so I was forced to choose a character sequence that didn't exist as a lexable token. Now, honestly, we're already halfway down that road, so probably the right thing to do is deprecate it and rip the bandaid fast. I honestly could be convinced either way.

@nickwallen
Copy link
Contributor

@cestella: What about users with existing stellar enrichments using := ...

Oh, and one clarification. No users will have production code (like an enrichment) that uses :=. That has only ever been available in the REPL. So that should not be a concern, right? The only concern would be documented uses of the REPL itself; like READMEs.

@ottobackwards
Copy link
Contributor Author

ottobackwards commented Jan 18, 2018

@cestella @nickwallen
I have changed stellar assignment to cover the := already. I think it will continue to work.
89f682f

@cestella
Copy link
Member

@nickwallen No, that's not correct. Stellar enrichments can contain :=. Consider the enrichments here in one of our use-cases. Enrichments can either take a map of enrichments or a list of strings. If you pass a list of strings (which you would need to do to create temporary variables that you then remove before passing through, like in the example), you'd use an assignment, which is currently :=.

@mattf-apache
Copy link
Member

As a professed language nut, when you mention the word "variable", I think "scope". And it seems to me that by placing the responsibility for variable assignment/update in the VariableResolver, you've actually tied into a very natural definition of scope given the way MapVariableResolver already works in association with a target dictionary or stack of dictionaries.

A. I think you should explicitly discuss scope in the documentation. There are several scopes already being used in practice, and I think users need to understand how it all works together, otherwise they will misunderstand and be bewildered. I am, no doubt, a good example, so dense that I have to ask the questions below :-)

B. I don't understand why you don't go ahead and enable update in MapVariableResolver, since IIRC that's the resolver for essentially everything interesting that uses variables in Stellar.

C. Do I understand correctly from the patch, that variable assignment IS enabled in lambda expressions? Presumably the scope will be the same instance dictionary where the input variables are stored, thus making them local, correct? Can new variables be created as well as input variables be updated?

D. The following scopes come to mind, with associated need for clarification:

  • Lambdas save their input variables in their ephemeral instance context, initialized by call syntax.
    • Comment: questions already asked above
  • The REPL used one global dict for scope of its variables, defined by ':='.
    • Comment: If I understand correctly, when in the REPL, '=' and ':=' assignments will go to the same global context dictionary, is that correct?
  • Profilers, transformations, and several other xml-defined Stellar-using features, have a persistent instance context, with variables initialized and updated via an xml syntax.
    • Comment: Apparently here we've specifically exclude assignment, even though this seems to be where it's needed most. Why is that? The instance context seems ready-made for local variables.

@mattf-apache
Copy link
Member

So a bunch more comments came in while I was writing the above. Nick's comment about enrichments using assignment already of course modifies my comment about Profilers, etc. Just making it uniform would give users a lot less to wonder about.

As for := vs =, it's not so terrible to have both as exact synonyms. This gives you both your preferred aesthetics and backward compatibility.

@nickwallen
Copy link
Contributor

nickwallen commented Jan 18, 2018

@cestella: No, that's not correct. Stellar enrichments can contain :=. Consider the enrichments here in one of our use-cases. Enrichments can either take a map of enrichments

Thanks for clarifying, but 'ugh'. I'll have to look at how that is implemented. I didn't realize we had a separate implementation of assignment. Does this PR address that? Shouldn't we have one implementation of the assignment mechanism?

Good commentary from @mattf-horton . I knew there was a reason why I didn't want to touch this. :)

@ottobackwards
Copy link
Contributor Author

OK,

The others can comment, but I think proper scope would be implemented within the execution of the call stack, not arbitrarily by the resolvers. Each resolver should be a 'scope' and new scopes should be created.

A. Because it is a resolver thing, where would you document it in a way that makes sense? I have been thinking that we should refactor to explicitly support scoping in the processor/evaluations and then have the resolvers used in scope as scope implementations. So the nesting of scopes would not be done in the resolver.

B. As noted in the PR description, I was hoping for discussion ( THIS discussion as a matter of fact ). Having had some issues with my other PRs and scope, I chose to explicitly NOT go that far until we talked it through. I did not feel all the way back then that I had enough of a grasp of the different resolvers to change their behavior and understand the consequences either way.

C. The LambaExpression actually delegates to the state variable resolver:

  VariableResolver variableResolver = new DefaultVariableResolver(
        variable -> lambdaVariables.getOrDefault(variable
            , state.variableResolver.resolve(variable)
        ), variable -> true,
        (variable, value) -> {
          if (state.variableResolver.exists(variable)) {
            state.variableResolver.update(variable, value);
          }
        });

D.

  • I think I answered above.
  • I think so, maybe @cestella or @nickwallen can answer that
  • Again, I stopped short until I got some kind of feedback.
  • I have changed StellarAssignment, I don't think we need both = and :=, and if there are := floating around they will work, unless we remove StellarAssignment

@mattf-apache
Copy link
Member

@ottobackwards And yet this discussion primarily relates to whether the assignment operation as defined in Enrichment should be unified with the assignment operation being added to core stellar, presumably due to user demand.

@cestella
Copy link
Member

So, I was wrong when I initially looked at this PR, this adds variable updates to the resolver. Should we perhaps separate parsing assignments (e.g. move StellarAssignment.from into the language like you've done here) from actually updating the variables. I think I'd like to see Enrichments and the REPL refactored to not hand-code the variable update logic when we get around to updating the resolver to allow updates. Thoughts?

@ottobackwards
Copy link
Contributor Author

@mattf-horton if by 'due to user demand' you mean to work around the fact that assignment was not in the language in the first place, ok.

@ottobackwards
Copy link
Contributor Author

Bump

@nickwallen
Copy link
Contributor

@cestella Should we perhaps separate parsing assignments (e.g. move StellarAssignment.from into the language like you've done here) from actually updating the variables.

@ottobackwards I didn't see you respond to this. I would agree with @cestella on this.

@ottobackwards
Copy link
Contributor Author

@nickwallen @cestella @mattf-horton , Sorry, I thought that question was more to @mattf-horton at the time.

@cestella I'm not sure I understand what you are saying. Are you saying that we should not do assignment until we overhaul the current mechanisms? That this is too much a 1/2 measure?

@ottobackwards
Copy link
Contributor Author

This is pretty old at this point, and there has been a lot of discussion all over the place about it. I'm not sure I can sum up where we are. @cestella can you sum up your current feeling on how we can move this forward?

@ottobackwards
Copy link
Contributor Author

Bump @cestella

@cestella
Copy link
Member

cestella commented May 9, 2018

Sorry, @ottobackwards this got lost in the shuffle. What I was saying was that I think we should not adjust the VariableResolver infrastructure to update variables as the result of assignments, but rather return back the variable and the updated object. For instance, in the BaseStellarProcessor class we have a parse method that returns the output of the expression. I'd suggest we have a parseExpression and parseAssignment where the first acts as it does now and the second returns the variable name and the result. As a follow-on, I think we can adjust the variableresolver infrastructure to support variable updates.

I guess ultimately, I suggest a tactical solution that leads us to the right direction rather than trying to fix it all at once.

Thoughts?

@ottobackwards
Copy link
Contributor Author

I don't quite get what you are saying. I can't wrap my head around it. If you look at the example from the PR description under concept, that doesn't work without the resolver being updated.

I have to be missing something

@ottobackwards ottobackwards changed the title METRON-1090 Add Assignment to Stellar Language METRON-1090 [DISCUSS] Add Assignment to Stellar Language May 9, 2018
@cestella
Copy link
Member

cestella commented May 9, 2018

So, I guess I'm suggesting that this is like..3/4 of the way to a complete solution, which is for stellar to handle variable assignments completely.

Specifically, I'd say that our dominantly used variable resolver (MapVariableResolver) doesn't support updates, so nothing actually happens when you update outside of the DefaultVariableResolver, so assignments don't actually function yet and an error isn't thrown.
So, if someone reads the updated docs and uses ++ in the profiler or enrichments, it won't actually work and it'll not be apparent why.

What I'm suggesting is one of these:

  1. refactoring the MapVariableResolver to support updates and refactoring the places where we use assignments to NOT handle variable assignment outside of stellar.
  2. Making a feature branch for stellar assignments and I'd +1 this into that branch and we can do the final PR in that branch
  3. pull back the scope of this PR and just have assignments supported in the parser without the unary operators. This is what I was getting here in my comment before:

For instance, in the BaseStellarProcessor class we have a parse method that returns the output of the expression. I'd suggest we have a parseExpression and parseAssignment where the first acts as it does now and the second returns the variable name and the result.

Ultimately, I LOVE this PR and I don't want you to think that I'm poo-pooing it. I think this is great work and I look forward to the feature being added. It just seemed ALMOST done, but not quite enough to stand by itself in master because the gap may leave people confused.

@ottobackwards
Copy link
Contributor Author

Ok, so we are on the same page. I purposely did not put the update in the map resolver because I knew it warranted more discussion etc, and this is it. I just wasn't sure I was understanding what you were saying exactly, and honestly, every time i go back to read from the start @mattf-horton 's comments make me mush minded ;)

Now we have to pick one of the options above to continue with.

@cestella
Copy link
Member

cestella commented May 9, 2018

I think I have a minor preference for 2, honestly. I like this work and don't want it to languish or grow too big. Entirely as an aside, we should probably have a set of microbenchmark tests for stellar to ensure no performance regressions given changes to the grammar.

@ottobackwards ottobackwards changed the title METRON-1090 [DISCUSS] Add Assignment to Stellar Language METRON-1563 [DISCUSS] Add Assignment to Stellar Language May 14, 2018
@ottobackwards
Copy link
Contributor Author

I have renamed in preparation for the Feature branch based on the original jira ( create a new subtask to land this on feature and change to that ID )

=, +=, -=, *=, /=

This are valid for assignment to variables.  All assignments return the value assigned,
as other scripting languages do.

VariableResolver extended with an update() method, though it may not be supported
by all resolvers, this gives the ability to update variables and save between calls.

The shell was updated for this, such that assignment calls

More testing need and doc but I wanted to get it out there.
This is dependent on PR 686
fixes for handling validate with null values
@ottobackwards
Copy link
Contributor Author

Ok everyone, this has moved to the new PR. Thanks!

@ottobackwards
Copy link
Contributor Author

btw, if you haven't checked the new fb and pr, the initial pr has landed, and has added support for := to go along with =

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants