Skip to content
This repository was archived by the owner on Aug 20, 2025. It is now read-only.

Conversation

@nickwallen
Copy link
Contributor

@nickwallen nickwallen commented Dec 25, 2017

Stellar in a Zeppelin Notebook

The goal of this effort was to have the shell-like experience of the command-line driven REPL, but inside a web-based Zeppelin Notebook. This makes the existing functionality provided by the REPL more accessible and approachable for the average user.

This also opens up interesting additional scenarios like "executable" use cases or plotting profile data. Definitely more work to do on those fronts, but this is a first step.

Changes

  1. There was a good amount of refactoring that I needed to do to achieve this. Prior to this effort different aspects of the REPL's functionality were tied together in the StellarShell class. To be able to have the same shell-like experience in both the REPL and Zeppelin, I needed to factor this logic out separately to allow for reuse.

    This included the following bits.

    • Console logic tied to the AESH library
    • Stellar execution logic
    • Autocomplete logic
    • Variable assignment (only available in the shell)
    • Magics
    • Docstrings
    • Comments
    • quit
  2. We had very few automated tests around the shell/REPL, primarily because all these bits were tied together, which made testing difficult. Separating out this logic allowed me to add a large number of unit tests around each of these pieces of logic.

  3. This also adds a separate project metron-stellar/stellar-zeppelin that provides a Zeppelin interpreter that allow you to run Stellar in a notebook. The interpreter acts as the front-end that then re-uses all of the bits that I refactored as described above.

What does this do?

  1. This allows you to run Stellar expressions in a Zeppelin Notebook.

    stellar in a zeppelin notebook

  2. This provides auto-complete in the Zeppelin Notebook by pressing the CTRL + . keys.

    stellar in a notebook - autocomplete

What does this not do?

This does not automatically prepare the Zeppelin environment to run Stellar. Meaning it does not install the Stellar interpreter in Zeppelin. You have to manually install the zinterpreter in Zeppelin by following the directions described in the metron-stellar/stellar-zepelin README.

Testing

Run Stellar in Notebook

  1. Build Metron.
  2. Follow the instructions in metron-stellar/stellar-zepelin/README.md to download/run Zeppelin locally.
  3. Follow the same instructions to execute Stellar expressions in a Zeppelin Notebook.
  4. Use CTRL + . to try out autocomplete.

Validate the CLI REPL

  1. Deploy Full Dev
  2. Run the existing CLI-driven REPL
  3. Validate that we have feature parity

Pull Request Checklist

  • Is there a JIRA ticket associated with this PR? If not one needs to be created at Metron Jira.
  • Does your PR title start with METRON-XXXX where XXXX is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character.
  • Has your PR been rebased against the latest commit within the target branch (typically master)?
  • Have you included steps to reproduce the behavior or problem that is being changed or addressed?
  • Have you included steps or a guide to how the change may be verified and tested manually?
  • Have you ensured that the full suite of tests and checks have been executed in the root metron folder via:
  • Have you written or updated unit tests and or integration tests to verify your changes?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • Have you verified the basic functionality of the build by building and running locally with Vagrant full-dev environment or the equivalent?

Copy link
Contributor Author

@nickwallen nickwallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few notes to guide reviewers...

/**
* Provides auto-completion for Stellar.
*/
public class DefaultStellarAutoCompleter implements StellarAutoCompleter {
Copy link
Contributor Author

@nickwallen nickwallen Dec 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class provides the core auto-completion logic for both the CLI-based REPL and Zeppelin.

this.autocompleteIndex = initializeIndex();

// TODO is this needed? FunctionResolver functionResolver
// // asynchronously update the index with function names found from a classpath scan.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this logic needed any longer? I don't feel it is, but need to dig on this some more.

* Default implementation of a StellarShellExecutor.
*/
public class DefaultStellarShellExecutor implements StellarShellExecutor {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class provides the core execution logic for both the REPL and Zeppelin. It handles core Stellar in addition to the "extensions" like assignment, magics, docstrings, etc.

for(SpecialCommand command : commandRegistry) {
if(command.getMatcher().apply(expression)) {
return command.execute(expression, this);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that is not "core" Stellar is defined separately as a SpecialCommand.

new MagicDefineGlobal(),
new MagicUndefineGlobal(),
new MagicListGlobals()
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the specials are made available here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever do something like add assignment to the core Stellar language (like #687) then all we have to do is not use/remove the AssignmentCommand class.


public interface SpecialDefinedListener {
void whenSpecialDefined(SpecialCommand magic);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Event listeners can be notified when variables, functions, or specials are defined. Useful in our case for notifying the autocompleter.

*
* Useful for debugging Stellar expressions.
*/
public class StellarShell extends AeshConsoleCallback implements Completion {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The StellarShell now only contains logic to drive the Aesh console.

*/
@Override
public void complete(CompleteOperation completeOperation) {
String buffer = completeOperation.getBuffer();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ties our generic autocompleter to the autocomplete interface required by Aesh.

* @param executor A stellar execution environment.
* @return The result of executing the magic command.
*/
StellarShellResult execute(String expression, StellarShellExecutor executor);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface for all specials; assignment, comments, magics, docstrings, etc.


[Apache Zeppelin](https://zeppelin.apache.org/) is a web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more. This project provides a means to run the Stellar REPL directly within a Zeppelin Notebook.

## Installation
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instructions for installing/running in a notebook.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please add the license header to this? #884 is close to going in and enforcing this, so I'm hoping to avoid impact to master.

<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements.  See the NOTICE file
distributed with this work for additional information
regarding copyright ownership.  The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License.  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, done.

Copy link
Contributor Author

@nickwallen nickwallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More notes to help guide reviewers through the code.

data[i++] = new String[] { toWrappedString(kv.getKey().toString(), wordWrap)
, toWrappedString(result.getResult(), wordWrap)
, toWrappedString(result.getExpression(), wordWrap)
, toWrappedString(result.getExpression().get(), wordWrap)
Copy link
Contributor Author

@nickwallen nickwallen Jan 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The VariableResult.expression field is now optional. We do not always know the expression that resulted in a value. The ShellFunctions just needed updated to treat this as an Optional.

```
$ mvn exec:java \
-Dexec.mainClass="org.apache.metron.stellar.common.shell.StellarShell" \
-Dexec.mainClass="org.apache.metron.stellar.common.shell.cli.StellarShell" \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StellarShell, the main driver class for the CLI-based REPL, was moved to its own package since it is only used by the CLI-based REPL. This separates it from the other core classes that are used by both the Zeppelin and CLI-based REPLs.

new MagicDefineGlobal(),
new MagicUndefineGlobal(),
new MagicListGlobals()
);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we ever do something like add assignment to the core Stellar language (like #687) then all we have to do is not use/remove the AssignmentCommand class.

*/
public interface FunctionDefinedListener {
void whenFunctionDefined(StellarFunctionInfo functionInfo);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used an event listener pattern so that external entities could get notified when things occur during the execution of Stellar. For example, the auto-completer needs to know when a variable is defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing seems to be ever removed, can that ever be a problem? That will typically cause retention... and 'leaks'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how this would cause a leak. Yes, in some cases listeners can cause troubles, but I don't see how it is a problem here. Can you be more specific? Otherwise, I don't know what needs a fixin' here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, battle scars.

* Add a listener that will be notified when a magic command is defined.
* @param listener The listener to notify.
*/
void addSpecialListener(StellarExecutionListeners.SpecialDefinedListener listener);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our DefaultStellarShellExecutor is a StellarExecutionNotifier as it is able to notify event listeners when variables, functions or specials are defined.

*
* For example `?TO_STRING` will output the docs for the function `TO_STRING`
*/
public class DocCommand implements SpecialCommand {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doc strings are not part of the core language.

*
* quit
*/
public class QuitCommand implements SpecialCommand {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what allows a user to execute quit within the REPL. Again, not part of core Stellar.

* This is typically an action performed on the execution
* environment, not something that could be executed within Stellar.
*/
public interface SpecialCommand {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interface that I used for any logic that we had implemented in the REPL that is not part of the Stellar language.

/**
* A Zeppelin Interpreter for Stellar.
*/
public class StellarInterpreter extends Interpreter {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what allows us to run Stellar in a Notebook.

}

@Override
public List<InterpreterCompletion> completion(String buf, int cursor) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where we connect our generic auto-completer that is used across both REPLs into the auto-complete extension point offered by Zeppelin.

@ottobackwards
Copy link
Contributor

This is quite a bit to go over.... can I trade you a similarly large review task?

@merrimanr
Copy link
Contributor

merrimanr commented Jan 3, 2018

I think this is an excellent start. So far I have only reviewed it from a user perspective and it's working well.

I've spun it up in full dev (not sure that's even necessary) and installed this via the instructions in the README. I was able to run most of the examples in the Stellar README.

I found that the functions available in Zeppelin are a subset of what's available in the Stellar shell. The missing functions include IS_EMAIL, ENRICHMENT*, GEO*, STATS* and many others. Is this expected?

Is there a way to pass in the zookeeper url in Zeppelin?

@merrimanr
Copy link
Contributor

I also noticed that functions returning a list only display the first item in both the shell and Zeppelin output.

For example, the expression MAP([ 'foo', 'bar'], (x) -> TO_UPPER(x) ) returns FOO when I would expect it to return ['FOO', 'BAR'].

From what I can tell the problem is only in how the result is displayed. For example, the expression 'BAR' in MAP([ 'foo', 'bar'], (x) -> TO_UPPER(x) ) returns true as expected.

@nickwallen
Copy link
Contributor Author

Thanks for digging into it @merrimanr . You uncovered some good stuff.

I found that the functions available in Zeppelin are a subset of what's available in the Stellar shell. The missing functions include IS_EMAIL, ENRICHMENT*, GEO*, STATS* and many others. Is this expected?

Yes, since we only added stellar-common to the interpreter, only the functions defined in that library are available.

I just updated the README to clarify this point. I also added instructions for adding additional libraries to gain access to more Stellar functions.

@nickwallen
Copy link
Contributor Author

@merrimanr: Is there a way to pass in the zookeeper url in Zeppelin?

No, I did not implement that in this PR. As a next step I was going to do whatever needs done to get the management functions working in Zeppelin. That would include adding a Zookeeper URL.

@nickwallen
Copy link
Contributor Author

@merrimanr: From what I can tell the problem is only in how the result is displayed. For example, the expression 'BAR' in MAP([ 'foo', 'bar'], (x) -> TO_UPPER(x) ) returns true as expected.

BUG! Good find. The value getting returned is always the first. I'll fix that.

@nickwallen
Copy link
Contributor Author

The problem is on the UI side of things; for both Zeppelin and the CLI. When I get the result back from Stellar, I was using ConversionUtils.convert(value, String.class) to get me a result that I can display. ConversionUtils just gives you the first item back if you ask it to convert a list to a String. Oops.

@nickwallen
Copy link
Contributor Author

@merrimanr Fixed that bug and was able to add a good amount of unit tests around it. Also figured out a way to unit test the Aesh-driven StellarShell class.

@nickwallen
Copy link
Contributor Author

Ok will do @ottobackwards

@nickwallen
Copy link
Contributor Author

And done. Please take a look when you get a chance.

@ottobackwards
Copy link
Contributor

Is there a 'supported' or tested version of zeppelin?
I have 0.7.3 installed locally.

I am getting exceptions starting the stellar interpreter

org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.TApplicationException: Internal error processing getFormType
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:450)
	at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getFormType(LazyOpenInterpreter.java:111)
	at org.apache.zeppelin.notebook.Paragraph.jobRun(Paragraph.java:387)
	at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
	at org.apache.zeppelin.scheduler.RemoteScheduler$JobRunner.run(RemoteScheduler.java:329)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.thrift.TApplicationException: Internal error processing getFormType
	at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
	at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
	at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getFormType(RemoteInterpreterService.java:337)
	at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getFormType(RemoteInterpreterService.java:323)
	at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getFormType(RemoteInterpreter.java:446)
	... 11 more

Also:

In the zeppelin site file, there is already a zeppelin.interpreters property, which in a comma separated list of class names. Is it right to create a new property vs. adding to the existing?

@ottobackwards
Copy link
Contributor

Another question:
Even through it is consistent with the other interpreters, I am not sure that we should not have Apache Stellar, or Apache Metron Stellar and not just stellar as the name. It does not seem right to not have Apache there.

@nickwallen
Copy link
Contributor Author

Is there a 'supported' or tested version of zeppelin? I have 0.7.3 installed locally.

0.7.3 is what the unit tests and what my manual testing have used.

I added this to the README.

@nickwallen
Copy link
Contributor Author

@ottobackwards : In the zeppelin site file, there is already a zeppelin.interpreters property, which in a comma separated list of class names. Is it right to create a new property vs. adding to the existing?

Both work, but adding it to the existing is probably the better approach. I updated the README to reflect this.

@nickwallen
Copy link
Contributor Author

@ottobackwards : I am getting exceptions starting the stellar interpreter

I have never seen this. I need more context unfortunately.

  1. What do you mean by "starting the stellar interpreter"?

  2. Is the exception coming from the Zeppelin logs or are you seeing this in a notebook?

  3. When do you see the exception? After starting Zeppelin or after running Stellar in a notebook? Or sometime else?

  4. Are you successfully running Stellar in a notebook at all?

  5. I am assuming you followed the instructions in the README exactly? Is there anything in there that might be different from what you did?

@ottobackwards
Copy link
Contributor

  1. When I type something in the notebook and hit run. At least I think that is when I get the error. It may happen when selecting the interpreter for the notebook.

  2. Yes

  3. Running TO_UPPER('foo')

  4. no, I get this for every statement I run.
    Also, I don't get %stellar for the prompt like your screenshots

  • I set the interpreter through the dropdown not by typing %stellar
  • I didn't have to 'create' the stellar interpreter after starting, there was already an entry, I think from the site config

@nickwallen
Copy link
Contributor Author

I didn't have to 'create' the stellar interpreter after starting, there was already an entry, I think from the site config

That does not (should not) happen. It should only be there when after you have created it in the UI.

Are you sure you haven't tried this a few times and either...
(1) There are left-over files in $ZEPPELIN_HOME between attempts
(2) Something is stored in browser cache, or even that...
(3) The Zeppelin instance was launched (and remains running) from one of your first attempts?

When I was playing with this a few weeks ago and was seeing weird behavior, I found that (3) was a problem for me. I either left the Zeppelin process running accidentally or the stop script in Zeppelin does not work all the time. I had to ps -ef | grep Zep and forcibly kill processes to make sure that the process was running with the configuration I expected.

@ottobackwards
Copy link
Contributor

I think creating the /interpreters/stellar/x.json creates the entry?

I just tried again, ( new mvn install , new site file ) and I checked the interpreter log:

INFO [2018-01-09 16:14:22,873] ({Thread-0} RemoteInterpreterServer.java[run]:97) - Starting remote interpreter server on port 63321
2 ERROR [2018-01-09 16:14:23,545] ({pool-1-thread-3} RemoteInterpreterServer.java[createInterpreter]:204) - java.lang.ClassNotFoundException: org.apache.metron.stellar.zepplin.StellarInterpreter
3 java.lang.ClassNotFoundException: org.apache.metron.stellar.zepplin.StellarInterpreter
4 > at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
5 > at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
6 > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
7 > at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
8 > at java.lang.Class.forName0(Native Method)
9 > at java.lang.Class.forName(Class.java:264)
10 > at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.createInterpreter(RemoteInterpreterServer.java:178)
11 > at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$createInterpreter.getResult(RemoteInterpreterService.java:1662)
12 > at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$createInterpreter.getResult(RemoteInterpreterService.java:1647)
13 > at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
14 > at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
15 > at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
16 > at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
17 > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
18 > at java.lang.Thread.run(Thread.java:745)

@ottobackwards
Copy link
Contributor

Can you run the check style against the interpreter class?
Also, do we want to use @Override?

@nickwallen
Copy link
Contributor Author

Also, do we want to use @OverRide?

Done.

@ottobackwards
Copy link
Contributor

@nickwallen I'm not sure what to do to get this going. The jar is in maven. I tried to path directly to it as well. I can attach my site and interpreter json files, and screenshot my settings....

@ottobackwards
Copy link
Contributor

I'm going to try a clean zeppelin install I guess.

@ottobackwards
Copy link
Contributor

ottobackwards commented Jan 10, 2018

ok, starting over

  1. First difference, is on first start ( new install ) the stellar interpreter / group is already created in zeppelin when I go to configure the stellar interpreter

screen shot 2018-01-10 at 09 32 02

So I update and restart with new settings.

Interpreter is green.

  1. Run the first two tasks with the following results:

screen shot 2018-01-10 at 09 39 20

  1. I was able to run other stellar statements fine, so it is better

screen shot 2018-01-10 at 09 43 54

@nickwallen
Copy link
Contributor Author

Ok, so it looks like it is working for you.

Zeppelin interprets the first line as the interpreter spec (or whatever they call it). That's why %functions failed because it thinks you want to use an interpreter called 'functions'. If you want to use a magic function like %functions either put that on a second line or make sure to specify %stellar explicitly like so...

%stellar

%functions

@ottobackwards
Copy link
Contributor

Is the plan to deploy the jar file in the interpreter directory like the other interpreters?

@nickwallen
Copy link
Contributor Author

Is the plan to deploy the jar file in the interpreter directory like the other interpreters?

I don't know what that is going to look like. We need to do more research to see how the setup can be automated by the Mpack. I don't want the user to have to do all this manual setup.

@ottobackwards
Copy link
Contributor

Do we have jiras for the next steps on this yet? I would be good if we recored our intent, and who knows, maybe someone will want to work on it.

@ottobackwards
Copy link
Contributor

This is really good work @nickwallen. I am +1 on this, but would like to see jiras on the next steps entered if possible as a side note.

@nickwallen
Copy link
Contributor Author

@asfgit asfgit closed this in 2dd01b1 Jan 12, 2018
iraghumitra pushed a commit to iraghumitra/incubator-metron that referenced this pull request Feb 17, 2018
@nickwallen nickwallen deleted the METRON-1382 branch September 17, 2018 19:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants