Skip to content

Conversation

@Shooter23
Copy link

When running PYSPARK_DRIVER_PYTHON=ipython pyspark on xterm the find-spark-home script calls ipython /path/to/find_spark_home.py and the string printed by that script gets assigned to SPARK_HOME. When run with IPython that string will start with a sequence bounded by control characters before the path determined by find_spark_home.py. While this part of the string does not appear on echo it will cause pyspark to compose paths improperly when using SPARK_HOME.

To see the sequence run:

import os
p = os.popen('ipython somescript.py')
p.read()

'\x1b[22;0t\x1b]0;IPython: {current directory}\x07the expected output\n'

The cut command removes the sequence before "the expected output". Lines without a bell character (\x07), such as you get when running python3 find_spark_home.py, remain unchanged.

What changes were proposed in this pull request?

Fixing the assignment to SPARK_HOME in find-spark-home to remove the control characters added when using ipython.

Why are the changes needed?

On xterm running PYSPARK_DRIVER_PYTHON=ipython pyspark causes pyspark to compose paths improperly, prepending the current working directory to SPARK_HOME as determined by find_spark_home.py, making it unable to find the files it seeks.

Does this PR introduce any user-facing change?

Yes. Before the change I would get "No such file or directory" errors as the current working directory would get prepended to SPARK_HOME. After the change the pyspark interactive prompt starts as expected with an ipython prompt.

How was this patch tested?

I ran pyspark with PYSPARK_DRIVER_PYTHON set to "python", "python3" and "ipython". All three variations gave the appropriate prompt with the expected session and context variables set. I also tested the pipe to the cut command with lines with and without bell characters to ensure that the addition had no effect on the latter. I didn't modify the current testing scheme because I couldn't find an extant test for any of the relevant bash scripts.

When running `PYSPARK_DRIVER_PYTHON=ipython pyspark` the find-spark-home script calls `ipython /path/to/find_spark_home.py` and the string printed by that script gets assigned to SPARK_HOME. When run with IPython that string will start with a sequence bounded by control characters before the path determined by find_spark_home.py. While this part of the string does not appear on echo it will cause pyspark to compose paths improperly when using SPARK_HOME.

To see the sequence run:
>>> import os
>>> p = os.popen('ipython somescript.py')
>>> p.read()
'\x1b[22;0t\x1b]0;IPython: notebook/Python\x07the expected output\n'

The cut command removes the sequence before "the expected output". Lines without a bell character (\x07), such as you get when running `python3 find_spark_home.py`, remain unchanged.
PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"python3"}"
fi
export SPARK_HOME=$($PYSPARK_DRIVER_PYTHON "$FIND_SPARK_HOME_PYTHON_SCRIPT")
export SPARK_HOME=$($PYSPARK_DRIVER_PYTHON "$FIND_SPARK_HOME_PYTHON_SCRIPT" | cut -d $'\007' -f 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it relate to #28256? Does it support jupyter too?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears to be the same problem but with a different solution.

I hadn't tried it with jupyter before so I don't know what I should expect.

Running it with PYSPARK_DRIVER_PYTHON=jupyter made it attempt to run jupyter-/path/to/find_spark_home.py, which failed with it failing to find /bin/load-spark-env.sh or /bin/spark-submit.

Running it with PYSPARK_DRIVER_PYTHON='jupyter notebook' started the kernel but initially opened a 403 forbidden page in the browser, complaining of lacking a referrer. I pasted the URL with the token appearing in the jupyter log and it took me to the bin directory under the virtual environment.

It doesn't appear to me that $PYSPARK_DRIVER_PYTHON "$FIND_SPARK_HOME_PYTHON_SCRIPT" should run with PYSPARK_DRIVER_PYTHON set to "jupyter".

@HyukjinKwon
Copy link
Member

cc @holdenk FYI

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@srowen
Copy link
Member

srowen commented Jan 1, 2022

(Could you make a JIRA for this?)

@holdenk
Copy link
Contributor

holdenk commented Jan 1, 2022

Running ipython as the driver seems weird it's not really designed to be used this way. Can you elaborate on why your doing this/what your trying to accomplish?

@zero323
Copy link
Member

zero323 commented Jan 2, 2022

Running ipython as the driver seems weird it's not really designed to be used this way. Can you elaborate on why your doing this/what your trying to accomplish?

Running interactive shell sessions with IPython is quite common approach from my experience.

@github-actions
Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Apr 13, 2022
@github-actions github-actions bot closed this Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants