-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-31483][PySpark] Use SPARK_PYTHON or 'python' to run find_spark_home.py #28256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Use SPARK_PYTHON or 'python' to run find_spark_home.py
|
Can one of the admins verify this patch? |
| PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"python"}" | ||
| fi | ||
| export SPARK_HOME=$($PYSPARK_DRIVER_PYTHON "$FIND_SPARK_HOME_PYTHON_SCRIPT") | ||
| export SPARK_HOME=$(${PYSPARK_PYTHON:-"python"} "$FIND_SPARK_HOME_PYTHON_SCRIPT") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm .. can we strip the non-printable characters instead?
Respecting PYSPARK_DRIVER_PYTHON falling back to PYSPARK_PYTHON is expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least I can come up with one way although it's hacky. e.g.)
a=$(ipython -c "import sys; print('/User', file=sys.stderr)" 2>&1 >/dev/null)
ls $aThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a workaround for ipython, but not for jupyter, because jupyter doesn't support jupyter find_spark_home.py. I think PYSPARK_DRIVER_PYTHON is more meant for "frontend". This fix enables PYSPARK_DRIVER_PYTHON=jupyter PYSPARK_DRIVER_PYTHON_OPTS=notebook pyspark too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, PYSPARK_DRIVER_PYTHON's falling back to PYSPARK_PYTHON happens after find-spark-home too:
Line 45 in f1fde0c
| PYSPARK_DRIVER_PYTHON=$PYSPARK_PYTHON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mzhang-code, can we just add a bandaid fix like: if PYSPARK_DRIVER_PYTHON ends with jupyter or ipython, uses PYSPARK_PYTHON or python for now with some comments about why we're using PYSPARK_PYTHON instead of PYSPARK_DRIVER_PYTHON?
|
We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable. |
Use SPARK_PYTHON or 'python' to run find_spark_home.py
What changes were proposed in this pull request?
Use the SPARK_PYTHON or
pythonto run find_spark_home.py instead ofPYSPARK_DRIVER_PYTHONbecausePYSPARK_DRIVER_PYTHONcan beipythonandipythonadds an invisible header to spark home path.Why are the changes needed?
I'm trying launching pyspark shell with IPython interface via
PYSPARK_DRIVER_PYTHON=ipython pysparkHowever it hits error
.../pyspark/bin/load-spark-env.sh: No such file or directoryIt is strange because the path
/Users/mengyu/opt/anaconda2/envs/py3-spark/lib/python3.7/site-packages/pyspark/bin/load-spark-env.shexists.Then I found it is because
ipythoninterpreter adds an invisible header to stdout output.Compare the output from
python:A workaround is to use
ipython ----no-term-titleoption. But I think not usingipythonto run find_spark_home is better because ipython is more of a frontend. Besides, with this fix, we can open SparkSession-enabled jupyter notebook session viaDoes this PR introduce any user-facing change?
How was this patch tested?
Tested by manual run of
pysparkandPYSPARK_DRIVER_PYTHON=ipython pyspark