-
Notifications
You must be signed in to change notification settings - Fork 29k
[SPARK-10001][Core] Interrupt tasks in repl with Ctrl+C #12557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Test build #56469 has finished for PR 12557 at commit
|
|
Test build #56471 has finished for PR 12557 at commit
|
|
Test build #56480 has finished for PR 12557 at commit
|
| * This makes it possible to interrupt a running shell job by pressing Ctrl+C. | ||
| */ | ||
| def cancelOnInterrupt(ctx: SparkContext): Unit = USignaling.register("INT") { | ||
| if (!ctx.statusTracker.getActiveJobIds().isEmpty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if the user presses ctrl-c with no running jobs, exits the shell but if there are running jobs cancels the current running jobs?
Would pressing ctrl-c twice actually exit if the current jobs aren't all finished being canceled?
Also this behaviour seems a bit confusing maybe (not the double ctrl c to cancel, but ctrl c on no jobs to exit and ctrl c with job to cancel).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. However I personally do not find it confusing at all. I would argue that killing blocking jobs fulfills the SIGINT description of interrupting. If you want to kill a shell externally with a single signal, you can always send a SIGTERM.
Moreover, as Ryan Blue commented in the related JIRA ticket, nodejs, python and ruby shells have the same behaviour (and sbt kind of does too if you enable the cancelable setting).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't see this behaviour in my Python shell or my regular bash shell:
holden@yogapanda:~/repos/spark$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.KeyboardInterrupt
KeyboardInterrupt
But if it matches other interactive shells than that seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't vouch for python but what do you mean with your "regular bash shell"? I would not expect my bash shell to exit when getting a sigint, rather it should kill the process that is using the current pty. In this analogy, an interrupt to spark shell should kill its "child processes" first.
|
Test build #56482 has finished for PR 12557 at commit
|
|
@jodersky Thanks for this addition, its going to be really useful. |
|
Test failure is from hive, seems unrelated. |
|
Test build #2843 has finished for PR 12557 at commit
|
|
Test build #2844 has finished for PR 12557 at commit
|
| def cancelOnInterrupt(ctx: SparkContext): Unit = USignaling.register("INT") { | ||
| if (!ctx.statusTracker.getActiveJobIds().isEmpty) { | ||
| logWarning("Cancelling all active jobs, this can take a while. " + | ||
| "Press Ctrl+C again to exit now.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can type Ctrl+D to quit the shell, should we always use Ctrl+C to cancel the running job and clear the current line input?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C-d is slightly different in that it just emits an end-of-transmission
(eot) character to the active terminal.
When the shell is running a job, it cannot listen to input and hence C-d
has no effect.
C-c on the other hand, sends an asynchronous signal by the host operating
system, which will be executed by the signal handler in a separate thread
and thus be able to stop running jobs.
On Apr 21, 2016 10:13 AM, "Davies Liu" notifications@github.com wrote:
In repl/src/main/scala/org/apache/spark/repl/Signaling.scala
#12557 (comment):
+import org.apache.spark.SparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.{Signaling => USignaling}
+
+private[repl] object Signaling extends Logging {
+
- /**
- * Register a SIGINT handler, that terminates all active spark jobs or terminates
- * when no jobs are currently running.
- * This makes it possible to interrupt a running shell job by pressing Ctrl+C.
- */
- def cancelOnInterrupt(ctx: SparkContext): Unit = USignaling.register("INT") {
- if (!ctx.statusTracker.
getActiveJobIds().isEmpty) {logWarning("Cancelling all active jobs, this can take a while. " +"Press Ctrl+C again to exit now.")
We can type Ctrl+D to quit the shell, should we always use Ctrl+C to cancel
the running job and clear the current line input?
—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/12557/files/4f9bf695344a5c4c54372eaa6bf54af0d2da1f74#r60619566
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I think I missunderstood your comment, are you proposing of dropping
the in-job condition and always catch a C-c?
In that case I'm not sure that I agree, since the main purpose of a sigint
is to interrupt a process. Killing jobs first and catching the signal is a
compromise for convenience. Imagine a spark shell that is stuck in some
arbitrary user code (not running a spark job), in that case C-c should imo
still exit the process.
On Apr 21, 2016 10:51 AM, "Jakob Odersky" jakob@odersky.com wrote:
C-d is slightly different in that it just emits an end-of-transmission
(eot) character to the active terminal.
When the shell is running a job, it cannot listen to input and hence C-d
has no effect.C-c on the other hand, sends an asynchronous signal by the host operating
system, which will be executed by the signal handler in a separate thread
and thus be able to stop running jobs.
On Apr 21, 2016 10:13 AM, "Davies Liu" notifications@github.com wrote:In repl/src/main/scala/org/apache/spark/repl/Signaling.scala
#12557 (comment):
+import org.apache.spark.SparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.{Signaling => USignaling}
+
+private[repl] object Signaling extends Logging {
+
- /**
- * Register a SIGINT handler, that terminates all active spark jobs or terminates
- * when no jobs are currently running.
- * This makes it possible to interrupt a running shell job by pressing Ctrl+C.
- */
- def cancelOnInterrupt(ctx: SparkContext): Unit = USignaling.register("INT") {
- if (!ctx.statusTracker.
getActiveJobIds().isEmpty) {logWarning("Cancelling all active jobs, this can take a while. " +"Press Ctrl+C again to exit now.")We can type Ctrl+D to quit the shell, should we always use Ctrl+C to
cancel the running job and clear the current line input?—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/12557/files/4f9bf695344a5c4c54372eaa6bf54af0d2da1f74#r60619566
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That make sense, looks good to me.
|
LGTM as well (note: from chatting with @jodersky it doesn't depend on the conditional to check for empty if a second ctrl-c is issued, rather it only re-register the signal handler after its completed killing all of the jobs and if another signal is received before then uses the default handler) |
| */ | ||
| override def handle(sig: Signal): Unit = { | ||
| // register old handler, will receive incoming signals while this handler is running | ||
| Signal.handle(signal, prevHandler) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have mentioned this in the PR description. When a signal is caught, the first thing the handler does is de-register itself. That way it only ever accepts the first signal and subsequent signals are escalated.
|
Merging this into master, thanks! |
| } | ||
| log.info("Registered signal handlers for [" + signals.mkString(", ") + "]") | ||
| def register(log: Logger): Unit = Seq("TERM", "HUP", "INT").foreach{ sig => | ||
| Signaling.register(sig) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did u remove the double registration check?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
o.a.s.util.Signaling takes care of enforcing single instantiation of a handler per jvm. Do you mean the specific case of double registering logging?
## What changes were proposed in this pull request? This is a follow-up to #12557, with the following changes: 1. Fixes some of the style issues. 2. Merges Signaling and SignalLogger into a new class called SignalUtils. It was pretty confusing to have Signaling and Signal in one file, and it was also confusing to have two classes named Signaling and one called the other. 3. Made logging registration idempotent. ## How was this patch tested? N/A. Author: Reynold Xin <rxin@databricks.com> Closes #12605 from rxin/SPARK-10001.
What changes were proposed in this pull request?
Improve signal handling to allow interrupting running tasks from the REPL (with Ctrl+C).
If no tasks are running or Ctrl+C is pressed twice, the signal is forwarded to the default handler resulting in the usual termination of the application.
This PR is a rewrite of -- and therefore closes #8216 -- as per @piaozhexiu's request
How was this patch tested?
Signal handling is not easily testable therefore no unit tests were added. Nevertheless, the new functionality is implemented in a best-effort approach, soft-failing in case signals aren't available on a specific OS.