Skip to content

Conversation

@jodersky
Copy link
Member

What changes were proposed in this pull request?

Improve signal handling to allow interrupting running tasks from the REPL (with Ctrl+C).
If no tasks are running or Ctrl+C is pressed twice, the signal is forwarded to the default handler resulting in the usual termination of the application.

This PR is a rewrite of -- and therefore closes #8216 -- as per @piaozhexiu's request

How was this patch tested?

Signal handling is not easily testable therefore no unit tests were added. Nevertheless, the new functionality is implemented in a best-effort approach, soft-failing in case signals aren't available on a specific OS.

@jodersky jodersky changed the title Allow interrupting tasks in repl with Ctrl+C [SPARK-10001][Core] Allow interrupting tasks in repl with Ctrl+C Apr 21, 2016
@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56469 has finished for PR 12557 at commit 0142eb1.

  • This patch fails RAT tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56471 has finished for PR 12557 at commit 0539450.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jodersky jodersky changed the title [SPARK-10001][Core] Allow interrupting tasks in repl with Ctrl+C [SPARK-10001][Core] Interrupt tasks in repl with Ctrl+C Apr 21, 2016
@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56480 has finished for PR 12557 at commit 94323b9.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

* This makes it possible to interrupt a running shell job by pressing Ctrl+C.
*/
def cancelOnInterrupt(ctx: SparkContext): Unit = USignaling.register("INT") {
if (!ctx.statusTracker.getActiveJobIds().isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if the user presses ctrl-c with no running jobs, exits the shell but if there are running jobs cancels the current running jobs?

Would pressing ctrl-c twice actually exit if the current jobs aren't all finished being canceled?

Also this behaviour seems a bit confusing maybe (not the double ctrl c to cancel, but ctrl c on no jobs to exit and ctrl c with job to cancel).

Copy link
Member Author

@jodersky jodersky Apr 21, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. However I personally do not find it confusing at all. I would argue that killing blocking jobs fulfills the SIGINT description of interrupting. If you want to kill a shell externally with a single signal, you can always send a SIGTERM.
Moreover, as Ryan Blue commented in the related JIRA ticket, nodejs, python and ruby shells have the same behaviour (and sbt kind of does too if you enable the cancelable setting).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I don't see this behaviour in my Python shell or my regular bash shell:

holden@yogapanda:~/repos/spark$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13)
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

KeyboardInterrupt

KeyboardInterrupt

But if it matches other interactive shells than that seems fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't vouch for python but what do you mean with your "regular bash shell"? I would not expect my bash shell to exit when getting a sigint, rather it should kill the process that is using the current pty. In this analogy, an interrupt to spark shell should kill its "child processes" first.

@holdenk
Copy link
Contributor

holdenk commented Apr 21, 2016

Thanks for continuing #8216 :) cc @davies to take a look maybe?

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #56482 has finished for PR 12557 at commit 4f9bf69.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@syepes
Copy link

syepes commented Apr 21, 2016

@jodersky Thanks for this addition, its going to be really useful.
Just cant count how many times I have pressed Ctrl+C and had to restart all over..

@jodersky
Copy link
Member Author

Test failure is from hive, seems unrelated.
jenkins, retest this please

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #2843 has finished for PR 12557 at commit 4f9bf69.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 21, 2016

Test build #2844 has finished for PR 12557 at commit 4f9bf69.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

def cancelOnInterrupt(ctx: SparkContext): Unit = USignaling.register("INT") {
if (!ctx.statusTracker.getActiveJobIds().isEmpty) {
logWarning("Cancelling all active jobs, this can take a while. " +
"Press Ctrl+C again to exit now.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can type Ctrl+D to quit the shell, should we always use Ctrl+C to cancel the running job and clear the current line input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C-d is slightly different in that it just emits an end-of-transmission
(eot) character to the active terminal.
When the shell is running a job, it cannot listen to input and hence C-d
has no effect.

C-c on the other hand, sends an asynchronous signal by the host operating
system, which will be executed by the signal handler in a separate thread
and thus be able to stop running jobs.
On Apr 21, 2016 10:13 AM, "Davies Liu" notifications@github.com wrote:

In repl/src/main/scala/org/apache/spark/repl/Signaling.scala
#12557 (comment):

+import org.apache.spark.SparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.{Signaling => USignaling}
+
+private[repl] object Signaling extends Logging {
+

  • /**
  • * Register a SIGINT handler, that terminates all active spark jobs or terminates
  • * when no jobs are currently running.
  • * This makes it possible to interrupt a running shell job by pressing Ctrl+C.
  • */
  • def cancelOnInterrupt(ctx: SparkContext): Unit = USignaling.register("INT") {
  • if (!ctx.statusTracker.
    getActiveJobIds().isEmpty) {
  •  logWarning("Cancelling all active jobs, this can take a while. " +
    
  •    "Press Ctrl+C again to exit now.")
    

We can type Ctrl+D to quit the shell, should we always use Ctrl+C to cancel
the running job and clear the current line input?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/12557/files/4f9bf695344a5c4c54372eaa6bf54af0d2da1f74#r60619566

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I think I missunderstood your comment, are you proposing of dropping
the in-job condition and always catch a C-c?
In that case I'm not sure that I agree, since the main purpose of a sigint
is to interrupt a process. Killing jobs first and catching the signal is a
compromise for convenience. Imagine a spark shell that is stuck in some
arbitrary user code (not running a spark job), in that case C-c should imo
still exit the process.
On Apr 21, 2016 10:51 AM, "Jakob Odersky" jakob@odersky.com wrote:

C-d is slightly different in that it just emits an end-of-transmission
(eot) character to the active terminal.
When the shell is running a job, it cannot listen to input and hence C-d
has no effect.

C-c on the other hand, sends an asynchronous signal by the host operating
system, which will be executed by the signal handler in a separate thread
and thus be able to stop running jobs.
On Apr 21, 2016 10:13 AM, "Davies Liu" notifications@github.com wrote:

In repl/src/main/scala/org/apache/spark/repl/Signaling.scala
#12557 (comment):

+import org.apache.spark.SparkContext
+import org.apache.spark.internal.Logging
+import org.apache.spark.util.{Signaling => USignaling}
+
+private[repl] object Signaling extends Logging {
+

  • /**
  • * Register a SIGINT handler, that terminates all active spark jobs or terminates
  • * when no jobs are currently running.
  • * This makes it possible to interrupt a running shell job by pressing Ctrl+C.
  • */
  • def cancelOnInterrupt(ctx: SparkContext): Unit = USignaling.register("INT") {
  • if (!ctx.statusTracker.
    getActiveJobIds().isEmpty) {
  •  logWarning("Cancelling all active jobs, this can take a while. " +
    
  •    "Press Ctrl+C again to exit now.")
    

We can type Ctrl+D to quit the shell, should we always use Ctrl+C to
cancel the running job and clear the current line input?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/apache/spark/pull/12557/files/4f9bf695344a5c4c54372eaa6bf54af0d2da1f74#r60619566

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That make sense, looks good to me.

@holdenk
Copy link
Contributor

holdenk commented Apr 21, 2016

LGTM as well (note: from chatting with @jodersky it doesn't depend on the conditional to check for empty if a second ctrl-c is issued, rather it only re-register the signal handler after its completed killing all of the jobs and if another signal is received before then uses the default handler)

*/
override def handle(sig: Signal): Unit = {
// register old handler, will receive incoming signals while this handler is running
Signal.handle(signal, prevHandler)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have mentioned this in the PR description. When a signal is caught, the first thing the handler does is de-register itself. That way it only ever accepts the first signal and subsequent signals are escalated.

@davies
Copy link
Contributor

davies commented Apr 22, 2016

Merging this into master, thanks!

@asfgit asfgit closed this in 8012793 Apr 22, 2016
}
log.info("Registered signal handlers for [" + signals.mkString(", ") + "]")
def register(log: Logger): Unit = Seq("TERM", "HUP", "INT").foreach{ sig =>
Signaling.register(sig) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did u remove the double registration check?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

o.a.s.util.Signaling takes care of enforcing single instantiation of a handler per jvm. Do you mean the specific case of double registering logging?

asfgit pushed a commit that referenced this pull request Apr 22, 2016
## What changes were proposed in this pull request?
This is a follow-up to #12557, with the following changes:

1. Fixes some of the style issues.
2. Merges Signaling and SignalLogger into a new class called SignalUtils. It was pretty confusing to have Signaling and Signal in one file, and it was also confusing to have two classes named Signaling and one called the other.
3. Made logging registration idempotent.

## How was this patch tested?
N/A.

Author: Reynold Xin <rxin@databricks.com>

Closes #12605 from rxin/SPARK-10001.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants