From 1d12e90c96c6d34fd98e390b0c7b395beb01f07e Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Wed, 10 Nov 2021 10:40:27 +0000 Subject: [PATCH 1/5] Add debugger doc --- r/vignettes/developers/debugger.Rmd | 68 +++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 r/vignettes/developers/debugger.Rmd diff --git a/r/vignettes/developers/debugger.Rmd b/r/vignettes/developers/debugger.Rmd new file mode 100644 index 00000000000..2038d37cfae --- /dev/null +++ b/r/vignettes/developers/debugger.Rmd @@ -0,0 +1,68 @@ +# Running R code with the C++ debugger + +As Arrow has C++ code at its core, debugging code can sometimes be tricky when +errors originate in the C++ rather than the R layer. If you are adding new code +which triggers a C++ bug (or find one in existing code), this can result in a +segfault. If you are working in RStudio, the session is aborted, and you may +not be able to retrieve the error messaging needed to diagnose and/or report +the bug. One way around this is to find the code that causes the error, and +run R with a C++ debugger. + +Firstly, load R with your debugger. The most common debuggers are `gdb` and `lldb` + +In my case it's `gdb`, but if you're using the `lldb` debugger, just swap in +that command here. + +```shell +R -d gdb +``` + +Next, run R. + +```shell +run +``` + +You should now be in an R session with the C++ debugger attached. This will +look similar to a normal R session, but with extra output. For example, here +is the output I get from running an R session in the debugger +and then loading arrow: + +``` +R version 4.1.1 (2021-08-10) -- "Kick Things" +Copyright (C) 2021 The R Foundation for Statistical Computing +Platform: x86_64-pc-linux-gnu (64-bit) + +R is free software and comes with ABSOLUTELY NO WARRANTY. +You are welcome to redistribute it under certain conditions. +Type 'license()' or 'licence()' for distribution details. + + Natural language support but running in an English locale + +R is a collaborative project with many contributors. +Type 'contributors()' for more information and +'citation()' on how to cite R or R packages in publications. + +Type 'demo()' for some demos, 'help()' for on-line help, or +'help.start()' for an HTML browser interface to help. +Type 'q()' to quit R. + +[Detaching after vfork from child process 48943] +[Detaching after vfork from child process 48945] +> library(arrow) +[New Thread 0x7ffff326d700 (LWP 48953)] +[New Thread 0x7fffeb7ff700 (LWP 48958)] + +Attaching package: ‘arrow’ + +The following object is masked from ‘package:utils’: + + timestamp +``` + +Now, run your code - either directly in the session or by sourcing it from a +file. If the code results in a segfault, you will have extra output that you +can use to diagnose the problem or attach to an issue as extra information. + +For an excellent in-depth guide to using the C++ debugger in R, see this blog +post by David Vaughan: https://blog.davisvaughan.com/2019/04/05/debug-r-package-with-cpp/. From 51a0968f8b5b61430107dd718a7db55cff478603 Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Wed, 10 Nov 2021 10:51:26 +0000 Subject: [PATCH 2/5] Update pkgdown --- r/_pkgdown.yml | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/r/_pkgdown.yml b/r/_pkgdown.yml index c6a19119ed3..cca043835f9 100644 --- a/r/_pkgdown.yml +++ b/r/_pkgdown.yml @@ -72,6 +72,10 @@ navbar: href: articles/flight.html - text: Arrow R Developer Guide href: articles/developing.html + - text: Developers + menu: + - text: Running R with the C++ debugger + href: articles/developers/debugger.html reference: - title: Multi-file datasets contents: From 419ec84b0cd682e74714de99f6d2f36ed6bcdad3 Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Wed, 10 Nov 2021 16:00:32 +0000 Subject: [PATCH 3/5] Rename article and add stuff in about just R debugging --- r/_pkgdown.yml | 4 +- r/vignettes/developers/debugger.Rmd | 68 ------------------------ r/vignettes/developers/debugging.Rmd | 77 ++++++++++++++++++++++++++++ 3 files changed, 79 insertions(+), 70 deletions(-) delete mode 100644 r/vignettes/developers/debugger.Rmd create mode 100644 r/vignettes/developers/debugging.Rmd diff --git a/r/_pkgdown.yml b/r/_pkgdown.yml index cca043835f9..1457891fb34 100644 --- a/r/_pkgdown.yml +++ b/r/_pkgdown.yml @@ -74,8 +74,8 @@ navbar: href: articles/developing.html - text: Developers menu: - - text: Running R with the C++ debugger - href: articles/developers/debugger.html + - text: Debugging + href: articles/developers/debugging.html reference: - title: Multi-file datasets contents: diff --git a/r/vignettes/developers/debugger.Rmd b/r/vignettes/developers/debugger.Rmd deleted file mode 100644 index 2038d37cfae..00000000000 --- a/r/vignettes/developers/debugger.Rmd +++ /dev/null @@ -1,68 +0,0 @@ -# Running R code with the C++ debugger - -As Arrow has C++ code at its core, debugging code can sometimes be tricky when -errors originate in the C++ rather than the R layer. If you are adding new code -which triggers a C++ bug (or find one in existing code), this can result in a -segfault. If you are working in RStudio, the session is aborted, and you may -not be able to retrieve the error messaging needed to diagnose and/or report -the bug. One way around this is to find the code that causes the error, and -run R with a C++ debugger. - -Firstly, load R with your debugger. The most common debuggers are `gdb` and `lldb` - -In my case it's `gdb`, but if you're using the `lldb` debugger, just swap in -that command here. - -```shell -R -d gdb -``` - -Next, run R. - -```shell -run -``` - -You should now be in an R session with the C++ debugger attached. This will -look similar to a normal R session, but with extra output. For example, here -is the output I get from running an R session in the debugger -and then loading arrow: - -``` -R version 4.1.1 (2021-08-10) -- "Kick Things" -Copyright (C) 2021 The R Foundation for Statistical Computing -Platform: x86_64-pc-linux-gnu (64-bit) - -R is free software and comes with ABSOLUTELY NO WARRANTY. -You are welcome to redistribute it under certain conditions. -Type 'license()' or 'licence()' for distribution details. - - Natural language support but running in an English locale - -R is a collaborative project with many contributors. -Type 'contributors()' for more information and -'citation()' on how to cite R or R packages in publications. - -Type 'demo()' for some demos, 'help()' for on-line help, or -'help.start()' for an HTML browser interface to help. -Type 'q()' to quit R. - -[Detaching after vfork from child process 48943] -[Detaching after vfork from child process 48945] -> library(arrow) -[New Thread 0x7ffff326d700 (LWP 48953)] -[New Thread 0x7fffeb7ff700 (LWP 48958)] - -Attaching package: ‘arrow’ - -The following object is masked from ‘package:utils’: - - timestamp -``` - -Now, run your code - either directly in the session or by sourcing it from a -file. If the code results in a segfault, you will have extra output that you -can use to diagnose the problem or attach to an issue as extra information. - -For an excellent in-depth guide to using the C++ debugger in R, see this blog -post by David Vaughan: https://blog.davisvaughan.com/2019/04/05/debug-r-package-with-cpp/. diff --git a/r/vignettes/developers/debugging.Rmd b/r/vignettes/developers/debugging.Rmd new file mode 100644 index 00000000000..1de214b34c8 --- /dev/null +++ b/r/vignettes/developers/debugging.Rmd @@ -0,0 +1,77 @@ +# Debugging Arrow + +If you are a developer working with Arrow code, the package's use of tidy eval +and C++ necessitates a solid debugging strategy. In this article, we reccommend +a few approaches. + +## Debugging R code + +The following resources provide detailed guides to debugging R code: +* [Advanaced R's chapter on debugging](https://adv-r.hadley.nz/debugging.html) +* [The RStudio debugging documentation](https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio) + +In general, we have found that using interactive debugging (e.g. calls to +`browser()`), where you can inspect objects in a particular environment, is +more efficient than simpler techniques such as `print()` statements. + +## Getting more descriptive C++ error messages + +If you are working in the RStudio IDE, your R session will be aborted if there is +a segfault. If you re-run your code in a command-line R session, the session +isn't automatically aborted and so it will be possible to copy the error +message accompanying the segfault. + +```shell +R +> cpp11::cpp_function("double this_will_crash() { double* some_array = nullptr; return some_array[INT_MAX]; }"); this_will_crash() +# *** caught segfault *** +# address 0x3fffffff8, cause 'memory not mapped' +# +# Traceback: +# 1: .Call("_code_12cc653bfe09b_this_will_crash", PACKAGE = "code_12cc653bfe09b") +# 2: this_will_crash() +``` + +### Running R code with the C++ debugger + +As Arrow has C++ code at its core, debugging code can sometimes be tricky when +errors originate in the C++ rather than the R layer. If you are adding new code +which triggers a C++ bug (or find one in existing code), this can result in a +segfault. If you are working in RStudio, the session is aborted, and you may +not be able to retrieve the error messaging needed to diagnose and/or report +the bug. One way around this is to find the code that causes the error, and +run R with a C++ debugger. + +Firstly, load R with your debugger. The most common debuggers are `gdb` and `lldb` + +In my case it's `gdb`, but if you're using the `lldb` debugger (for example, +if you're on a Mac), just swap in +that command here. + +```shell +R -d gdb +``` + +Next, run R. + +```shell +run +``` + +You should now be in an R session with the C++ debugger attached. This will +look similar to a normal R session, but with extra output. + +Now, run your code - either directly in the session or by sourcing it from a +file. If the code results in a segfault, you will have extra output that you +can use to diagnose the problem or attach to an issue as extra information. + +Here is an example of some R code that intentionally causes a segfault, as well +as the output produced. + +``` + +``` + + +For an excellent in-depth guide to using the C++ debugger in R, see this blog +post by David Vaughan: https://blog.davisvaughan.com/2019/04/05/debug-r-package-with-cpp/. From 4ee43a682bf9715d1cdc5e3d96458e8a02ea5390 Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Thu, 11 Nov 2021 12:44:55 +0000 Subject: [PATCH 4/5] Add extra resources and context --- r/vignettes/developers/debugging.Rmd | 62 ++++++++++++++++++---------- 1 file changed, 40 insertions(+), 22 deletions(-) diff --git a/r/vignettes/developers/debugging.Rmd b/r/vignettes/developers/debugging.Rmd index 1de214b34c8..b6a9eafa83d 100644 --- a/r/vignettes/developers/debugging.Rmd +++ b/r/vignettes/developers/debugging.Rmd @@ -1,37 +1,39 @@ # Debugging Arrow If you are a developer working with Arrow code, the package's use of tidy eval -and C++ necessitates a solid debugging strategy. In this article, we reccommend +and C++ necessitates a solid debugging strategy. In this article, we recommend a few approaches. ## Debugging R code -The following resources provide detailed guides to debugging R code: -* [Advanaced R's chapter on debugging](https://adv-r.hadley.nz/debugging.html) -* [The RStudio debugging documentation](https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio) - In general, we have found that using interactive debugging (e.g. calls to `browser()`), where you can inspect objects in a particular environment, is more efficient than simpler techniques such as `print()` statements. -## Getting more descriptive C++ error messages +## Getting more descriptive C++ error messages after a segfault If you are working in the RStudio IDE, your R session will be aborted if there is a segfault. If you re-run your code in a command-line R session, the session isn't automatically aborted and so it will be possible to copy the error -message accompanying the segfault. +message accompanying the segfault. Here is an example from a bug which +existed at time of writing. ```shell -R -> cpp11::cpp_function("double this_will_crash() { double* some_array = nullptr; return some_array[INT_MAX]; }"); this_will_crash() -# *** caught segfault *** -# address 0x3fffffff8, cause 'memory not mapped' -# -# Traceback: -# 1: .Call("_code_12cc653bfe09b_this_will_crash", PACKAGE = "code_12cc653bfe09b") -# 2: this_will_crash() +> S3FileSystem$create() + + *** caught segfault *** +address 0x1a0, cause 'memory not mapped' + +Traceback: + 1: (function (anonymous, access_key, secret_key, session_token, role_arn, session_name, external_id, load_frequency, region, endpoint_override, scheme, background_writes) { .Call(`_arrow_fs___S3FileSystem__create`, anonymous, access_key, secret_key, session_token, role_arn, session_name, external_id, load_frequency, region, endpoint_override, scheme, background_writes)})(access_key = "", secret_key = "", session_token = "", role_arn = "", session_name = "", external_id = "", load_frequency = 900L, region = "", endpoint_override = "", scheme = "", background_writes = TRUE, anonymous = FALSE) + 2: exec(fs___S3FileSystem__create, !!!args) + 3: S3FileSystem$create() ``` +This output provides the R traceback; however, it doesn't provide any +information about the exact line of C++ code from which the segfault originated. +For this, you will need to run R with the C++ debugger. + ### Running R code with the C++ debugger As Arrow has C++ code at its core, debugging code can sometimes be tricky when @@ -42,11 +44,12 @@ not be able to retrieve the error messaging needed to diagnose and/or report the bug. One way around this is to find the code that causes the error, and run R with a C++ debugger. -Firstly, load R with your debugger. The most common debuggers are `gdb` and `lldb` +Firstly, load R with your debugger. The most common debuggers are `gdb` +(typically found on Linux, sometimes on macOS, or Windows via MinGW or Cygwin) +and `lldb` (the default macOS debugger). In my case it's `gdb`, but if you're using the `lldb` debugger (for example, -if you're on a Mac), just swap in -that command here. +if you're on a Mac), just swap in that command here. ```shell R -d gdb @@ -65,13 +68,28 @@ Now, run your code - either directly in the session or by sourcing it from a file. If the code results in a segfault, you will have extra output that you can use to diagnose the problem or attach to an issue as extra information. -Here is an example of some R code that intentionally causes a segfault, as well -as the output produced. +Here is debugger output from the segfault shown in the previous example. You +can see here that the exact line which triggers the segfault is included in the +output. ``` +> S3FileSystem$create() +Thread 1 "R" received signal SIGSEGV, Segmentation fault. +0x00007ffff0128369 in std::__atomic_base::operator++ (this=0x178) at /usr/include/c++/9/bits/atomic_base.h:318 +318 operator++() noexcept ``` +## Resources + +The following resources provide detailed guides to debugging R code: + +* [The chapter on debugging in 'Advanced R' by Hadley Wickham](https://adv-r.hadley.nz/debugging.html) +* [The RStudio debugging documentation](https://support.rstudio.com/hc/en-us/articles/205612627-Debugging-with-RStudio) + +For an excellent in-depth guide to using the C++ debugger in R, see [this blog +post by David Vaughan.](https://blog.davisvaughan.com/2019/04/05/debug-r-package-with-cpp/) + +You can find a list of equivalent [gdb and lldb commands on the LLDB website.](https://lldb.llvm.org/use/map.html) + -For an excellent in-depth guide to using the C++ debugger in R, see this blog -post by David Vaughan: https://blog.davisvaughan.com/2019/04/05/debug-r-package-with-cpp/. From 5665ff02781b29b04227d2678f6bc43d3f73af73 Mon Sep 17 00:00:00 2001 From: Nic Crane Date: Thu, 11 Nov 2021 13:33:56 +0000 Subject: [PATCH 5/5] Add details about macOS issue --- r/vignettes/developers/debugging.Rmd | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/r/vignettes/developers/debugging.Rmd b/r/vignettes/developers/debugging.Rmd index b6a9eafa83d..a18178dfac1 100644 --- a/r/vignettes/developers/debugging.Rmd +++ b/r/vignettes/developers/debugging.Rmd @@ -32,9 +32,9 @@ Traceback: This output provides the R traceback; however, it doesn't provide any information about the exact line of C++ code from which the segfault originated. -For this, you will need to run R with the C++ debugger. +For this, you will need to run R with the C++ debugger attached. -### Running R code with the C++ debugger +### Running R code with the C++ debugger attached As Arrow has C++ code at its core, debugging code can sometimes be tricky when errors originate in the C++ rather than the R layer. If you are adding new code @@ -44,6 +44,9 @@ not be able to retrieve the error messaging needed to diagnose and/or report the bug. One way around this is to find the code that causes the error, and run R with a C++ debugger. +If you are using macOS and have installed R using the Apple installer, you will +not be able to run R with a debugger attached; please see [the instructions here for details on causes of this and workarounds.](https://mac.r-project.org/bin/macosx/RMacOSX-FAQ.html#I-cannot-attach-debugger-to-R) + Firstly, load R with your debugger. The most common debuggers are `gdb` (typically found on Linux, sometimes on macOS, or Windows via MinGW or Cygwin) and `lldb` (the default macOS debugger).