diff --git a/posts/2024-12-12-non-api-use/index.qmd b/posts/2024-12-12-non-api-use/index.qmd
new file mode 100644
index 00000000..a775abd0
--- /dev/null
+++ b/posts/2024-12-12-non-api-use/index.qmd
@@ -0,0 +1,1331 @@
+---
+title: "Use of non-API entry points in `data.table`"
+author: "Ivan Krylov"
+date: "2024-12-12"
+categories: [developer, guest post, performance]
+# image: "image.jpg"
+draft: true
+bibliography: refs.bib
+---
+
+```{r}
+#| echo: false
+library(data.table)
+library(tools) # format.check_details
+load('precomputed.rda')
+```
+
+In the late 1970's, people at Bell Laboratories designed the S
+programming language in order to facilitate interactive exploratory data
+analysis [@Chambers2016]. Instead of writing, compiling, scheduling, and
+interpreting the output of individual Fortran programs, the goal of S
+was to conduct all the necessary steps of the analysis on the fly. S
+achieved this not by replacing the extensive collection of Fortran
+subroutines, but by providing a special interface language [@Becker1985]
+through which S could communicate with compiled code.
+
+Fast forward more than four decades and an increase by more than three
+orders of magnitude in storage and processing capability of computers
+around us. The [dominant implementation of S is now R][is.R]. It is now
+feasible to implement algorithms solely in R, recouping the potential
+performance losses by reducing the programmer effort spent debugging and
+maintaining the code [@Nash2024]. Still, the capability of R to be
+extended by special-purpose compiled code is as important as ever. As of
+`r when`, `r round(sum(needscomp)/length(needscomp)*100)`% of CRAN
+packages use compiled code. Since the implementation language of R is C,
+not Fortran, the application programming interface (API) for R is mainly
+defined in terms of C.
+
+What's in an API?
+=================
+
+[Writing R Extensions][WRE] ("WRE") is the definitive guide for R
+package development. Together with the [CRAN policy][CRANpolicy] it
+forms the "rules as written" that the maintainers of CRAN packages must
+follow. A recent version of R exports `r nrow(symbols)` symbols,
+including `r symbols[,sum(type=='function')]` functions ("entry points",
+not counting C preprocessor macros) and
+`r symbols[,sum(type!='function')]` variables. Not all of them are
+intended to be used by packages. Even back in R-3.3.0, the oldest
+version currently supported by `data.table`, [WRE chapter 6, "The R
+API"][WRE33API] classified R's entry points into four categories:
+
+> * __API__
+> Entry points which are documented in this manual and declared in an
+> installed header file. These can be used in distributed packages and
+> will only be changed after deprecation.
+> * __public__
+> Entry points declared in an installed header file that are exported
+> on all R platforms but are not documented and subject to change
+> without notice.
+> * __private__
+> Entry points that are used when building R and exported on all R
+> platforms but are not declared in the installed header files. Do not
+> use these in distributed code.
+> * __hidden__
+> Entry points that are where possible (Windows and some modern
+> Unix-alike compilers/loaders when using R as a shared library) not
+> exported.
+
+Although nobody objected to the use of the _API_ entry points, and there
+was little point in trying to use the _hidden_ entry points in a package
+that would fail to link almost everywhere, the _public_ and the
+_private_ entry points ended up being a point of contention. Those
+deemed too internal to use but not feasible to make _hidden_ were (and
+still are) listed in the character vector `tools:::nonAPI`: ` R CMD
+check ` looks at the functions imported by the package and signals a
+`NOTE` if it finds any listed there.
+
+The remaining _public_ functions, neither documented as API nor
+explicitly forbidden by ` R CMD check `, sat there, alluring the package
+developers with their offers. For example, the [serialization
+interface][ltierney_serialize] is only [documented in WRE since
+R-4.5][WRE45serialize], but it has been powering part of the [digest]
+CRAN package since 2019 (and other packages before it) without any
+drastic changes. Some of the inclusions in `tools:::nonAPI` could have
+been historical mistakes: while WRE has been saying [back in version
+3.3.0][WRE33wilcox] that `wilcox_free` should be called after a call to
+the (API) functions `dwilcox`, `pwilcox` or `qwilcox`, the function was
+only [declared in the public headers][wilcox_declared] and [removed from
+`tools:::nonAPI`][wilcox_api] in R-4.2.0. Still, between R-3.3.3 and
+R-4.4.2, the `#define USE_RINTERNALS` escape hatch finally closed,
+`tools:::nonAPI` grew from `r length(nonAPI.3_3)` to
+`r length(nonAPI.4_4)` entries, and the package maintainers had to adapt
+or face archival of their packages.
+
+A [recent question on R-devel][ALTREPnonAPI] (whether the [ALTREP]
+interface should be considered "API" for the purpose of CRAN package
+development) sparked a series of events and an extensive discussion
+containing the highest count of occurrences of the word "API" per month
+ever seen on R-devel (234), topping [October 2002][Rd200210] (package
+versioning and API breakage, 150), [October 2005][Rd200510] (API for
+graphical interfaces and console output, 124), and [May 2019][Rd201905]
+(discussions of the ALTREP interface and multi-threading, 121). As a
+result, Luke Tierney [started work][clarifyingAPI] on programmatically
+describing the functions and other symbols exported by R (including
+variables and preprocessor and enumeration constants), giving a
+stronger definition to the interface. His changes add the currently
+unexported function `tools:::funAPI()` that lists entry points and two
+more of their categories:
+
+> * __experimental__
+> Entry points declared in an installed header file that are part of
+> an experimental API, such as `R_ext/Altrep.h`. These are subject to
+> change, so package authors wishing to use these should be prepared
+> to adapt.
+> * __embedding__
+> Entry points intended primarily for embedding and creating new
+> front-ends. It is not clear that this needs to be a separate
+> category but it may be useful to keep it separate for now.
+
+Additionally, WRE now spells out that entry points not explicitly
+documented or at least listed in the output of `tools:::funAPI` (or
+something that will replace it) are now off-limits, even if not
+currently present in `tools:::nonAPI` (emphasis added):
+
+> * __public__
+> Entry points declared in an installed header file that are exported
+> on all R platforms but are not documented and subject to change
+> without notice. _Do not use these in distributed code. Their
+> declarations will eventually be moved out of installed header
+> files._
+
+Correspondingly, the number of `tools:::nonAPI` entry points in the
+current development version of R rose to `r length(nonAPI.trunk)`,
+prompting the blog post you are currently reading.
+
+
+
+
+
+
+
+Non-API entry points marked by ` R CMD check `
+==============================================
+
+The first version of the `data.table` package in the CRAN archive dates
+back to April 2006 (which corresponds to R version 2.3.0). It has been
+evolving together with R and its API and thus has accumulated a number
+of uses of R internals that are [now flagged by ` R CMD check ` as
+non-API][remove_non_API]:
+
+`r gsub(
+ '(?m)^', '> ', perl = TRUE,
+ format(subset(dtchecks, grepl('API', Output))[1,])
+)`
+
+ -- ` R CMD check --as-cran ` on a released version of `data.table`
+
+Operating on the S4 bit: `IS_S4_OBJECT`, `SET_S4_OBJECT`, `UNSET_S4_OBJECT`
+---------------------------------------------------------------------------
+
+In R's "S4" OOP system, objects can have a primitive base type (e.g.
+`setClass("PrimitiveBaseType", contains = "numeric")` or no base type at
+all (e.g. `setClass("NoBaseType")`). In the former case, their
+`SEXPTYPE` code is that of their base class (e.g. `REALSXP`). In the
+latter case, their type code is `OBJSXP` (previously `S4SXP`, which is
+now an alias for `OBJSXP`). To make both cases work consistently, R uses
+a [special "S4" bit][RI_S4rep] in the header of the object.
+
+The `data.table` class is [registered][setOldClass] with the S4 OOP
+system, making it possible to create S4 classes containing `data.table`s
+as members (`setClass(slots = c(mytable = 'data.table'))`) or even
+inheriting from `data.table` (and, in turn, from `data.frame`:
+`setClass(contains = 'data.table')`). Additionally, `data.table`s may
+contain columns that are themselves S4 objects, and both of these cases
+require care from the C code.
+
+The undocumented functions `IS_S4_OBJECT`, `SET_S4_OBJECT`,
+`UNSET_S4_OBJECT` exist as bare interfaces to [the internal
+macros][IS_S4_OBJECT] of the same names and directly access the flag
+inside their argument. Writing R Extensions
+[documents][WRE_replacement_entrypoints] `Rf_isS4` and `Rf_asS4` as
+their replacements.
+
+The [`Rf_isS4`][isS4] function is a wrapper for `IS_S4_OBJECT` that
+follows the usual naming convention for remapped functions, has been
+part of the API for a long time, and could implement additional checks
+if they are needed by R. The [`Rf_asS4`][asS4] function (experimental
+API) is more involved, trying to "deconstruct" S4 objects into S3
+objects if possible and requested to. If the reference
+count of its argument is _above_ 1, it will operate upon and return
+its shallow duplicate.
+
+`data.table` used to directly operate on the S4 bit in two places, the
+[`shallow` function in `src/assign.c`][datatable_assign_shallow_S4] and
+the [`keepattr` function in
+`src/dogroups.c`][datatable_dogroups_keepattr_S4]. In both cases, this
+was required after directly modifying attribute list using the
+undocumented function `SET_ATTRIB`. For
+`shallow`, the solution was to replace the manual operation of
+attributes with
+[`SHALLOW_DUPLICATE_ATTRIB`][datatable_assign_SHALLOW_ATTRIB] (API,
+available since 3.3.0), which itself takes care of invariants like the
+object bit and the S4 bit.
+
+The `keepattr` function is only used in
+[`growVector`][datatable_dogroups_grow_keepattr] to transplant all
+attributes from a vector to its enlarged copy without duplicating them,
+for which no API exists. The solution is to
+[use `Rf_asS4` to control the S4 object bit][remove_set_s4_object],
+knowing that the new vector is freshly allocated and thus cannot be
+shared yet.
+
+**Status** in `data.table`: fixed in [#6183][remove_set_s4_object] and
+[#6264].
+
+Converting between calls and pairlists: `SET_TYPEOF`
+----------------------------------------------------
+
+In R, [function calls][call] are internally represented as Lisp-style
+pairlists where the first pair is of special type `LANGSXP` instead of
+`LISTSXP`. For example, the following diagram illustrates the data
+structure of the call `print(x = 42L)`:
+
+{width=40em}
+
+Here, every list item is a separate R object, a "cons cell"; each cell
+contains the value in its `CAR` field and a reference to the rest of the
+list in its `CDR` field. Argument names, if provided, are stored in the
+third field, `TAG`. The list is terminated by `R_NilValue`, which is of
+type `NILSXP`. These structures must be constructed every time C code
+wants to evaluate a function call ([e.g.][datatable_rbindlist_eval]).
+
+Previously, R API contained a function to allocate `LISTSXP` pairlists
+of arbitrary length, `allocList()`, but not function calls, so it became
+a somewhat common idiom to first allocate the list and then use
+`SET_TYPEOF` to change the type of the head pair to `LANGSXP`. This
+did not previously lead to problems, since the two types have the same
+internal memory layout.
+
+The danger of `SET_TYPEOF` lies in the possibility to set the type of an
+R value to one with an incompatible memory layout. (For example, vector
+types `REALSXP` and `INTSXP` are built very differently from cons cells
+`LISTSXP` and `LANGSXP`.) Starting with R-4.4.1, [R contains the
+`allocLang` function in addition to the `allocList` function][WRE_call]
+that directly allocates a function call object with a head pair of type
+`LANGSXP`. In order to stay compatible with previous R versions,
+packages may [allocate the `LISTSXP` tail first and then use `lcons()`
+to construct the `LANGSXP` head pair of the call][remove_set_typeof].
+
+
+Problem (the only instance in `data.table`):
+
+```c
+ SEXP s = PROTECT(allocList(2));
+ SET_TYPEOF(s, LANGSXP);
+// ^^^^^^^^^^ unsafe operation, could be used to corrupt objects
+ SETCAR(s, install("format.POSIXct"));
+ SETCAR(CDR(s), column);
+```
+
+Solutions:
+
+```c
+// for fixed-size calls with contents known ahead of time
+SEXP s = lang2(install("format.POSIXct"), column);
+```
+or:
+```c
+// partially pre-populate
+SEXP s = lang2(install("format.POSIXct"), R_NilValue);
+// later, when 'column' is known:
+SETCAR(CDR(s), column);
+```
+or:
+```c
+// allocate a call with 'n' elements
+SEXP call = lcons(R_NilValue, allocList(n - 1));
+```
+or:
+```c
+// in R >= 4.4.1 only:
+SEXP call = allocLang(n);
+```
+
+Unfortunately, the `LCONS` macro didn't work with `#define R_NO_REMAP`
+prior to R-4.4, because it expanded to `lcons()` instead of
+`Rf_lcons()`.
+
+**Status** in `data.table`: fixed in [#6313][remove_set_typeof].
+
+Strings as C arrays of `CHARSXP` values: `STRING_PTR`
+-----------------------------------------------------
+
+From the point of view of R code, strings are very simple things, much
+like numbers: they live in atomic vectors and can be directly compared
+with other objects. It is only natural to desire to work with them as
+easily from C code as it's possible with other atomic types, where
+functions `REAL()`, `INTEGER()`, or `COMPLEX()` can be used to access
+the buffer containing the numbers.
+
+The underlying reality of strings is more complicated: since they
+internally manage memory buffers containing text in a given encoding,
+they must be subject to garbage collection. Like other managed objects
+in R, they are represented as `SEXP` values of special type `CHARSXP`.
+R's garbage collector is [generational and requires the use of write
+barrier][RI17] ([1][Tierney_gengc], [2][Tierney_writebr]) any time a
+`SEXP` value (such as an `STRSXP` vector) references another `SEXP`
+value (such as a `CHARSXP` string). In a generational garbage collector,
+"younger" generations are marked and sweeped more frequently than
+"older" ones, because in a typical R session, most objects are temporary
+[@Jones2012, chapter 9]. If package C code manually writes a reference
+to a "young" `CHARSXP` object into an "old" `STRSXP` vector without
+taking generations into account, a following collection of the "young"
+pool of objects will miss the `CHARSXP` being referenced by the "old"
+`STRSXP` and remove the `CHARSXP` as "garbage". This makes the `SEXP *`
+pointers returned by `STRING_PTR` unsafe and requires the use of
+`STRING_PTR_RO` function, which returns a read-only `const SEXP *`.
+
+Thankfully, `data.table` has already been using read-only `const SEXP *`
+pointers when working with `STRSXP` vectors, so the required changes to
+the code were [not too substantial][remove_string_ptr], limited to
+the name of the function:
+
+Example of the problem:
+
+```c
+const SEXP *sourceD = STRING_PTR(source);
+// ^^^^^^^^^^
+// returns a writeable SEXP * pointer, therefore unsafe
+```
+
+Solution:
+
+```c
+#if R_VERSION < R_Version(3, 5, 0)
+// STRING_PTR_RO only appeared in R-3.5
+#define STRING_PTR_RO(x) (STRING_PTR(x))
+#endif
+
+// later:
+const SEXP *sourceD = STRING_PTR_RO(source);
+// ^^^^^^^^^^^^^
+// returns a const SEXP * pointer, which prevents accidental writes
+```
+
+**Status** in `data.table`: fixed in [#6312][remove_string_ptr].
+See also: [PR18775].
+
+Reading the reference counts: `NAMED` {#NAMED}
+-------------------------------------
+
+In plain R, all value types -- numbers, strings, lists -- have
+pass-by-value semantics. Without dark and disturbing things in play, such
+as non-standard evaluation or active bindings, R code can give a plain
+value (`x <- 1:10`) to a function (`f(x)`) or store it in a variable (`y
+<- x`), have the function modify its argument (`f <- \(x) { x[1] <- 0
+}`) or change the duplicate variable (`y[2] <- 3`), and still have the
+original value intact (`stopifnot(identical(x, 1:10))`). Only the
+inherently mutable types, such as environments, external pointers and
+weak references, will stay shared between all assignments and function
+arguments; the value types behave as if R copies them every time.
+
+And yet actually making these copies is wasteful when the code only
+reads the variable and does not alter it. (In fact, one of the original
+motivations of `data.table` was to reduce certain wasteful copying of
+data that happens during normal R computations.) Until version 4.0.0,
+`NAMED` was R's mechanism to save memory and CPU time instead of
+creating and storing these copies. A temporary object such as the value
+of `1:10` was not bound to a symbol and thus could be modified right
+away. Assigning it to a variable, as in `x <- 1:10`, gave it a
+`NAMED(x)` count of 1, for which R had an internal optimisation in
+replacement function calls like `foo(x) <- 3`. Assigning the same value
+to yet another symbol (by copying `y <- x` or calling a function
+`foo(x)`) increased the `NAMED()` count to 2 or more, for which there
+was no optimisation: in order to modify one of the symbols, R was
+required to duplicate `x` first. `NAMED()` was not necessarily decreased
+after the bindings disappeared, and decreasing it after having reached
+`NAMEDMAX` was impossible. During the lifetime of R-3.x, `NAMEDMAX` was
+increased from 2 to 3 and later to 7.
+
+Between R-3.1.0 and R-4.0.0, R [migrated from `NAMED` to reference
+counting][Tierney_refcnt]. Reference counts are easier to properly
+decrement than `NAMED`, thus preventing unneeded copies of objects that
+became unreferenced. R-3.5.0 [documented the symbols][Rnews_setnamed]
+`MAYBE_REFERENCED(.)` / `NO_REFERENCES(.)` for use instead of checking
+`NAMED(.) == 0`, `MAYBE_SHARED(.)` / `NOT_SHARED(.)` instead of checking
+`NAMED(.) > 1`, and `MARK_NOT_MUTABLE(.)` instead of setting `NAMED(.)`
+to `NAMEDMAX`, which later became part of the API instead of the
+`NAMED(.)` and `REFCNT(.)` functions. The hard rules are that a value is
+safe to modify in place if it has `NO_REFERENCES()` (reference count of
+0), definitely unsafe to modify in place (requiring a call to
+`duplicate` or `shallow_duplicate`) if it is `MAYBE_SHARED()` (reference
+count above 1), and almost certainly unsafe to modify in place if it is
+`MAYBE_REFERENCED()` (reference count of 1).
+
+`data.table`'s only uses of `NAMED()` were in the [verbose output during
+assignment][remove_named]:
+
+```c
+if (verbose) {
+ Rprintf(_("RHS for item %d has been duplicated because NAMED==%d MAYBE_SHARED==%d, but then is being plonked. length(values)==%d; length(cols)==%d)\n"),
+ i+1, NAMED(thisvalue), MAYBE_SHARED(thisvalue), length(values), length(cols));
+ // ^^^^^ non-API function
+}
+```
+
+Since the correctness of the modification operation hinges on the
+reference count being 0 (and it may be important whether it's exactly 1
+or above 1), the same amount of _useful_ information can be conveyed by
+printing `MAYBE_REFERENCED()` and `MAYBE_SHARED()` instead of `NAMED()`:
+
+```c
+if (verbose) {
+ Rprintf(_("RHS for item %d has been duplicated because MAYBE_REFERENCED==%d MAYBE_SHARED==%d, but then is being plonked. length(values)==%d; length(cols)==%d)\n"),
+ i+1, MAYBE_REFERENCED(thisvalue), MAYBE_SHARED(thisvalue), length(values), length(cols));
+ // ^^^^^^^^^^^^^^^^ API function
+}
+```
+
+**Status** in `data.table`: fixed in [#6420][remove_named].
+
+Encoding bits: `LEVELS`
+-----------------------
+
+`LEVELS` is the name of the internal R [macro][LEVELS_macro] and an
+exported non-API [function][LEVELS_function] accessing a [16-bit field
+called `gp`][LEVELS_field] ([general-purpose][RI112]) that is present in
+the header of every `SEXP` value. Not every access to this field is
+done using the `LEVELS()` macro; there are bits of R code that access
+`(sexp)->sxpinfo.gp` directly. R uses this field for many purposes:
+
+ * matching given arguments against the formals of a function
+ ([1][gp_for_match1], [2][gp_for_match2], [3][gp_for_match3])
+ * remembering the previous [type][gp_for_gc] of a garbage-collected value
+ * [finalizing][gp_for_finalize] the reference-semantics objects before
+ garbage-collecting them
+ * [marking][gp_for_calling] condition handlers as "calling" (executing
+ on top of where the condition was signalled in the call stack), as
+ opposed to "non-calling" (executing at the site of the `tryCatch`
+ call)
+ * [marking][gp_for_assignment] objects in complex assignment calls
+ * storing the [S4 object bit][gp_for_s4]
+ * [marking][gp_for_jit] functions as (un)suitable for bytecode
+ compilation
+ * [marking][gp_for_growable] vectors as growable
+ * [marking][gp_for_missing] provided ("actual") function arguments as
+ [missing][gp_for_missing2]
+ * [marking][gp_for_ddval] the `..1`, `..2`, etc symbols as
+ corresponding to the [given element of the `...`
+ argument][Rhelp_dots]
+ * [marking][gp_for_env] environments as [locked][envflags_locked], or
+ for [caching][envflags_global] the global variable lookup, or for
+ looking up values in the base environment or the special functions
+ ([1][gp_for_basesym], [2][basesym2], [3][gp_for_special],
+ [4][specialsym2])
+ * [marking][gp_for_hashash] symbols naming environment contents for
+ [hash lookup][hashash2]
+ * [marking][gp_for_active] bindings inside environments as
+ [active][active_binding]
+ * [marking][gp_for_promsxp] promise objects as already evaluated
+ * [marking][gp_for_charsxp] `CHARSXP` values as present in the global
+ cache or being in a certain encoding
+
+Although the value of `gp` is directly stored in R's serialized data
+stream, neither of these are part of the API. Out of all possible uses
+for this flag, `data.table` is only interested in string encodings. From
+the viewpoints of [plain R][R_Encoding] and the [C API][WRE_encoding],
+an individual string (`CHARSXP` value) can be marked with the following
+encodings:
+
+R-level encoding name | C-level encoding constant | Meaning
+:----------------:|:----------------:|------------------------------
+`"latin1"` | `CE_LATIN1` | ISO/IEC 8859-1 or CP1252
+`"UTF-8"` | `CE_UTF8` | ISO/IEC 10646
+`"unknown"` | `CE_NATIVE` | Encoding of the current locale
+`"bytes"` | `CE_BYTES` | Not necessarily text; `translateChar` will fail
+
+Internally, R also [marks strings as encoded in ASCII][R_SET_ASCII]:
+since all three encodings are ASCII-compatible, an ASCII string will
+never need to be translated into a different encoding. Note that there
+is a subtle difference between a string _marked_ in a certain encoding
+and actually _being_ in a certain encoding: in an R session running with
+a UTF-8 locale (which includes most modern Unix-alikes and Windows ≥
+10, November 2019 update) a string marked as `CE_NATIVE` will also be in
+UTF-8. (Similarly, with an increasingly rare Latin-1 locale, a
+`CE_NATIVE` string will be in Latin-1.)
+
+The `data.table` code is interested in knowing whether a string is
+[marked as UTF-8, Latin-1, or ASCII][datatable_isencoded]. This is used
+to [convert strings to UTF-8 when needed][datatable_needUTF8] (also:
+[output to native encoding or UTF-8 in
+`fwrite`][datatable_ENCODED_CHAR], [automatic conversion in
+`forder`][datatable_anynotascii]). The `getCharCE` API function appeared
+in R-2.7.0 together with the encoding support, so switching the
+`IS_UTF8` and `IS_LATIN` macros from `LEVELS` to API calls [was
+relatively straightforward][datatable_levels1].
+
+R-4.5.0 is expected to introduce the `charIsASCII` "experimental" API
+function that returns the value of the ASCII marker for a `CHARSXP`
+value, which [will replace the use of `LEVELS` in the `IS_ASCII`
+macro][remove_levels]. Curiously, while it looks like the code could
+benefit from switching from the `getCharCE()` tests (which only look at
+the value of the flags and so may needlessly translate strings from
+`CE_NATIVE`) to the new experimental `charIs(UTF8|Latin1)` functions
+that will also return `TRUE` for a matching native encoding, actually
+making the change breaks a number of unit tests.
+
+**Status** in `data.table`: partially fixed in
+[#6420][datatable_levels1], waiting for R-4.5.0 to be released with the
+new API in [#6422][remove_levels].
+
+`SETLENGTH`, `SET_GROWABLE_BIT`, `(SET_)TRUELENGTH`
+---------------------------------------------------
+
+### Growable vectors
+
+Since `data.frame`s and `data.table`s are lists, and lists in R are
+value types with pass-by-value semantics, adding or
+removing a column to one normally involves allocating a new list
+referencing the rest of the columns (performing a "shallow duplicate").
+By contrast, the [over-allocated lists][datatable_overallocation] can be
+resized in place by gradually increasing their `LENGTH` (remembering
+their original length in the `TRUELENGTH` field), obviating the need for
+shallow duplicates at the cost of making `data.table`s shared,
+by-reference values. The change has been introduced in [v1.7.3, November
+2011][news173], together with the `:=` operator for changing the columns
+by reference (which has since become [the defining feature of
+data.table][datatable_logo]).
+
+R's own use of `TRUELENGTH` is [varied][RI113]. The field itself
+appeared in [R-0.63][R_truelength] together with the `VECSXP` lists (to
+replace the Lisp-style linked pairlists). R [uses this
+field][R_hashvalue] in `CHARSXP` strings to store the hash values
+[belonging to symbols][R_install_truelen]. R's many `VECSXP`-based hash
+tables use it to count the primary slots in use: hashes are used for
+reference tracking during (un)serialization ([1][R_serialize_hash],
+[2][R_saveload_hash]) and looking up environment contents
+([1][R_envir_hashpri], [2][R_envir_hashval]). R-3.3 (May 2016) saw the
+inclusion of [radix sort][R_radixsort] from `data.table` itself, which
+uses `TRUELENGTH` to sort strings. R-3.4
+(April 2017) [introduced][R_growable] over-allocation when growing
+vectors due to assignment outside their bounds. The [growable
+bit][gp_for_growable] was added to prevent the mismanagement of the
+allocated memory counter: without the bit set on the over-allocated
+vectors, the garbage collector only counted `LENGTH(x)` instead of
+`TRUELENGTH(x)` units as released when garbage-collecting the vector,
+inflating the counter over time. [ALTREP] objects introduced in R-3.5
+(April 2018) don't have a `TRUELENGTH`: it [cannot be
+set][R_altrep_set_truelen] and is [returned as 0][R_altrep_truelen]. In
+very old versions of R, `TRUELENGTH` wasn't initialised, but it is
+nowadays set to 0, which `data.table` [depends
+upon][datatable_init_testtl].
+
+Nowadays, `data.table` uses vectors whose length is different from their
+allocated size in many places:
+
+* `src/dogroups.c`
+ * reuses the same memory for the [`data.table` subset object
+ `.SD`][datatable_docols_SD] and for the [virtual row number column
+ `.I`][datatable_docols_I] by shortening the vectors to the size of
+ the current group
+ * later [restores their natural length][datatable_docols_restore]
+ * [extends the `data.table` for new columns][datatable_docols_extend]
+ as needed
+* `src/freadR.c`
+ * works with an over-estimated line count and so can [truncate the
+ columns][datatable_freadR_truncate] after the value is known
+ precisely
+ * the columns are [prepared to be truncated][datatable_freadR_settl]
+ * may also [drop columns by reference][datatable_freadR_drop]
+* `src/subset.c`
+ * the `subsetDT` function [prepares an over-allocated
+ `data.table`][datatable_subset_alloc] together with its names.
+* `src/assign.c`
+ * the `shallow` function [prepares][datatable_assign_shallow] an
+ over-allocated `data.table` referencing the columns of an existing
+ `data.table`
+ * `assign` [creates][datatable_assign_create] or
+ [removes][datatable_assign_remove] columns by reference
+ * `finalizer` causes an `INTSXP` vector [with the fake
+ length][datatable_assign_finalizer] to be (eventually)
+ garbage-collected to fix up a discrepancy in R's vector size
+ accounting caused by the existence of the over-allocated
+ `data.table`
+
+`SETLENGTH` presents many opportunities to create inconsistencies within
+R:
+
+* When copying shortened objects without the `GROWABLE_BIT` set, R
+ allocates and copies only `XLENGTH` elements and [duplicates the value
+ of `TRUELENGTH`][R_duplicate_truelength].
+ * For this and other reasons, `data.table`s have a special
+ [`.internal.selfref` attribute][datatable_assign_selfref] attached
+ containing an `EXTPTR` back to itself. A copy of a table can be
+ detected because it will have a different address.
+ * The `_selfrefok` function tries to [restore the correct
+ `TRUELENGTH`][datatable_assign_selfrefok] if it detects a copy.
+ * Setting the `GROWABLE_BIT` on the `data.table` would make R keep the
+ default `TRUELENGTH` (0) instead of copying it.
+* When deallocating shortened objects without the `GROWABLE_BIT` set, R
+ [accounts only for the `XLENGTH` elements][R_memory_getVecSize] being
+ released, over-counting the total amount of allocated memory.
+ * `data.table` compensates for this using the
+ [`finalizer`][datatable_assign_finalizer] on the `.internal.selfrep`
+ attribute.
+ * Setting the `GROWABLE_BIT` on the `data.table` would make R account
+ for `TRUELENGTH` elements instead of `XLENGTH` elements.
+
+Unfortunately, `GROWABLE_BIT` is not part of the API and was only
+introduced in R-3.4, so it does not present a full solution to the
+problems. Moreover,
+
+* Setting `LENGTH` larger than the allocated length may cause R to
+ access undefined or even unmapped memory.
+* For vectors containing other `SEXP` values (of type `VECSXP`,
+ `EXPRSXP`, `STRSXP`): when reducing the `LENGTH`, having a
+ non-persistent value (something unlike the persistent values
+ `R_NilValue` or `R_BlankString` or `R_NaString` provided by R itself)
+ in the newly inaccessible cells will also make them unreachable from
+ the viewpoint of the garbage collector, potentially prompting it to
+ reuse or unmap the pointed-to memory. Increasing the `LENGTH` again
+ with invalid pointers in the newly accessible slots will make an
+ invalid vector that cannot be safely altered or discarded:
+
+ ```c
+ #include
+ #include
+ void foo(void) {
+ {
+ SEXP list = PROTECT(allocVector(VECSXP, 100)), elt;
+ SET_VECTOR_ELT(list, 99, elt = allocVector(REALSXP, 1000));
+
+ double * p = REAL(elt); // initialise the vector
+ for (R_xlen_t i = 0; i < xlength(elt); ++i) p[i] = i;
+
+ SETLENGTH(list, 1); // elt is unreachable
+ R_gc(); // elt is collected
+ SETLENGTH(list, 100); // invalid elt is reachable again
+ R_gc(); // invalid elt is accessed
+ UNPROTECT(1);
+ }
+ R_gc(); // crash here if not above
+ }
+ ```
+
+[Starting with R-4.3][R_PR17620], R packages can implement their own
+`VECSXP`-like objects using the [ALTREP] framework; `STRSXP` objects
+have been supported since R-3.5. An `ALTREP` object is defined by its
+_class_ (a collection of methods) and two arbitrary R values, `data1`
+and `data2`. (Attributes are not a part of the ALTREP representation and
+exist the same way as on normal R objects.) A simple implementation of a
+shortened, expandable vector could hold a full-length vector in the
+`data1` slot and its pretend-length as a one-element `REALSXP` value in
+the `data2` slot. (Currently, `R_xlen_t` values are limited by the
+largest integer precisely representable in an IEEE `double` value, which
+is $2^{52}$.) The over-allocated class will need to implement the
+following methods:
+
+* [Common ALTREP methods][Rapi_altrep_methods]:
+ * `Length()`, returning the pretend-length of the vector. Required.
+ * `Duplicate(deep)`. If not implemented, R will create a copy as an
+ ordinary object using the length and the data pointer provided by
+ the class.
+ * There is also `DuplicateEX(deep)`, which is responsible for
+ copying the value _and_ the attributes, but it may be hard to
+ implement within the API bounds (`ATTRIB` is not API), and R
+ provides a default implementation that calls `Duplicate` above.
+ * Shared mutable vectors [can cause problems][Tierney_mutable], so a
+ decision to let the `Duplicate()` return value share the original
+ vector will require a lot of thought and testing.
+ * `Serialized_state()`, `Unserialize(state)`. If not implemented, R
+ will serialize the value as an ordinary object, which is what
+ currently happens for `data.table`s. Once an R package implements an
+ ALTREP class with a `Serialized_state` method, the format is set in
+ stone; any changes will have to introduce a new class.
+ * Similarly, there is `UnserializeEX(state, attr,
+ objf, levs)` responsible for setting `LEVELS`, the object bit, and
+ the attributes; the default implementation should suffice.
+ * `Inspect(pre, deep, pvec, inspect_subtree)`. May `Rprintf` some
+ information from the ALTREP fields before returning `FALSE` to let R
+ continue `inspect`ing the object.
+* [Common `altvec` methods][Rapi_altvec_methods] required for most code
+ to work with the class:
+ * `Dataptr(writable)`, returning the pointer to the start of the array
+ backing the underlying vector. For `VECSXP` or `STRSXP` vectors,
+ `writable` should always be `FALSE`, so `DATAPTR_RO` can be used.
+ * `Dataptr_or_null()`. May delegate to `Dataptr(FALSE)` above.
+ * `Extract_subset(indx, call)`. Must allocate and return `x[indx]` for
+ 1-based numeric `indx` that may be outside the bounds of `x`.
+* Class-specific methods:
+ * [`altstring` methods][Rapi_altstring_methods]:
+ * `Elt(i)`. Must return `x[[i]]` for 0-based `i` or signal an error.
+ Required.
+ * `Set_elt(i, v)`. Must perform `x[[i]] <- v` for 0-based `i` or
+ signal an error. Required.
+ * `Is_sorted()`. If not implemented, will always return
+ `UNKNOWN_SORTEDNESS`.
+ * `No_NA()`. If not implemented, will always return 0 (unknown whether
+ contains missing values).
+ * [`altlist` methods][Rapi_altlist_methods]:
+ * `Elt(i)` and `Set_elt(i, v)` like above.
+ * The rest of the atomic vector types ([integer][Rapi_altinteger],
+ [logical][Rapi_altlogical], [numeric][Rapi_altreal],
+ [complex][Rapi_altcomplex], [raw][Rapi_altraw]) will each need a
+ subset of the following methods:
+ * `Elt(i)`, `Is_sorted()`, `No_NA()`, as above.
+ * `Get_region(i, n, buf)` to populate the buffer `buf[n]` of the
+ corresponding C type with elements at the given 0-based indices
+ `i`. The indices are not guaranteed to be within bounds; the
+ number of actually copied elements must be returned. If not
+ implemented, R will use the `Elt(i)` method, which is slower.
+ * `Sum(narm)`, `Min(narm)`, `Max(narm)` to compute a summary of the
+ vector, optionally ignoring the missing values. If not
+ implemented, R will use `Get_region` to compute the summaries.
+
+Additionally, `data.table` will need a function to [create new ALTREP
+tables][Rapi_new_altrep] and to resize the vector in place. The resize
+function will need to check whether the given value
+[`R_altrep_inherits`][Rapi_altrep_inherits] from the over-allocated class
+and then modify the ALTREP data slots as needed. The function may even
+reallocate the payload to enlarge the vector in place past the original
+allocation limit without requiring a `setDT` call from the user. Since a
+reallocation will invalidate the data pointer, it must be only used from
+inside `data.table`, not from the ALTREP methods.
+
+The original implementation that uses `SETLENGTH` can be kept behind
+`#if R_VERSION < R_Version(4, 3, 0)` for backwards compatibility.
+
+Replacing `TRUELENGTH`-based growable vectors with `ALTREP`-based ones
+will conform to the API, allow growing the vector in place, and avoid
+the various inconsistencies that happen when R duplicates or deallocates
+these vectors, but also has the following downsides:
+
+ * Every place in `data.table` that uses growable vectors will have to
+ be refactored to use the new abstraction layer (`SETLENGTH` in R <
+ 4.3, ALTREP in R ≥ 4.3).
+ * Both implementations will have to be maintained as long as
+ `data.table` supports R < 4.3.
+ * The current implementation in `data.table` re-creates ALTREP
+ objects as ordinary ones precisely because it's impossible to
+ `SET_TRUELENGTH` on ALTREP objects. This will also need to be
+ refactored.
+ * The data pointer access is slower for ALTREP vectors than for
+ ordinary vectors: having checked the ALTREP bit in the header, R will
+ have to access the method table and call the method instead of adding
+ a fixed offset to the original `SEXP` pointer. This shouldn't be
+ noticeable unless `data.table` puts data pointer access inside a
+ "hot" loop.
+ * For numeric ALTREP classes, ALTREP-aware operations that use
+ `*_GET_REGION` instead of the data pointer will become slower unless
+ the class implements a `Get_region` method.
+
+**Status** in `data.table`: not fixed yet.
+
+### Fast string matching {#TRUELENGTH-mark}
+
+`data.table`'s use of `TRUELENGTH` is not limited to growable buffers. A
+common idiom is to set the `TRUELENGTH`s of `CHARSXP` values from a
+vector to their negative 1-based indices in that vector, then look up
+other `CHARSXP`s in the original vector using `-TRUELENGTH(s)`. This
+technique relies on [R's `CHARSXP` cache][RI110]: for the given string
+contents and encoding, only one copy of a string created by
+`mkCharLenCE` (and related functions) will exist in the memory. As a
+result, if a string does exist in the original vector, it will be the
+_same_ `CHARSXP` object whose `TRUELENGTH` had been set to its negative
+index. R does not currently set negative `TRUELENGTH`s by itself, so any
+positive `TRUELENGTH`s can be safely discarded as non-matches.
+
+In the best case scenario, this lookup is very fast: for a table of size
+$n$ and $k$ strings to look up in it, it takes $\mathrm{O}(1)$ memory
+(the `TRUELENGTH` is already there, unused) and $\mathrm{O}(n)$ time for
+overhead plus $\mathrm{O}(k)$ time for the actual lookups.
+
+Care must be taken for the technique to work properly:
+
+* The strings must be converted to UTF-8. Two copies of the same text in
+ different encodings will be stored in different objects at different
+ addresses, preventing the technique from working:
+ ```r
+ packageVersion('data.table')
+ # [1] ‘1.16.99’
+ x <- data.table(factor(rep(enc2utf8('ø'), 3)))
+ # memrecycle() forgot to account for encodings
+ x[1,V1 := iconv('ø', to='latin1')]
+ as.numeric(x$V1)
+ # [1] 2 1 1
+ levels(x$V1) # duplicated levels!
+ # [1] "ø" "ø"
+ identical(levels(x$V1)[[1]], levels(x$V1)[[2]])
+ # [1] TRUE
+ levels(x$V1) <- levels(x$V1)
+ levels(x$V1) # R restores unique levels
+ # [1] "ø"
+ ```
+* Any non-zero `TRUELENGTH` values resulting from R-internal usage must
+ be [saved][datatable_assign_savetl] beforehand and restored
+ afterwards.
+* The `TRUELENGTH`s are used to look up variables in hashed
+ environments, so R code should not run while the values are disturbed.
+
+The encoding conversions take $\mathrm{O}(n+k)$ time and space; the `TRUELENGTH`
+bookkeeping takes $\mathrm{O}(n)$ space and time (thanks to the exponential
+`realloc` trick).
+
+The fast string lookup is used in the following places:
+
+* `src/assign.c`: [factor level merging in
+ `memrecycle`][datatable_assign_memrecycle], [`savetl`
+ helper][datatable_assign_savetl]
+* `src/rbindlist.c`: [matching column
+ names][datatable_rbindlist_matchcolumns], [matching factor
+ levels][datatable_rbindlist_matchfactors]
+* `src/forder.c`: (different purpose, same technique) [storing the
+ group numbers][datatable_forder_truelen], [looking them
+ up][datatable_forder_truelen], [restoring the original
+ values][datatable_forder_free_ustr]
+* `src/chmatch.c`: [saving the original
+ `TRUELENGTH`s][datatable_chmatch_savetl], [remembering the positions
+ of `CHARSXP`s in the table][datatable_chmatch_settl], [cleaning up on
+ error][datatable_chmatch_cleanup1], [looking up strings in the
+ table][datatable_chmatch_lookup], [cleaning up before
+ exit][datatable_chmatch_cleanup2]
+* `src/fmelt.c`: [combining factor levels by merging their `CHARSXP`s in
+ a common array with indices in `TRUELENGTH`][datatable_fmelt_truelen]
+
+Since there doesn't seem to be any intent to allow using R API to place
+arbitrary integer values inside unused `SEXP` fields, `data.table` will
+have to look up the `CHARSXP` values using the externally available
+information. Performing $\mathrm{O}(nk)$ direct pointer comparisons would scale
+poorly, so for an $\mathrm{O}(1)$ individual lookup `data.table` could build a
+hash table of `SEXP` pointers. While pointer hashing [isn't strictly
+guaranteed by the C standard to work][Wellons_hashptr], it has been used
+[in R itself][R_unique_PTRHASH]. A hash table for $n$ `CHARSXP` pointers
+would need $\mathrm{O}(n)$ memory, $\mathrm{O}(n)$ time to initialise, and average $\mathrm{O}(k)$
+time for $k$ lookups [@Cormen2009, chapter 11].
+
+Taking the `savetl` bookkeeping into account, the _average asymptotic_
+performance of `TRUELENGTH` and hashing for string lookup is the same in
+both time and space, but the constants are most likely lower for
+`TRUELENGTH`. Transitioning to a hash will probably involve a
+performance hit.
+
+A truly lazy implementation could just use [`std::unordered_map`][cppreference_unordered_map] (at the cost of requiring C++11,
+which was supported but far from required in R-3.3, and having to shield
+R from the C++ exceptions) or the permissively-licensed [uthash]. Since
+the upper bound on the size of the table is known ahead of time, a
+custom-made open-addressing hash table [@Cormen2009, section 11.4] could
+be implemented with a fixed load factor, requiring only one allocation
+and no linked lists to walk.
+
+**Status** in `data.table`: not fixed yet.
+
+### Marking columns for copying
+
+The use of `TRUELENGTH` in `data.table` to mark objects is not limited
+to `CHARSXP` strings. Individual columns are also marked in a similar
+manner for later copying:
+
+* In `src/dogroups.c`, the vectors allocated for the special symbols
+ `.BY`, `.I`, `.N`, `.GRP` must not be returned by the grouping
+ operations evaluated with `dt[..., ..., by=...]`, so they are [marked
+ with a `TRUELENGTH` of -1][datatable_dogroups_setlen-1], and the
+ [marked columns][datatable_dogroups_anyspecialstatic] are later
+ re-created.
+* In `src/utils.c`, columns share memory or are ALTREP must be copied.
+ Memory sharing between columns may lead to confusing results when they
+ are altered by reference, and ALTREP columns cannot have `TRUELENGTH`
+ set. The code uses the same trick as with `CHARSXP` objects: if
+ `TRUELENGTH` is set on an object, accessing it through a
+ different pointer and seeing a non-zero value will prove that the
+ object had been previously visited. The code first [prepares zero
+ `TRUELENGTH`s][datatable_copyShared1], then [marks ALTREP, special,
+ and already marked columns for copying][datatable_copyShared2], then
+ [marks columns not previously marked with their column
+ number][datatable_copyShared3], then finally [restores the
+ `TRUELENGTH`s for the columns that won't be
+ overwritten][datatable_copyShared4].
+ * The `SET_TRUELENGTH` call in `copySharedColumns` would fail if it
+ ever got an ALTREP column, but the only use of `copySharedColumns`
+ in `reorder` guards against those.
+
+The same solution as above can be used
+here, with the same downsides of having to allocate memory for the hash
+table and the potential to have worst-case $\mathrm{O}(kn)$ time for $k$ lookups
+fundamental to hash tables.
+
+**Status** in `data.table`: not fixed yet.
+
+But there's more
+================
+
+Using `tools:::funAPI` together with the lists of symbols exported from
+R and imported by `data.table`, we can find a number of non-API entry
+points which ` R CMD check ` doesn't complain about yet:
+`r paste(paste0('', sort(DTnonAPI_yet), ''), collapse = ', ')`.
+
+`(SET_)ATTRIB`, `SET_OBJECT` {#ATTRIB-all}
+----------------------------
+
+`data.table` performs some direct operations on the attribute pairlists.
+Accessing attributes directly requires manually maintaining the object
+bit.
+
+> Use `getAttrib` for individual attributes. To test whether there are
+> any attributes use `ANY_ATTRIB`, added in R 4.5.0. Use `setAttrib` for
+> individual attributes, `DUPLICATE_ATTRIB` or
+> `SHALLOW_DUPLICATE_ATTRIB` for copying attributes from one object to
+> another. Use `CLEAR_ATTRIB` for removing all attributes, added in R
+> 4.5.0.
+
+-- [WRE 6.21.1][WRE_replacement_entrypoints]
+
+### Testing for presence of attributes
+
+`src/nafill.c` [checks][datatable_nafill_ATTRIB] whether the source
+object has any attributes before trying to copy them using
+`copyMostAttrib`.
+
+Problem:
+
+```c
+if (!isNull(ATTRIB(VECTOR_ELT(x, i))))
+ // ^^^^^^ non-API entry point
+```
+
+Solution:
+
+```c
+#if R_VERSION < R_Version(4, 5, 0)
+#define ANY_ATTRIB(x) (!isNull(ATTRIB(x)))
+#endif
+
+if (ANY_ATTRIB(VECTOR_ELT(x, i)))
+ // ^^^^^^^^^^ introduced in R-4.5
+```
+
+**Status** in `data.table`: not fixed yet. Will need to wait for R-4.5.0
+to be released with the new interface.
+
+### Iterating over all attributes
+
+* The code in `src/assign.c` needs to [iterate over all the attributes of
+`attr(dt, 'index')`][datatable_assign_ATTRIB] in order to find indices
+that use the given column.
+* The code in `src/dogroups.c` needs to [iterate over all attributes of
+ a column][datatable_dogroups_ATTRIB] in case a reference to the value
+ of a special symbol has been stashed there and must be duplicated.
+
+Without `ATTRIB`, this will only be possible using an R-level call to
+`attributes()`. While the indices could be changed to use a different data
+structure (a named `VECSXP` list?), necessitating an update step for
+`data.table`s loaded from storage, the code in `src/dogroups.c` cannot
+avoid having to see all the attributes.
+
+**Status** in `data.table`: no idea how to fix yet.
+
+### Raw `c(NA, n)` row names
+
+The code in `src/dogroups.c` needs to [access the raw `rownames`
+attribute][datatable_dogroups_rownames] of a `data.table`, even if it's
+in the compact form as a 2-element integer vector starting with `NA`.
+The `getAttrib` function has a special case for the `R_RowNamesSymbol`,
+which returns an ALTREP representation of this attribute.
+
+`data.table` needs this access in order to [temporarily
+overwrite][datatable_dogroups_rownames2] the `rownames` attribute for
+the specially-prepared subset `data.table` named `.SD` (which has a
+different number of rows and therefore needs different `rownames`).
+Creating a full-sized `rownames` attribute instead of its compact form
+would take more time than desirable.
+
+**Status** in `data.table`: no idea how to fix yet.
+
+### Direct transplantation of attributes
+
+The code in `src/dogroups.c` needs to
+[transplant][datatable_dogroups_SETATTR] the attributes from one object
+to another without duplicating them, even shallowly.
+`SHALLOW_DUPLICATE_ATTRIB` could work as a replacement, but with worse
+performance because it would waste time copying attributes from an
+object that is about to be discarded.
+
+**Status** in `data.table`: no idea how to fix yet.
+
+`findVar`
+---------
+
+[Used in `dogroups`][datatable_dogroups_findVar] to look up the
+pre-created variables corresponding to the special symbols `.SDall`,
+`.SD`, `.N`, `.GRP`, `.iSD`, `.xSD` in their environment.
+
+> The functions `findVar` and `findVarInFrame` have been used in a
+> number of packages but are too low level to be part of the API. For
+> most uses the functions `R_getVar` and `R_getVarEx` added in R 4.5.0
+> will be sufficient. These are analogous to the R functions `get` and
+> `get0`.
+
+-- [WRE 6.21.7]
+
+The new function `R_getVar` is different in that it will never return a
+`PROMSXP` (which are an internal implementation detail) or an
+`R_UnboundValue`, but the current code doesn't try to care about either.
+
+Example of the problem:
+
+```c
+SEXP SD = PROTECT(findVar(install(".SD"), env));
+ // ^^^^^^^ non-API function
+```
+
+Solution:
+
+
+```c
+#if R_VERSION < R_Version(4, 5, 0)
+#define R_getVar(sym, rho, inherits) \
+ ((inherits) ? findVar((sym), (rho)) : findVarInFrame((rho), (sym)))
+#endif
+
+SEXP SD = PROTECT(R_getVar(install(".SD"), env, TRUE));
+ // ^^^^^^^^ introduced in R-4.5
+```
+
+**Status** in `data.table`: not fixed yet. Will need to wait for R-4.5.0
+to be released with the new interface.
+
+`GetOption`
+-----------
+
+Used in `src/rbindlist.c` to read the
+[`datatable.rbindlist.check`][datatable_rbindlist_getoption] option,
+`src/freadR.c` to read the
+[`datatable.old.fread.datetime.character`][datatable_freadR_getoption]
+option, `src/init.c` to read the
+[`datatable.verbose`][datatable_init_getoption] option, `src/forder.c`
+to get the [`datatable.use.index` and
+`datatable.forder.auto.index`][datatable_forder_getoption] options, and
+`src/subset.c` to read the
+[`datatable.alloccol`][datatable_subset_getoption] option.
+
+> Use `GetOption1`.
+
+-- [WRE 6.21.1][WRE_replacement_entrypoints]
+
+The difference is that `GetOption1` doesn't take a second argument
+`rho`, which `GetOption` has been ignoring anyway.
+
+Example of the problem:
+
+```c
+SEXP opt = GetOption(install("datatable.use.index"), R_NilValue);
+ // ^^^^^^^^^ non-API function
+```
+
+Solution:
+
+```c
+SEXP opt = GetOption1(install("datatable.use.index"));
+ // ^^^^^^^^^^ API function introduced in R-2.13
+```
+
+**Status** in `data.table`: not fixed yet.
+
+Testing for a `data.frame`: `isFrame`
+-------------------------------------
+
+Back in 2012, Matt Dowle needed to quickly test an object for being a
+`data.frame`, and the undocumented function `isFrame` seemed like it
+[did the job][datatable_isframe_added]. Since `isFrame` was not part of
+the documented interface, in 2024 Luke Tierney gave the function a
+better-fitting name, [`isDataFrame`][R_isdataframe_added], and made it
+an experimental entry point, while retaining the original function as a
+wrapper.
+
+Use of `isFrame` [doesn't give a `NOTE` yet][remove_isframe], but when
+R-4.5.0 is released together with the new name for the function,
+`data.table` will be able to use it, falling back to `isFrame` on older
+versions of R. `isDataFrame` is documented among other [replacement
+entry point names][WRE_replacement_entrypoints] in Writing R Extensions.
+
+Problem (the only instance in `data.table`):
+
+```c
+if (!isVector(thiscol) || isFrame(thiscol))
+ // ^^^^^^^ may disappear in a future R version
+```
+
+Solution:
+
+```c
+#if R_VERSION < R_Version(4, 5, 0)
+// R versions older than 4.5.0 released use the old name of the function
+#define isDataFrame(x) (isFrame(x))
+#endif
+
+// later:
+if (!isVector(thiscol) || isDataFrame(thiscol))
+ // ^^^^^^^^^^^ introduced in R-4.5
+```
+
+**Status** in `data.table`: change reverted in [#6244][remove_isframe],
+waiting for R-4.5.0 to release with the new interface.
+
+`OBJECT`
+--------
+
+Used in `src/assign.c` to [test whether S3 dispatch is possible on an
+object][datatable_assign_OBJECT] before spending CPU time on
+constructing and evaluating an R-level call to `as.character` instead of
+`coerceVector`.
+
+> Use `isObject`.
+
+-- [WRE 6.21.1][WRE_replacement_entrypoints]
+
+Problem:
+```c
+if (OBJECT(source) && getAttrib(source, R_ClassSymbol)!=R_NilValue) {
+ // ^^^^^^ non-API entry point
+```
+
+Solution:
+```c
+if (isObject(source)) {
+ // ^^^^^^^^ API entry point
+```
+
+Most likely, the check for `getAttrib(source, R_ClassSymbol)` is
+superfluous, because when used correctly, R API maintains the object bit
+set only when the `class` attribute is non-empty.
+
+**Status** in `data.table`: not fixed yet.
+
+Conclusion
+==========
+
+While `data.table` could get rid of most of its non-API use with
+relative ease, either using a different name for the function
+(`STRING_PTR_RO`, `GetOption1`) or adding a wrapper for R < 4.5
+(`ANY_ATTRIB`, `findVar`), two interfaces will require a significant
+amount of work.
+
+Replacing the use of `TRUELENGTH` and related functions will require
+implementing two features from scratch: a set of ALTREP classes for
+growable vectors (with the previous implementation hidden in `#ifdef`
+for R < 4.3) and pointer-keyed hash tables for string and column
+marking.
+
+It is not currently clear how to replace the use of `ATTRIB`.
+
+References
+==========
+
+[is.R]: https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2024/03/08#n2024-03-09
+[WRE]: https://cran.r-project.org/doc/manuals/R-exts.html
+[CRANpolicy]: https://cran.r-project.org/web/packages/policies.html
+[WRE33API]: https://web.archive.org/web/20160609093632/https://cran.r-project.org/doc/manuals/R-exts.html#The-R-API
+[ltierney_serialize]: https://homepage.divms.uiowa.edu/~luke/R/serialize/serialize.html
+[WRE45serialize]: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Custom-serialization-input-and-output
+[digest]: https://cran.r-project.org/package=digest
+[WRE33wilcox]: https://web.archive.org/web/20160609093632/https://cran.r-project.org/doc/manuals/R-exts.html#Distribution-functions
+[wilcox_declared]: https://github.com/r-devel/r-svn/commit/1638b0106279aa1944b17742054bc6882656596e
+[wilcox_api]: https://github.com/r-devel/r-svn/commit/32ea1f67f842e3247f782a91684023b0b5eec6c5
+[ALTREPnonAPI]: https://stat.ethz.ch/pipermail/r-devel/2024-April/083339.html
+[ALTREP]: https://svn.r-project.org/R/branches/ALTREP/ALTREP.html
+[Rd200210]: https://stat.ethz.ch/pipermail/r-devel/2002-October/thread.html
+[Rd200510]: https://stat.ethz.ch/pipermail/r-devel/2005-October/thread.html
+[Rd201905]: https://stat.ethz.ch/pipermail/r-devel/2019-May/thread.html
+[clarifyingAPI]: https://stat.ethz.ch/pipermail/r-devel/2024-June/083449.html
+[remove_non_API]: https://github.com/Rdatatable/data.table/issues/6180
+[setOldClass]: https://search.r-project.org/R/refmans/methods/html/setOldClass.html
+[RI_S4rep]: https://cran.r-project.org/doc/manuals/R-ints.html#Representation-of-S4-objects
+[IS_S4_OBJECT]: https://github.com/r-devel/r-svn/blob/c20ebd2d417d9ebb915e32bfb0bfdad768f9a80a/src/main/memory.c#L4033-L4035
+[isS4]: https://github.com/r-devel/r-svn/blob/c20ebd2d417d9ebb915e32bfb0bfdad768f9a80a/src/main/objects.c#L1838-L1841
+[asS4]: https://github.com/r-devel/r-svn/blob/c20ebd2d417d9ebb915e32bfb0bfdad768f9a80a/src/main/objects.c#L1843
+[datatable_assign_shallow_S4]: https://github.com/Rdatatable/data.table/blob/a2e20d6cab0bc3cd00f8e47d10603e8c04c89759/src/assign.c#L156
+[datatable_dogroups_keepattr_S4]: https://github.com/Rdatatable/data.table/blob/a2213177283f0f15823e1ff823c1fdf63746da3d/src/dogroups.c#L485
+[datatable_assign_SHALLOW_ATTRIB]: https://github.com/Rdatatable/data.table/commit/f952062030e6657bef83de2748c65120990031c1
+[datatable_dogroups_grow_keepattr]: https://github.com/Rdatatable/data.table/blob/a2213177283f0f15823e1ff823c1fdf63746da3d/src/dogroups.c#L522
+[remove_set_s4_object]: https://github.com/Rdatatable/data.table/pull/6183
+[#6264]: https://github.com/Rdatatable/data.table/pull/6264
+[call]: https://search.r-project.org/R/refmans/base/html/call.html
+[datatable_rbindlist_eval]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/rbindlist.c#L237
+[WRE_call]: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Creating-call-expressions
+[remove_set_typeof]: https://github.com/Rdatatable/data.table/pull/6313
+[RI17]: https://cran.r-project.org/doc/manuals/R-ints.html#The-write-barrier
+[Tierney_gengc]: https://homepage.stat.uiowa.edu/~luke/R/gengcnotes.html
+[Tierney_writebr]: https://homepage.stat.uiowa.edu/~luke/R/barrier.html
+[remove_string_ptr]: https://github.com/Rdatatable/data.table/pull/6312
+[PR18775]: https://bugs.r-project.org/show_bug.cgi?id=18775
+[Tierney_refcnt]: https://developer.r-project.org/Refcnt.html
+[Rnews_setnamed]: https://developer.r-project.org/blosxom.cgi/R-devel/NEWS/2017/09/02#n2017-09-03
+[remove_named]: https://github.com/Rdatatable/data.table/pull/6420/files#diff-22b103646a1efab9bbfc374791ccfc3fd1422eefc48918a3e126fc2f30d1f572L552
+[LEVELS_macro]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L228
+[LEVELS_function]:https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/memory.c#L3902
+[LEVELS_field]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L132
+[RI112]: https://cran.r-project.org/doc/manuals/R-ints.html#Rest-of-header
+[gp_for_match1]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/match.c#L175
+[gp_for_match2]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/match.c#L233-L236
+[gp_for_match3]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/unique.c#L53
+[gp_for_gc]:https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/memory.c#L151-L155
+[gp_for_finalize]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/memory.c#L1364-L1374
+[gp_for_calling]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/errors.c#L1660-L1665
+[gp_for_assignment]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L280-L324
+[gp_for_s4]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L359-L362
+[gp_for_jit]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L364-L371
+[gp_for_growable]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L373-L377
+[gp_for_missing]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L449-L456
+[gp_for_missing2]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/eval.c#L2260-L2281
+[gp_for_ddval]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L519-L523
+[Rhelp_dots]: https://search.r-project.org/R/refmans/base/html/dots.html
+[gp_for_env]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L529-L530
+[envflags_locked]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/envir.c#L106-L108
+[envflags_global]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/envir.c#L613-L655
+[gp_for_hashash]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L1182-L1186
+[hashash2]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/envir.c#L517-L520
+[gp_for_active]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L1205-L1210
+[active_binding]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/envir.c#L3466-L3483
+[gp_for_basesym]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L1225-L1228
+[basesym2]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/main/envir.c#L754-L768
+[gp_for_special]: https://github.com/r-devel/r-svn/blob/2753df314f7d8e154bc42b5abd99daaf6472dbe1/src/include/Defn.h#L1230-L1236
+[specialsym2]: https://github.com/r-devel/r-svn/blob/2753df314f7d8e154bc42b5abd99daaf6472dbe1/src/main/names.c#L1019-L1046
+[gp_for_promsxp]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L1165-L1166
+[gp_for_charsxp]: https://github.com/r-devel/r-svn/blob/c9437a83b9677074fe01310caac6a2a66cc7f680/src/include/Defn.h#L843-L853
+[R_Encoding]: https://search.r-project.org/R/refmans/base/html/Encoding.html
+[WRE_Encoding]: https://cran.r-project.org/doc/manuals/R-exts.html#Character-encoding-issues
+[R_SET_ASCII]: https://github.com/r-devel/r-svn/blob/2753df314f7d8e154bc42b5abd99daaf6472dbe1/src/main/envir.c#L4312-L4375
+[datatable_isencoded]: https://github.com/Rdatatable/data.table/blob/40ad2e6978202ecc626db9eaae3a18ed5e4df769/src/data.table.h#L36-L38
+[datatable_needUTF8]: https://github.com/Rdatatable/data.table/blob/40ad2e6978202ecc626db9eaae3a18ed5e4df769/src/data.table.h#L63-L73
+[datatable_ENCODED_CHAR]: https://github.com/Rdatatable/data.table/blob/40ad2e6978202ecc626db9eaae3a18ed5e4df769/src/fwriteR.c#L8-L12
+[datatable_anynotascii]: https://github.com/Rdatatable/data.table/blob/40ad2e6978202ecc626db9eaae3a18ed5e4df769/src/forder.c#L312-L331
+[datatable_levels1]: https://github.com/Rdatatable/data.table/pull/6420/commits/46dbfa93e72776c59dacb286de9831fa28c481b5#diff-3b83136e49e2df4f5df80b312d7d4199fed9e0d283401dbf7bd9159a5096bcaaL36
+[remove_levels]: https://github.com/Rdatatable/data.table/pull/6422/commits/72cbd170fd16844dd8094b8d049d2e56d0926d22
+[news173]: https://github.com/Rdatatable/data.table/blob/6a15f8617de121a406cee97b22e83e0c2c4bb034/NEWS.0.md#new-features-13
+[datatable_overallocation]: https://github.com/Rdatatable/data.table/commit/e09d91beccc862eebcd9497c27b422058320396b#diff-22b103646a1efab9bbfc374791ccfc3fd1422eefc48918a3e126fc2f30d1f572R262-R276
+[datatable_logo]: https://raw.githubusercontent.com/Rdatatable/data.table/master/.graphics/logo.png
+[datatable_stretch_column]: https://github.com/Rdatatable/data.table/commit/b4e023df736fed8c4dc536ac0061e895a565b375#diff-697a3094ef3d287d25b94aa344f7ed0262aa3fdb97af9b7e04e3b0ef585b05bcR30-R56
+[RI113]: https://cran.r-project.org/doc/manuals/R-ints.html#The-_0027data_0027
+[R_truelength]: https://github.com/r-devel/r-svn/commit/2d4ae2c4bd593bc2aa2273076997b6e63bbcb782
+[R_hashvalue]: https://github.com/r-devel/r-svn/blob/04a3b015e7d20598f66954b88ae2d39068451494/src/include/Defn.h#L1184-L1187
+[R_install_truelen]: https://github.com/r-devel/r-svn/blob/04a3b015e7d20598f66954b88ae2d39068451494/src/main/names.c#L1256-L1272
+[R_serialize_hash]: https://github.com/r-devel/r-svn/blob/04a3b015e7d20598f66954b88ae2d39068451494/src/main/serialize.c#L617-L634
+[R_saveload_hash]: https://github.com/r-devel/r-svn/blob/04a3b015e7d20598f66954b88ae2d39068451494/src/main/saveload.c#L807-L834
+[R_envir_hashpri]: https://github.com/r-devel/r-svn/blob/04a3b015e7d20598f66954b88ae2d39068451494/src/main/envir.c#L193-L207
+[R_envir_hashval]: https://github.com/r-devel/r-svn/blob/04a3b015e7d20598f66954b88ae2d39068451494/src/main/envir.c#L497-L520
+[R_radixsort]: https://github.com/r-devel/r-svn/commit/4907092c953bd0b9c059474f77e40990ecf312b1
+[R_growable]: https://github.com/r-devel/r-svn/commit/287b8316232aea7c619d0cadcb515507b1e3ebfa
+[R_altrep_set_truelen]: https://github.com/r-devel/r-svn/blob/04a3b015e7d20598f66954b88ae2d39068451494/src/include/Defn.h#L391
+[R_altrep_truelen]: https://github.com/r-devel/r-svn/blob/04a3b015e7d20598f66954b88ae2d39068451494/src/main/altrep.c#L345
+[datatable_init_testtl]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/init.c#L206
+[datatable_docols_SD]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L197
+[datatable_docols_I]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L230-L237
+[datatable_docols_restore]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L482-L485
+[datatable_docols_extend]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L318-L324
+[datatable_freadR_truncate]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/freadR.c#L536-L538
+[datatable_freadR_settl]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/freadR.c#L519
+[datatable_freadR_drop]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/freadR.c#L551-L552
+[datatable_subset_alloc]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/subset.c#L300-L334
+[datatable_assign_shallow]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/assign.c#L192-L196
+[datatable_assign_create]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/assign.c#L535-L536
+[datatable_assign_remove]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/assign.c#L733-L734
+[datatable_assign_finalizer]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/assign.c#L21
+[R_duplicate_truelength]: https://github.com/r-devel/r-svn/blob/04a3b015e7d20598f66954b88ae2d39068451494/src/main/duplicate.c#L43-L81
+[datatable_assign_selfref]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/assign.c#L27-L63
+[datatable_assign_selfrefok]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/assign.c#L99-L138
+[R_memory_getVecSize]: https://github.com/r-devel/r-svn/blob/04a3b015e7d20598f66954b88ae2d39068451494/src/main/memory.c#L1108-L1109
+[R_PR17620]: https://bugs.r-project.org/show_bug.cgi?id=17620
+[Rapi_altrep_methods]: https://aitap.codeberg.page/R-api/#R_005fset_005faltrep_005f_002e_002e_002e_005fmethod
+[Tierney_mutable]: https://github.com/ALTREP-examples/Rpkg-mutable/blob/master/vignettes/mutable.Rmd
+[Rapi_altvec_methods]: https://aitap.codeberg.page/R-api/#R_005fset_005faltvec_005f_002e_002e_002e_005fmethod
+[Rapi_altstring_methods]: https://aitap.codeberg.page/R-api/#R_005fmake_005faltstring_005fclass
+[Rapi_altlist_methods]: https://aitap.codeberg.page/R-api/#R_005fmake_005faltlist_005fclass
+[Rapi_altinteger]: https://aitap.codeberg.page/R-api/#R_005fmake_005faltinteger_005fclass
+[Rapi_altlogical]: https://aitap.codeberg.page/R-api/#R_005fmake_005faltlogical_005fclass
+[Rapi_altreal]: https://aitap.codeberg.page/R-api/#R_005fmake_005faltreal_005fclass
+[Rapi_altcomplex]: https://aitap.codeberg.page/R-api/#R_005fmake_005faltcomplex_005fclass
+[Rapi_altraw]: https://aitap.codeberg.page/R-api/#R_005fmake_005faltraw_005fclass
+[Rapi_new_altrep]: https://aitap.codeberg.page/R-api/#R_005fnew_005faltrep
+[Rapi_altrep_inherits]: https://aitap.codeberg.page/R-api/#index-R_005faltrep_005finheritsaltrep_005finherits
+[datatable_assign_savetl]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/assign.c#L1274-L1328
+[RI110]: https://cran.r-project.org/doc/manuals/R-ints.html#The-CHARSXP-cache
+[datatable_assign_memrecycle]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/assign.c#L833-L867
+[datatable_rbindlist_matchcolumns]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/rbindlist.c#L70-L179
+[datatable_rbindlist_matchfactors]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/rbindlist.c#L367-L516
+[datatable_forder_range_str]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/forder.c#L295-L383
+[datatable_forder_truelen]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/forder.c#L769
+[datatable_forder_free_ustr]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/forder.c#L75
+[datatable_chmatch_savetl]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/chmatch.c#L58-L64
+[datatable_chmatch_settl]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/chmatch.c#L78-L80
+[datatable_chmatch_cleanup1]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/chmatch.c#L103
+[datatable_chmatch_lookup]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/chmatch.c#L108-L130
+[datatable_chmatch_cleanup2]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/chmatch.c#L135-L136
+[datatable_fmelt_truelen]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/utils.c#L273
+[Wellons_hashptr]: https://nullprogram.com/blog/2016/05/30/
+[R_unique_PTRHASH]: https://github.com/r-devel/r-svn/blob/3713345283787c928e563cdcdf01cc4a9dc1c708/src/main/unique.c#L185-L208
+[cppreference_unordered_map]: https://en.cppreference.com/w/cpp/container/unordered_map
+[uthash]: https://troydhanson.github.io/uthash/
+[datatable_dogroups_setlen-1]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L105-L152
+[datatable_dogroups_anyspecialstatic]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L6-L64
+[datatable_copyShared1]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/utils.c#L260-L261
+[datatable_copyShared2]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/utils.c#L266-L267
+[datatable_copyShared3]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/utils.c#L273
+[datatable_copyShared4]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/utils.c#L273
+[datatable_nafill_ATTRIB]: https://github.com/Rdatatable/data.table/blob/546259ddaba0e8ab1506729113688f85ca2986fd/src/nafill.c#L216
+[datatable_assign_ATTRIB]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/assign.c#L618-L629
+[datatable_dogroups_ATTRIB]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L57-L58
+[datatable_dogroups_rownames]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L131-L134
+[datatable_dogroups_rownames2]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L195
+[datatable_dogroups_SETATTR]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L509-L515
+[datatable_dogroups_findVar]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/dogroups.c#L90-L118
+[WRE 6.21.7]: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Working-with-variable-bindings
+[datatable_rbindlist_getoption]: https://github.com/Rdatatable/data.table/blob/master/src/rbindlist.c#L231
+[datatable_freadR_getoption]: https://github.com/Rdatatable/data.table/blob/master/src/freadR.c#L132
+[datatable_init_getoption]: https://github.com/Rdatatable/data.table/blob/master/src/init.c#L331
+[datatable_forder_getoption]: https://github.com/Rdatatable/data.table/blob/master/src/forder.c#L1619-L1637
+[datatable_subset_getoption]: https://github.com/Rdatatable/data.table/blob/master/src/subset.c#L299
+[datatable_isframe_added]: https://github.com/Rdatatable/data.table/commit/87666e70ce1a69b28f0e92ec7504d80e3d53a824#diff-4fc47a9752ba4edfef0cabcc1958eda943545ad3859e48d498b0e3f87a9ae5aeR192
+[R_isdataframe_added]: https://github.com/r-devel/r-svn/commit/4ef83b9dc3c6874e774195d329cbb6c11a71c414
+[remove_isframe]: https://github.com/Rdatatable/data.table/issues/6244
+[WRE_replacement_entrypoints]: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Some-API-replacements-for-non_002dAPI-entry-points
+[datatable_isframe_added]: https://github.com/Rdatatable/data.table/commit/87666e70ce1a69b28f0e92ec7504d80e3d53a824#diff-4fc47a9752ba4edfef0cabcc1958eda943545ad3859e48d498b0e3f87a9ae5aeR192
+[R_isdataframe_added]: https://github.com/r-devel/r-svn/commit/4ef83b9dc3c6874e774195d329cbb6c11a71c414
+[remove_isframe]: https://github.com/Rdatatable/data.table/issues/6244
+[WRE_replacement_entrypoints]: https://cran.r-project.org/doc/manuals/r-devel/R-exts.html#Some-API-replacements-for-non_002dAPI-entry-points
+[datatable_assign_OBJECT]: https://github.com/Rdatatable/data.table/blob/03c647f9a44710aad834c0718e0b34e8c5341bf1/src/assign.c#L1158
diff --git a/posts/2024-12-12-non-api-use/langsxp.pikchr b/posts/2024-12-12-non-api-use/langsxp.pikchr
new file mode 100644
index 00000000..14831a63
--- /dev/null
+++ b/posts/2024-12-12-non-api-use/langsxp.pikchr
@@ -0,0 +1,21 @@
+Head: ellipse "LANGSXP" fit
+ellipse "SYMSXP" "print" fit with .ne at Head.sw + (-.3in, -.3in)
+arrow <- from last ellipse.ne to Head.sw "CAR" above aligned
+
+ellipse "NILSXP" fit with .n at Head.s + (0,-.3in)
+arrow from Head.s to last ellipse.n "TAG" above aligned
+
+Arg1: ellipse "LISTSXP" fit with .nw at Head.se + (.3in, -.3in)
+arrow -> from Head.se to Arg1.nw "CDR" above aligned
+
+ellipse "INTSXP" "42" fit with .ne at Arg1.sw + (-.3in, -.3in)
+arrow <- from last ellipse.ne to Arg1.sw "CAR" above aligned
+
+ellipse "SYMSXP" "x" fit with .n at Arg1.s + (0,-.42in)
+arrow from Arg1.s to last ellipse.n "TAG" above aligned
+
+ellipse "NILSXP" fit with .nw at Arg1.se + (.3in, -.3in)
+arrow -> from Arg1.se to last ellipse.nw "CDR" above aligned
+
+"SEXP call" mono with .s at Head.n + (0,.2in)
+arrow <- from Head.n to last text.s
diff --git a/posts/2024-12-12-non-api-use/langsxp.svg b/posts/2024-12-12-non-api-use/langsxp.svg
new file mode 100644
index 00000000..292db794
--- /dev/null
+++ b/posts/2024-12-12-non-api-use/langsxp.svg
@@ -0,0 +1,41 @@
+
+
diff --git a/posts/2024-12-12-non-api-use/precomputed.R b/posts/2024-12-12-non-api-use/precomputed.R
new file mode 100644
index 00000000..fa1bcecb
--- /dev/null
+++ b/posts/2024-12-12-non-api-use/precomputed.R
@@ -0,0 +1,82 @@
+library(data.table)
+
+# The results are not reproducible because they depend on both the R-devel
+# version and the data.table-git version, hence the pre-computation.
+
+symbols <- fread(
+ # most likely implies R on GNU/Linux built with --enable-R-shlib
+ paste('nm -gDP', file.path(R.home('lib'), 'libR.so')),
+ fill = TRUE, col.names = c('name', 'type', 'value', 'size')
+)[
+ type %in% c('B', 'D', 'R', 'T') # don't care about [weak] imports
+][,
+ type := fcase(
+ type == 'B', 'variable',
+ type == 'D', 'data',
+ type == 'R', 'read-only data',
+ type == 'T', 'function'
+ )
+][]
+
+DTsymbols <- fread(
+ # again, only tested on GNU/Linux
+ paste('nm -gDP', system.file(
+ file.path('libs', 'data_table.so'), package = 'data.table'
+ )),
+ fill = TRUE, col.names = c('name', 'type', 'value', 'size')
+)[type %in% c('U', 'w')][,
+ type := fcase(
+ type == 'U', 'undefined',
+ type == 'w', 'weak'
+ )
+][,
+ name := sub('@.*', '', name)
+][]
+
+# this is entirely dependent on late-2024 tools:::{funAPI,nonAPI}
+setdiff(
+ # symbols exported by R and imported by data.table...
+ intersect(symbols$name, DTsymbols$name) |>
+ tools:::unmap(), # renamed according to how R API entry points are named
+ # except those listed among API entry points
+ tools:::funAPI()$name |> tools:::unmap()
+) |> setdiff(
+ # and also skip variables because they are omitted in funAPI
+ symbols[type == 'variable', name]
+) -> DTnonAPI
+# which ones does R CMD check _not_ complain about... yet?
+DTnonAPI_yet <- setdiff(DTnonAPI, tools:::nonAPI)
+
+# History of tools:::nonAPI
+getNonAPI <- function(ver,
+ url = sprintf(
+ "https://svn.r-project.org/R/branches/R-%s-branch/src/library/tools/R/sotools.R",
+ ver
+ )
+) {
+ ee <- parse(text = readLines(url))
+ for (e in ee) {
+ if (
+ is.call(e) && length(e) == 3 &&
+ identical(e[[1]], quote(`<-`)) &&
+ identical(e[[2]], quote(`nonAPI`))
+ )
+ return(do.call(c, as.list(e[[3]])[-1]))
+ }
+}
+nonAPI.3_3 <- getNonAPI('3-3')
+nonAPI.4_4 <- getNonAPI('4-4')
+nonAPI.trunk <- getNonAPI(url = 'https://svn.r-project.org/R/trunk/src/library/tools/R/sotools.R')
+
+# CRAN package metadata and check results
+cpdb <- tools::CRAN_package_db()
+needscomp <- cpdb[,'NeedsCompilation'] == 'yes'
+checks <- tools::CRAN_check_details()
+dtchecks <- subset(checks, Package == 'data.table')
+
+when <- Sys.Date()
+save(
+ needscomp, dtchecks, symbols, nonAPI.3_3, nonAPI.4_4, nonAPI.trunk,
+ DTnonAPI, DTnonAPI_yet,
+ when, file = 'precomputed.rda', compress = 'xz'
+)
diff --git a/posts/2024-12-12-non-api-use/precomputed.rda b/posts/2024-12-12-non-api-use/precomputed.rda
new file mode 100644
index 00000000..9feb788e
Binary files /dev/null and b/posts/2024-12-12-non-api-use/precomputed.rda differ
diff --git a/posts/2024-12-12-non-api-use/refs.bib b/posts/2024-12-12-non-api-use/refs.bib
new file mode 100644
index 00000000..b51c71ee
--- /dev/null
+++ b/posts/2024-12-12-non-api-use/refs.bib
@@ -0,0 +1,55 @@
+@book{Becker1985,
+ address = {Monterey, Calif},
+ series = {The {Wadsworth} statistics/probability series},
+ title = {Extending the {S} system},
+ isbn = {978-0-534-05016-0},
+ language = {eng},
+ publisher = {Wadsworth},
+ author = {Becker, Richard A. and Chambers, John M.},
+ year = {1985},
+}
+@book{Chambers2016,
+ address = {Milton},
+ series = {Chapman \& {Hall} / {CRC} {The} {R} {Series}},
+ title = {Extending {R}},
+ isbn = {978-1-4987-7572-4 978-1-4987-7571-7},
+ language = {eng},
+ publisher = {CRC Press},
+ author = {Chambers, John M.},
+ year = {2016},
+}
+@article{Nash2024,
+ author = {Nash, John C. and Bhattacharjee, Arkajyoti},
+ title = {A Comparison of {R} Tools for Nonlinear Least Squares Modeling},
+ journal = {The R Journal},
+ year = {2024},
+ note = {https://doi.org/10.32614/RJ-2023-091},
+ doi = {10.32614/RJ-2023-091},
+ volume = {15},
+ issue = {4},
+ issn = {2073-4859},
+ pages = {198-215}
+}
+@book{Jones2012,
+ address = {Boca Raton, FL},
+ series = {Applied algorithms and data structures series},
+ title = {The garbage collection handbook: the art of automatic memory management},
+ isbn = {978-1-4200-8279-1},
+ shorttitle = {The garbage collection handbook},
+ language = {eng},
+ publisher = {CRC Press},
+ author = {Jones, Richard and Hosking, Antony and Moss, Eliot},
+ year = {2012},
+ note = {OCLC: ocn212844102},
+ keywords = {Memory management (Computer science)},
+}
+@book{Cormen2009,
+ address = {Cambridge, Massachusetts London, England},
+ edition = {Third edition},
+ title = {Introduction to algorithms},
+ isbn = {978-0-262-03384-8 978-0-262-27083-0},
+ language = {eng},
+ publisher = {MIT Press},
+ author = {Cormen, Thomas H. and Leiserson, Charles Eric and Rivest, Ronald Linn and Stein, Clifford},
+ year = {2009},
+}