Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
.Rproj.user
.vscode/
*.bkp
*.log
check/
docs/
FastRet.Rproj
Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Package: FastRet
Title: Retention Time Prediction in Liquid Chromatography
Version: 1.3.0
Version: 1.3.5
Authors@R: c(
person(given = "Tobias", family = "Schmidt", role = c("aut", "cre", "cph"), email = "tobias.schmidt331@gmail.com", comment = c(ORCID = "0000-0001-9681-9253")),
person(given = "Christian", family = "Amesoeder", role = c("aut", "cph"), email = "christian-amesoeder@web.de", comment = c(ORCID = "0000-0002-1668-8351")),
Expand Down
57 changes: 57 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,61 @@

# FastRet 1.3.5 <!-- Commit Date: 2026-01-13 -->

Bugfix:

1. `predict.frm()` now imputes missing/non-finite values not only for base model
predictors, but also for adjustment model predictors. This prevents `NA`
predictions if the adjustment model depends on predictors that are
missing/non-finite in the new data.

Internal Improvements:

1. Consolidated the imputation logic in `predict.frm()` into a shared internal
helper.

# FastRet 1.3.4 <!-- Commit Date: 2026-01-12 -->

Internal Improvements:

1. `start_gui()` and `start_gui_in_devmode()` now rely on
`with(future::plan(...), local = TRUE)` so that temporary future plans are
restored automatically without manual bookkeeping.

# FastRet 1.3.3 <!-- Commit Date: 2026-01-11 -->

API Improvements:

1. `selective_measuring()` accepts `"max"` and `"inf"` as additional values
for its `rt_coef` parameter:
- `"max"` is an alias for the existing `"max_ridge_coef"`
- `"inf"` sets all chemical descriptor features to zero before clustering so that
RT alone drives the distance metric (i.e., it is "infinitely" more important
than the chemical descriptors).

# FastRet 1.3.2 <!-- Commit Date: 2026-01-11 -->

Bugfix:

1. `adjust_frm()` automatically switched to "lm" adjustment if `predictors`
contained only one predictor, regardless of whether `add_cds` was TRUE or
FALSE. This has now been fixed, so that "lasso", "ridge" and "gbtree"
adjustment is now possible if either `add_cds` is TRUE, `add_cds` is
`NULL` or `predictors` contains more than one predictor.

# FastRet 1.3.1 <!-- Commit Date: 2026-01-11 -->

API Improvements:

1. Added arguments `match_rts` and `match_keys` to `adjust_frm()`:
- If `match_rts=TRUE` (default), RTs are obtained by matching rows in
`new_data` to rows in `frm$df` based on `match_keys`.
- If `match_rts=FALSE`, RTs are obtained by applying the base model to
`new_data`.
- `match_keys` can be any combination of "INCHIKEY", "SMILES" and "NAME". If
left at default NULL, SMILES+INCHIKEY is used if both columns are present
in the adjusted and the original training data. Otherwise, SMILES+NAME is
used.

# FastRet 1.3.0 <!-- Commit Date: 2025-11-12 -->

API Improvements:
Expand Down
6 changes: 2 additions & 4 deletions R/app.R
Original file line number Diff line number Diff line change
Expand Up @@ -49,8 +49,7 @@ start_gui <- function(port = 8080,
# Use [start_gui_in_devmode()] for development
catf("Checking CDK version")
check_cdk_version()
oldplan <- future::plan("multisession", workers = nw)
on.exit(future::plan(oldplan), add = TRUE)
with(future::plan("multisession", workers = nw), local = TRUE)
catf("Starting FastRet GUI")
app <- fastret_app(port, host, reload, nsw)
shiny::runApp(app)
Expand Down Expand Up @@ -140,8 +139,7 @@ start_gui_in_devmode <- function(strategy = "sequential",
))

catf("Initializing cluster")
oldplan <- future::plan(strategy)
on.exit(future::plan(oldplan), add = TRUE, after = FALSE)
with(future::plan(strategy), local = TRUE)

catf("Starting FastRet GUI in development mode")
pkg_root <- dirname(system.file("DESCRIPTION", package = "FastRet"))
Expand Down
67 changes: 45 additions & 22 deletions R/sm.R
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
#' re-measurement. The selection process includes:
#'
#' 1. Generating chemical descriptors from the SMILES strings of the molecules.
#' These are the features used by [train_frm()] and [adjust_frm()].
#' These are the features used by [train_frm()] and [adjust_frm()].˝
#' 2. Standardizing chemical descriptors to have zero mean and unit variance.
#' 3. Training a Ridge Regression model with the standardized chemical
#' descriptors as features and the retention times as the target variable.
Expand All @@ -40,15 +40,17 @@
#'
#' @param rt_coef
#' Which coefficient to use for scaling RT before clustering. Options are:
#' - max_ridge_coef: scale with the maximum absolute coefficient obtained in
#' ridge regression. I.e., RT will have approximately the same weight as the
#' most important chemical descriptor.
#' - 1: do not scale RT any further, i.e., use standardized RT. The effect of
#' - `0`: exclude RT from the clustering.
#' - 'max' / 'max_ridge_coef': scale with the maximum absolute coefficient
#' obtained in ridge regression. I.e., RT will have approximately the same
#' weight as the most important chemical descriptor.
#' - `1`: do not scale RT any further, i.e., use standardized RT. The effect of
#' leaving RT unscaled is kind of unpredictable, as the ridge coefficients
#' depend on the dataset. If the maximum absolute coefficient is much smaller
#' than 1, RT will dominate the clustering. If it is much larger than 1, RT
#' will have little influence on the clustering.
#' - 0: exclude RT from the clustering.
#' - 'inf': set all chemical descriptor values to zero, i.e., RT is "infinitely"
#' more important than any chemical descriptor.
#'
#' @return
#' A list containing the following elements:
Expand Down Expand Up @@ -77,6 +79,15 @@ selective_measuring <- function(raw_data,
rt_coef = "max_ridge_coef"
) {

stopifnot(
is.data.frame(raw_data),
all(c("NAME", "SMILES", "RT") %in% colnames(raw_data)),
is.numeric(k_cluster), length(k_cluster) == 1, !is.na(k_cluster), k_cluster >= 2,
is.logical(verbose) || is.numeric(verbose),
is.null(seed) || is.numeric(seed),
is.numeric(rt_coef) || is.character(rt_coef), length(rt_coef) == 1, !is.na(rt_coef)
)
rt_coef <- try_as_numeric(rt_coef, fallback = rt_coef)
logf <- if (verbose >= 1) catf else null

logf("Starting Selective Measuring")
Expand All @@ -91,25 +102,31 @@ selective_measuring <- function(raw_data,
df <- df[, nonmeta, drop = FALSE]
dfz <- as.data.frame(scale(df)) # z-score standardized (mean 0, sd 1)

logf("Training Ridge Regression model")
Xz <- as.matrix(dfz[, colnames(dfz) != "RT", drop = FALSE])
y <- as.numeric(dfz[, "RT"])
model <- fit_glmnet(Xz, y, method = "ridge", seed = seed)
if (rt_coef %in% c(Inf, "inf")) {
logf("Setting CDs to zero because rt_coef is Inf")
coefs <- rep(0, ncol(dfz) - 1)
names(coefs) <- colnames(dfz)[colnames(dfz) != "RT"]
model <- NULL
dfzb <- data.frame(RT = dfz$RT)
} else {
logf("Training Ridge Regression model")
Xz <- as.matrix(dfz[, colnames(dfz) != "RT", drop = FALSE])
y <- as.numeric(dfz[, "RT"])
model <- fit_glmnet(Xz, y, method = "ridge", seed = seed)

logf("Scaling features by coefficients of Ridge Regression model")
coef_mat <- glmnet::coef.glmnet(model) # (m+1) x 1 matrix
coefs <- as.numeric(coef_mat)[-1] # remove intercept
coefs <- setNames(coefs, rownames(coef_mat)[-1])
coefs <- coefs[colnames(Xz)] # ensure correct order
Xzb <- sweep(Xz, 2, coefs, `*`) # z-score standardized and beta-scaled
logf("Scaling features by coefficients of Ridge Regression model")
coef_mat <- glmnet::coef.glmnet(model) # (m+1) x 1 matrix
coefs <- as.numeric(coef_mat)[-1] # remove intercept
coefs <- setNames(coefs, rownames(coef_mat)[-1])
coefs <- coefs[colnames(Xz)] # ensure correct order
Xzb <- sweep(Xz, 2, coefs, `*`) # z-score standardized and beta-scaled

logf("Scaling RT by %s before clustering", rt_coef)
rtc <- if (rt_coef == "max_ridge_coef") {
max(abs(coefs), na.rm = TRUE)
} else {
as.numeric(rt_coef)
logf("Scaling RT by %s before clustering", rt_coef)
rtc <- if (grepl("max", rt_coef)) max(abs(coefs), na.rm = TRUE)
else if (is.numeric(rt_coef)) as.numeric(rt_coef)
else stop("Invalid value for rt_coef: ", rt_coef)
dfzb <- data.frame(RT = dfz$RT * rtc, Xzb)
}
dfzb <- data.frame(RT = dfz$RT * rtc, Xzb)

logf("Applying PAM clustering")
clobj <- cluster::pam(dfzb, k = as.numeric(k_cluster))
Expand All @@ -124,3 +141,9 @@ selective_measuring <- function(raw_data,
)
ret
}

try_as_numeric <- function(x, fallback = x) {
if (length(x) != 1) stop("scalar expected")
y <- suppressWarnings(as.numeric(x))
if (!is.na(y)) y else fallback
}
Loading
Loading