Skip to content

Need to have a faster implementation of getting deviances for splicing and stability events/features #2

@Arshammik

Description

@Arshammik

Main

On the current pipeline, we use a glm fitting with a binomial family to model m1 as success and m2 as failure. Here is the original code we used previously.

# the glm function 
.glm_fit <- function(nz_indx, nz_val) {  
  
  temp_m1_m2 <- rep(0, ncol(temp_X))
  temp_m1_m2[nz_indx] <- nz_val
  
  #fit <- glm(cbind(temp_m1_m2[m1_indexes], temp_m1_m2[m2_indexes]) ~ 1, family = "binomial")
  fit <- tryCatch({
    glm(cbind(temp_m1_m2[m1_indexes], temp_m1_m2[m2_indexes]) ~ 1, family = "binomial")
  }, error = function(e) {
    message("Error in glm: ", e$message, ", due to not convargence, we put zero as deviance for this case.")
    return(NULL)
  }) 
  if (is.null(fit)) {
    return(0)  # Return NA if glm failed
  }
  return(fit$deviance)
 }

To do

  1. For now, since we extract fit$deviance, need to do is to implement the deviance in a C++ script and benchmark with the previous results.
  2. Then, create an R script and matching cpp function using Rcpp to make getting the high variable events easier. I suggest a function for splicing and stability.
  3. (Optional) If the implementation is successful, we can try to build our native feature selection for gene expression and not use Seuart's Find VariableFeatures by using the vst method.

Thanks!

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions