Clustered data

I was just trying to use simdata to generate data with some of the variables constant within clusters (of fixed size). For my purposes the following function was sufficient (where n_obs is the number of observations, cor_mat is the correlation matrix and clustervar is an integer such that 1: clustervar are the indices of the variables constant within clusters):  

```
cmvtnorm <- function(n_obs=100, cluster_size=4, cor_mat, clustervar=0){
  if (n_obs %% cluster_size != 0) {
    stop("n_obs is not divisble by cluster_size")
  }
  X <- matrix(rnorm(n_obs*ncol(cor_mat)), nrow=n_obs, ncol=ncol(cor_mat))
  if (clustervar>0) {
    X[, 1:clustervar] <- X[rep(1:(n_obs/cluster_size), each=cluster_size), 1:clustervar]
  }
  chol_cor <- chol(cor_mat)
  X <- X %*% chol_cor
  return(X)
}
  

cor_mat <- cor_from_upper(5,
                          rbind(c(1,2,0.5), c(1,3,0.5),
                                c(2,4,0.5), c(3,5,-0.3),
                                c(4,5,0.5) ))

test <- cmvtnorm(n_obs=100000, cluster_size=4, cor_mat=cor_mat, clustervar=3)

cor(test)
cor_mat

apply(test, 2, sd)
apply(test, 2, mean)
head(test, 20)
```

Maybe this could be a nice extension?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clustered data #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Clustered data #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions