I was just trying to use simdata to generate data with some of the variables constant within clusters (of fixed size). For my purposes the following function was sufficient (where n_obs is the number of observations, cor_mat is the correlation matrix and clustervar is an integer such that 1: clustervar are the indices of the variables constant within clusters):
cmvtnorm <- function(n_obs=100, cluster_size=4, cor_mat, clustervar=0){
if (n_obs %% cluster_size != 0) {
stop("n_obs is not divisble by cluster_size")
}
X <- matrix(rnorm(n_obs*ncol(cor_mat)), nrow=n_obs, ncol=ncol(cor_mat))
if (clustervar>0) {
X[, 1:clustervar] <- X[rep(1:(n_obs/cluster_size), each=cluster_size), 1:clustervar]
}
chol_cor <- chol(cor_mat)
X <- X %*% chol_cor
return(X)
}
cor_mat <- cor_from_upper(5,
rbind(c(1,2,0.5), c(1,3,0.5),
c(2,4,0.5), c(3,5,-0.3),
c(4,5,0.5) ))
test <- cmvtnorm(n_obs=100000, cluster_size=4, cor_mat=cor_mat, clustervar=3)
cor(test)
cor_mat
apply(test, 2, sd)
apply(test, 2, mean)
head(test, 20)
I was just trying to use simdata to generate data with some of the variables constant within clusters (of fixed size). For my purposes the following function was sufficient (where n_obs is the number of observations, cor_mat is the correlation matrix and clustervar is an integer such that 1: clustervar are the indices of the variables constant within clusters):
Maybe this could be a nice extension?