This package offers a flexible and optimized framework for fitting parametric Gaussian process models to large datasets using the linear-cost Vecchia approximation. This library offers several unique features that set it apart from others in the space:
- Because it is written in Julia, you can provide any covariance function you want (as opposed to being restricted to a list of pre-implemented models).
- If autodiff works on your kernel function, it will work for the approximation
(and have good parallel performance for gradients and Hessians!). See
BesselK.jl for a
ForwardDiff.jl-compatible Matern function, including the smoothness parameter. - It is implemented in a way that allows chunking, where the fundamental unit in each small likelihood approximation can be a vector of measurements, not just a singleton.
- It offers fully arbitrary mean functions, not just linear ones.
- Because of how great the Julia optimization ecosystem is, you can hook into much more powerful optimizers than is possible in, for example, R.
See the example files for commented demonstrations.
If you have data data measured at locations pts::Vector{SVector{D,Float64}}
and you want to fit the parameters of covariance function kernel(pt_1, pt_2, params),
here is a very simple template for doing so. This code is optimized to scale
very well with multiple threads, so please remember to start Julia with
julia -t $NTHREADS among your other arguments.
using Vecchia
# See below, docstrings, and examples for details on specifying options.
# This object itself also can be used as a function to give the negative
# log-likelihood, so `appx(params)` (and autodiff of appx) is supported.
appx = VecchiaApproximation(pts, kernel, data)
# Load in some extensions for estimation. To be modular, this package doesn't
# hard-code this dependence. The syntax for using alternative solvers is
# demonstrated in the example files.
using ForwardDiff, NLPModels, UnoSolver
# Specify a solver and hand everything to vecchia_estimate. Optionally,
# you can substitute the expected Fisher information matrix in place of
# the true Hessian, which will be faster since it only requires first
# partial derivatives of the covariance function with respect to params.
solver = NLPModelsSolver(uno; preset="filtersqp")
mle = vecchia_estimate(appx, some_init, solver;
expected_fisher=true) # only for MeanZero() (for now!)
# Perhaps now you wanted to predict with your fitted model?
# (see the docstrings for options, but hopefully the defaults will serve you well!)
preds = predict(appx, prediction_pts, mle)
# With those predictions, you can get the conditional mean and conditional
# variances rapidly, and for small numbers of prediction points you can get
# the full dense conditional covariance matrix.
cond_mean = conditional_mean(preds)
cond_vars = conditional_variances(preds)
cond_covar = full_conditional_covariance(preds) # beware of runtime!It's that easy! Enjoy your linear-cost (approximate) MLEs, predictions, conditional simulations, preconditioners, and more.
If you have a mean function, you have two options. You can write your functions with
covfun(x, y, params) = # uses params[1:whatever]
meanfun(x, params) = # manually uses params[(whatever+1):end]in which case you can just modify your approximation constructor with
appx = VecchiaApproximation(pts, kernel, data; meanfun=meanfun)and hand everything to the optimizer and get a Vector{Float64} back as usual.
Alternatively, if you want to write your functions as
covfun(x, y, params) = # uses params[1:whatever_it_needs]
meanfun(x, params) = # manually uses params[1:whatever_it_needs]then you should additionally modify your estimation code with
fancy_init = Parameters(cov_init, mean_init)
mle = vecchia_estimate(appx, fancy_init, solver)Now what comes back will be a Vecchia.Parameters object, which has separate
fields .cov_params and .mean_params. You can also hand that object right to
predict instead of a Vector{Float64} and everything will work as expected.
While it is not necessary for evaluating the negative log-likelihood, in some
settings it can be useful to use this approximation to produce a sparse
approximation to the model-implied precision matrix. For this purpose,
rchol(appx, params) gives a (permuted) sparse inverse Cholesky factor for your
matrix. If ordering=NoPermutation() in your VecchiaApproximation
constructor, then one precisely has the approximation that your precision matrix
is approximated with U*U', where U is UpperTriangular. Build it like so:
using SparseArrays
# This object has some standard methods: mul!/*, ldiv!/\, logdet, etc.
U = rchol(appx, sample_p)Additionally, this package offers the convenience constructor of
rchol_preconditioner, which gives a preconditioner object for your implied
kernel matrix. Here is an example building the preconditioner and solving a
linear system with Krylov.jl.
using Krylov
# Note that we don't need to pass the data in here when building the
# VecchiaApproximation, as we won't need it.
#
# For a preconditioner, I recommend cranking up the number of conditioning
# points a bit past the defaults, which are tuned for likelihoods.
appx = VecchiaApproximation(pts, matern; conditioning=KNNConditioning(25))
pre = rchol_preconditioner(appx, [5.0, 0.1, 2.25])
# the dense exact covariance matrix for comparison that will be used for cg.
M = [matern(x, y, [5.0, 0.1, 2.25]) for x in pts, y in pts]
# a sample RHS.
v = collect(1.0:length(pts))
# a Vecchia preconditioner in action (try re-running this without the
# preconditioner and see how slowly it converges...)
sol2 = cg(Symmetric(M), v; M=pre, ldiv=false, verbose=1) # ~ 15 iterationsThe full syntax for the constructor looks like this:
appx = VecchiaApproximation(pts, kernel, data;
ordering=default_ordering(pts),
predictionsets=default_predictionsets(),
conditioning=default_conditioning())where:
orderingrefers to the way that the points are ordered, which can heavily impact approximation accuracy. The default is a canonical sorting in 1D and a random ordering in 2+ dimensions. Very heuristically, the "best" generic ordering is the most efficiently space-filling one. Random ordering is a pretty decent fast approximation to that.predictionsetsis used to specify whether you are "chunking" your approximation. This package supports doing block Vecchia approximations, where the fundamental unit isn't a univariate conditional distribution. At the moment due to the migration, I haven't re-implemented any blocked constructors yet, so the only implemented option isSingletonPredictionSets(), which is the default.conditioningrefers to how conditioning sets are chosen given the ordering. The default isKNNConditioning(k), wherek=5in 1D andk=10in 2+D. You can also specify ametricfromDistances.jlwithKNNConditioning(k, metric). If you're on a sphere andptsare polar coordinates, for example, you should useKNNConditioning(k, Haversine(1.0)).
The ultimate goal of this design is to make adding new specialized constructors simple and straightforward, so that fancier and more specific configuration options can be offered via extensions or implemented on your own in custom code. With time, I will provide more options that also demonstrates how one might do that.
- A careful investigation into memoization or similar approaches to speed up construction and negative log-likelihood evaluation.
- Some time for users to kick the tires on the new design.
If you use this software in your work, particularly if you actually use second-order optimization with the real Hessians, please cite the package itself:
@software{Geoga_Vecchia_jl,
author = {Geoga, Christopher J.},
title = {Vecchia.jl},
url = {https://github.com/cgeoga/Vecchia.jl},
year = {2021},
publisher = {Github}
}
I would also be curious to see/hear about your application if you're willing to share it or take the time to tell me about it.