Actually a two-parter:
-
With each mod, embed the profile and settings that were used to create it (including load order, installed/found mods), so that it can easily be recreated. This is "somewhat" useful in and of itself, for troubleshooting, but really is an enabler of:
-
Allow updating an existing merge instead of creating a new one, based only on the changes made to the profile. Build times could be described as "decent" (under 5 minutes, usually), but shaving that 5 minutes down to 5 seconds would be extra awesome.
This will actually require the metadata to be fairly detailed if we want to be able to keep the merge clean. For example, if we've copied a bunch of head parts and related assets for a specific subset of NPC choices that have all been changed, then those records and files are all orphans and need to be deleted. The metadata therefore needs to track everything that was added, and the reason (i.e. originating record or file) why it was added, in a tree or more likely graph structure, since diamonds are clearly possible (mod X includes NPCs A and B who use headparts C and D which both reference the same texture). This graph then has to be traversed and updated in order to prune obsolete dependencies.
What about file assets? These may have changed, e.g. a bugfix for a particular mod was installed which just updated a single mesh or texture. We might be able to use simple MD5 or CRC32 to check for changes. This requires reading all the files as they currently are. Is this faster than just copying them all again? Uncertain - copying sure does take a long time, especially the facegen/facetint files, but checksumming them might take almost as long, and if a large number have changed, then that time is added to the copy time. (Note - we don't have to recompute the hashes for the previous merge, we can just store those when it's created the first time.)
What about BSAs? BSAs are immutable... or are they? They're immutable with the way the API currently works, but theoretically there actually is a low-O mutation algorithm if only a small number of files have changed:
- Compute new header (index) size
- Append new files to the end of the archive
- Move file blobs from the beginning of the files section to the end of the file until the "hole" is greater than the difference between previous and current header size
- Either do nothing with the additional empty space (between end of new header and beginning of "moved" files section), if this proves to be stable in-game, or zero it out and add a "padding file" to the index. It's a generated patch, so it makes no difference if there's one garbage file in there.
- Rewrite the header entirely (probably fast enough) or update it with the new/changed offsets
- On next inc/diff build, fill in the padding area if it's large enough to hold the new/changed files, or expand it the same way as above if not.
A few problems: (a) this crosses multiple libraries and is totally untested, (b) pre-BA2 structure containing directory offsets probably throws a wrench in the gears, although this isn't unsolvable with the right algorithms; and (c) this significantly raises the complexity of keeping archive sizes under 2 GB - a possible workaround is to only expand the last archive in the sequence, and use the same uncompressed-size rule to decide when it's too large.
The complexity of incremental/differential BSAing is extremely high. Even though it seems technically possible, and maybe technically interesting, it may not be worth the effort here. A lower-tech option that is likely to work for many people is to do the diffs/additions as loose files, and if the loose file list seems to be growing unreasonably large (say, over 1 GB), then give the option to "compact" i.e. rebuild the BSAs using loose files as overrides. This could even be set up as a user-defined build rule ("compact when loose files > X"), with a reasonable default, and many/most users would either not really notice or never even encounter it.
Definitely leaning toward the low-tech option for now, with high-tech being left for some way-off-in-the-future release. The biggest risk with low-tech is someone changing the loose files and invalidating the hashes, but we can say that, metaphorically speaking, tampering with the output voids the warranty. And in any case, the issue can always be fixed by just generating a brand-new output mod.
Blah blah blah, tl;dr - this is doable in the short term, hopefully by beta or shortly after, but is liable to be a little bit on the ugly or "brute force" side.
Actually a two-parter:
With each mod, embed the profile and settings that were used to create it (including load order, installed/found mods), so that it can easily be recreated. This is "somewhat" useful in and of itself, for troubleshooting, but really is an enabler of:
Allow updating an existing merge instead of creating a new one, based only on the changes made to the profile. Build times could be described as "decent" (under 5 minutes, usually), but shaving that 5 minutes down to 5 seconds would be extra awesome.
This will actually require the metadata to be fairly detailed if we want to be able to keep the merge clean. For example, if we've copied a bunch of head parts and related assets for a specific subset of NPC choices that have all been changed, then those records and files are all orphans and need to be deleted. The metadata therefore needs to track everything that was added, and the reason (i.e. originating record or file) why it was added, in a tree or more likely graph structure, since diamonds are clearly possible (mod X includes NPCs A and B who use headparts C and D which both reference the same texture). This graph then has to be traversed and updated in order to prune obsolete dependencies.
What about file assets? These may have changed, e.g. a bugfix for a particular mod was installed which just updated a single mesh or texture. We might be able to use simple MD5 or CRC32 to check for changes. This requires reading all the files as they currently are. Is this faster than just copying them all again? Uncertain - copying sure does take a long time, especially the facegen/facetint files, but checksumming them might take almost as long, and if a large number have changed, then that time is added to the copy time. (Note - we don't have to recompute the hashes for the previous merge, we can just store those when it's created the first time.)
What about BSAs? BSAs are immutable... or are they? They're immutable with the way the API currently works, but theoretically there actually is a low-O mutation algorithm if only a small number of files have changed:
A few problems: (a) this crosses multiple libraries and is totally untested, (b) pre-BA2 structure containing directory offsets probably throws a wrench in the gears, although this isn't unsolvable with the right algorithms; and (c) this significantly raises the complexity of keeping archive sizes under 2 GB - a possible workaround is to only expand the last archive in the sequence, and use the same uncompressed-size rule to decide when it's too large.
The complexity of incremental/differential BSAing is extremely high. Even though it seems technically possible, and maybe technically interesting, it may not be worth the effort here. A lower-tech option that is likely to work for many people is to do the diffs/additions as loose files, and if the loose file list seems to be growing unreasonably large (say, over 1 GB), then give the option to "compact" i.e. rebuild the BSAs using loose files as overrides. This could even be set up as a user-defined build rule ("compact when loose files > X"), with a reasonable default, and many/most users would either not really notice or never even encounter it.
Definitely leaning toward the low-tech option for now, with high-tech being left for some way-off-in-the-future release. The biggest risk with low-tech is someone changing the loose files and invalidating the hashes, but we can say that, metaphorically speaking, tampering with the output voids the warranty. And in any case, the issue can always be fixed by just generating a brand-new output mod.
Blah blah blah, tl;dr - this is doable in the short term, hopefully by beta or shortly after, but is liable to be a little bit on the ugly or "brute force" side.