Context
Inspired byt the discussion here: #711 (comment)
Embedded files are currently greedely fully read into memory during opening of binlog - while they might never be accessed.
Gotchas
Embedded files are a ziparchive which is within the zipstream of a binlog - acessing those later on would require one of those optins:
- either leaving the stream open (and hence not veryfying it's properly terminated)
- or rereading and again decompressing the entire binlog archive.
- or copying the embedded zip archive into separate temporary file
Each of those options have significant downsides. The optimal way would need to be tested.
Alternative
Redesigning the binlog format.
E.g.: compressed events stream and files zip archive would be two independent streams within single file. The file would have few empty bytes prealocated on the begining and those would then be overwritten as the binlog would be writen:
- size of compressed events stream (so that this can be quickly skipped in uncompressed FileStream and the next stream - files ziparchive can be read)
- size of ziparchive (as this cannot be reliably obtained from ZipArchive for possibly larger archives)
- indication of file properly terminated (having this on befgining of file, instead of end, allows the completeness check on initial open, without the need to read the files ZipArchive).
Other possible alternations to the format can be done at the same time to optimize the compression ratio, quicker redaction workflows etc. - e.g.:
- deduplicated strings are packed together and possibly compressed in separate stream from the rest of events. Again - the offset would be part of initial 'table of contents'
FYI @rokonec - he was incepting some of those ideas
Context
Inspired byt the discussion here: #711 (comment)
Embedded files are currently greedely fully read into memory during opening of binlog - while they might never be accessed.
Gotchas
Embedded files are a ziparchive which is within the zipstream of a binlog - acessing those later on would require one of those optins:
Each of those options have significant downsides. The optimal way would need to be tested.
Alternative
Redesigning the binlog format.
E.g.: compressed events stream and files zip archive would be two independent streams within single file. The file would have few empty bytes prealocated on the begining and those would then be overwritten as the binlog would be writen:
Other possible alternations to the format can be done at the same time to optimize the compression ratio, quicker redaction workflows etc. - e.g.:
FYI @rokonec - he was incepting some of those ideas