Skip to content

Conversation

@KaelynSom
Copy link
Contributor

Addition of parallel compression where possible in multithreaded processes.

Split the compression function into two components:

  1. compress - Actual compression of the given file
  2. cleancompressed - Cleans up the after of compression if required from the first function (delete or move the old vs new file)

Performance Statistics

Compression:

  • Single Threaded - 90.9 seconds
  • Multithreaded (8 threads) - 19.6 seconds

Decompression

  • Single Threaded - 4.7 seconds
  • Multithreaded (8 threads) - 1.6 seconds

Verifications of changes

Context on data

Table 1 - 2025.08.07 Partition - Quote - 2.5m rows - 9 columns
Table 2 - 2025.08.08 Partition - Quote - 2.5m rows - 9 columns
Table 3 - 2025.08.18 Partition - Trade - 0.6m rows - 8 columns

Single Threaded Process Verification

ksomerville@homer:~/repo/TorQ$ q torq.q -proctype test -procname mytest -debug

Verify single threaded
q)\s
0i

View Current compression ratio - returns nothing so currently not compressed
q)-21!`:/home/ksomerville/testDB/hdb/2025.08.07/quote/bid
()

Modify Compression File to have compression stats of (2;16;9):
`:/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv

q)read0 `:/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv

"table,minage,column,calgo,cblocksize,clevel"
"default,1,default,2,16,9"

Run the compression

q) .cmp.docompression[hdbpath::/home/ksomerville/testDB/hdb;csvpath::/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv]

2025.08.20D10:35:16.425280344|homer.aquaq.co.uk|test|mytest|INF|compression|Opening :/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv
2025.08.20D10:35:16.425580325|homer.aquaq.co.uk|test|mytest|INF|compression|scanning hdb directory structure
2025.08.20D10:35:16.427341320|homer.aquaq.co.uk|test|mytest|INF|compression|getting current size of each file up to a maximum age of 0W
2025.08.20D10:35:16.428936670|homer.aquaq.co.uk|test|mytest|INF|compression|compressing file :/home/ksomerville/testDB/hdb/2025.08.07/quote/asize with algo: 2, blocksize: 16, and level: 9.
2025.08.20D10:35:31.490386281|homer.aquaq.co.uk|test|mytest|INF|compression|File compressed successfully; matches orginal. Deleting original.
2025.08.20D10:35:31.494224780|homer.aquaq.co.uk|test|mytest|INF|compression|compressing file :/home/ksomerville/testDB/hdb/2025.08.07/quote/ask with algo: 2, blocksize: 16, and level: 9.
2025.08.20D10:35:35.110539713|homer.aquaq.co.uk|test|mytest|INF|compression|File compressed successfully; matches orginal. Deleting original.
..... 
2025.08.20D10:36:47.698069037|homer.aquaq.co.uk|test|mytest|INF|compression|compressing file :/home/ksomerville/testDB/hdb/2025.08.18/trade/time with algo: 2, blocksize: 16, and level: 9.
2025.08.20D10:36:48.147347226|homer.aquaq.co.uk|test|mytest|INF|compression|File compressed successfully; matches orginal. Deleting original.
2025.08.20D10:36:48.148971086|homer.aquaq.co.uk|test|mytest|INF|compression|Memory savings from compression: 277.75MB. Total compression ratio: 6.48.
2025.08.20D10:36:48.148993440|homer.aquaq.co.uk|test|mytest|INF|compression|Additional memory used from de-compression: 0.00MB. Total de-compression ratio: .
2025.08.20D10:36:48.149005764|homer.aquaq.co.uk|test|mytest|INF|compression|Check .cmp.statstab for info on each file.

Verify Compression has been applied:
q)-21!`:/home/ksomerville/testDB/hdb/2025.08.07/quote/bid

compressedLength  | 5687327
uncompressedLength| 21137896
algorithm         | 2i
logicalBlockSize  | 16i
zipLevel          | 9i

Run Decompression

Change the algo in the compression config to 0:
read0 `:/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv

"table,minage,column,calgo,cblocksize,clevel"
"default,1,default,0,16,9"

Run decompression:
q).cmp.docompression[hdbpath::/home/ksomerville/testDB/hdb;csvpath::/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv]

2025.08.20D10:39:10.729348995|homer.aquaq.co.uk|test|mytest|INF|compression|Opening :/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv
2025.08.20D10:39:10.729565762|homer.aquaq.co.uk|test|mytest|INF|compression|scanning hdb directory structure
2025.08.20D10:39:10.730656536|homer.aquaq.co.uk|test|mytest|INF|compression|getting current size of each file up to a maximum age of 0W
2025.08.20D10:39:10.732080105|homer.aquaq.co.uk|test|mytest|INF|compression|Single threaded process, compress applied sequentially
2025.08.20D10:39:10.732201025|homer.aquaq.co.uk|test|mytest|INF|compression|decompressing file :/home/ksomerville/testDB/hdb/2025.08.07/quote/asize with algo: 0, blocksize: 16, and level: 9.
2025.08.20D10:39:10.888851273|homer.aquaq.co.uk|test|mytest|INF|compression|File decompressed :/home/ksomerville/testDB/hdb/2025.08.07/quote/asize successfully; matches orginal. Deleting original.
2025.08.20D10:39:10.938715888|homer.aquaq.co.uk|test|mytest|INF|compression|decompressing file :/home/ksomerville/testDB/hdb/2025.08.07/quote/ask with algo: 0, blocksize: 16, and level: 9.
2025.08.20D10:39:11.125199968|homer.aquaq.co.uk|test|mytest|INF|compression|File decompressed :/home/ksomerville/testDB/hdb/2025.08.07/quote/ask successfully; matches orginal. Deleting original.
....
2025.08.20D10:39:13.744531666|homer.aquaq.co.uk|test|mytest|INF|compression|decompressing file :/home/ksomerville/testDB/hdb/2025.08.18/trade/time with algo: 0, blocksize: 16, and level: 9.
2025.08.20D10:39:13.784353728|homer.aquaq.co.uk|test|mytest|INF|compression|File decompressed :/home/ksomerville/testDB/hdb/2025.08.18/trade/time successfully; matches orginal. Deleting original.
2025.08.20D10:39:13.786344158|homer.aquaq.co.uk|test|mytest|INF|compression|Memory savings from compression: 0.00MB. Total compression ratio: .
2025.08.20D10:39:13.786368366|homer.aquaq.co.uk|test|mytest|INF|compression|Additional memory used from de-compression: 277.75MB. Total de-compression ratio: -0.15.
2025.08.20D10:39:13.786377981|homer.aquaq.co.uk|test|mytest|INF|compression|Check .cmp.statstab for info on each file.

Verify the file is no longer compressed:
q)-21!`:/home/ksomerville/testDB/hdb/2025.08.07/quote/bid
()

Multithreaded Process

Start the test processes with a max thread setting of 8:
ksomerville@homer:~/repo/TorQ$ q torq.q -proctype test -procname mytest -debug -s 8

Verify processes is multithreaded
q)\s
8i

Show that the files are uncompressed (not touched from above):
q)-21!`:/home/ksomerville/testDB/hdb/2025.08.07/quote/bid
()

Set the Compression algo back to (2;16;9)
q) read0 `:/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv

"table,minage,column,calgo,cblocksize,clevel"
"default,1,default,2,16,9"

Run the Compression:
q).cmp.docompression[hdbpath::/home/ksomerville/testDB/hdb;csvpath::/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv]

2025.08.20D10:46:55.353539620|homer.aquaq.co.uk|test|mytest|INF|compression|Opening :/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv
2025.08.20D10:46:55.353745262|homer.aquaq.co.uk|test|mytest|INF|compression|scanning hdb directory structure
2025.08.20D10:46:55.360599162|homer.aquaq.co.uk|test|mytest|INF|compression|getting current size of each file up to a maximum age of 0W
2025.08.20D10:46:55.361543420|homer.aquaq.co.uk|test|mytest|INF|compression|Multithreaded process, compress applied in parallel
2025.08.20D10:46:55.362048704|homer.aquaq.co.uk|test|mytest|INF|compression|compressing file :/home/ksomerville/testDB/hdb/2025.08.07/quote/mode with algo: 2, blocksize: 16, and level: 9.
2025.08.20D10:46:55.362448998|homer.aquaq.co.uk|test|mytest|INF|compression|compressing file :/home/ksomerville/testDB/hdb/2025.08.07/quote/bid with algo: 2, blocksize: 16, and level: 9.
2025.08.20D10:46:55.362863217|homer.aquaq.co.uk|test|mytest|INF|compression|compressing file :/home/ksomerville/testDB/hdb/2025.08.07/quote/ex with algo: 2, blocksize: 16, and level: 9.
2025.08.20D10:46:55.362864769|homer.aquaq.co.uk|test|mytest|INF|compression|compressing file :/home/ksomerville/testDB/hdb/2025.08.07/quote/sym with algo: 2, blocksize: 16, and level: 9.
...
2025.08.20D10:47:01.966100639|homer.aquaq.co.uk|test|mytest|INF|compression|compressing file :/home/ksomerville/testDB/hdb/2025.08.08/trade/sym with algo: 2, blocksize: 16, and level: 9.
2025.08.20D10:47:02.115062680|homer.aquaq.co.uk|test|mytest|INF|compression|compressing file :/home/ksomerville/testDB/hdb/2025.08.18/trade/sym with algo: 2, blocksize: 16, and level: 9.
2025.08.20D10:47:14.090004899|homer.aquaq.co.uk|test|mytest|INF|compression|File compressed :/home/ksomerville/testDB/hdb/2025.08.07/quote/asize successfully; matches orginal. Deleting original.
2025.08.20D10:47:15.275834022|homer.aquaq.co.uk|test|mytest|INF|compression|File compressed :/home/ksomerville/testDB/hdb/2025.08.07/quote/ask successfully; matches orginal. Deleting original.
2025.08.20D10:47:18.003927302|homer.aquaq.co.uk|test|mytest|INF|compression|File compressed :/home/ksomerville/testDB/hdb/2025.08.07/quote/bid successfully; matches orginal. Deleting original.
...
2025.08.20D10:47:25.931134231|homer.aquaq.co.uk|test|mytest|INF|compression|File compressed :/home/ksomerville/testDB/hdb/2025.08.18/trade/stop successfully; matches orginal. Deleting original.
2025.08.20D10:47:25.950448280|homer.aquaq.co.uk|test|mytest|INF|compression|File compressed :/home/ksomerville/testDB/hdb/2025.08.18/trade/sym successfully; matches orginal. Deleting original.
2025.08.20D10:47:26.003025705|homer.aquaq.co.uk|test|mytest|INF|compression|File compressed :/home/ksomerville/testDB/hdb/2025.08.18/trade/time successfully; matches orginal. Deleting original.
2025.08.20D10:47:26.014115653|homer.aquaq.co.uk|test|mytest|INF|compression|Memory savings from compression: 277.75MB. Total compression ratio: 6.48.
2025.08.20D10:47:26.014161522|homer.aquaq.co.uk|test|mytest|INF|compression|Additional memory used from de-compression: 0.00MB. Total de-compression ratio: .
2025.08.20D10:47:26.014188255|homer.aquaq.co.uk|test|mytest|INF|compression|Check .cmp.statstab for info on each file.

Verify compression applied:
q) -21!`:/home/ksomerville/testDB/hdb/2025.08.07/quote/bid

compressedLength  | 5687327
uncompressedLength| 21137896
algorithm         | 2i
logicalBlockSize  | 16i
zipLevel          | 9i

Run decompression on the data, change compression algo to 0:
q) read0 `:/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv

"table,minage,column,calgo,cblocksize,clevel"
"default,1,default,0;16,9"

Run the Decompression:
q).cmp.docompression[hdbpath::/home/ksomerville/testDB/hdb;csvpath::/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv]

2025.08.20D10:49:52.308128840|homer.aquaq.co.uk|test|mytest|INF|compression|Opening :/home/ksomerville/deploy/TorQApp/latest/appconfig/compressionconfig.csv
2025.08.20D10:49:52.308254537|homer.aquaq.co.uk|test|mytest|INF|compression|scanning hdb directory structure
2025.08.20D10:49:52.340113480|homer.aquaq.co.uk|test|mytest|INF|compression|getting current size of each file up to a maximum age of 0W
2025.08.20D10:49:52.369159977|homer.aquaq.co.uk|test|mytest|INF|compression|Multithreaded process, compress applied in parallel
2025.08.20D10:49:52.369476564|homer.aquaq.co.uk|test|mytest|INF|compression|decompressing file :/home/ksomerville/testDB/hdb/2025.08.07/quote/ask with algo: 0, blocksize: 16, and level: 9.
2025.08.20D10:49:52.369518251|homer.aquaq.co.uk|test|mytest|INF|compression|decompressing file :/home/ksomerville/testDB/hdb/2025.08.07/quote/bsize with algo: 0, blocksize: 16, and level: 9.
..
2025.08.20D10:49:52.689600888|homer.aquaq.co.uk|test|mytest|INF|compression|decompressing file :/home/ksomerville/testDB/hdb/2025.08.18/trade/time with algo: 0, blocksize: 16, and level: 9.
2025.08.20D10:49:52.954294011|homer.aquaq.co.uk|test|mytest|INF|compression|File decompressed :/home/ksomerville/testDB/hdb/2025.08.07/quote/asize successfully; matches orginal. Deleting original.
2025.08.20D10:49:53.104145196|homer.aquaq.co.uk|test|mytest|INF|compression|File decompressed :/home/ksomerville/testDB/hdb/2025.08.07/quote/ask successfully; matches orginal. Deleting original.
...
2025.08.20D10:49:54.524779300|homer.aquaq.co.uk|test|mytest|INF|compression|File decompressed :/home/ksomerville/testDB/hdb/2025.08.18/trade/sym successfully; matches orginal. Deleting original.
2025.08.20D10:49:54.547016266|homer.aquaq.co.uk|test|mytest|INF|compression|File decompressed :/home/ksomerville/testDB/hdb/2025.08.18/trade/time successfully; matches orginal. Deleting original.
2025.08.20D10:49:54.548035643|homer.aquaq.co.uk|test|mytest|INF|compression|Memory savings from compression: 0.00MB. Total compression ratio: .
2025.08.20D10:49:54.548052858|homer.aquaq.co.uk|test|mytest|INF|compression|Additional memory used from de-compression: 277.75MB. Total de-compression ratio: -0.15.
2025.08.20D10:49:54.548059614|homer.aquaq.co.uk|test|mytest|INF|compression|Check .cmp.statstab for info on each file.

Verify the data isn'compressed:
q)-21!`:/home/ksomerville/testDB/hdb/2025.08.07/quote/bid
()

Copy link
Member

@jonathonmcmurray jonathonmcmurray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments that are largely unrelated to your actual change but while we're modifying this file may as well clean it up a bit

Comment on lines 190 to 191
/-log to the table if the algo wasn't 0
statstab,:$[not 0=algo;(filetoCompress;algo;(-21!filetoCompress)`compressedLength;sizeuncomp);(filetoCompress;algo;comprL;sizeuncomp)]];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do this logging in the compressmaxage function rather than the cleancompressed - I think we then can reduce the args to cleancompressed and skip calcing comprL.

statstab,:$[not 0=algo;(filetoCompress;algo;(-21!filetoCompress)`compressedLength;sizeuncomp);(filetoCompress;algo;comprL;sizeuncomp)]];
[$[not count -21!compressedFile;
[.lg.o[`compression; "Failed to compress file ",string[filetoCompress]];hdel compressedFile];
[.lg.o[`compression;cmp,"compressed ","file ",string[compressedFile]," doesn't match original. Deleting new file"];hdel compressedFile]]]]]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the number of ] on this line is horrifying - I think we should be able to refactor this function a bit to avoid the [...] block syntax inside conditionals and make this a bit easier to follow

e.g. something like

    if[()~ key compressedFile;
        .lg.o[`compression; "No compressed file present for the following file - ",string[filetoCompress];
        :();
     ];
    if[not ((get compressedFile)~sf:get filetoCompress) & (count -21!compressedFile) or algo=0;
         .lg.o[`compression;cmp,"compressed ","file ",string[compressedFile]," doesn't match original. Deleting new file"];
         hdel compressedFile;
         :();
     ];
     // rest of code to rename file etc.

Comment on lines 133 to 138
$[0= system"s";
[.lg.o[`compression; "Single threaded process, compress applied sequentially"];
{compress[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize];cleancompressed[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize]} each table];
[.lg.o[`compression; "Multithreaded process, compress applied in parallel "];
{compress[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize]} peach table;
{cleancompressed[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize]} each table;]]}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be cleaner if we made this into two functions that can be called depending on whether proc is single threaded or not

Suggested change
$[0= system"s";
[.lg.o[`compression; "Single threaded process, compress applied sequentially"];
{compress[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize];cleancompressed[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize]} each table];
[.lg.o[`compression; "Multithreaded process, compress applied in parallel "];
{compress[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize]} peach table;
{cleancompressed[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize]} each table;]]}
$[0= system"s";singlethreadcompress;multithreadcompress]table;
}
singlethreadcompress:{[table]
.lg.o[`compression; "Single threaded process, compress applied sequentially"];
{compress[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize];
cleancompressed[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize]} each table;
}
multithreadcompress:{[table]
.lg.o[`compression; "Multithreaded process, compress applied in parallel "];
{compress[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize]} peach table;
{cleancompressed[x `fullpath;x `calgo;x `cblocksize;x `clevel; x `currentsize]} each table;
}

or something like this

clean the code and make it more modular depending
on threads count
@jonathonmcmurray jonathonmcmurray merged commit c2635d5 into master Sep 4, 2025
@jonathonmcmurray jonathonmcmurray deleted the parallel_compression branch September 4, 2025 13:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants