-
-
Notifications
You must be signed in to change notification settings - Fork 72
Open
Description
This ticket should cover all numbers printed when building index:
[2025-06-14 08:35:07.418] [info] [Batch 0] Processed documents [0, 100000)
[2025-06-14 08:35:15.251] [info] [Batch 1] Processed documents [100000, 200000)
[2025-06-14 08:35:21.422] [info] [Batch 3] Processed documents [300000, 369721)
[2025-06-14 08:35:21.697] [info] [Batch 2] Processed documents [200000, 300000)
[2025-06-14 08:35:21.777] [info] Merging titles
[2025-06-14 08:35:21.781] [info] Creating document lexicon
[2025-06-14 08:35:21.803] [info] Merging URLs
[2025-06-14 08:35:21.803] [info] Collecting terms
[2025-06-14 08:35:22.343] [info] Writing terms
[2025-06-14 08:35:22.386] [info] Mapping terms
[2025-06-14 08:35:22.602] [info] Remapping IDs
[2025-06-14 08:35:23.519] [info] Concatenating batches
[2025-06-14 08:35:23.894] [info] Success.
[2025-06-14 08:35:24.181] [info] Number of worker threads: 12
[2025-06-14 08:35:24.191] [info] Inverting [0, 100000)
[2025-06-14 08:35:27.442] [info] Inverting [100000, 200000)
[2025-06-14 08:35:30.468] [info] Inverting [200000, 300000)
[2025-06-14 08:35:33.652] [info] Inverting [300000, 369721)
[2025-06-14 08:35:36.502] [info] Number of terms: 687555
[2025-06-14 08:35:36.502] [info] Number of documents: 369721
[2025-06-14 08:35:36.502] [info] Number of postings: 42181908
[2025-06-14 08:35:36.783] [info] Fixed block size: 64
[2025-06-14 08:35:36.783] [info] Dropping 0 terms
[2025-06-14 08:35:36.783] [info] Reading sizes...
[2025-06-14 08:35:36.784] [info] Storing max weight for each list and for each block...
Storing terms statistics: 100% [0s]
Storing score upper bounds: 100% [0s]
[2025-06-14 08:35:37.615] [info] number of elements / number of blocks: 32.457684
[2025-06-14 08:35:37.660] [info] Processing 369721 documents
Create index: 100% [0s]/workdir/wikir/inv:block_simdbp
[2025-06-14 08:35:38.218] [info] Index compressed in 0.557833 seconds
{"type": block_simdbp, "worker_threads": 12, "construction_time": 0.557833}
<TOP>: 61874559
m_endpoints: 732992
m_bits: 732984
m_lists: 61141546
[2025-06-14 08:35:38.484] [info] Documents: 46030037 bytes, 8.729816014960727 bits per element
[2025-06-14 08:35:38.484] [info] Frequencies: 15111509 bytes, 2.865969742288566 bits per element
{"size": 61141546, "docs_size": 46030037, "freqs_size": 15111509, "bits_per_doc": 8.72982, "bits_per_freq": 2.86597}
[2025-06-14 08:35:38.484] [info] Checking the written data, just to be extra safe...
[2025-06-14 08:35:38.792] [info] Everything is OK!
This could be done with stream.imbue(std::locale("en_US.UTF-8")) but there might be a better way.
Metadata
Metadata
Assignees
Labels
No labels