Skip to content

Print commas in long decimals #612

@elshize

Description

@elshize

This ticket should cover all numbers printed when building index:

[2025-06-14 08:35:07.418] [info] [Batch 0] Processed documents [0, 100000)
[2025-06-14 08:35:15.251] [info] [Batch 1] Processed documents [100000, 200000)
[2025-06-14 08:35:21.422] [info] [Batch 3] Processed documents [300000, 369721)
[2025-06-14 08:35:21.697] [info] [Batch 2] Processed documents [200000, 300000)
[2025-06-14 08:35:21.777] [info] Merging titles
[2025-06-14 08:35:21.781] [info] Creating document lexicon
[2025-06-14 08:35:21.803] [info] Merging URLs
[2025-06-14 08:35:21.803] [info] Collecting terms
[2025-06-14 08:35:22.343] [info] Writing terms
[2025-06-14 08:35:22.386] [info] Mapping terms
[2025-06-14 08:35:22.602] [info] Remapping IDs
[2025-06-14 08:35:23.519] [info] Concatenating batches
[2025-06-14 08:35:23.894] [info] Success.
[2025-06-14 08:35:24.181] [info] Number of worker threads: 12
[2025-06-14 08:35:24.191] [info] Inverting [0, 100000)
[2025-06-14 08:35:27.442] [info] Inverting [100000, 200000)
[2025-06-14 08:35:30.468] [info] Inverting [200000, 300000)
[2025-06-14 08:35:33.652] [info] Inverting [300000, 369721)
[2025-06-14 08:35:36.502] [info] Number of terms: 687555
[2025-06-14 08:35:36.502] [info] Number of documents: 369721
[2025-06-14 08:35:36.502] [info] Number of postings: 42181908
[2025-06-14 08:35:36.783] [info] Fixed block size: 64
[2025-06-14 08:35:36.783] [info] Dropping 0 terms
[2025-06-14 08:35:36.783] [info] Reading sizes...
[2025-06-14 08:35:36.784] [info] Storing max weight for each list and for each block...
Storing terms statistics: 100% [0s]
Storing score upper bounds: 100% [0s]
[2025-06-14 08:35:37.615] [info] number of elements / number of blocks: 32.457684
[2025-06-14 08:35:37.660] [info] Processing 369721 documents
Create index: 100% [0s]/workdir/wikir/inv:block_simdbp

[2025-06-14 08:35:38.218] [info] Index compressed in 0.557833 seconds
{"type": block_simdbp, "worker_threads": 12, "construction_time": 0.557833}
<TOP>: 61874559
    m_endpoints: 732992
        m_bits: 732984
    m_lists: 61141546
[2025-06-14 08:35:38.484] [info] Documents: 46030037 bytes, 8.729816014960727 bits per element
[2025-06-14 08:35:38.484] [info] Frequencies: 15111509 bytes, 2.865969742288566 bits per element
{"size": 61141546, "docs_size": 46030037, "freqs_size": 15111509, "bits_per_doc": 8.72982, "bits_per_freq": 2.86597}
[2025-06-14 08:35:38.484] [info] Checking the written data, just to be extra safe...
[2025-06-14 08:35:38.792] [info] Everything is OK!

This could be done with stream.imbue(std::locale("en_US.UTF-8")) but there might be a better way.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions