Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
1e30e8e
Directory for tiny test data
michaelkleber Jan 15, 2026
131a4a7
Add files via upload
michaelkleber Jan 15, 2026
5f6b0b0
Add Digit Scanner (#1)
michaelkleber Jan 16, 2026
06717ba
Enable scan for up to d=13 digits
michaelkleber Jan 16, 2026
b1c66fb
Update DigitScanner.cpp
michaelkleber Jan 16, 2026
bf25b1a
Update DigitScanner.h
michaelkleber Jan 16, 2026
c8c7c1c
Create README.md explaining digit scanning
michaelkleber Jan 17, 2026
3866e9e
Make it build in MSVC
michaelkleber Jan 18, 2026
0b6dc05
Make it build in MSVC (try #2)
michaelkleber Jan 18, 2026
ad846a3
Revert "Make it build in MSVC (try #2)"
michaelkleber Jan 18, 2026
e9b3504
Reapply "Make it build in MSVC (try #2)"
michaelkleber Jan 18, 2026
fb228fb
Revert "Make it build in MSVC"
michaelkleber Jan 18, 2026
ac12e82
add new files to project or solution or whatever this thing is called
michaelkleber Jan 18, 2026
69a2b4a
Merge branch 'master' of https://github.com/michaelkleber/DigitViewer
michaelkleber Jan 18, 2026
b78ecbe
Efficiency improvement: test bit before atomic_or
michaelkleber Jan 20, 2026
146a42a
Speed up by tracking unconditionally some info that gets discarded.
michaelkleber Jan 20, 2026
54d0598
If almost all strings are found, report which are missing.
michaelkleber Feb 1, 2026
dbffaac
Inlcude any leading 0's in the last-seen digit string.
michaelkleber Feb 1, 2026
415bad5
Bitvector mapreduce
michaelkleber Apr 14, 2026
da9705a
Update README.md to include results for d=12,13
michaelkleber Apr 14, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
588 changes: 588 additions & 0 deletions Source/DigitViewer2/DigitScanner/DigitScanner.cpp

Large diffs are not rendered by default.

28 changes: 28 additions & 0 deletions Source/DigitViewer2/DigitScanner/DigitScanner.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
/* DigitScanner.h
*
* Author : Michael Kleber
* Date Created : 01/15/2026
* Last Modified : 01/15/2026
* Copyright 2026 Google LLC
*
*/

#pragma once
#include "PublicLibs/Types.h"

namespace DigitViewer2 {
using namespace ymp;

class BasicDigitReader;

class DigitScanner {
public:
DigitScanner(BasicDigitReader& reader, upL_t d);
void search();

private:
BasicDigitReader& m_reader;
upL_t m_d;
};

}
85 changes: 85 additions & 0 deletions Source/DigitViewer2/DigitScanner/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
Scanning for All Strings of Digits
========
by Michael Kleber

Code in this directory implements a way to scan through a large file of digits until _every_ sequence of $d$ digits has appeared.

Are you wondering "Does my 10-digit phone number appear in the digits of pi?"
Yes it does, somewhere in the first 241,641,121,048 digits.
What about your 16-digit credit card number?
I don't know — we haven't calculated enough digits of pi to see every 16-digit number.
(Yet.)

## Background

Pi, and many other numbers you can compute with y-cruncher, are believed to be [normal numbers](https://en.wikipedia.org/wiki/Normal_number).
This would mean that every sequence of $d$ decimal digits should appear in it, in approximately $1/(10^d)$ of the possible locations.
(That's what you would expect if the digits were random... and we have every reason to believe that pi's digits behave like random ones _from this particular point of view_.)

That leads to asking the very natural question:
"Out of the $10^d$ sequences of $d$ digits, which one takes the longest to appear, and how many digits does it take?"

* For d=1, the digit 0 is the last one to show up in pi, all the way out at the 32nd place after the decimal point: 3.1415926535897932384626433832795**0**2...
* For d=2 you need to go out to 606 places before you finally see the two-digit sequence 68.
* When Fabrice Bellard calculated 2.7 trillion digits of pi, he scanned for all sequences up to d=11, reported [here](https://bellard.org/pi/pi2700e9/pidigits.html#:~:text=scan%20decimal%20expansion%20of%20pi) in 2010.
* The scan for d=12 used the code in this directory, running on the [100 trillion digits computed by Google](https://pi.delivery/).
* The scan for d=13 used the code in this directory, running on the [314 trillion digits computed by StorageReview](https://www.storagereview.com/review/storagereview-sets-new-pi-record-314-trillion-digits-on-a-dell-poweredge-r7725).

| d | digits needed | last d-digit seq |
|:-:|---------------------:|:----------------:|
| 1| 32 | `0` |
| 2| 606 | `68` |
| 3| 8,555 | `483` |
| 4| 99,849 | `6716` |
| 5| 1,369,564 | `33394` |
| 6| 14,118,312 | `569540` |
| 7| 166,100,506 | `1075656` |
| 8| 1,816,743,912 | `36432643` |
| 9| 22,445,207,406 | `172484538` |
| 10| 241,641,121,048 | `5918289042` |
| 11| 2,512,258,603,207 | `56377726040` |
| 12| 27,261,146,164,637 | `717542605965` |
| 13| 294,420,436,740,325 | `8683109988379` |

* These are recorded in the [On-line Encyclopedia of Integer Sequences](https://oeis.org/) as entries [A036903](https://oeis.org/A036903) and [A032510](https://oeis.org/A032510).

For a 50-50 chance of seeing all sequences of 14 digits, you would need
[around 3.26 _quadrillion_](https://www.wolframalpha.com/input?i=N%5Bexp%28-n+exp%28-w%2Fn%29%29%5D+where+n+%3D+10%5E14+and+w+%3D+3.26+quadrillion)
random digits, so don't hold your breath.


## Algorithm

### Basic idea
To search for every string of $d$ digits:
* Make a bitvector of $10^d$ zeros
* Look at strings of $d$ digits one at a time, considered as a $d$-digit number $n$.
* If the $n$'th bit in the bitstring is a $0$, then you've found a new string!
* Go you! Add one to the variable "how many strings I've found so far."
* If that variable equals $10^d$, you've seen them all! Have a party.
* If the $n$'th bit in the bitstring is already a $1$, nothing to see here, move along.

If you have a lot of digits, a lot of memory, and a lot of time, this will do the job.

If you don't have $10^d$ bits of memory, then you could scan the digits more than once —
"Okay _this_ time I'm going to only pay attention to $d$-digit strings that start with a 7."
This multi-scan idea is not implemented here. Call a friend with more RAM.

### Parallelization and efficiency
To run this search faster, we use many threads. We can't have all those threads writing to the same memory at once
(their changes might clobber each other), so we implement a little mapreduce-like arrangement: The mapper threads each own a
chunk of digits and convert them into d-digit values; the reducer threads each own a chunk of memory and flip bits from 0 to 1
when the value is seen. The shuffling between mappers and reducers is implemented by storing the values in an NxN array
of vectors of values, where vector (i,j) holds values produced by mapper i and consumed by reducer j.

We stop that approach when the bitvector is getting close to all 1's, and switch to a new phase where we track the arrival
of the last few thousand strings in a (mutex-guarded) hash map that remembers at what position those strings finally appear.
This lets us keep using many threads and still find out which string took the longest to first show up.

The bitvector phase of the search is sped up by issuing memory prefetch hints, since the CPU spending all its time
asking for randomly-placed individual bits in a very large span of memory is a latency-pessimal access pattern.
The hash map phase uses a quick little Bloom filter to do less hashing.

The cutover point between the two search phases, the memory prefetch hint details, and the number of threads to use
are definitely sensitive to what exact hardware you're running on. If you plan to run this code for large $d$
(say 10 or up), you may profit from tuning these to your setup.
14 changes: 14 additions & 0 deletions Source/DigitViewer2/DigitViewer/DigitViewerTasks.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
#include "DigitViewer2/DigitWriters/BasicDigitWriter.h"
#include "DigitViewer2/DigitWriters/BasicTextWriter.h"
#include "DigitViewer2/DigitWriters/BasicYcdSetWriter.h"
#include "DigitViewer2/DigitScanner/DigitScanner.h"
#include "DigitViewerTasks.h"
namespace DigitViewer2{
////////////////////////////////////////////////////////////////////////////////
Expand Down Expand Up @@ -479,8 +480,21 @@ void to_ycd_file_partial(BasicDigitReader& reader){
);
process_write(reader, start_pos, end_pos - start_pos, writer, start_pos);
}
void find_last_d_string(BasicDigitReader& reader){
Console::println("\n\nFind Last d-Digit String");
Console::println();

// Get d from the user.
upL_t d = Console::scan_label_upL_range("Enter d (1-13): ", 1, 13);
Console::println();

DigitScanner scanner(reader, d);
scanner.search();
}
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
}


2 changes: 2 additions & 0 deletions Source/DigitViewer2/DigitViewer/DigitViewerTasks.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,11 @@ void compute_stats(BasicDigitReader& reader);
void to_text_file(BasicDigitReader& reader);
void to_ycd_file_all(BasicDigitReader& reader);
void to_ycd_file_partial(BasicDigitReader& reader);
void find_last_d_string(BasicDigitReader& reader);
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
}
#endif

19 changes: 16 additions & 3 deletions Source/DigitViewer2/DigitViewer/DigitViewerUI2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,11 @@ void Menu_TextFile(BasicTextReader& reader){
Console::println("Compress digits 1 - N into one or more .ycd files.", 'G');
Console::print(" 4 ", 'w');
Console::println("Compress a subset of digits into .ycd files.", 'G');
Console::print(" 5 ", 'w');
Console::println("Search for all d-digit strings.", 'G');

Console::println("\nEnter your choice:", 'w');
upL_t c = Console::scan_label_upL_range("option: ", 0, 4);
upL_t c = Console::scan_label_upL_range("option: ", 0, 5);
Console::println();

switch (c){
Expand All @@ -73,6 +75,9 @@ void Menu_TextFile(BasicTextReader& reader){
case 4:
to_ycd_file_partial(reader);
return;
case 5:
find_last_d_string(reader);
return;
default:;
}
}
Expand Down Expand Up @@ -115,14 +120,16 @@ void Menu_YcdFile(BasicYcdSetReader& reader){
Console::println("Compress digits 1 - N into one or more .ycd files.", 'G');
Console::print(" 4 ", 'w');
Console::println("Compress a subset of digits into .ycd files.", 'G');
Console::print(" 5 ", 'w');
Console::println("Search for all d-digit strings.", 'G');
Console::println();

Console::print(" 5 ", 'w');
Console::print(" 6 ", 'w');
Console::print("Add search directory.", 'G');
Console::println(" (if .ycd files are in multiple paths)", 'Y');

Console::println("\nEnter your choice:", 'w');
upL_t c = Console::scan_label_upL_range("option: ", 0, 5);
upL_t c = Console::scan_label_upL_range("option: ", 0, 6);
Console::println();

switch (c){
Expand All @@ -142,6 +149,10 @@ void Menu_YcdFile(BasicYcdSetReader& reader){
to_ycd_file_partial(reader);
return;
case 5:
find_last_d_string(reader);
return;

case 6:
Console::println("\nEnter directory:");
reader.add_search_path(Console::scan_utf8());
break;
Expand Down Expand Up @@ -200,3 +211,5 @@ void Menu_Main(){
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
}


3 changes: 3 additions & 0 deletions Source/DigitViewer2/Objects.mk
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,12 @@ CURRENT += DigitWriters/BasicTextWriter.cpp
CURRENT += DigitWriters/BasicYcdFileWriter.cpp
CURRENT += DigitWriters/BasicYcdSetWriter.cpp

CURRENT += DigitScanner/DigitScanner.cpp

CURRENT += DigitViewer/DigitViewerTasks.cpp
CURRENT += DigitViewer/DigitViewerUI2.cpp


SOURCES := $(SOURCES) $(addprefix $(CURRENT_DIR)/, $(CURRENT))
endif

2 changes: 2 additions & 0 deletions Source/DigitViewer2/SMC_DigitViewer2.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,5 @@

#include "DigitViewer/DigitViewerTasks.cpp"
#include "DigitViewer/DigitViewerUI2.cpp"

#include "DigitScanner/DigitScanner.cpp"
7 changes: 7 additions & 0 deletions Source/PublicLibs/BasicLibs/StringTools/ToString.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,13 @@ YM_NO_INLINE std::string tostrln(uiL_t x, NumberFormat format){
YM_NO_INLINE std::string tostrln(siL_t x, NumberFormat format){
return tostr(x, format) += "\r\n";
}
YM_NO_INLINE std::string tostr_width(uiL_t x, int width){
std::ostringstream out;
out << std::setfill('0');
out << std::setw(width);
out << x;
return out.str();
}
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
Expand Down
1 change: 1 addition & 0 deletions Source/PublicLibs/BasicLibs/StringTools/ToString.h
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ YM_NO_INLINE std::string tostrln (uiL_t x, NumberFormat format = NORMAL);
YM_NO_INLINE std::string tostrln (siL_t x, NumberFormat format = NORMAL);
static std::string tostrln (u32_t x, NumberFormat format = NORMAL){ return tostrln((uiL_t)x, format); }
static std::string tostrln (s32_t x, NumberFormat format = NORMAL){ return tostrln((siL_t)x, format); }
YM_NO_INLINE std::string tostr_width (uiL_t x, int width);
////////////////////////////////////////////////////////////////////////////////
////////////////////////////////////////////////////////////////////////////////
// Float
Expand Down
1 change: 1 addition & 0 deletions TinyTestData/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Minimal .ycd file of 1 million decimal digits, just to have for testing purposes.
Binary file added TinyTestData/pi1m - 0.ycd
Binary file not shown.
32 changes: 17 additions & 15 deletions VSS - DigitViewer2/DigitViewer2/DigitViewer2.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -62,102 +62,102 @@
<VCProjectVersion>15.0</VCProjectVersion>
<ProjectGuid>{78460907-F11F-45DF-A8B3-BCF1D8E54EC5}</ProjectGuid>
<RootNamespace>DigitViewer2</RootNamespace>
<WindowsTargetPlatformVersion>10.0.17763.0</WindowsTargetPlatformVersion>
<WindowsTargetPlatformVersion>10.0</WindowsTargetPlatformVersion>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='04-SSE3|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='07-Penryn|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='13-Haswell|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='17-Skylake|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='00-x86|Win32'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>true</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='04-SSE3|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='07-Penryn|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='13-Haswell|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='17-Skylake|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
<PropertyGroup Condition="'$(Configuration)|$(Platform)'=='00-x86|x64'" Label="Configuration">
<ConfigurationType>Application</ConfigurationType>
<UseDebugLibraries>false</UseDebugLibraries>
<PlatformToolset>v141</PlatformToolset>
<PlatformToolset>v145</PlatformToolset>
<WholeProgramOptimization>true</WholeProgramOptimization>
<CharacterSet>MultiByte</CharacterSet>
</PropertyGroup>
Expand Down Expand Up @@ -564,6 +564,7 @@
<ClCompile Include="..\..\Source\DigitViewer2\DigitReaders\BasicYcdSetReader.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitReaders\InconsistentMetadataException.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitReaders\ParsingTools.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitScanner\DigitScanner.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitViewer\DigitViewerTasks.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitViewer\DigitViewerUI2.cpp" />
<ClCompile Include="..\..\Source\DigitViewer2\DigitWriters\BasicTextWriter.cpp" />
Expand Down Expand Up @@ -699,6 +700,7 @@
<ClInclude Include="..\..\Source\DigitViewer2\DigitReaders\BasicYcdSetReader.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitReaders\InconsistentMetadataException.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitReaders\ParsingTools.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitScanner\DigitScanner.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitViewer\DigitViewerTasks.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitViewer\DigitViewerUI2.h" />
<ClInclude Include="..\..\Source\DigitViewer2\DigitWriters\BasicDigitWriter.h" />
Expand Down
Loading