Best performance to date on VM: 4.4k records per second ingest from fetching URLs.
Best performance on laptop DK: factor 3 higher than that (at least)
Question: what's the bottle neck?
Hypothesis: IO bound writing to disk, laptop is SSD, see performance chart GCE here
Idea to test: upgrade 100 GB balanced persistent disk to SSD.
- Base price (100 GB, all for Europe west-4, excluding compute, just storage) Zonal balanced PD: EUR 9.34 per month
- Zonal SSD PD: EUR 15.88 per month
- Regional SSD PD: EUR 31.76 per month
Compared to VM compute cost (182 euro per month) this is negligible. The question is whether this will actually be faster: write IO for Zonal Balanced PD is the same as Zonal SSD PD.
On the other hand, I think the KPI to look for is sustained throughput. This is also influenced by disk size
But I think it is worth a try. If jobs run faster, this will actually save money because compute is the most expensive part.
Proposal: try let's try Zonal SSD PD with 256 GB, which is 40 euros a month (compared to 15.88). This should have a throughput of 122 MB/s (compared to 15 MB/s for standard pd; I can't find the throughput for balanced pd).
Best performance to date on VM: 4.4k records per second ingest from fetching URLs.
Best performance on laptop DK: factor 3 higher than that (at least)
Question: what's the bottle neck?
Hypothesis: IO bound writing to disk, laptop is SSD, see performance chart GCE here
Idea to test: upgrade 100 GB balanced persistent disk to SSD.
Compared to VM compute cost (182 euro per month) this is negligible. The question is whether this will actually be faster: write IO for Zonal Balanced PD is the same as Zonal SSD PD.
On the other hand, I think the KPI to look for is sustained throughput. This is also influenced by disk size
But I think it is worth a try. If jobs run faster, this will actually save money because compute is the most expensive part.
Proposal: try let's try Zonal SSD PD with 256 GB, which is 40 euros a month (compared to 15.88). This should have a throughput of 122 MB/s (compared to 15 MB/s for standard pd; I can't find the throughput for balanced pd).