Hi According to the tfx examples, I pass the pipeline_options to generate_statistics_from_csv which set --direct_num_workers=16 like:
pipeline_options = PipelineOptions(['--direct_num_workers=16'])
It's seem that this option cannot speed up this API, when I set direct_num_workers=1, the cost time is equal the 16 worker, like that:
# direct_num_workers=1
python prep.py 99.27s user 5.84s system 99% cpu 1:45.67 total
# direct_num_workers=16
python prep.py 101.92s user 5.22s system 98% cpu 1:48.44 total
Could someone help me?
Hi According to the tfx examples, I pass the
pipeline_optionstogenerate_statistics_from_csvwhich set--direct_num_workers=16like:It's seem that this option cannot speed up this API, when I set
direct_num_workers=1, the cost time is equal the 16 worker, like that:Could someone help me?