Skip to content

Druid quickstart: Update task memory computation#13563

Merged
kfaraz merged 4 commits intoapache:masterfrom
findingrish:increase-task-memory
Dec 15, 2022
Merged

Druid quickstart: Update task memory computation#13563
kfaraz merged 4 commits intoapache:masterfrom
findingrish:increase-task-memory

Conversation

@findingrish
Copy link
Copy Markdown
Contributor

@findingrish findingrish commented Dec 14, 2022

The pr has following changes:

  • Use 80% of memory specified for running services (versus 50% earlier).
  • Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier).
  • Add direct memory for router.

@findingrish findingrish marked this pull request as ready for review December 14, 2022 12:10
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, @findingrish . LGTM!

@kfaraz kfaraz added this to the 25.0 milestone Dec 14, 2022
@vogievetsky
Copy link
Copy Markdown
Contributor

I think the same sort of adjustment needs to happen to the indexer:

I just ran built this build, and ran ./bin/start-druid -m 16g -s broker,router,historical,coordinator-overlord,indexer,zookeeper and tried to ingest the 3 file trip dataset and it failed:

image

If I use middle managers (./bin/start-druid -m 16g) it works

The query I ran (it is just the default generated one but posting it here for completeness)

REPLACE INTO "trips_60m_2" OVERWRITE ALL
WITH "ext" AS (SELECT *
FROM TABLE(
  EXTERN(
    '{"type":"http","uris":["https://static.imply.io/example-data/trips/trips_xaa.csv.gz","https://static.imply.io/example-data/trips/trips_xab.csv.gz","https://static.imply.io/example-data/trips/trips_xac.csv.gz"]}',
    '{"type":"csv","findColumnsFromHeader":false,"columns":["trip_id","vendor_id","pickup_datetime","dropoff_datetime","store_and_fwd_flag","rate_code_id","pickup_longitude","pickup_latitude","dropoff_longitude","dropoff_latitude","passenger_count","trip_distance","fare_amount","extra","mta_tax","tip_amount","tolls_amount","ehail_fee","improvement_surcharge","total_amount","payment_type","trip_type","pickup","dropoff","cab_type","precipitation","snow_depth","snowfall","max_temperature","min_temperature","average_wind_speed","pickup_nyct2010_gid","pickup_ctlabel","pickup_borocode","pickup_boroname","pickup_ct2010","pickup_boroct2010","pickup_cdeligibil","pickup_ntacode","pickup_ntaname","pickup_puma","dropoff_nyct2010_gid","dropoff_ctlabel","dropoff_borocode","dropoff_boroname","dropoff_ct2010","dropoff_boroct2010","dropoff_cdeligibil","dropoff_ntacode","dropoff_ntaname","dropoff_puma"]}',
    '[{"name":"trip_id","type":"long"},{"name":"vendor_id","type":"long"},{"name":"pickup_datetime","type":"string"},{"name":"dropoff_datetime","type":"string"},{"name":"store_and_fwd_flag","type":"string"},{"name":"rate_code_id","type":"long"},{"name":"pickup_longitude","type":"double"},{"name":"pickup_latitude","type":"double"},{"name":"dropoff_longitude","type":"double"},{"name":"dropoff_latitude","type":"double"},{"name":"passenger_count","type":"long"},{"name":"trip_distance","type":"double"},{"name":"fare_amount","type":"double"},{"name":"extra","type":"double"},{"name":"mta_tax","type":"double"},{"name":"tip_amount","type":"double"},{"name":"tolls_amount","type":"double"},{"name":"ehail_fee","type":"string"},{"name":"improvement_surcharge","type":"string"},{"name":"total_amount","type":"double"},{"name":"payment_type","type":"long"},{"name":"trip_type","type":"string"},{"name":"pickup","type":"string"},{"name":"dropoff","type":"string"},{"name":"cab_type","type":"string"},{"name":"precipitation","type":"double"},{"name":"snow_depth","type":"long"},{"name":"snowfall","type":"long"},{"name":"max_temperature","type":"long"},{"name":"min_temperature","type":"long"},{"name":"average_wind_speed","type":"double"},{"name":"pickup_nyct2010_gid","type":"long"},{"name":"pickup_ctlabel","type":"long"},{"name":"pickup_borocode","type":"long"},{"name":"pickup_boroname","type":"string"},{"name":"pickup_ct2010","type":"long"},{"name":"pickup_boroct2010","type":"long"},{"name":"pickup_cdeligibil","type":"string"},{"name":"pickup_ntacode","type":"string"},{"name":"pickup_ntaname","type":"string"},{"name":"pickup_puma","type":"long"},{"name":"dropoff_nyct2010_gid","type":"long"},{"name":"dropoff_ctlabel","type":"long"},{"name":"dropoff_borocode","type":"long"},{"name":"dropoff_boroname","type":"string"},{"name":"dropoff_ct2010","type":"long"},{"name":"dropoff_boroct2010","type":"long"},{"name":"dropoff_cdeligibil","type":"string"},{"name":"dropoff_ntacode","type":"string"},{"name":"dropoff_ntaname","type":"string"},{"name":"dropoff_puma","type":"long"}]'
  )
))
SELECT
  TIME_PARSE("pickup_datetime") AS "__time",
  "trip_id",
  "vendor_id",
  "dropoff_datetime",
  "store_and_fwd_flag",
  "rate_code_id",
  "pickup_longitude",
  "pickup_latitude",
  "dropoff_longitude",
  "dropoff_latitude",
  "passenger_count",
  "trip_distance",
  "fare_amount",
  "extra",
  "mta_tax",
  "tip_amount",
  "tolls_amount",
  "ehail_fee",
  "improvement_surcharge",
  "total_amount",
  "payment_type",
  "trip_type",
  "pickup",
  "dropoff",
  "cab_type",
  "precipitation",
  "snow_depth",
  "snowfall",
  "max_temperature",
  "min_temperature",
  "average_wind_speed",
  "pickup_nyct2010_gid",
  "pickup_ctlabel",
  "pickup_borocode",
  "pickup_boroname",
  "pickup_ct2010",
  "pickup_boroct2010",
  "pickup_cdeligibil",
  "pickup_ntacode",
  "pickup_ntaname",
  "pickup_puma",
  "dropoff_nyct2010_gid",
  "dropoff_ctlabel",
  "dropoff_borocode",
  "dropoff_boroname",
  "dropoff_ct2010",
  "dropoff_boroct2010",
  "dropoff_cdeligibil",
  "dropoff_ntacode",
  "dropoff_ntaname",
  "dropoff_puma"
FROM "ext"
PARTITIONED BY MONTH

@vogievetsky
Copy link
Copy Markdown
Contributor

Ok so you might have read my comment above and wondered, wait what indexer configs are you using? It is a fair question.

I copy a custom set of indexer configs into the micro-quickstart and then I swap middleManager to indexer in the supervise script.

I did a visual diff of my working configs with the non working auto configs and mine had this section in them:

# Processing threads and buffers on Peons
druid.processing.numMergeBuffers=1
druid.processing.buffer.sizeBytes=100000000
druid.processing.numThreads=4

I copied the block above to conf/druid/auto/indexer/runtime.properties and restarted ./bin/start-druid -m 16g -s broker,router,historical,coordinator-overlord,indexer,zookeeper and the ingest worked.

I hope this gives a clue to how to fix this. I presume one of these configs (probably druid.processing.buffer.sizeBytes needs to be hooked to the auto tuning)
`

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Dec 15, 2022

Merging this PR for now. The fix for indexer can be done in a follow up PR post Druid 25, tracking the issue here #13569.

@kfaraz
Copy link
Copy Markdown
Contributor

kfaraz commented Dec 15, 2022

Merging as the failure is unrelated.

@kfaraz kfaraz merged commit 97bc022 into apache:master Dec 15, 2022
findingrish added a commit to findingrish/druid that referenced this pull request Dec 15, 2022
Changes:
* Use 80% of memory specified for running services (versus 50% earlier).
* Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier). 
* Add direct memory for router.
kfaraz pushed a commit that referenced this pull request Dec 15, 2022
Changes:
* Use 80% of memory specified for running services (versus 50% earlier).
* Tasks get either 512m / 1024m or 2048m now (versus 512m or 2048m earlier). 
* Add direct memory for router.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants