Reduce the calls to ink_get_hrtime in the event loop#12602
Reduce the calls to ink_get_hrtime in the event loop#12602cmcfarlen merged 3 commits intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR optimizes the event loop by reducing the frequency of ink_get_hrtime() calls. Instead of calling it multiple times per event loop iteration, the PR introduces a configurable mechanism to cache the time and update it probabilistically after processing each event.
Key changes:
- Introduces a new configuration parameter
proxy.config.exec_thread.event_time_update_rate(default: 10%) to control how often the event loop timestamp is refreshed - Modifies
process_event()andprocess_queue()to accept and return anevent_timeparameter, allowing time value reuse across the event loop - Reduces
ink_get_hrtime()calls from once-per-event to once-per-loop plus probabilistic updates
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/records/RecordsConfig.cc | Adds the new configuration parameter for controlling event time update rate |
| src/iocore/eventsystem/UnixEThread.cc | Implements the time caching logic in the event loop, modifying process_event() and process_queue() to pass and return cached time values |
| src/iocore/eventsystem/EventSystem.cc | Establishes the static configuration binding for the new event time update rate parameter |
| include/iocore/eventsystem/EThread.h | Updates method signatures to accept and return ink_hrtime for time caching |
| doc/admin-guide/files/records.yaml.en.rst | Documents the new configuration parameter with usage details |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
| tail_cb->waitForActivity(sleep_time); | ||
|
|
||
| // loop cleanup | ||
| loop_finish_time = ink_get_hrtime(); |
There was a problem hiding this comment.
note the time is updated after waiting here unconditionally.
| will create its own domain socket with a ``-<thread id>`` suffix added to the | ||
| end of the path. | ||
|
|
||
| .. ts:cv:: CONFIG proxy.config.exec_thread.event_time_update_rate INT 10 |
There was a problem hiding this comment.
Instead of spelling out that this is reloadable etc., you should add
:reloadable:
| RecEstablishStaticConfigInt32(thread_freelist_low_watermark, "proxy.config.allocator.thread_freelist_low_watermark"); | ||
|
|
||
| extern int event_time_update_rate; | ||
| RecEstablishStaticConfigInt32(event_time_update_rate, "proxy.config.exec_thread.event_time_update_rate"); |
There was a problem hiding this comment.
Not sure I like the name of this config :)
Feels like it should be clear that it's for event metrics, and not the event system itself ?
| .. ts:cv:: CONFIG proxy.config.exec_thread.loop_time_update_probability INT 10 | ||
| :reloadable: | ||
|
|
||
| This dynamically loadable setting controls the rate that exec thread loop timestamps are |
There was a problem hiding this comment.
Should this say "this dynamic reloadable" ? Or just "this reloadable" ?
| }; | ||
|
|
||
| int thread_max_heartbeat_mseconds = THREAD_MAX_HEARTBEAT_MSECONDS; | ||
| int loop_time_update_probability = 10; |
There was a problem hiding this comment.
Mostly a question: Would it make sense if we were to start making these types of config driven globals std::atomic ?

We have noticed that under high RPS loads ATS spends measurable amount of time just getting the current time. In #10163, after some benchmarks, we changed from using a thread-local variable to cache the time to just calling
ink_get_hrtimewhich fixed the time issue we had as well as increased RPS for some benchmarks. That PR replaced all references to the time cache variable with calls toink_get_hrtime. That meant, however, that each iteration through the event loop callsink_get_hrtimemultiple times, including one time for each event processed.This PR attempts to relax that a bit, but also provides a knob to control how often the event loop time is updated (as a percentage chance to be updated after an event is processed). More often can lead to more accurate timers, but also increases the number of calls to get the time.