Skip to content

feat(bottlecap): generate file descriptor and threads enhanced metrics#453

Merged
duncanista merged 4 commits intojordan.gonzalez/bottlecap/universal-instrumentationfrom
shreya.malpani/process-enhanced-metrics
Nov 15, 2024
Merged

feat(bottlecap): generate file descriptor and threads enhanced metrics#453
duncanista merged 4 commits intojordan.gonzalez/bottlecap/universal-instrumentationfrom
shreya.malpani/process-enhanced-metrics

Conversation

@shreyamalpani
Copy link
Copy Markdown
Contributor

What does this PR do?

This PR introduces four new enhanced lambda metrics. These metrics are emitted once per invocation and represent file descriptor and threads data across all running processes.

The two file descriptor metrics are:

  • aws.lambda.enhanced.fd_max - maximum number of file descriptors that can be opened by the function
  • aws.lambda.enhanced.fd_use - maximum number of file descriptors open at once over the duration of the function

The two threads metrics are:

  • aws.lambda.enhanced.threads_max - maximum number of threads that can be used by the function
  • aws.lambda.enhanced.threads_use - maximum number of threads in use at once over the duration of the function

Describe how to test/QA changes

Additional Notes

  • fd_max and threads_max both default to send 1024, even if we are not able to read this data, because that is the AWS Lambda limit of max open files and max processes
  • This implementation of fd_use and threads_use considers all running processes, whereas Lambda Insights only considers the process of the lambda itself, so these metrics will always be higher than the numbers reported by Lambda Insights
  • fd_use reflects a peak over the course of the request. Suppose 200 file descriptors are opened, then deleted, then 300 are opened, then deleted. The metric will report 300.
  • threads_use also reflects a peak over the course of the request. Suppose 20 threads are created, then closed, then 30 are created, then closed. The metric will report 30.

@shreyamalpani shreyamalpani marked this pull request as ready for review November 14, 2024 00:19
@shreyamalpani shreyamalpani requested a review from a team as a code owner November 14, 2024 00:19
Comment thread bottlecap/src/proc/mod.rs Outdated
Comment thread bottlecap/tests/proc/13/.gitkeep
Comment thread bottlecap/src/proc/mod.rs Outdated
Copy link
Copy Markdown
Contributor

@alexgallotta alexgallotta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!
Left some comments on style and suggestions, but the only thing to double check is the i32 bits for pids I think

Comment thread bottlecap/src/metrics/enhanced/lambda.rs
Comment thread bottlecap/src/proc/mod.rs Outdated
Comment thread bottlecap/src/proc/mod.rs
Comment thread bottlecap/src/proc/mod.rs
Comment thread bottlecap/src/proc/mod.rs Outdated
Comment thread bottlecap/src/proc/mod.rs Outdated
debug!("File descriptor max data not found in file {}", limits_path);
break;
};
fd_max = fd_max.min(fd_max_pid);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand here: the fd_max is the minimum between current and fd_max_pid? Shouldn't it be the max?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought on this was to get the min of all the limits to find the most constraining limit, I'll add a comment about that I see how its confusing

Comment thread bottlecap/src/proc/mod.rs Outdated
Comment thread bottlecap/src/proc/mod.rs Outdated
Comment thread bottlecap/src/proc/mod.rs Outdated
Comment thread bottlecap/src/proc/mod.rs Outdated
let threads_max = proc::get_threads_max_data(&pids);
let mut threads_use = -1_f64;

let mut interval = interval(Duration::from_millis(1));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we doing this operation every millisecond? Isn't it too much? Could you explain why this low threshold?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's to catch the fd count more accurately in case a large number of files are opened and closed very quickly, 2ms also gets pretty close

Comment thread bottlecap/src/proc/mod.rs Outdated
@duncanista duncanista merged commit 72181b1 into jordan.gonzalez/bottlecap/universal-instrumentation Nov 15, 2024
@duncanista duncanista deleted the shreya.malpani/process-enhanced-metrics branch November 15, 2024 00:41
duncanista pushed a commit that referenced this pull request Nov 15, 2024
#453)

* add fd and threads enhanced metrics

* clippy fixes

* fixes

* rename var
duncanista pushed a commit that referenced this pull request Nov 19, 2024
#453)

* add fd and threads enhanced metrics

* clippy fixes

* fixes

* rename var
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants