-
Notifications
You must be signed in to change notification settings - Fork 16.4k
WIP KPO log cleanup #36227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP KPO log cleanup #36227
Conversation
|
@dstandish Let me know if we can help test this. We have a long-running KubernetesPodOperator running dbt and we had to turn off |
|
Hi @troyharvey , my secret master plan is to do a larger refactor of KPO logging. In short here's the plan:
it may sound complicated but i think it's ultimately simpler and more reliable than what we have now. and it should resolve your problem. however, in order to do this refactor, i want to try to not regress the various fixes people have contributed (such as #34412), so i need to understand what the heck it is doing or at least rely on the tests. but the tests pass even with the code removed. which is why i published this PR, so that the author of #34412 can help clarify the situation, and we can make this better. |
|
also @troyharvey interestingly we have received reports of this happening with DBT. it always seems to be DBT-related. we actually worked pretty hard to reproduce but could not. but that's part of the motivation for this effort. do you happen to have a reliable repro? |
|
hey @troyharvey i actually do have something for you to try.... there's actualy a read timeout parameter we can use for k8s log reads. you can try it out with this patch the tuple means can play with the numbers. seems to work though! let me know what you find |
|
Just to make sure it was intentional, you didn't revert all the changes, where you removed the statement that logs the error message: self.log.error(
"Error parsing timestamp (no timestamp in message %r). "
"Will continue execution but won't update timestamp",
line,
)For the rest, the main goal is aggregating the multiline log. For the following log: log = (
"2020-10-08T14:16:17.793417674Z message1 line1\n"
"message1 line2\n"
"message1 line3\n"
"2020-10-08T14:16:18.793417674Z message2 line1\n"
"message2 line2\n"
"2020-10-08T14:16:19.793417674Z message3 line1\n"
)Your PR logs: While my code logs: I agree that the test doesn't cover this check, but IMHO, we need to update the test and not the code to aggregate the log lines if they are really split for some reason. |
Like I tried to indicate, my goal was to not necessarily to revert your change. But I wanted to understand what your change was trying to do. And one way to do that is to revert it and use the test to see expected behavior vs actual. I agree it would be best if you can update the tests with the right "expected" cases that really reveal what it's supposed to do, both with respect to
It was nontrivial to apply the revert patch, maybe I missed that one. But if we are to keep the code and just update tests it's moot. |
|
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
All this does in effect is revert #34412. I created this PR simply do demonstrate that when we remove the code added in #34412, the tests added in that PR still pass. Which means that either the tests are wrong, or the code is unnecessary.
I am trying to clean up KPO logging logic a bit, and resolve some problems we've encountered with logs hanging. But I don't want to regress #34412. So I need either its tests to work properly, or to conclude that the logic doesn't do anything and we can remove it.