Suppress FileNotFoundError when reading status#1298
Conversation
|
See also #1281 |
blackboxsw
left a comment
There was a problem hiding this comment.
Thanks for this PR @sparkiegeek. It feels a bit like this approach is only fixing the symptom of the race and not the underlying cause.
What do you think about us fixing the underlying problem by avoiding the race in the first place?
diff --git a/cloudinit/temp_utils.py b/cloudinit/temp_utils.py
index e23b65997..005497429 100644
--- a/cloudinit/temp_utils.py
+++ b/cloudinit/temp_utils.py
@@ -107,4 +107,12 @@ def mkstemp(**kwargs):
return tempfile.mkstemp(**kwargs)
+def mktemp(**kwargs):
+ if "dir" not in kwargs:
+ kwargs["dir"] = _tempfile_dir_arg(
+ odir=None, needs_exe=kwargs.pop("needs_exe", False)
+ )
+ return tempfile.mktemp(**kwargs)
+
+
# vi: ts=4 expandtab
diff --git a/cloudinit/util.py b/cloudinit/util.py
index 1fadd1962..2bcbb88e8 100644
--- a/cloudinit/util.py
+++ b/cloudinit/util.py
@@ -1887,7 +1887,12 @@ def is_link(path):
def sym_link(source, link, force=False):
LOG.debug("Creating symbolic link from %r => %r", link, source)
if force and os.path.lexists(link):
- del_file(link)
+ # Provide atomic update of symlink to avoid races with status --wait
+ # LP: #1962150
+ tmp_link = temp_utils.mktemp(dir=os.path.dirname(link))
+ os.symlink(source, tmp_link)
+ os.replace(tmp_link, link)
+ return
os.symlink(source, link)
This would allow us to replace the existing symlink with a "new" symlink without the del_file of the original link.
Seems good. I don't see the need for adding |
@sparkiegeek thanks .The reason I had thought about using mktemp instead of mkstemp was only to get the temporary file with without the actual "work" or creating the file because we are doing that creation via os.symlink. But given that this function is long-deprecated, I'll take a better attempt at this today to make something that is sustainable and doesn't introduce a depency on something that will bite us when this function is dropped. |
mktemp was a poor choice :/. Since all I think we really want here is a random string for the symlink name, it's probably best to just use our own util.rand_str to get that name within the target directory. diff --git a/cloudinit/util.py b/cloudinit/util.py
index 1fadd1962..ae3311e3e 100644
--- a/cloudinit/util.py
+++ b/cloudinit/util.py
@@ -1887,7 +1887,12 @@ def is_link(path):
def sym_link(source, link, force=False):
LOG.debug("Creating symbolic link from %r => %r", link, source)
if force and os.path.lexists(link):
- del_file(link)
+ # Provide atomic update of symlink to avoid races with status --wait
+ # LP: #1962150
+ tmp_link = os.path.join(os.path.dirname(link), "tmp" + rand_str(8))
+ os.symlink(source, tmp_link)
+ os.replace(tmp_link, link)
+ return
os.symlink(source, link)
|
|
Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close. If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon. (If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.) |
|
@sparkiegeek , do you want help getting this one across the finish line? I think Chad's solution in his most recent comment looks good. Do you agree? If so, I can update the PR and add unit tests. |
The status file read here is a symlink which is not atomically updated, since `get_status_details` already copes with not being able to read the status file, we simply suppress the error reading it. LP:1962150
Suggested-by: Chad Smith <chad.smith@canonical.com>
d5c232d to
f6beb15
Compare
| status = UXAppStatus.RUNNING | ||
| status_v1 = load_json(load_file(status_file)).get("v1", {}) | ||
| with contextlib.suppress(FileNotFoundError): | ||
| status_v1 = load_json(load_file(status_file)).get("v1", {}) |
There was a problem hiding this comment.
I think we can remove the changes to this file. status_v1 is used a few lines later and will fail if this line had an exception.
Other than that, I think the PR would be good as-in. I have no plans on adding multi-threaded unit tests, so I'm not sure there's any unit testing that needs to be modified.
@blackboxsw , do you agree?
There was a problem hiding this comment.
@sparkiegeek @TheRealFalcon I think we can just extend the existing unit test to assert that the forced link changed what it points to. I'm good with that.
diff --git a/tests/unittests/test_util.py b/tests/unittests/test_util.py
index 3f3079b04..db6e2bc4b 100644
--- a/tests/unittests/test_util.py
+++ b/tests/unittests/test_util.py
@@ -376,13 +376,16 @@ class TestSymlink(CiTestCase):
tmpd = self.tmp_dir()
link = self.tmp_path("link", tmpd)
target = self.tmp_path("target", tmpd)
+ target2 = self.tmp_path("target2", tmpd)
util.write_file(target, "hello")
+ util.write_file(target2, "hello2")
util.sym_link(target, link)
self.assertTrue(os.path.exists(link))
- util.sym_link(target, link, force=True)
+ util.sym_link(target2, link, force=True)
self.assertTrue(os.path.exists(link))
+ self.assertEqual("hello2", util.load_file(link))
def test_sym_link_dangling_link(self):
tmpd = self.tmp_dir()12bd2b5 to
f9761f6
Compare
|
I took the liberty of pushing the suggested changes. Assuming CI passes, any objection to me merging this? |
| # Provide atomic update of symlink to avoid races with status --wait | ||
| # LP: #1962150 | ||
| tmp_link = os.path.join(os.path.dirname(link), "tmp" + rand_str(8)) | ||
| os.symlink(source, tmp_link) |
There was a problem hiding this comment.
what guarantees do you have that tmp_link doesn't exist on the filesystem?
There was a problem hiding this comment.
We have none, but the rand_str(8) makes it pretty unlikely. I suppose we could switch to a uuid if we want to be absolutely sure.
There was a problem hiding this comment.
am just thinking about the Bad Guy ™️ who pre-creates this and then gets you to use it
There was a problem hiding this comment.
Sorry, pre-creates what?
There was a problem hiding this comment.
Scenario where bad_guy brute force creates all tmp links before we get there results in a traceback in cloud-init which will stop cloud-init in noisily with a Traceback instead of silently allowing malicious data injection (even though our use-cases for symlink don't really lead to anything malicious at the moment).
python3 -c 'import os, pathlib; pathlib.Path("/badguyfile1").touch(); os.symlink("/badguyfile1", "badguytmpfile"); os.symlink("/run/cloud-initfile", "badguytmpfile")'
Traceback (most recent call last):
File "<string>", line 1, in <module>
FileExistsError: [Errno 17] File exists: '/run/cloud-initfile' -> 'badguytmpfile'There was a problem hiding this comment.
I suppose just the go the extra mile and avoid the traceback in the event that we happen to have a collision:
diff --git a/cloudinit/util.py b/cloudinit/util.py
index ae3311e3e..df841f210 100644
--- a/cloudinit/util.py
+++ b/cloudinit/util.py
@@ -1890,6 +1890,7 @@ def sym_link(source, link, force=False):
# Provide atomic update of symlink to avoid races with status --wait
# LP: #1962150
tmp_link = os.path.join(os.path.dirname(link), "tmp" + rand_str(8))
+ del_file(tmp_link)
os.symlink(source, tmp_link)
os.replace(tmp_link, link)
returnThat would guarantee we never see a Traceback because we clean up our tmp_link before we even attempt to create it,. del_file handles the case where theENOENT where tmp_link doesn't exist so we won't error in this case and we won't collide with "Bad Guy" or any previous invocations that somehow broke before renaming the symlink.
There was a problem hiding this comment.
If you're fine with the traceback (I don't mind, up to you folks) then think actually omitting the del_file is cleaner since you can't guarantee that Bad Guy:tm: won't create the symlink between us deleting it and then creating it (and also has a higher chance of discovering the destination because of the interaction with the FS).
In other words, let's land this! Thanks for helping to get it over the line
There was a problem hiding this comment.
+1 awesome thanks @sparkiegeek I agree and will add the del_file. and land this. Total mistread of your perfect English. Will leave out the del_file and we can land it, I don't expect we will hit this trace given the very limited scope and use of sym_link force throughout cloud-init, and the impact of a traceback in current use-cases is minimal to running cloud-init operations.
d0496fa to
f9761f6
Compare
The status file read here is a symlink which is not atomically
updated, since
get_status_detailsalready copes with not being ableto read the status file, we simply suppress the error reading it.
LP:1962150
Test Steps
TODO: Needs appropriate testing.
Passes unit tests locally
Checklist: