Skip to content

Suppress FileNotFoundError when reading status#1298

Merged
blackboxsw merged 3 commits into
canonical:mainfrom
sparkiegeek:status-wait-filenotfound
Apr 8, 2022
Merged

Suppress FileNotFoundError when reading status#1298
blackboxsw merged 3 commits into
canonical:mainfrom
sparkiegeek:status-wait-filenotfound

Conversation

@sparkiegeek
Copy link
Copy Markdown
Contributor

The status file read here is a symlink which is not atomically
updated, since get_status_details already copes with not being able
to read the status file, we simply suppress the error reading it.

LP:1962150

Test Steps

TODO: Needs appropriate testing.

Passes unit tests locally

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@sparkiegeek
Copy link
Copy Markdown
Contributor Author

See also #1281

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @sparkiegeek. It feels a bit like this approach is only fixing the symptom of the race and not the underlying cause.

What do you think about us fixing the underlying problem by avoiding the race in the first place?

diff --git a/cloudinit/temp_utils.py b/cloudinit/temp_utils.py
index e23b65997..005497429 100644
--- a/cloudinit/temp_utils.py
+++ b/cloudinit/temp_utils.py
@@ -107,4 +107,12 @@ def mkstemp(**kwargs):
     return tempfile.mkstemp(**kwargs)
 
 
+def mktemp(**kwargs):
+    if "dir" not in kwargs:
+        kwargs["dir"] = _tempfile_dir_arg(
+            odir=None, needs_exe=kwargs.pop("needs_exe", False)
+        )
+    return tempfile.mktemp(**kwargs)
+
+
 # vi: ts=4 expandtab
diff --git a/cloudinit/util.py b/cloudinit/util.py
index 1fadd1962..2bcbb88e8 100644
--- a/cloudinit/util.py
+++ b/cloudinit/util.py
@@ -1887,7 +1887,12 @@ def is_link(path):
 def sym_link(source, link, force=False):
     LOG.debug("Creating symbolic link from %r => %r", link, source)
     if force and os.path.lexists(link):
-        del_file(link)
+        # Provide atomic update of symlink to avoid races with status --wait
+        # LP: #1962150
+        tmp_link = temp_utils.mktemp(dir=os.path.dirname(link))
+        os.symlink(source, tmp_link)
+        os.replace(tmp_link, link)
+        return
     os.symlink(source, link)
 

This would allow us to replace the existing symlink with a "new" symlink without the del_file of the original link.

@sparkiegeek
Copy link
Copy Markdown
Contributor Author

Thanks for this PR @sparkiegeek. It feels a bit like this approach is only fixing the symptom of the race and not the underlying cause.

What do you think about us fixing the underlying problem by avoiding the race in the first place?

diff --git a/cloudinit/temp_utils.py b/cloudinit/temp_utils.py
index e23b65997..005497429 100644
--- a/cloudinit/temp_utils.py
+++ b/cloudinit/temp_utils.py
@@ -107,4 +107,12 @@ def mkstemp(**kwargs):
     return tempfile.mkstemp(**kwargs)
 
 
+def mktemp(**kwargs):
+    if "dir" not in kwargs:
+        kwargs["dir"] = _tempfile_dir_arg(
+            odir=None, needs_exe=kwargs.pop("needs_exe", False)
+        )
+    return tempfile.mktemp(**kwargs)
+
+
 # vi: ts=4 expandtab
diff --git a/cloudinit/util.py b/cloudinit/util.py
index 1fadd1962..2bcbb88e8 100644
--- a/cloudinit/util.py
+++ b/cloudinit/util.py
@@ -1887,7 +1887,12 @@ def is_link(path):
 def sym_link(source, link, force=False):
     LOG.debug("Creating symbolic link from %r => %r", link, source)
     if force and os.path.lexists(link):
-        del_file(link)
+        # Provide atomic update of symlink to avoid races with status --wait
+        # LP: #1962150
+        tmp_link = temp_utils.mktemp(dir=os.path.dirname(link))
+        os.symlink(source, tmp_link)
+        os.replace(tmp_link, link)
+        return
     os.symlink(source, link)
 

This would allow us to replace the existing symlink with a "new" symlink without the del_file of the original link.

Seems good.

I don't see the need for adding mktemp wrapper given you already have a wrapper for mkstemp (and mktemp comes with lots of warnings).

@blackboxsw
Copy link
Copy Markdown
Collaborator

blackboxsw commented Feb 28, 2022

emp comes with lots of warnings).

@sparkiegeek thanks .The reason I had thought about using mktemp instead of mkstemp was only to get the temporary file with without the actual "work" or creating the file because we are doing that creation via os.symlink. But given that this function is long-deprecated, I'll take a better attempt at this today to make something that is sustainable and doesn't introduce a depency on something that will bite us when this function is dropped.

@blackboxsw
Copy link
Copy Markdown
Collaborator

I don't see the need for adding mktemp wrapper given you already have a wrapper for mkstemp (and mktemp comes with lots of warnings).

mktemp was a poor choice :/.

Since all I think we really want here is a random string for the symlink name, it's probably best to just use our own util.rand_str to get that name within the target directory.

diff --git a/cloudinit/util.py b/cloudinit/util.py
index 1fadd1962..ae3311e3e 100644
--- a/cloudinit/util.py
+++ b/cloudinit/util.py
@@ -1887,7 +1887,12 @@ def is_link(path):
 def sym_link(source, link, force=False):
     LOG.debug("Creating symbolic link from %r => %r", link, source)
     if force and os.path.lexists(link):
-        del_file(link)
+        # Provide atomic update of symlink to avoid races with status --wait
+        # LP: #1962150
+        tmp_link = os.path.join(os.path.dirname(link), "tmp" + rand_str(8))
+        os.symlink(source, tmp_link)
+        os.replace(tmp_link, link)
+        return
     os.symlink(source, link)
 

@github-actions
Copy link
Copy Markdown

Hello! Thank you for this proposed change to cloud-init. This pull request is now marked as stale as it has not seen any activity in 14 days. If no activity occurs within the next 7 days, this pull request will automatically close.

If you are waiting for code review and you are seeing this message, apologies! Please reply, tagging TheRealFalcon, and he will ensure that someone takes a look soon.

(If the pull request is closed and you would like to continue working on it, please do tag TheRealFalcon to reopen it.)

@github-actions github-actions Bot added the stale-pr Pull request is stale; will be auto-closed soon label Mar 15, 2022
@TheRealFalcon TheRealFalcon removed the stale-pr Pull request is stale; will be auto-closed soon label Mar 15, 2022
@TheRealFalcon
Copy link
Copy Markdown
Contributor

@sparkiegeek , do you want help getting this one across the finish line? I think Chad's solution in his most recent comment looks good. Do you agree? If so, I can update the PR and add unit tests.

The status file read here is a symlink which is not atomically
updated, since `get_status_details` already copes with not being able
to read the status file, we simply suppress the error reading it.

LP:1962150
Suggested-by: Chad Smith <chad.smith@canonical.com>
@sparkiegeek sparkiegeek force-pushed the status-wait-filenotfound branch from d5c232d to f6beb15 Compare March 22, 2022 10:00
Comment thread cloudinit/cmd/status.py Outdated
status = UXAppStatus.RUNNING
status_v1 = load_json(load_file(status_file)).get("v1", {})
with contextlib.suppress(FileNotFoundError):
status_v1 = load_json(load_file(status_file)).get("v1", {})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the changes to this file. status_v1 is used a few lines later and will fail if this line had an exception.

Other than that, I think the PR would be good as-in. I have no plans on adding multi-threaded unit tests, so I'm not sure there's any unit testing that needs to be modified.

@blackboxsw , do you agree?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sparkiegeek @TheRealFalcon I think we can just extend the existing unit test to assert that the forced link changed what it points to. I'm good with that.

diff --git a/tests/unittests/test_util.py b/tests/unittests/test_util.py
index 3f3079b04..db6e2bc4b 100644
--- a/tests/unittests/test_util.py
+++ b/tests/unittests/test_util.py
@@ -376,13 +376,16 @@ class TestSymlink(CiTestCase):
         tmpd = self.tmp_dir()
         link = self.tmp_path("link", tmpd)
         target = self.tmp_path("target", tmpd)
+        target2 = self.tmp_path("target2", tmpd)
         util.write_file(target, "hello")
+        util.write_file(target2, "hello2")
 
         util.sym_link(target, link)
         self.assertTrue(os.path.exists(link))
 
-        util.sym_link(target, link, force=True)
+        util.sym_link(target2, link, force=True)
         self.assertTrue(os.path.exists(link))
+        self.assertEqual("hello2", util.load_file(link))
 
     def test_sym_link_dangling_link(self):
         tmpd = self.tmp_dir()

@TheRealFalcon TheRealFalcon force-pushed the status-wait-filenotfound branch from 12bd2b5 to f9761f6 Compare April 7, 2022 15:44
@TheRealFalcon TheRealFalcon marked this pull request as ready for review April 7, 2022 15:45
@TheRealFalcon
Copy link
Copy Markdown
Contributor

I took the liberty of pushing the suggested changes. Assuming CI passes, any objection to me merging this?

Comment thread cloudinit/util.py
# Provide atomic update of symlink to avoid races with status --wait
# LP: #1962150
tmp_link = os.path.join(os.path.dirname(link), "tmp" + rand_str(8))
os.symlink(source, tmp_link)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what guarantees do you have that tmp_link doesn't exist on the filesystem?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have none, but the rand_str(8) makes it pretty unlikely. I suppose we could switch to a uuid if we want to be absolutely sure.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

am just thinking about the Bad Guy ™️ who pre-creates this and then gets you to use it

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, pre-creates what?

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scenario where bad_guy brute force creates all tmp links before we get there results in a traceback in cloud-init which will stop cloud-init in noisily with a Traceback instead of silently allowing malicious data injection (even though our use-cases for symlink don't really lead to anything malicious at the moment).

python3 -c 'import os, pathlib; pathlib.Path("/badguyfile1").touch(); os.symlink("/badguyfile1", "badguytmpfile"); os.symlink("/run/cloud-initfile", "badguytmpfile")'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
FileExistsError: [Errno 17] File exists: '/run/cloud-initfile' -> 'badguytmpfile'

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose just the go the extra mile and avoid the traceback in the event that we happen to have a collision:

diff --git a/cloudinit/util.py b/cloudinit/util.py
index ae3311e3e..df841f210 100644
--- a/cloudinit/util.py
+++ b/cloudinit/util.py
@@ -1890,6 +1890,7 @@ def sym_link(source, link, force=False):
         # Provide atomic update of symlink to avoid races with status --wait
         # LP: #1962150
         tmp_link = os.path.join(os.path.dirname(link), "tmp" + rand_str(8))
+        del_file(tmp_link)
         os.symlink(source, tmp_link)
         os.replace(tmp_link, link)
         return

That would guarantee we never see a Traceback because we clean up our tmp_link before we even attempt to create it,. del_file handles the case where theENOENT where tmp_link doesn't exist so we won't error in this case and we won't collide with "Bad Guy" or any previous invocations that somehow broke before renaming the symlink.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're fine with the traceback (I don't mind, up to you folks) then think actually omitting the del_file is cleaner since you can't guarantee that Bad Guy:tm: won't create the symlink between us deleting it and then creating it (and also has a higher chance of discovering the destination because of the interaction with the FS).

In other words, let's land this! Thanks for helping to get it over the line

Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw Apr 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 awesome thanks @sparkiegeek I agree and will add the del_file. and land this. Total mistread of your perfect English. Will leave out the del_file and we can land it, I don't expect we will hit this trace given the very limited scope and use of sym_link force throughout cloud-init, and the impact of a traceback in current use-cases is minimal to running cloud-init operations.

@blackboxsw blackboxsw force-pushed the status-wait-filenotfound branch from d0496fa to f9761f6 Compare April 8, 2022 17:34
Copy link
Copy Markdown
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants