Skip to content

Conversation

@fanjiyun
Copy link
Contributor

@fanjiyun fanjiyun commented Feb 5, 2018

  • Description for the changelog
    Rolling back the port configs if failed to programIngress()

Signed-off-by: fanjiyun fan.jiyun@zte.com.cn

@codecov-io
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (master@20dd462). Click here to learn what that means.
The diff coverage is 0%.

Impacted file tree graph

@@           Coverage Diff            @@
##             master   #2069   +/-   ##
========================================
  Coverage          ?   40.5%           
========================================
  Files             ?     138           
  Lines             ?   22174           
  Branches          ?       0           
========================================
  Hits              ?    8982           
  Misses            ?   11875           
  Partials          ?    1317
Impacted Files Coverage Δ
service_linux.go 3.25% <0%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 20dd462...f51df90. Read the comment docs.

@fcrisciani fcrisciani requested a review from ctelfer March 8, 2018 23:24
service_linux.go Outdated
if sb.ingress {
filteredPorts = filterPortConfigs(ingressPorts, false)
if err := programIngress(gwIP, filteredPorts, false); err != nil {
filterPortConfigs(ingressPorts, true)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that potentially that is not enough, it will depend on where the failure happened... if that fails in the middle there can be some half configured stuff left behind

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur. The programIngress() function does not fail gracefully or leave the system in a consistent state if it fails, unfortunately. These changes are necessary for a successful rollback, but insufficient.

Reviewing programIngress() a bit I see there are several different stages of changes:

  1. first ingress programming after startup

  2. basic iptables setup required for per-port configuration

  3. per-ingress port rule add/deletion consisting of:
    a. DNAT rule programming
    b. SNAT rule programming
    c. filter bypass for the given port
    d. setting up a listening proxy (for reasons I don't quite get yet)

Failures in 1. never abort processing: they only flag the error. So no problem there. Failures in 2. do abort processing, but every time programIngress(...true) /* add */ is invoked, 2 checks each step to avoid clobbering existing work. So if failure happens here, we are probably ok to just undo the filterPortConfigs().

Failures in part 3. are the problem in several ways. We could have gotten through some, but not all of the ports. In that case we should either undo the operation we just did on the port configurations that were successful, or return a list of ports that we were not successful in programming. Those could then be passed to filterPortConfigs() to undo their tracking. The latter solution seems simpler and cleaner to me.

Unfortunately, that's not quite sufficient either, because note that programming each port is a multi-stage process. If the programming fails in the middle of one of the stages and we want to make things clean, we have to undo the parts that did succeed. (e.g. if 3c fails, we need to undo parts 3a and 3b for that port to leave things in a consistent state.)

But even that isn't quite sufficient, because it depends on whether this was for add or delete and the reason for the failure. If the operation was an "add", then removing the partial add makes sense. But if the operation was a remove and it failed because for some reason the entry wasn't there, then adding it back would be an error. Finally, there is the question of what to do if for some bizarre reason rolling back the partial port configuration fails. Ugh. :(

I think the goal of this patch is definitely worthwhile, but making such a solution robust is going to be tricky to say the least.

Copy link
Contributor

@ctelfer ctelfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I think that the patch is insufficient. The changes provided as-is will only work for a limited set of cases and will leave the system in an even more inconsistent state in others. This patch would need to include a bit of refactoring to programIngress() in order to be correct/effective.

service_linux.go Outdated
if sb.ingress {
filteredPorts = filterPortConfigs(ingressPorts, false)
if err := programIngress(gwIP, filteredPorts, false); err != nil {
filterPortConfigs(ingressPorts, true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I concur. The programIngress() function does not fail gracefully or leave the system in a consistent state if it fails, unfortunately. These changes are necessary for a successful rollback, but insufficient.

Reviewing programIngress() a bit I see there are several different stages of changes:

  1. first ingress programming after startup

  2. basic iptables setup required for per-port configuration

  3. per-ingress port rule add/deletion consisting of:
    a. DNAT rule programming
    b. SNAT rule programming
    c. filter bypass for the given port
    d. setting up a listening proxy (for reasons I don't quite get yet)

Failures in 1. never abort processing: they only flag the error. So no problem there. Failures in 2. do abort processing, but every time programIngress(...true) /* add */ is invoked, 2 checks each step to avoid clobbering existing work. So if failure happens here, we are probably ok to just undo the filterPortConfigs().

Failures in part 3. are the problem in several ways. We could have gotten through some, but not all of the ports. In that case we should either undo the operation we just did on the port configurations that were successful, or return a list of ports that we were not successful in programming. Those could then be passed to filterPortConfigs() to undo their tracking. The latter solution seems simpler and cleaner to me.

Unfortunately, that's not quite sufficient either, because note that programming each port is a multi-stage process. If the programming fails in the middle of one of the stages and we want to make things clean, we have to undo the parts that did succeed. (e.g. if 3c fails, we need to undo parts 3a and 3b for that port to leave things in a consistent state.)

But even that isn't quite sufficient, because it depends on whether this was for add or delete and the reason for the failure. If the operation was an "add", then removing the partial add makes sense. But if the operation was a remove and it failed because for some reason the entry wasn't there, then adding it back would be an error. Finally, there is the question of what to do if for some bizarre reason rolling back the partial port configuration fails. Ugh. :(

I think the goal of this patch is definitely worthwhile, but making such a solution robust is going to be tricky to say the least.

@fanjiyun
Copy link
Contributor Author

@fcrisciani @ctelfer , thanks to both of you very much. Yes, it is tricky to solve all abnormal situations completely. This patch can only handle part of the situation. I need further analysis and thinking to solve the problem completely.

@fanjiyun fanjiyun force-pushed the 1-Rolling-back-the-port-configs branch 2 times, most recently from b92bddd to 9a00354 Compare August 9, 2018 08:01
@fanjiyun
Copy link
Contributor Author

fanjiyun commented Aug 9, 2018

@ctelfer, @fcrisciani, Recently, I have further modified the function programIngress(), based on our previous discussion.

As described by @ctelfer,programIngress() contains three stages:

  1. first ingress programming after startup
  2. basic iptables setup required for per-port configuration
  3. per-ingress port rule add/deletion

Stage.1 does not need to be dealt with.
If failures in stage.2, we should undo the filterPortConfigs().
For stage.3, it depends on whether add or deletion. For deletion, there is no rollback because the function will not be abort. On the contrary, we will rollback the changes for the add port rules to ensure data consistency.

Could you review the PR,thanks to both of you.

@fanjiyun
Copy link
Contributor Author

@ctelfer , @fcrisciani , could you review this PR again,thanks.

Copy link
Contributor

@ctelfer ctelfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, thanks for continuing to carry this! I think this could benefit from two improvements:

  1. move the filterPortConfigs() back out of programIngress() to the caller as before, but for add only.
  2. create a slice of rollback rules for each successful iptables programming and issue them upon failure.

More details below. Thanks again!

service_linux.go Outdated
filterPortConfigs(ingressPorts, !isDelete)
}
}()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reading of this is that if we get past the setup phase (the next block) then we remove the ingress ports that we were trying to add. In the first place, this seems wrong since it does this whether rest of the function succeeds or fails. But more importantly ... I think that the filterPortConfigs() aspect of your original patch was the more correct way to handle this part of the rollback. i.e. if programIngress() returns an error during an add operation, then run filterPortConfigs(lb.service.ingressPorts, false) in the error block. The only thing wrong with that part of the original was that it was incomplete. The rest of your patch sees to undoing the iptables programming.

service_linux.go Outdated
for _, iPort := range ingressPorts {

var portErr error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest moving this outside the block for reasons below.

filterPortConfigs(rollbackPort, !isDelete)
}
}()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that this block was ever needed because you were already running filterPortConfigs(ingressPorts, !isDelete) above. But that's kind of moot if we move the filterPortConfigs() "undo" out to the caller.

service_linux.go Outdated
defer func() {
if portErr != nil && !isDelete {
rule := strings.Fields(fmt.Sprintf("-t nat %s %s -p %s --dport %d -j DNAT --to-destination %s:%d", rollbackAddDelOpt,
ingressChain, strings.ToLower(PortConfig_Protocol_name[int32(iPort.Protocol)]), iPort.PublishedPort, gwIP, iPort.PublishedPort))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm... Rather than stacking up a defer function per programmed port, I'd consider having a list of "rollback rules". Then, there could just be one defer function defined before the loop that looks something like this:

var portErr error
rollbackRules := make([][]string, 0, len(ingressPorts)*3)
defer func() {
        if portErr != nil && !isDelete {
                for _, rule := range rollbackRules {
                        if err := iptables.RawCombinedOutput(rule...); err != nil {
                                logrus.Warnf("setting up rule failed while rolling back:, %v: %v", rule, err)
                        }
                }
        }
}

Then at this point in the code, you don't need this to be in a defer block, you just do:

rollBackRule := strings.Fields(...)
rollbackRules = append(rollbackRules, rollbackRule)

Incidentally, moving portErr outside of the loop has the added benefit that it can help recover from proxy errors: see below.

service_linux.go Outdated
if portErr != nil && !isDelete {
rule = strings.Fields(fmt.Sprintf("%s %s -m state -p %s --sport %d --state ESTABLISHED,RELATED -j ACCEPT", rollbackAddDelOpt,
ingressChain, strings.ToLower(PortConfig_Protocol_name[int32(iPort.Protocol)]), iPort.PublishedPort))
if err := iptables.RawCombinedOutput(rule...); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea here re: adding a rollback rule to a slice of them.

errStr := fmt.Sprintf("setting up rule failed, %v: %v", rule, err)
if portErr = iptables.RawCombinedOutput(rule...); portErr != nil {
errStr := fmt.Sprintf("setting up rule failed, %v: %v", rule, portErr)
if !isDelete {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a rollback rule here as well because if there is more than one port, the first one could be successfully programmed, but a subsequent one could fail.

errStr := fmt.Sprintf("setting up rule failed, %v: %v", rule, portErr)
if !isDelete {
return fmt.Errorf("%s", errStr)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few lines below here, if we catch the return of plumbProxy in portErr then we'll successfully be able to roll back from that error as well. This is contingent on moving the portErr definition outside the loop as mentioned above.

@fanjiyun fanjiyun force-pushed the 1-Rolling-back-the-port-configs branch from 9a00354 to 9220c35 Compare August 23, 2018 09:08
@fanjiyun
Copy link
Contributor Author

@ctelfer ,thank you very much for analyzing this patch so deeply. I think your suggestion is better that add a slice of rollback rules. I have made the corresponding improvement.

However, I think if programIngress() failures in stage.2 (basic iptables setup required for per-port configuration), we should undo filterPortConfigs (), both add and deletion. Because the the port rule add/deletion is not executed actually.

In the patch, I add a parameter "handleIngressPorts" to determine whether programIngress() starts to add/delete port rule, and the initial value of handleIngressPorts is false.
When programIngress() failures in stage.2 (basic iptables setup required for per-port configuration), "!handleIngressPorts == true" and filterPortConfigs() could be executed.
Conversely, if stage.2 is successful and stage.3(port rule add/deletion) begins to be executed, handleIngressPorts will be assigned true. "!handleIngressPorts == false" and whether or not to execute filterPortConfigs() will be determined by other parameters (such as portErr, isDelete).

That's why I put filterPortConfigs() inside the programIngress(). It depend on where the failure of programIngress() happened.

The above is my thought. Thank you again for your time and look forward to further discussions with you.

@fanjiyun fanjiyun force-pushed the 1-Rolling-back-the-port-configs branch 2 times, most recently from acbea82 to b8d6826 Compare August 29, 2018 09:23
@fanjiyun
Copy link
Contributor Author

@ctelfer I have a new idea. Maybe we can move filterPortConfigs() into programIngress() when programIngress() is called. And start to filter the ingress ports until stage.3(port rule add/deletion) begins to be executed.

@fanjiyun fanjiyun force-pushed the 1-Rolling-back-the-port-configs branch from b8d6826 to efcdf4a Compare August 29, 2018 09:56
@ctelfer
Copy link
Contributor

ctelfer commented Aug 29, 2018

I like it! That seems like an even cleaner solution. Just one small nit. You mentioned:

However, I think if programIngress() failures in stage.2 (basic iptables setup required for per-port configuration), we should undo filterPortConfigs (), both add and deletion. Because the the port rule add/deletion is not executed actually.

and I think you are right. However, I also noticed that the way your patch is currently arranged, it only undoes filterPortConfigs() if the operation is not delete. So I think that this:

		if portErr != nil && !isDelete {
			filterPortConfigs(filteredPorts, !isDelete)
                        ...

would probably need to change to something like this:

               if portErr != nil {
                       filterPortConfigs(filteredPorts, !isDelete)
                       if isDelete {
                               return
                       }
                       ...

(there are other possible approaches. I'm sure you get the idea.)

@fanjiyun
Copy link
Contributor Author

In today's latest version, By filtering ports until stage.3(port rule add/deletion) begins to be executed, we can avoid the effect of the failures in stage.2 (basic iptables setup) on port configuration.
And in the stage.3(port rule add/deletion), you mean if there has an error for the port rule deletion, we also should undo filterPortConfigs() ?
In the code, if there has an error for the deletion, it jsut print a warning, and not interrupt follow-up operation, as shown:

		if ..., portErr != nil {
 			...
 			if !isDelete {
 				return fmt.Errorf("%s", errStr)
 			}
 			logrus.Warnf("%s", errStr)
 		}

if we undo filterPortConfigs() and not roll back rules for all ports, is this appropriate? Maybe a warning is enough if an error has occurred for deletion? Do you think so?

@ctelfer
Copy link
Contributor

ctelfer commented Aug 29, 2018

Ok, I clearly misunderstood your intent w/ the message I quoted. The question boils down to whether a port should still be tracked as a "filtered port" if dockerd encountered some issue when trying to delete the corresponding rules. I suppose it would depend on the nature of the failure. If it failed because the rule already vanished for some stupid reason, then ignoring the error is fine. OTOH, if, say, there's some issue w/ firewalld and the rules programming didn't go through, and the rules are still left then the port is going to seem open from docker's perspective, but effectively be blocked. The fact that there are 3 such rules that could fail for any port is also troublesome here.

Ok. For purpose of this patch, it isn't making the situation any worse by only issuing a warning during a delete failure. So it's probably ok as is. Will give it one more pass and update my review. Thanks!

Copy link
Contributor

@ctelfer ctelfer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

(one minor nit on wording below if you wouldn't mind changing... not a blocker)

service_linux.go Outdated
filterPortConfigs(filteredPorts, !isDelete)
for _, rule := range rollbackRules {
if err := iptables.RawCombinedOutput(rule...); err != nil {
logrus.Warnf("setting up rule failed while rolling back, %v: %v", rule, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: we know this is only going to happen when rolling back an "add" so instead of "setting up", the message should probably say "removing" or "rolling back" or some such.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, and I have changed. Thanks

@fanjiyun fanjiyun force-pushed the 1-Rolling-back-the-port-configs branch from efcdf4a to 78104ee Compare August 30, 2018 04:35
@fanjiyun fanjiyun changed the title Rolling back the port configs if failed to add/delete ingress Rolling back the port configs if failed to programIngress() Aug 30, 2018
@fanjiyun
Copy link
Contributor Author

Yes, I also think that keeping the status quo may be a better way for a delete failure by print a warning. Thank you very much for your time.

@fanjiyun
Copy link
Contributor Author

fanjiyun commented Sep 2, 2018

@ctelfer , excuse me, does this need someone else to review? Can I trouble you to help ping someone? Thanks very much.

@fanjiyun
Copy link
Contributor Author

fanjiyun commented Sep 8, 2018

@fcrisciani @thaJeztah , I don't know if this PR need more than one person to review, can you take a look? @ctelfer, is there anything I need to do to make this PR merge? Thanks.

@ctelfer
Copy link
Contributor

ctelfer commented Sep 10, 2018

@fanjiyun Just need another review. Otherwise looks good to go from my standpoint.

Hopefully @fcrisciani can take a look, although I believe he is travelling today . @euanh or @selansen Could either of you take a look?

@selansen
Copy link
Contributor

selansen commented Sep 10, 2018 via email

service_linux.go Outdated
filterPortConfigs(filteredPorts, !isDelete)
for _, rule := range rollbackRules {
if err := iptables.RawCombinedOutput(rule...); err != nil {
logrus.Warnf("rolling back rule failed, %v: %v", rule, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what will happen if few rules were successfully rolled back and few of them failed. I think we will be in wired state. Hope that never happens.

@selansen
Copy link
Contributor

one general nit . I see everywhere "rolling back " "setting up" words. it would be good to keep it "roll back rule " "set up rule".

@selansen
Copy link
Contributor

LGTM

@fanjiyun fanjiyun force-pushed the 1-Rolling-back-the-port-configs branch from 78104ee to 8c39369 Compare September 11, 2018 11:02
Signed-off-by: fanjiyun <fan.jiyun@zte.com.cn>
@fanjiyun fanjiyun force-pushed the 1-Rolling-back-the-port-configs branch from 8c39369 to db50782 Compare September 11, 2018 11:12
@fanjiyun
Copy link
Contributor Author

I have updated these words. Besides, there are some functions that also contain "setting up", and I updated together. Thank you all.

@selansen
Copy link
Contributor

Thanks. LGTM

@ctelfer ctelfer merged commit a5a6ca3 into moby:master Sep 11, 2018
@ctelfer
Copy link
Contributor

ctelfer commented Sep 11, 2018

Thanks for sticking with this and seeing it through @fanjiyun . Change is merged.

@fanjiyun
Copy link
Contributor Author

@ctelfer, @fcrisciani, @selansen, thanks for your time and review.

@selansen
Copy link
Contributor

Thank you for your contribution.

dani-docker pushed a commit to dani-docker/libnetwork that referenced this pull request May 22, 2019
Signed-off-by: Dani Louca <dani.louca@docker.com>
euanh added a commit that referenced this pull request Jun 24, 2019
euanh added a commit that referenced this pull request Jun 24, 2019
kolyshkin pushed a commit to kolyshkin/libnetwork that referenced this pull request Jul 25, 2019
Signed-off-by: Dani Louca <dani.louca@docker.com>
(cherry picked from commit 2ce4242)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants