[specific ci=Group23-VIC-Machine-Service] VCH creation logging flushes all data before exiting#6812
Conversation
|
|
||
| // wait for write end to close | ||
| // closing of read end is expected to block until data is drained from the pipe | ||
| time.Sleep(1 * time.Millisecond) |
There was a problem hiding this comment.
Both here and below should really be until logic - for simplicity the global test timeout can be used to avoid something here (I did not see a handy testify until mechanism).
As it stands it will probably work, until run on a heavily loaded system and 1ms is not long enough for the other goroutine to be scheduled - for example if running unit tests in parallel.
There was a problem hiding this comment.
Thanks for pointing that out. I think there's a better way around to fix this: instead of waiting or timing out, it could continuously write to the pipe until the write fails, while the close() is running in the background in a go routine. Then, after the write has failed, begin reading on the left-over data until exhausted. I can keep a count of the number of bytes written before failing and check if the read fails after reading the exact number of bytes. Fixes coming up.
lib/install/vchlog/buffered_pipe.go
Outdated
| return 0, io.EOF | ||
| } | ||
| for bp.buffer.Len() == 0 && !bp.closed { | ||
| bp.readerReady = true // the reader is ready to read |
There was a problem hiding this comment.
I'd update the comment to something like: // We have a valid consumer so switch to flushing in Close
lib/install/vchlog/buffered_pipe.go
Outdated
| // closed: boolean indicating if the stream is closed | ||
| // readClosed: boolean indicating if read end of the pipe is closed | ||
| // writeClosed: boolean indicating if write end of the pipe is closed | ||
| // readerReady: boolean indicating if the reader is ready |
There was a problem hiding this comment.
Minor: normally these descriptive comments go above each field in the struct (but aren't necessary if the field isn't exported).
lib/install/vchlog/buffered_pipe.go
Outdated
| closed: false, | ||
| buffer: bytes.NewBuffer(nil), | ||
| c: c, | ||
| readClosed: false, |
There was a problem hiding this comment.
The zero-value for booleans is false, so if you leave readClosed, writeClosed and readerReader out of this instantiation, they will init to false automatically. Feel free to leave it like this if you are doing it to be explicit/verbose!
There was a problem hiding this comment.
Thanks! I just feel like it's more explicit this way, to let code reader know that the pipe is open and consumer is not ready by default
lib/install/vchlog/buffered_pipe.go
Outdated
| return 0, io.EOF | ||
| } | ||
| for bp.buffer.Len() == 0 && !bp.closed { | ||
| bp.readerReady = true // now we have a valid consumer to switch to flushing in Close |
There was a problem hiding this comment.
What happens if we had a reader at one point, but no longer have one?
This shouldn't happen with the way this code is currently used, but in theory I think something like this could happen:
bp = NewBufferedPipe()
bp.Write([]byte("Hello World"))
b = make([]byte, 5)
bp.Read(b) // Read "Hello", caused readerReady = true
bp.Close() // Waits forever for someone to read " World"
Maybe this isn't a case we can reasonably handle? (Or maybe we handle this with some sort of timeout on the flush?
There was a problem hiding this comment.
In this particular problem scope, the read is from the http connection established by govc with the datastore, so the early abortion of read only happens if the datastore connection is cut down when consuming the logs. In this case, datastore.Upload will return an error. However I'm not sure how to treat this connection failure reasonably, it could be the problem with VC, or the problem from client side.
A possible way to handle this is that, if datastore.Upload returns an error in vchlogger.go, we send a signal to the underlying pipe telling it to drop any leftover messages.
There was a problem hiding this comment.
In general usage, it's hard to tell if someone is still calling read() continuously. If someone calls read() at one point and stop, the pipe won't know about this (pipe isn't keeping track of the status of caller).
Adding a timeout would be a possible workaround, however I'm not sure how long the timeout should be. The pipe can't tell if the consumer is just being slow, or completely down
There was a problem hiding this comment.
Ironically adding a WriteTo method would probably allow us to resolve this, but in a manner that's not consistent with the pure Reader behaviour.
WriteTo shouldn't return at all until the copy has completely finished, and you would get an error from the writer that can be used to purge any remaining data.
jzt
left a comment
There was a problem hiding this comment.
LGTM, pending address of others' review comments.
…reate-logger-flush
lib/install/vchlog/buffered_pipe.go
Outdated
| done <- true | ||
| }() | ||
| select { | ||
| case <-time.After(time.Minute * 30): // timeout if consumer inactive for 30 minutes |
There was a problem hiding this comment.
I don't like timeouts - they just cause pain, but....
30 min seems like a long time given the amount of data we're trying to move and the expected duration of the operation. This blocks Close from returning - does that prevent the API request from completing?
I'm wondering if we should chose an equally horrific approach of creating the pipe with a context, saving that into the structure and then waiting for that to time out or be cancelled. At least then we're exposed to cancellation by way of the client http connection dropping.
There was a problem hiding this comment.
Yes, the timeout does prevent the API request from returning. The VCH will still be created, but the close hangs so the API hangs as well. Also I'm not sure what would that mean for the vic-machine service and other processes running on the service
I'm not sure if it makes sense for a pipe stream to have its caller context. Maybe it's vchlogger who needs to have the context and keep track of context timeouts?
|
With lots of help from @hickeng , the new proposed solution is:
ping @zjs for feedback |
cmd/vic-machine/create/create.go
Outdated
| // wait on the logger to finish streaming | ||
| go func() { | ||
| select { | ||
| case <-time.After(time.Second * 3): |
There was a problem hiding this comment.
Just realized that we don't need the select statement here at all:
go func() {
// let the user know why they're waiting if it causes a noticeable delay
time.Sleep(3 * time.Second)
op.Infof("Waiting for log upload to complete")
}
There was a problem hiding this comment.
good point, thanks
| op.Info("Installer completed successfully") | ||
|
|
||
| // wait on the logger to finish streaming | ||
| go func() { |
There was a problem hiding this comment.
What happens to this goroutine if we return before the time.After is reached?
Should this be taking op as an argument and have a case <-op.Done(): so that this gets cleaned up if op is canceled (which I think happens just before we exit)?
Fixes #6732
Upon calling
bp.Close()on the logger pipe, write end closes first. All subsequent writes fail. If there's left-over data in the pipe,bp.Close()blocks and waits until all data is drained.