Skip to content

[specific ci=Group23-VIC-Machine-Service] VCH creation logging flushes all data before exiting#6812

Merged
AngieCris merged 30 commits intovmware:masterfrom
AngieCris:create-logger-flush
Nov 23, 2017
Merged

[specific ci=Group23-VIC-Machine-Service] VCH creation logging flushes all data before exiting#6812
AngieCris merged 30 commits intovmware:masterfrom
AngieCris:create-logger-flush

Conversation

@AngieCris
Copy link
Contributor

@AngieCris AngieCris commented Nov 20, 2017

Fixes #6732

Upon calling bp.Close() on the logger pipe, write end closes first. All subsequent writes fail. If there's left-over data in the pipe, bp.Close() blocks and waits until all data is drained.

@AngieCris AngieCris changed the title [WIP] VCH creation logging flushes all data before exiting [specific ci=Group23-VIC-Machine-Service] VCH creation logging flushes all data before exiting Nov 21, 2017

// wait for write end to close
// closing of read end is expected to block until data is drained from the pipe
time.Sleep(1 * time.Millisecond)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both here and below should really be until logic - for simplicity the global test timeout can be used to avoid something here (I did not see a handy testify until mechanism).
As it stands it will probably work, until run on a heavily loaded system and 1ms is not long enough for the other goroutine to be scheduled - for example if running unit tests in parallel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out. I think there's a better way around to fix this: instead of waiting or timing out, it could continuously write to the pipe until the write fails, while the close() is running in the background in a go routine. Then, after the write has failed, begin reading on the left-over data until exhausted. I can keep a count of the number of bytes written before failing and check if the read fails after reading the exact number of bytes. Fixes coming up.

return 0, io.EOF
}
for bp.buffer.Len() == 0 && !bp.closed {
bp.readerReady = true // the reader is ready to read
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd update the comment to something like: // We have a valid consumer so switch to flushing in Close

// closed: boolean indicating if the stream is closed
// readClosed: boolean indicating if read end of the pipe is closed
// writeClosed: boolean indicating if write end of the pipe is closed
// readerReady: boolean indicating if the reader is ready
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: normally these descriptive comments go above each field in the struct (but aren't necessary if the field isn't exported).

closed: false,
buffer: bytes.NewBuffer(nil),
c: c,
readClosed: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zero-value for booleans is false, so if you leave readClosed, writeClosed and readerReader out of this instantiation, they will init to false automatically. Feel free to leave it like this if you are doing it to be explicit/verbose!

Copy link
Contributor Author

@AngieCris AngieCris Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I just feel like it's more explicit this way, to let code reader know that the pipe is open and consumer is not ready by default

return 0, io.EOF
}
for bp.buffer.Len() == 0 && !bp.closed {
bp.readerReady = true // now we have a valid consumer to switch to flushing in Close
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we had a reader at one point, but no longer have one?

This shouldn't happen with the way this code is currently used, but in theory I think something like this could happen:

bp = NewBufferedPipe()
bp.Write([]byte("Hello World"))
b = make([]byte, 5)
bp.Read(b) // Read "Hello", caused readerReady = true
bp.Close() // Waits forever for someone to read " World"

Maybe this isn't a case we can reasonably handle? (Or maybe we handle this with some sort of timeout on the flush?

Copy link
Contributor Author

@AngieCris AngieCris Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this particular problem scope, the read is from the http connection established by govc with the datastore, so the early abortion of read only happens if the datastore connection is cut down when consuming the logs. In this case, datastore.Upload will return an error. However I'm not sure how to treat this connection failure reasonably, it could be the problem with VC, or the problem from client side.
A possible way to handle this is that, if datastore.Upload returns an error in vchlogger.go, we send a signal to the underlying pipe telling it to drop any leftover messages.

Copy link
Contributor Author

@AngieCris AngieCris Nov 21, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general usage, it's hard to tell if someone is still calling read() continuously. If someone calls read() at one point and stop, the pipe won't know about this (pipe isn't keeping track of the status of caller).
Adding a timeout would be a possible workaround, however I'm not sure how long the timeout should be. The pipe can't tell if the consumer is just being slow, or completely down

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ironically adding a WriteTo method would probably allow us to resolve this, but in a manner that's not consistent with the pure Reader behaviour.
WriteTo shouldn't return at all until the copy has completely finished, and you would get an error from the writer that can be used to purge any remaining data.

Copy link
Contributor

@jzt jzt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending address of others' review comments.

Copy link
Contributor

@mhagen-vmware mhagen-vmware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

done <- true
}()
select {
case <-time.After(time.Minute * 30): // timeout if consumer inactive for 30 minutes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like timeouts - they just cause pain, but....
30 min seems like a long time given the amount of data we're trying to move and the expected duration of the operation. This blocks Close from returning - does that prevent the API request from completing?

I'm wondering if we should chose an equally horrific approach of creating the pipe with a context, saving that into the structure and then waiting for that to time out or be cancelled. At least then we're exposed to cancellation by way of the client http connection dropping.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the timeout does prevent the API request from returning. The VCH will still be created, but the close hangs so the API hangs as well. Also I'm not sure what would that mean for the vic-machine service and other processes running on the service

I'm not sure if it makes sense for a pipe stream to have its caller context. Maybe it's vchlogger who needs to have the context and keep track of context timeouts?

@AngieCris
Copy link
Contributor Author

AngieCris commented Nov 22, 2017

With lots of help from @hickeng , the new proposed solution is:

  1. Instead of blocking on buffered_pipe.Close() waiting for consumer on remaining data, leverage the context timeout on CLI. Have the vic-machine create CLI wait for the log streaming to complete, or time out if context deadline is exceeded.
  2. buffered_pipe.Read() only quits with EOF if the buffer is empty and the write end is closed. On API side, after the VCH creation ends and pipe is closed, log streaming will still run in the background reading from the pipe, if there's remaining data. This read will not be blocked.

ping @zjs for feedback

// wait on the logger to finish streaming
go func() {
select {
case <-time.After(time.Second * 3):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that we don't need the select statement here at all:

go func() {
   // let the user know why they're waiting if it causes a noticeable delay
   time.Sleep(3 * time.Second)
   op.Infof("Waiting for log upload to complete")
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, thanks

op.Info("Installer completed successfully")

// wait on the logger to finish streaming
go func() {
Copy link
Member

@zjs zjs Nov 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens to this goroutine if we return before the time.After is reached?

Should this be taking op as an argument and have a case <-op.Done(): so that this gets cleaned up if op is canceled (which I think happens just before we exit)?

@AngieCris AngieCris merged commit e224e01 into vmware:master Nov 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ensure VCH creation log streaming flushes all data before exiting

7 participants