Skip to content

Datacomplete#165

Merged
alexcjohnson merged 15 commits intomasterfrom
datacomplete
Jul 7, 2016
Merged

Datacomplete#165
alexcjohnson merged 15 commits intomasterfrom
datacomplete

Conversation

@peendebak
Copy link
Contributor

After starting a measurement I would like to make sure the measurement is complete. I start the measurement in the background because I want live plotting to work.
Is there already a method to do this? In this PR I added a method to the DataSet object.

A typical use case

    data = qc.Loop(sweepvalues, delay=delay).run(background=True)
    liveplotwindow.clear(); liveplotwindow.add(data.amplitude)
    # wait for measurement to complete
    data.complete()

@alexcjohnson
Copy link
Contributor

Interesting... I like where this is going.

There is no such method yet, the only thing we have is blocking the next Loop until the previous one finishes (see #154 that changes how this works) but this doesn't presently allow for live updates to continue while you're waiting. It could though: the measurement QcodesProcess could remember what DataSet it's acquiring, and call data_set.complete() prior to .join on the process. That would be cool!

As to how data_set.complete() works - we can't assume any particular plotting engine, so I think what we should do is when you make a plot, it registers itself with the DataSet, and whenever the DataSet syncs, it calls plot.update() on every registered plot. (note that a plot can reference more than one DataSet, then it would register with all of them)

Then we could move the update_widget business (and your Spyder equivalent) out of the plots and into the DataSet, which would have some nice side effects:

  • The DataSet gets synced even if there's no plot
  • Multiple plots of the same DataSet won't increase the sync frequency
  • We could even set it up that if you have a plot open and manually change a DataSet, the plot gets updated (with some rate limiting or eventing to ensure you don't update on every point if you for example change a whole bunch of data points in a loop.

@peendebak
Copy link
Contributor Author

The PR now has a more generic system for adding callbacks. It might not do everything listed above, but the system is simple and works for me.

Typical use case:

import pyqtgraph
qtapp=pyqtgraph.mkQApp()
qc.DataSet.background_functions['qt'] = qtapp.processEvents()
...

data = qc.Loop(sweepvalues, delay=delay).run(background=True)
# setup plotting window etc.
...
# wait for loop to complete
data.complete()

location_provider = TimestampLocation()

# functions to be called when operating in background mode
background_functions=dict()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peendebak apologies for letting this get stale...

I like the dict - good call, so the key labels it and helps you remove or update the appropriate one. Any reason not to declare it with {} though?

I guess the reason to make this a class rather than an instance variable is that this callback won't change from run to run? If that works for your use case lets keep this, but we should pay attention to how people use this and figure out if we have reason to add instance callbacks too.

@MerlinSmiles
Copy link
Contributor

MerlinSmiles commented Jun 7, 2016

Is this summing over nans really the way to go?
I would assume that is a lot of unnecessary checking?

I was thinking just counting the passed datapoints would work:

import numpy as np
x = np.random.random(10)
y = np.random.random(5)
ntotal = len(x)*len(y)
npassed = 0
for i in x:
    for j in y:
        npassed += 1
        print(npassed/ntotal)

But that would break down for BreakIf() broken loops. (the checking for nans does so too btw.)

@alexcjohnson
Copy link
Contributor

But that would break down for BreakIf() broken loops. (the checking for nans does so too btw.)

Good point @MerlinSmiles - Given the current structure of loops (which I don't see changing but you never know...) within a given dimension you always start filling in data from the first point, but you can leave gaps if there is a BreakIf, then you increment the next dimension's index and start over. If we trust that, we could make a very efficient fraction_complete by first looking at the outer dimension, first index in any inner dimensions, finding the last point with data (wooo binary search!)... then repeat this for inner dimensions, using the index we found for the outer dimension.
Visually (. is NaN and * is data, X is the point we want to find):

*******...
*****.....
*********.
*****X....
..........
..........
..........

We first search the first column, and find that the 4th row is the last data point. Then we search the 4th row and find that the 6th column is the last data point, so completion is (3 + 6/10)/7 = 51.4%

if getattr(self, 'synced_index', None) is not None:
last_index = max(last_index, self.synced_index)

return (last_index + 1) / self.ndarray.size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peendebak @MerlinSmiles no need to search for the fraction complete, the DataArray already tracks this for sync and save. It's a little annoying to have to look at three different attributes, maybe later we can tighten this up, but I think it's OK for now.

- If you read in a DataSet (ie with load_data) then it is
  marked as saved up to whatever data you read in. The `Formatter`
  handles this.
- If you create a DataSet with preset data, then it is marked
  as 100% modified - ie it's entirely data that has not been saved.
  the DataArray constructor and nest method handle this.
Now it doesn't return anything, and only catches errors from the
background_functions, removing functions which fail twice in a row
Any other errors (like from sync()) will not be caught so will
stop future code from running when it shouldn't
self.sync_index = i + 1
return self.sync_index < self.syncing_array.size

def failing_func(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@giulioungaretti
Copy link
Contributor

Codacy Here is an overview of what got changed by this pull request:

Issues
======
- Added 1


Complexity increasing per file
==============================
- qcodes/tests/test_data.py  1
- qcodes/data/data_array.py  10
- qcodes/data/gnuplot_format.py  1


Complexity decreasing per file
==============================
+ qcodes/data/data_set.py  -2


Clones removed
==============
+ qcodes/tests/test_format.py  -7

See the complete overview on Codacy

@alexcjohnson
Copy link
Contributor

In order to make DataSet.complete robust, I went through and cleaned up our markers for modified/saved/synced data in each DataArray. There are a reasonably large set of possibilities for this that seem to need differing behavior, unfortunately: local data during acquisition before and after saving, synced data during acquisition, preset data, data reloaded from disk... Anyway all of these cases are tested at least for their basic behavior.

I also changed how DataSet.complete works in line with my previous comments: The only exceptions we catch are in the background_functions, and if one fails repeatedly we remove it and keep going, rather than returning False. Now we don't return anything, we just exit normally or raise an error if something outside the background_functions fails.

@peendebak / @eendebakpt I haven't seen you around lately so I'm going to assume you're offline - but I don't expect this is too controversial. We can always adjust it later if you have concerns. (Anyway thanks for making this PR in the first place!)

@giulioungaretti any more comments from your human self, as opposed to your bot alter-ego?

@alexcjohnson alexcjohnson merged commit c3f5534 into master Jul 7, 2016
@alexcjohnson alexcjohnson deleted the datacomplete branch July 7, 2016 15:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants