diff --git a/0014-overhaul-qiskit-experiments.md b/0014-overhaul-qiskit-experiments.md new file mode 100644 index 0000000..9732ede --- /dev/null +++ b/0014-overhaul-qiskit-experiments.md @@ -0,0 +1,673 @@ +# Overhaul Qiskit Experiments + +| **Status** | **Accepted** | +|:------------------|:-----------------------------------------| +| **RFC #** | 0014 | +| **Authors** | Naoki Kanazawa (nkanazawa1989@gmail.com) | +| **Deprecates** | N/A | +| **Submitted** | 2023-08-23 | +| **Updated** | 2023-09-12 | + +## Summary + +[Qiskit Experiments](https://github.com/Qiskit-Extensions/qiskit-experiments) (QE) is a qiskit extension project that provides a basic framework +for calibration and characterization experiments together with the built-in experiment library. +The ultimate goal of this project is the enablement of maintenance workflow for a utility-scale quantum processors, which may consist of +few hundreds or maybe thousands of qubits. +This document spots current performance bottlenecks and describes how to overcome the performance issues. + +As described in the [project documentation](https://qiskit.org/ecosystem/experiments/tutorials/intro.html#what-is-qiskit-experiments), QE consists of +three major components, and a developer will subclass these components to define a workflow tailored to their quantum processor. + +- [Experiment](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/main/qiskit_experiments/framework/base_experiment.py) Class: Defines the quantum circuits and runs them on the processor. +- [Analysis](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/main/qiskit_experiments/framework/base_analysis.py) Class: Consumes experiment results and generates analysis results. +- [ExperimentData](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/main/qiskit_experiments/framework/experiment_data.py) Class: Stores experiment results and analysis results. + +A typical workflow to run a single experiment would look like: + +**Example1: T1 experiment on a single qubit** +```python +from qiskit_experiments.library import T1 +import numpy as np +from qiskit_ibm_provider import IBMProvider + +# (step1) Load backend +provider = IBMProvider() +backend = provider.get_backend("ibm_xyz") + +# (step2) Define experiment and run +exp = T1(physical_qubits=(0,), delays=np.linspace(0, 300e-6), backend=backend) +exp_data = exp.run() + +# (step3) Wait until experiment completes +exp_data.block_for_results() + +# (step4) Save results in remote storage (optional) +exp_data.save() +``` + +This lines of code show a procedure to run the inversion recovery experiment on the qubit 0 of some IBM device `ibm_xyz`. +In a practical maintenance workflow, one must run this experiment for all qubits on `ibm_xyz`. +This can be done by creating a `ParallelExperiment` instance. + +**Example2: T1 experiment on multiple qubits** +```python +from qiskit_experiments.framework import ParallelExperiment + +exp = ParallelExperiment( + [ + T1(physical_qubits=(q,), delays=np.linspace(0, 300e-6)) + for q in range(backend.num_qubits) + ], + backend=backend, +) +exp_data = exp.run() +``` + +A component experiment (`T1`) creates M circuit, and the parallel experiment is defined for N qubits. +In principle number of task for this experiment is M*N, but the parallel experiment digests N circuits of the components and +creates a single merged circuit to run. Thus number of actual payload is always M, regardless of qubit number. +It seems like QE framework is syntactically scalable, however, data analysis overhead will almost monotonically increase with N in the current implementation. + +Although circuit generation cost and communication overhead are also significant, we should focus on analysis task in this RFC document. +This is because circuit generation heavily relies on the core [Qiskit](https://github.com/Qiskit/qiskit-terra) package, +and communication overhead depends on the payload format that [Provider](https://github.com/Qiskit/qiskit-ibm-provider) defines (currently IBM uses QPY binary format). +Especially, we put attention in the [curve analysis](https://github.com/Qiskit-Extensions/qiskit-experiments/tree/main/qiskit_experiments/curve_analysis) module, +because almost all experiments in the maintenance workflow require parameter scan and fitting to find an optimal control parameter or device parameter as shown in above example. + +## Motivation + +The plot below shows wall clock time (with and without plotting the Matplotlib figure) for running parallel T1 analysis against number of qubits on my laptop; MBP Intel Core i7 @ 2.3 GHz (4 cores), 32 GB Mem. +As shown in this experiment result, required time increases roughly quadratically with qubit number. +To run a single experiment on few hundreds qubit device (such as IBM Osprey processor), it would take 10s minutes to get analysis results. +This is non-negligible downtime for a quantum computing system because the system calibration workflow often consists of [dependent tasks](https://arxiv.org/abs/1803.03226), +and the following experiment run might be on hold until predecessor results become available. + + + +(see [this gist](https://gist.github.com/nkanazawa1989/41428cd21b7307a70be720779b364ca3) for more details.) + +I also scrutinized the performance of the current framework with a statistical profiler. +Since the percentage of function detection varied significantly from run to run, I only show qualitative results here. +I found following subroutines tend to frequently show up in the frame graph. + +#### Initialization of `ExperimentData` sub-container + +Because composite analysis needs to instantiate new data container to run component analyses, +calling the constructor of this expensive class consumes significant time in heavily batched experiment. +Especially, acquisition of a service object and qiskit version information is the expensive part. +Since such information is not necessary for inner (child) data container, we can replace `ExperimentData` +with much lightweight data container to alleviate this overhead. See the following discussion for details. + +#### Marginalization of the result count + +When parallelize the experimental circuits, the measured count dictionary is keyed on the +merged bit string of all measurements in sub experiments. +The max length of the dictionary scales with O(n2) with qubit size, and actual size +may increase with shot number. Since the marginalization function is already implemented in Rust, +we need to reduce number of call to this function. + +#### Creation of curve-fit figures + +When we run a curve analysis, the analysis will generate a matplotlib figure to interact with human experimentalists. +This subroutine is also expensive. Fortunately we can set `plot=False` in the analysis options +if figure is not necessary. +Experimentalist will be able to load raw data points from the experiment service at later time (not supported as of QE 0.5). + +#### Curve fitting + +This is not significantly slow but the overhead is visible. It also may depend on the analysis. +Some analysis provide multiple initial guesses for the fit model parameters, +and the fitter is called many times for each guess values. + + +## User Benefit + +A user or system agent who heavily rely on the parallel or batch execution feature will dramatically benefit from this overhaul. +These users typically be a responsible for the system deployment or maybe a researcher who benchmarks a novel gate with larger application circuits. +This kind of work requires users to run a chain of calibration and monitoring experiments at a scale of the quantum processor. +If experiments are conducted in the dedicated queue mode or with a [runtime in a session](https://qiskit.org/ecosystem/ibm-runtime/how_to/run_session.html), +the significant overhead in the local analysis task may also deteriorate the machine efficiency of the remote quantum computing system. + +With the proposed overhaul plan, we aim to achieve the T1 analysis wall time of < 5 sec at nq=100 either with a high-end laptop or workstation (sorry we don't have proper benchmark suite for QE!). +I think this is a feasible and good target number to motivate experimentalist to employ QE for a practical work. + +## Design Proposal + +### Performance bottleneck of current framework + +Performance bottleneck mainly comes from the design of the `ExperimentData` class. +This class was originally designed as a bridge between IBM provider and Experiment service, where we submit jobs to and later save the analysis results in, respectively. +Therefore, the class implements multithreading to query the provider for the job results, and also an interface to the experiment service. + +Although this class is tightly coupled to remote server, local QE workflow also heavily relies on the experiment data, +namely, the experiment data receives job objects from the experiment class, waits for the job results, invokes the analysis callbacks on the results, +collects outcomes from the callbacks and finally stores them in the experiment service. +The internal flow of experiment run is simplified to some extent and shown in the diagram below. + + + +As you find, `ExperimentData` is more than a data container, and indeed this is executor + data container and too complicated to maintain. +If you are familiar with [concurrency](https://docs.python.org/3/library/concurrent.futures.html), you may know there are two different implementations; +multithreading and multiprocessing. +For IO-bound tasks like waiting for job results, multithreading is effective approach. +On the other hand, for CPU-bound tasks like running analysis, multiprocessing is more effective. +In Python (CPython), we have the [global interpreter lock](https://peps.python.org/pep-0703/) (GIL) mechanism (in future the [GIL could be removed](https://peps.python.org/pep-0703/)), +and under the lock only single thread can access Python object at a time. +Although the core of the curve fitting analysis is implemented by NumPy which [can release GIL](https://superfastpython.com/numpy-vs-gil/), +still data management in QE analysis relies on the custom Python data structure and thus gain from multithreading is a little. +However, due to the tight coupling between data container and executor in the experiment data class, it's not easy to simply use multiprocessing. +In multiprocessing, objects under manipulation are duplicated in each subprocess through pickle, and thus analysis results collected by the experiment data instance +in a subprocess are immediately discarded after the subprocess finishes. +Instead of mutating the copied container, the analysis callback must return resulting data to the main process. + +### Specific problems in composite analysis + +Composite analysis is a wrapper of a list of analyses, that is used internally by the [ParallelExpeirment](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/main/qiskit_experiments/framework/composite/parallel_experiment.py) +and [BatchExperiment](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/main/qiskit_experiments/framework/composite/batch_experiment.py). +Because these are the key elements to build a practical workflow for the maintenance of quantum processors, +improvements in the composite framework will dramatically improve the target user's experience. +As shown in the Example2, the composite experiment is created with a list of experiment instance, and an experiment instance is defined per a +single set of qubit and single kind of experiment, e.g. T1 experiment on qubit 0. The composite experiment will create a composite analysis on the fly by consuming the +analysis instance attached to each experiment instance. +Once a composite job completes, composite analysis will spawn subset `ExperimentData` (i.e. child data) by splitting the result data into each container. +This process is done by a [`marginal_distribution`](https://github.com/Qiskit/qiskit-terra/blob/7bc2bfd4f8dcc436aec5f85641e3ec07be15eb01/qiskit/result/utils.py#L199-L203) function in Qiskit. +For example, if the job result is a count dictionary `{"00": 100, "01": 200, "10": 300, "11": 400}`, the result for the subset experiment 1 +is `{"0": 400, "1": 600}`, and one for experiment 2 is `{"0": 700, "1": 300}`. +With these two sub-dictionaries, the composite analysis creates two experiment data containers and run component analysis on them. +This mechanism allows us to reuse the rest of analysis logic written for a single set of qubit, for system-scale analysis. +Execution of each analysis task called by the composite analysis is controlled by the outermost experiment data container fed into the composite analysis. +The composite experiment can be nested into another composite experiment, and this flexibility allows us to build a complicated experiment pattern that +experimentalist or system administrator might be interested in. +For example, if you want to [measure the pure dephasing rate](https://www.nature.com/articles/s41534-019-0168-5) of multiple qubits, +you need to batch T1 and T2 experiment, and broadcast the batched task on multiple qubits. +This setup can be written as a parallel of a batch experiment, or vise versa. + +The idea of experiment parallelism is sophisticated but there are several issues from both capability and performance viewpoint. +These points might be summarized with following three implementation defects. + +#### 1. Owner of the marginalization responsibility + +Currently, decomposition of a batched experiment result (i.e. `marginal_distribution` function) is run by the composite analysis. +This indicates a user cannot obtain the result data for a particular experiment until running the composite analysis for whole experiment results. +This hurts usability. +For example, an experimentalist may find a couple of unexpected analysis results out of a large number of batched results. +Even if they want to rerun analysis only on failed instances, they need to handcraft another composite analysis instance with an entire list of analyses. +They need to identify the index of the analysis of interest if one wants to try different fitting options. +Instead, delegating the marginalization responsibility to `ExperimentData` will simplify the workflow as follows. + +**Example3: Rerunning a partial component analysis** +```python +exp_data = ExperimentData.load(exp_id, service=experiment_service) +sub_data = exp_data.child_data(experiment="T1", qubits=(0,)) # Not supported. + +T1Analysis().run(sub_data, replace_results=True) +``` + +In addition, the `marginal_distribution` function is [repeatedly called](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/c66034c90dad73d705af25be7e9ed9617e7eb2ef/qiskit_experiments/framework/composite/composite_analysis.py#L222-L226) for each component register index in the current implementation, +this induces heavy dictionary manipulation overhead in the composite analysis. + +#### 2. Callback running callbacks + +In QE analysis, an experiment developer needs to provide implementation of the [`_run_analysis`](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/c66034c90dad73d705af25be7e9ed9617e7eb2ef/qiskit_experiments/framework/base_analysis.py#L228-L231) method +where the analysis digests the experiment results and returns a set of analysis results. +This method is passed to the thread executor in the experiment data as a callback, and runs concurrently with other component analyses. +The composite analysis also implements this method, in which the [composite analysis callback invokes multiple threads](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/c66034c90dad73d705af25be7e9ed9617e7eb2ef/qiskit_experiments/framework/composite/composite_analysis.py#L130-L133) for underlying component analysis callbacks. +Since composite analysis can be nested, this structure requires recursive thread/process pool management, which is likely unmanageable. + +#### 3. Initialization of the expensive data container + +Because `_run_analysis` method takes `ExperimentData` instance, the composite analysis needs to instantiate this expensive data container many times +although running analysis only requires experiment results (such as count dictionary) and metadata involving the setting of experiment. +The interface to the experiment service is not necessary at all for analysis, and thus we can define more lightweight container. + + +### Improvements in unit analysis + +Since 1-D parameter scan followed by a curve fitting is the most popular pattern in our experiments, QE provides the [`CurveAnalysis`](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/main/qiskit_experiments/curve_analysis/curve_analysis.py) baseclass, +so that an experiment developer can quickly write new analysis only by defining a fitting model and initial guess for the model parameters. +Indeed, most calibration and maintenance experiments subclass the curve analysis baseclass. +That is to say, even small improvements will appear as a cumulative performance improvements in the heavily batched experiments. + +The subroutines with the significant execution time in the curve analysis are the curve fitter and figure drawer. +QE uses the [LMFIT](https://lmfit.github.io/lmfit-py/) for curve fitting, which is basically a wrapper of the [Scipy Optimize](https://docs.scipy.org/doc/scipy/tutorial/optimize.html). +By default, we use the `least_square` minimizer to minimize the residual of the fit curve. +This optimizer is not quite slow, but it still requires numerically computed Jacobian matrix for parameter update. +[JAX](https://jax.readthedocs.io/en/latest/index.html) provides a framework for JIT compilation and auto-gradient of a numpy compatible cost function. +This allows us to choose more sophisticated optimizer such as BFGS, along with much faster residual computation. +We found that this approach brought [more than x10 performance gain](https://github.com/Qiskit-Extensions/qiskit-experiments/issues/1192#issuecomment-1582050036) in the curve fitting. + +In contrast to the fitting, there is no straightforward way to reduce the drawer overhead other than lazy figure creation. +At a scale of 1000 qubit experiment, maybe human experimentalist doesn't have a chance to visually confirm every experiment result. +We could conditionally create a figure for unexpected experiment result (e.g. very poor chi-squared value, too short T1 value for T1 analysis) +to help a human experimentalist to scrutinize the device issue. Otherwise, we could create the result figure on the fly when it is accessed. + +### Proposal of new workflow + +Following sequence diagram proposes new workflow of QE run. +Note that now we have four local components, `Experiment`, `Analysis`, `Executor`, and `ExperimentData`. +The experiment data becomes a data container which also implements the interface to the experiment service (this feature can be mix-in). +Executor has two sub-executors `CircuitExecutor` and `AnalysisExecutor`. +At run time, the experiment passes circuit payloads to the circuit executor and builds analysis task dependency with the analysis executor. +Once after circuits are set and all tasks are scheduled, the executor runs subroutines with proper concurrency management. + + + +In contrast to the existing implementation, the circuit executor takes bare QuantumCircuit input and submit the job by itself. +This design is reasonable in terms of future integration with Qiskit Runtime, which is going to be a standard +execution path in IBM Quantum systems. The Runtime API doesn't take conventional provider Job object. +The circuit executor just receives circuits to run on a backend, and returns a canonical result data. +The details of job execution is hidden, and we can easily switch to different communication model without breaking the rest of machinery. + +Returned job results (this can be composite experiment results) are added to the experiment data. +This result data contains sufficient metadata to bootstrap the maginalization subroutine, and the added data can be immediately decomposed into +a set of component experiment data (child data). +This component data can be a lightweight data container rather than a complete experiment data object, as discussed previously. + +Finally, analysis callbacks run on multiple subprocesses. A callback takes a component data, and returns a list of [`AnalysisResultData`](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/main/qiskit_experiments/framework/analysis_result_data.py) dataclass, +which is to be added to the experiment data in the main process. +The order of task execution is resolved by the dependency manager attached to the analysis executor. Tasks might be represented by DAG. +Let's consider the following example. + +**Example4: Parallel-batched T1 and T2 experiment** +```python +exps = [] +for q in range(n_qubit): + exp = BatchExperiment([T1((q,), t1delays), T2Hahn((q,), t2delays)]) + exps.append(exp) +parallel_exp = ParallelExperiment(exps) +``` +The analysis for this `parallel_exp` is equivalent to +```python +analysis = CompositeAnalysis( + [CompositeAnalysis([T1Analysis(), T2HahnAnalysis()]) for _ in range(n_qubit)] +) +``` +Although composite analysis are nested, the outer (composite) analysis just recursively runs inner analyses, and thus they can be flattened. +In addition, `T1Analysis` and `T2HahnAnalysis` are interchangeable. +In other words, component analyses are all independent, and we just need to throw all analysis callbacks into the process pool. + + + +This allows us to manage parallelism with a single `ProcessPoolExecutor.map`, which dramatically simplifies the implementation. +In the next example, we slightly modify above workflow. + +**Example5: Parallel Tphi experiment** +```python +exps = [] +for q in range(n_qubit): + exp = Tphi((q,), t1delays, t2delays) + exps.append(exp) +parallel_exp = ParallelExperiment(exps) +``` +The inner batch experiment is just replaced with `Tphi` experiment, which consists of the same T1 and T2 Hahn experiment. +Equivalent analysis also looks similar. +```python +analysis = CompositeAnalysis( + [TphiAnalysis() for _ in range(n_qubit)] +) +``` +However, the `TphiAnalysis` computes new `Tphi` value by consuming the `T1` and `T2` quantity created by the T1 and T2 Hahn analysis respectively. +Therefore, the Tphi analysis callback has dependency on other two callbacks. +The execution consists of two layers in this case, namely, running all T1 and T2 Hahn analysis callbacks in parallel, then run Tphi analysis callback in parallel. + + + +The [MitigatedTomographyAnalysis](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/main/qiskit_experiments/library/tomography/mit_tomography_analysis.py) is +another example with this pattern. This is the analysis for tomography experiments with consideration of the readout error. +This analysis runs a sub analysis for the local readout error mitigation first to obtain a mitigator object, and uses it to build a noisy POVM to do tomographic reconstruction of a target channel or state. +Calibration experiment is another pattern we need to consider the dependency. +In the QE builtin library, we usually have a pair of experiment implementation for characterization and calibration, e.g. Rabi ↔ RoughAmplitudeCal. +From viewpoint of analysis, the calibration analysis is equivalent to the characterization analysis + calibration library update. +The latter task is [considered as a dependent analysis callback](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/c66034c90dad73d705af25be7e9ed9617e7eb2ef/qiskit_experiments/calibration_management/base_calibration_experiment.py#L162-L186). + +The dependency manager provides an interface to build these graphs at run time. +Once after the task graph is built, it can provide an iterator of the parallel tasks with the executor. +New design allows us to subdivide the role of `ExperimentData` into multiple components. +We can write more fine-grained unittest for each component, and maintainability of the codebase is also expected to be improved. + + +### Migration plan + +This overhaul introduces several breaking API changes, and we should communicate closely with our community developers who +write own experiment libraries. +It is important to keep in mind that the baseclasses in QE framework are expected to be subclassed by the experimentalists (they are also end-users), +and a protected method doesn't mean non-public implementation that can be freely modified, e.g. `BaseAnalysis._run_analysis`. + +I suggest following steps of the refactoring to minimize the impact on the user code-base and give the community developers enough grace for migration. +The sequential PRs are also good for easier code review. + +1. Rework of the composite analysis. + +In this step, we update `ExperimentData.add_data` so that it can bootstrap the child data creation for component analysis. +This means we can get rid of [significant amount of code block](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/c66034c90dad73d705af25be7e9ed9617e7eb2ef/qiskit_experiments/framework/composite/composite_analysis.py#L150-L336) from the `CompositeAnalysis`. +This helps experimentalist with re-analyzing a particular experiment result, as shown in the Example3. +We also start a pending deprecation warning when `CompositeAnalysis._run_analysis` is implemented. +Existing popular implementation pattern (see `ThiAnalysis` or `MitigatedTomographyAnalysis`) doesn't fit in with the machinery of the analysis task dependency graph. +For example, `TphiAnalysis` would look like as follows before and after the refactoring. + +(before) + +```python + def _run_analysis( + self, experiment_data: ExperimentData + ) -> Tuple[List[AnalysisResultData], List["matplotlib.figure.Figure"]]: + + analysis_results, figures = super()._run_analysis(experiment_data) + + t1_result = next(filter(lambda res: res.name == "T1", analysis_results)) + t2_result = next(filter(lambda res: res.name in {"T2star", "T2"}, analysis_results)) + ... +``` + +(after) +```python + def _run_analysis( + self, experiment_data: ExperimentData + ) -> Tuple[List[AnalysisResultData], List["matplotlib.figure.Figure"]]: + + t1_result = experiment_data.analysis_results("T1").value.n + t2_result = experiment_data.analysis_results("T2").value.n + ... +``` + +Component analysis execution will be managed by the `AnalysisExecutor` once after it's implemented, and the `super()._run_analysis` will become empty. +Eventually call to the superclass method is not necessary (it will still work but just introduce redundant overhead). +The (pending) deprecation warning is to notify the developers the future update of this logic. + +2. Implement `CircuitExecutor` + +In this step, we replace the [ExperimentData._job_executor](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/c66034c90dad73d705af25be7e9ed9617e7eb2ef/qiskit_experiments/framework/experiment_data.py#L227) with the `CircuitExecutor` class instance, +and delegate job handling from experiment data to this class. Also run options and backend tied to `BaseExperiment` will be moved to the circuit executor class. +This indicates that an empty `ExperimentData` container must be initialized in the constructor of the experiment class (currently it's initialized at run time), but I don't think this change hurts existing user workflow. +Note that once we introduce `Executor` class in a followup PR, the experiment data is instantiated along with the executor inside the experiment class constructor anyway. + +3. Implement `AnalysisExecutor` + +In this step, we replace the [ExperimentData._analysis_executor](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/c66034c90dad73d705af25be7e9ed9617e7eb2ef/qiskit_experiments/framework/experiment_data.py#L349) with the `AnalysisExecutor` class instance. +We delegate the `ExperimentData.add_analysis_callback` to the analysis executor, and remove the implementation of `BaseAnalysis.run` with proper deprecation handling. +Namely, we need to inspect class methods within the `__new__` method (or maybe in `__init_subclass__`) and raise a deprecation warning if the run method is implemented in the subclass. +I think this is really rare case, and indeed I've never encountered the situation I need to overwrite the run method in an analysis subclass. +Feel free to comment on this RFC or email me if you know such use case. +We also need to remove `CompositeAnalysis.run` method, and update `ThiAnalysis` and `MitigatedTomographyAnalysis`. +We switch the pending deprecation warning of the composite `_run_analysis` to the deprecation warning. +It's probably worth kindly writing a migration guide in the tutorial and post it in the community slack. +We can also add the call dunder method to the base analysis class. +Probably this step is the toughest part in this journey. + +4. Implement `Executor` and cleanup `ExperimentData` + +Finally, we can completely remove the thread management machinery from the experiment data. Implement `Executor` and move the circuit executor and analysis executor to this class. +The base experiment constructor is instantiated with the `Executor` class, and the experiment data becomes almost a pure data container. +Remember we should implement lock mechanism in the experiment data so that we can continue to support `.block_for_results` method. +The `ExperimentData` would look more maintainable after this cleanup. + + +5. Implement new curve analysis baseclass + +Optionally we can provide new curve analysis base class based on JAX cost function, and rewrite the builtin curve analysis subclasses with new implementation. +We will gradually deprecate the existing `CurveAnalysis` base class. +We can also implement more sophisticated figure generation mechanism, e.g. conditioned on the experiment quality, or just lazy creation. + + +## Detailed Design + +This doesn't syntactically influence the end-user's workflow. +However, the machinery of experiment run is now delegated to the `Executor` class behind the scene, and thus an experiment must be +instantiated with an executor instance. +The role of experiment class will become (1) creation of circuits, (2) transpilation of experiment circuits, +and (3) build task graph with the executor. + +```python +exp = MyExperiment((0,), **exp_options) +exp.set_run_options(shots=1000) +# Directly update executor.circuit_executor options (delegate) +exp.backend = some_backend +# Directly update executor.circuit_executor (delegate) + +exp_data = exp.run() +# Add transpiled circuits to executor +# Add analysis to executor +# Run executor and lock ExperimentData + +exp_data.block_for_results() +# Wait until lock is released +``` + +This is rough design for the `CircuitExecutor`, `AnalysisExecutor` and `Executor` class. +Implementation details can be changed. + +```python +class CircuitExecutor: + """Job manager. This can be replaced in future with Runtime API. + + Role: + * A black box that takes circuits and returns results. + * Manage backend run options. + """ + + def __init__(self): + self._options = self._default_options() + self._executor = ThreadPoolExecutor() + self.backend = None + + @classmethod + def _default_options(cls) -> Options: + return Options( + max_circuits = None, + ) + + def set_options(self, **options): + ... + + def options(self) -> Options: + return self._options + + def add_circuits( + self, + circuits: List[QuantumCircuit], # Order sensitive + ): + ... + + def run(self) -> List[Dict]: + # Do job splitting based on max_circuits option + # Manage failed job: qiskit-experiments/#1216 + # Return canonical result dictionary + ... + + +class AnalysisExecutor: + """Analysis manager. This resolves task dependency. + + Role: + * Run analysis callbacks in subprocess and collect result data. + """ + + def __init__(self): + self._executor = ProcessPoolExecutor() + self._dependency = Dependency() + + def add_callback( + self, + callback: Callable, + predecessor: Optional[str] = None, # task dependency + ) -> str: # Returns assigned task UID + ... + + def run(self) -> Tuple[List[AnalysisResultData], List[Figure]]: + # Run callbacks in subprocess + # Avoid multiprocessing when # of callbacks is less than threshold + ... + + +class Executor: + """Executor interface attached to Experiment instance. + + Role: + * Control entire execution + * Bridge between experiment data and experiment class + """ + + def __init__(self): + self.experiment_data = None + self.circuit_executor = CircuitExecutor() + self.analysis_executor = AnalysisExecutor() + + def initialize_experiment_data(self): + ... + + def run(self): + # 1. Lock self.experiment_data + # 2. Run Job Executor, wait for results + # 3. Run Analysis Executor, wait for results + # 4. Release lock of self.experiment_data + ... + +``` + +It's noteworthy that one can also run experiment with transpiled circuits by directly using executor interface. +See [Qiskit-Experiments/#1222](https://github.com/Qiskit-Extensions/qiskit-experiments/pull/1222) for background. + +```python +exc = Executor() +exc.circuit_executor.backend = some_backend +exc.circuit_executor.add_circuits(transpiled_circuits) +exc.analysis_executor.add_callback(T1Analysis()) +exc.run() + +exc.experiment_data.block_for_results() +exc.experiment_data.save() +``` + +Analysis class may optionally provide a validation mechanism for circuit metadata, which might be called by the `Executor.run`. + +```python +class BaseAnalysis: + + def validate_metadata(self, metadata: dict[str, Any]): + pass + +``` + +For example, an experiment circuit for T1Analysis must have `xval` in metadata, which represents the delay duration in between state preparation and measurement. +The circuit metadata of a composite experiment has more complicated data structure, and implementation of the `CompositeAnalysis.validate_metadata` might become complicated. + +Note that [`BaseAnalysis.run`](https://github.com/Qiskit-Extensions/qiskit-experiments/blob/c66034c90dad73d705af25be7e9ed9617e7eb2ef/qiskit_experiments/framework/base_analysis.py#L118-L123) is not called in the new workflow. +This method conventionally takes `ExperimentData` and mutates the input container through a callback. +It internally wraps the `_run_analysis` method as a local function in which created analysis data are formatted and added to the experiment data. +This local function is added to the input experiment data as a callback (very complicated!). +By design, the analysis callback must NOT mutate the experiment data because the callback can run on a subprocess, +and the mutated experiment data container doesn't update the original container residing in the main process. +Therefore, the `BaseAnalysis.run` method is going to be deprecated, and instead we can add + +```python +class BaseAnalysis: + ... + + def __call__( + self, + experiment_data: ExperimentData, + ) -> List[AnalysisResultData, Figure]: + return self._run_analysis(experiment_data) +``` + +to turn analysis instance into callable. This allows the executor to directly submit analysis instances. +Eventually, we can also deprecate the `_run_analysis` method. +Experiment developer will directly overwrite the call dunder method to provide the implementation of analysis. + +## Alternative Approaches +In the proposed design, the executor is hidden and the experiment class behaves as the sole user interface. +Alternatively, we can expose executor and remove `BaseExperiment.run` method. + +```python +with Executor() as exc: + exp = MyExperiment((0,), **options) + exp_data = exc.run(exp) +``` + +On the one hand, this syntax has a high affinity with Runtime session, and we can completely delegate +the role of circuit and analysis execution to the `Executor`. Experiment class becomes just a factory of experimental circuits. +On the other hand, this introduces a breaking API change also to end users, and preclude us from capability of single liner execution with method chain. + +```python +exp_data = MyExperiment( + (0,), backend=backend, **options +).run().block_for_results() +``` + +If we have enough bandwidth to support end-user migration, alternative implementation is worth considering. + + +## Questions +How to communicate with community developers? +How to avoid breaking API change? +What is the reasonable timeline for migration? + +Another important situation that we may need to cover is the break of the executor run. +Job queueing in a quantum system may take very long time (sometime more than a day) if you don't reserve the device, +and we often terminate the kernel where the circuit executor is actively running. +In this situation, we may want to restart from the analysis, once after all jobs successfully complete. + +```python +exp = MyExperiment((0,), **exp_options) +exp_data = exp.run() +exp_data.save() +exp_data.block_for_results() + +... # decide to terminate after few hours of waiting +``` + +(in a fresh kernel) +```python +new_exc = Executor() +new_exc.run(experiment_id = "...") +``` + +Probably something like this is convenient. Since in principle we can save all experiment classes in the experiment service artifact, we can reconstruct discarded executor from the experiment ID. +Alternatively, we can locally save the `Executor` instance itself. Namely, + +```python +exp.executor.withdraw() + +... + +new_exc = Executor() +new_exc.rerun(0) +``` + +This mechanism doesn't require the access authorization to the IBM experiment service. +To recover the executor instance, we just need to store the job IDs in the circuit executor and the analysis callbacks in the analysis executor. +Note that the analysis callbacks are now callable analysis instances, and they can provide class information and attached option values. +We just need to store this information locally in a temporary or application folder. +This is somewhat like `git stash save` and `git stash pop`. +We should also discuss how we can address this problem with new `Executor` class. + + +## Future Extensions + +In the future, we should implement `RuntimeExecutor` which is a drop-in replacement of `CircuitExecutor`. +This executor will support the Qiskit Runtime execution path, and we can enable the session feature for +iterated experiments (see [Qiskit-Experiments/#626](https://github.com/Qiskit-Extensions/qiskit-experiments/pull/626)) +which is convenient for execution of error amplification experiments. + +We can also separately discuss execution pattern of the experiment. +At some point we may move away from the current pattern of `Experiment.run().block_for_results()`, and switch to more context-like execution for seamless integration of runtime. +This can be done by adding two dunder methods to `Executor`, namely + +```python + +class Executor: + + def __enter__(self): + pass + + def __exit__(self, exc_type, exc_val, exc_tb): + pass + + ... +``` + +This means we don't need to immediately decide new syntax for experiment run. +This can be a future extension, and we can focus on internal rework of our core components in this RFC; separation of execution from the data container (i.e. single responsibility principle), +and encapsulation of circuit execution for future runtime integration (i.e. dependency inversion principle), in addition to moving the responsibility of analysis execution from +individual analysis class to a dedicated executor for better control of parallelism. \ No newline at end of file diff --git a/0014-overhaul-qiskit-experiments/ex3_dag.png b/0014-overhaul-qiskit-experiments/ex3_dag.png new file mode 100644 index 0000000..ad78343 Binary files /dev/null and b/0014-overhaul-qiskit-experiments/ex3_dag.png differ diff --git a/0014-overhaul-qiskit-experiments/ex4_dag.png b/0014-overhaul-qiskit-experiments/ex4_dag.png new file mode 100644 index 0000000..96d480f Binary files /dev/null and b/0014-overhaul-qiskit-experiments/ex4_dag.png differ diff --git a/0014-overhaul-qiskit-experiments/scaling.png b/0014-overhaul-qiskit-experiments/scaling.png new file mode 100644 index 0000000..fc7e030 Binary files /dev/null and b/0014-overhaul-qiskit-experiments/scaling.png differ diff --git a/0014-overhaul-qiskit-experiments/seq_diagram_current.png b/0014-overhaul-qiskit-experiments/seq_diagram_current.png new file mode 100644 index 0000000..8543dca Binary files /dev/null and b/0014-overhaul-qiskit-experiments/seq_diagram_current.png differ diff --git a/0014-overhaul-qiskit-experiments/seq_diagram_proposed.png b/0014-overhaul-qiskit-experiments/seq_diagram_proposed.png new file mode 100644 index 0000000..75b7758 Binary files /dev/null and b/0014-overhaul-qiskit-experiments/seq_diagram_proposed.png differ diff --git a/README.md b/README.md index 7dcf0fa..227933f 100644 --- a/README.md +++ b/README.md @@ -20,6 +20,7 @@ facilitate that and collect feedback prior to implementation. | RFC | Status | References/Discussion | | --- | ------ | --------------------- | +| `0014` [Overhaul Qiskit Experiments](0014-overhaul-qiskit-experiments.md) | Implementation in progress | [RFC PR](https://github.com/Qiskit/RFCs/pull/47) \| [Implementation](https://github.com/Qiskit-Extensions/qiskit-experiments/issues/1268) | | `0011` [Plan to rename `Qiskit/qiskit-terra` repo to `Qiskit/qiskit`](0011-repo-rename.md) | Implementation in progress | [RFC PR](https://github.com/Qiskit/RFCs/pull/31) \| [Implementation](https://github.com/Qiskit/RFCs/issues/41) | | `0010` [Preliminary representation of rvalue classical expression in Qiskit](0010-simple-classical-representations.md) | Implementation in progress | [RFC PR](https://github.com/Qiskit/RFCs/pull/30) \| [Implementation](https://github.com/Qiskit/qiskit-terra/issues/10239) | | `0009` [`Operation`: the interface for valid `QuantumCircuit` operations](0009-interface-for-circuit-operations.md) | Implemented | [RFC PR](https://github.com/Qiskit/RFCs/pull/25) \| [Implementation](https://github.com/Qiskit/qiskit-terra/pull/7087)|