Yuta Nakamura nakasan617

PROFILE

Hello this is Yuta Nakamura. I am a Ph.D. candidate at DePaul University (almost in the 3rd year). I have an enormous curiosity out there and I wish to create some business that would change the game in an industry.

RESEARCH

I am currently in computer systems, which does things related to compilers, operating systems, database, etc... I am currently working on research of program analysis. To be more specific, I am trying to create this software to detect the point of divergence/reconvergence of two executions. We are currently assuming those differences occur because of provenances of the 2 executions.

I love research in a sense that I get to control what I do at my own time schedule, though sometimes it gets packed before the deadlines.

PUBLISHED WORK

The following is the published work I have so far.

Efficient Differencing of System-level Provenance Graphs (2023)

Abstract:
Data provenance, when audited at the operating system level, generates a large volume of low-level events. Current provenance systems infer causal flow from these event traces, but do not infer application structure, such as loops and branches. The absence of these inferred structures decreases accuracy when comparing two event traces, leading to low-quality answers from a provenance system. In this paper, we infer nested natural and unnatural loop structures over a collection of provenance event traces. We describe an `unrolling method' that uses the inferred nested loop structure to systematically mark loop iterations. Our loop-based unrolling improves the accuracy of trace comparison by 20-70% over trace comparisons that do not rely on inferred structures.

Provenance-based Workflow Diagnostics Using Program Specification (2022)

Abstract:
Workflow management systems (WMS) help automate and coordinate scientific modules and monitor their execution. WMSes are also used to repeat a workflow application with different inputs to test sensitivity and reproducibility of runs. However, when differences arise in outputs across runs, current WMSes do not audit sufficient provenance metadata to determine where the execution first differed. This increases diagnostic time and leads to poor quality diagnostic results. In this paper, we use program specification to precisely determine locations where workflow execution differs. We use existing provenance audited to isolate modules where execution differs. We show that using program specification comes at some increased storage overhead due to mapping of provenance data flows onto program specification, but leads to better quality diagnostics in terms of the number of differences found and their location relative to comparing provenance metadata audited within current WMSes.

Link to the paper

Published in HiPC 2022

Content Defined Merkle Tree (2020)

Abstract:
Containerization simplifies the sharing and deployment of applications when environments change in the software delivery chain. To deploy an application, container delivery methods push and pull container images. These methods operate on file and layer (set of files) granularity, and introduce redundant data within a container. Several container operations such as upgrading, installing, and maintaining become inefficient, because of copying and provisioning of redundant data. In this paper, we reestablish recent results that block-level deduplication reduces the size of individual containers, by verifying the result using content-defined chunking. Block-level deduplication, however, does not improve the efficiency of push/pull operations which must determine the specific blocks to transfer. We introduce a content-defined Merkle Tree (\CDMT{}) over deduplicated storage in a container. \CDMT{} indexes deduplicated blocks and determines changes to blocks in logarithmic time on the client. \CDMT{} efficiently pushes and pulls container images from a registry, especially as containers are upgraded and (re-)provisioned on a client. We also describe how a registry can efficiently maintain the \CDMT{} index as new image versions are pushed. We show the scalability of \CDMT{} over Merkle Trees in terms of disk and network I/O savings using 15 container images and 233 image versions from Docker Hub.

Link to CDMT

Published in HiPC 2020

Efficient Provenance Alignment in Reproduced Executions (2020)

Abstract:
Reproducing experiments entails repeating experiments with changes. Changes, such as a change in input arguments, a change in the invoking environment, or a change due to non-determinism in the runtime may alter results. If results alter significantly, perusing them is not sufficient---users must analyze the impact of a change and determine if the experiment computed the same steps. Making fine-grained, stepwise comparisons can be both challenging and time-consuming. In this paper, we compare a reproduced execution with recorded system provenance of the original execution, and determine \textit{provenance alignment}. The alignment is based on comparing the specific location in the program, the control flow of the execution, and data inputs. Experiments show that the alignment method has a low overhead to compute a match and realigns with a small look-ahead buffer.

Link to Provenance Alignment

Published in TaPP 2020

HOBBIES

I love studying and playing chess and other strategic board games, so if you are interested, befriend me and play against me! My highest rating is 1650 for rapid (10 minutes) in chess.com.

Link to chess.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yuta Nakamura nakasan617

Organizations

Block or report nakasan617