-
Notifications
You must be signed in to change notification settings - Fork 113
Add Dataset-related relational schemas and the file system service with Git version control #2369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
bobbai00
merged 14 commits into
master
from
jiadong-introduce-dataset-schema-and-version-control-fs-service
Feb 16, 2024
Merged
Add Dataset-related relational schemas and the file system service with Git version control #2369
bobbai00
merged 14 commits into
master
from
jiadong-introduce-dataset-schema-and-version-control-fs-service
Feb 16, 2024
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
f06d2bf to
ab44757
Compare
Yicong-Huang
requested changes
Feb 11, 2024
Contributor
Yicong-Huang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. it looks really good and clean. Some general comments:
- think about thread-safe issues, and see if we need to make it thread-safe;
- consider about different OS support, and make the design general if possible;
- consider use standard git library and parsers to reduce maintenance effort.
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
d070e5c to
87108ee
Compare
1fbae6e to
7cd9a48
Compare
Yicong-Huang
approved these changes
Feb 16, 2024
Contributor
Yicong-Huang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left comments in code
...n/scala/edu/uci/ics/texera/web/resource/dashboard/user/dataset/utils/JGitVersionControl.java
Outdated
Show resolved
Hide resolved
...n/scala/edu/uci/ics/texera/web/resource/dashboard/user/dataset/utils/JGitVersionControl.java
Outdated
Show resolved
Hide resolved
...n/scala/edu/uci/ics/texera/web/resource/dashboard/user/dataset/utils/JGitVersionControl.java
Outdated
Show resolved
Hide resolved
...n/scala/edu/uci/ics/texera/web/resource/dashboard/user/dataset/utils/JGitVersionControl.java
Outdated
Show resolved
Hide resolved
...ra/web/resource/dashboard/user/dataset/service/GitVersionControlLocalFileStorageService.java
Outdated
Show resolved
Hide resolved
b560c23 to
98c9b73
Compare
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces the dataset-related table schema design, as well as a class that provides version control using Git on the local file system.
File service with Git Version Control
I added a class called
GitVersionControlLocalFileStorageService, which consists of several static methods:1. File Write and Delete
The following methods handle file writing/deleting in the repository and directory deletion, using standard Java IO alongside Git for version tracking:
For
writeFileToRepoandremoveFileFromRepo, the changes will be staged by git add and rm using JGit.2. Version Init and Creation
The following methods for repository initialization and version creation:
This method does the git init using JGit.
This method does a
git commit -m {versionName}to create a commit.3. Read File/FileTree of a certain version
Since a repository can have multiple versions, reads on files of different version can happen frequently. To make reads be able to happen simultaneously, we need to avoid checking out during reads.
In order to avoid checking out between different commits when doing reads, I utilized
git showandgit ls-tree, passing the commit hash value to these commands to accomplish read a file/filetree of a certain commit without checking out.Utilizes git ls-tree to fetch the repository's file tree at a specific commit, parsed into a Set of FileNode objects representing the file hierarchy.
Leverages git show to output the content of a file at a specific commit directly to an OutputStream, facilitating version-specific file content retrieval without altering the working directory's state.
Dataset-related DB schema
Three tables are added:
I introduce a table
dataset_versionto store the version metadata, instead of relying on the git commands to check all the versions. The reasons of this decision are:it reduces the number of system call(executing git commands), as APIs like checking the versions of a dataset will be called very frequently.
The relationship between
datasetanddataset_versionare 1 to N: 1 dataset can have multiple versions, but one dataset version can only belong to one dataset.