-
Notifications
You must be signed in to change notification settings - Fork 113
Add Environment WebServer APIs #2434
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
ceceeca to
0047eb4
Compare
aglinxinyuan
approved these changes
Mar 1, 2024
Contributor
aglinxinyuan
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Merged
bobbai00
added a commit
that referenced
this pull request
Mar 6, 2024
This PR introduces the GUI of `environment` and some fixes to previous `dataset` features. For the backend of `environment`, see #2434 ### Features - Environment Tab on the left panel  - Auto Complete only from the files in datasets  ### Implementation Details 1. The changes on the `ScanSourceOperatorDesc` Previously, the source file is located by its absolute path and scanned into the workflow. Now, since all the files are within the dataset and managed by JGit, its physical file may not be directly available, current solution is to write the target file into a temporary file, which is identified by an absolute path generated by JVM. The file will be deleted when JVM quits. 2. When workflow execute request is submitted, the webserver will also persist the environment eid to the `WorkflowExecutions` table.
bobbai00
added a commit
that referenced
this pull request
Mar 28, 2024
This PR introduces the GUI of environment and some fixes to previous dataset features. For the backend of environment, see #2434 After introducing the environment, the way of uploading data and scanning data using workflow is presented in this [blog](https://github.com/Texera/texera/wiki/Create-Dataset,-upload-data-to-it-and-use-it-in-Workflow). For more specific information, there is a [demo video](https://www.youtube.com/watch?app=desktop&v=EJ269aWnHv4&ab_channel=TexeraProject). ## Features - View the Environment information at the workspace  - Add dataset to the current environment  - Preview Data File in Dataset of environment  - Scan Files that are in the datasets  ## Implementation Details ### The changes on the ScanSourceOperatorDesc Previously, the source file is located by its absolute path and scanned into the workflow. Now, since all the files are within the dataset and managed by JGit, its physical file may not be directly available. Therefore, couple of changes are made regarding the way that source operator scans the file. 1. In the source operator descriptor: ScanSourceOpDesc A new member variable is added: ```scala @JsonIgnore var filePath: Option[String] = None // new @JsonIgnore var datasetFileDesc: Option[DatasetFileDesc] = None ``` class `DatasetFileDesc` contains the softlink to the file in the dataset, and has utilities to read the file as stream/tempraory file. `datasetFileDesc` will be initialized when `setContext` is called: ```scala if (getContext.userId.isDefined) { val environmentEid = WorkflowResource.getEnvironmentEidOfWorkflow( UInteger.valueOf(workflowContext.workflowId.id) ) // if user system is defined, a datasetFileDesc will be initialized, which is the handle of reading file from the dataset datasetFileDesc = Some( getEnvironmentDatasetFilePathAndVersion(getContext.userId.get, environmentEid, fileName.get) ) } ``` 2. For each source operator executor, i.e. CSVScanSourceExec A new parameter is added in the constructor: ```scala class CSVScanSourceOpExec private[csv] ( filePath: String, datasetFileDesc: DatasetFileDesc, ``` If `datasetFileDesc` is set non-null(i.e. user system is enabled), when creating the input stream reader, the stream will be created using `datasetFileDesc.fileInputStream`: ```scala // this function create the input stream accordingly: // - if filePath is set, create the stream from the file // - if fileDesc is set, create the stream via JGit call def createInputStream(filePath: String, fileDesc: DatasetFileDesc): InputStream = { if (filePath != null && fileDesc != null) { throw new RuntimeException( "File Path and Dataset File Descriptor cannot present at the same time." ) } if (filePath != null) { new FileInputStream(filePath) } else { // create stream from dataset file desc fileDesc.fileInputStream() } } ``` --------- Co-authored-by: Xinyuan Lin <xinyual3@uci.edu>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces the APIs of Environment to the Web server. It depends on the dataset feature, #2413 and #2391 .
Designs
workflow_executionrecord.DB Schema Changes
Three new Tables are added:
environmentis the table for storing the environment info.environment_of_workflowmaintains the which environment the workflow is within. CURRENTLY, WORKFLOW is 1-to-1 correspondence to environment.dataset_of_environmentrecords which dataset(s) are visible to the workflow that is using this environment.New Column is added to the
workflow_executions:New column
environment_eidis used to record which the environment is used for that workflow execution.New APIs
Several APIs related to environment is added.
Existing API Updates
I changed the implementation of
persistWorkflowinWorkflowResource. Specifically, aenvironmentwill be created if the workflow has no corresponding environment when persisting, the code snippet is: