Skip to content

Conversation

@bobbai00
Copy link
Contributor

@bobbai00 bobbai00 commented Mar 18, 2024

This PR introduces the GUI of environment and some fixes to previous dataset features. For the backend of environment, see #2434

After introducing the environment, the way of uploading data and scanning data using workflow is presented in this blog. For more specific information, there is a demo video.

Features

  • View the Environment information at the workspace
    2024-03-21 23 02 09

  • Add dataset to the current environment
    2024-03-21 23 03 14

  • Preview Data File in Dataset of environment
    2024-03-21 23 04 08

  • Scan Files that are in the datasets
    2024-03-21 23 05 53

Implementation Details

The changes on the ScanSourceOperatorDesc

Previously, the source file is located by its absolute path and scanned into the workflow. Now, since all the files are within the dataset and managed by JGit, its physical file may not be directly available. Therefore, couple of changes are made regarding the way that source operator scans the file.

  1. In the source operator descriptor: ScanSourceOpDesc
    A new member variable is added:
  @JsonIgnore
  var filePath: Option[String] = None

// new
  @JsonIgnore
  var datasetFileDesc: Option[DatasetFileDesc] = None

class DatasetFileDesc contains the softlink to the file in the dataset, and has utilities to read the file as stream/tempraory file.

datasetFileDesc will be initialized when setContext is called:

    if (getContext.userId.isDefined) {
      val environmentEid = WorkflowResource.getEnvironmentEidOfWorkflow(
        UInteger.valueOf(workflowContext.workflowId.id)
      )
      // if user system is defined, a datasetFileDesc will be initialized, which is the handle of reading file from the dataset
      datasetFileDesc = Some(
        getEnvironmentDatasetFilePathAndVersion(getContext.userId.get, environmentEid, fileName.get)
      )
    }
  1. For each source operator executor, i.e. CSVScanSourceExec

A new parameter is added in the constructor:

class CSVScanSourceOpExec private[csv] (
    filePath: String,
    datasetFileDesc: DatasetFileDesc,

If datasetFileDesc is set non-null(i.e. user system is enabled), when creating the input stream reader, the stream will be created using datasetFileDesc.fileInputStream:

  // this function create the input stream accordingly:
  // - if filePath is set, create the stream from the file
  // - if fileDesc is set, create the stream via JGit call
  def createInputStream(filePath: String, fileDesc: DatasetFileDesc): InputStream = {
    if (filePath != null && fileDesc != null) {
      throw new RuntimeException(
        "File Path and Dataset File Descriptor cannot present at the same time."
      )
    }
    if (filePath != null) {
      new FileInputStream(filePath)
    } else {
      // create stream from dataset file desc
      fileDesc.fileInputStream()
    }
  }

@bobbai00 bobbai00 self-assigned this Mar 18, 2024
@bobbai00 bobbai00 requested a review from aglinxinyuan March 18, 2024 17:24
@bobbai00 bobbai00 force-pushed the jiadong-introduce-environment-feature branch 2 times, most recently from 93e4c91 to 43c84da Compare March 21, 2024 20:31
@bobbai00 bobbai00 marked this pull request as ready for review March 22, 2024 06:16
@bobbai00 bobbai00 changed the title [WIP]Add environment GUI and new mechanism to scan source files in workflow Add environment GUI and new mechanism to scan source files in workflow Mar 22, 2024
@bobbai00 bobbai00 force-pushed the jiadong-introduce-environment-feature branch 4 times, most recently from 9c9d5d5 to e9e8972 Compare March 25, 2024 17:50
@bobbai00 bobbai00 force-pushed the jiadong-introduce-environment-feature branch from e9e8972 to 8b59285 Compare March 25, 2024 18:34
@bobbai00 bobbai00 closed this Mar 27, 2024
@aglinxinyuan aglinxinyuan deleted the jiadong-introduce-environment-feature branch September 6, 2025 00:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant