Skip to content
This repository was archived by the owner on Jun 14, 2024. It is now read-only.

Conversation

@apoorvedave1
Copy link
Contributor

@apoorvedave1 apoorvedave1 commented Sep 14, 2020

What changes were proposed in this pull request?

Add support for mergeing two com.microsoft.hyperspace.index.Directory objects. Pre-requisite for #29, #163.

Feature Description

We need a functionality to merge two directory trees in order to support future functionalities like Append and Optimize. When we create index only on partially unindexed data, we need to make sure to log the latest snapshot of the index as the combination of previously created index files and newly created index files. This can be done through the suggested merge api on index Directory.

Example Usecase: Merge contents of partially created indexes in case of append:

  1. User creates index on original data. This index is stored in index directory (say) index/v__=0. This will be stored in metadata as
{
  "root" : {
    "name" : "file:/C:/",
    "files" : [ ],
    "subDirs" : [ {
      "name" : "Users",
      "files" : [ ],
      "subDirs" : [ {
        "name" : "apdave",
        "files" : [ ],
        "subDirs" : [ {
          "name" : "repo2",
          "files" : [ ],
          "subDirs" : [ {
            "name" : "testdata",
            "files" : [ ],
            "subDirs" : [ {
              "name" : "v__=0",
              "files" : [ {
                "name" : "part-00000-5782bdd3-4729-44e6-b54b-51b557f66792-c000.snappy.parquet",
                "size" : 1473,
                "modifiedTime" : 1585280921507
              }, {
                "name" : "part-00000-e8c8821f-a1b2-4b3b-be9e-687b6fa6d057-c000.snappy.parquet",
                "size" : 1878,
                "modifiedTime" : 1585280853559
              } ],
              "subDirs" : [ ]
            } ]
          } ]
        } ]
      } ]
    } ]
  }
}
  1. User adds new data and calls refreshIndex(mode = "quick"). This will create another intermediate Content object similar to above, with name = "v__=1".

  2. We can now call content.root.merge(content2.root) to deduplicate the directory structure.

{
  "root" : {
    "name" : "file:/C:/",
    "files" : [ ],
    "subDirs" : [ {
      "name" : "Users",
      "files" : [ ],
      "subDirs" : [ {
        "name" : "apdave",
        "files" : [ ],
        "subDirs" : [ {
          "name" : "repo2",
          "files" : [ ],
          "subDirs" : [ {
            "name" : "testdata",
            "files" : [ ],
            "subDirs" : [ {
              "name" : "v__=0",
              "files" : [ {
                "name" : "part-00000-5782bdd3-4729-44e6-b54b-51b557f66792-c000.snappy.parquet",
                "size" : 1473,
                "modifiedTime" : 1585280921507
              }, {
                "name" : "part-00000-e8c8821f-a1b2-4b3b-be9e-687b6fa6d057-c000.snappy.parquet",
                "size" : 1878,
                "modifiedTime" : 1585280853559
              } ],
              "subDirs" : [ ]
            }, {
              "name" : "v__=1",
              "files" : [ {
                "name" : "part-00000-abc2bdd3-4729-44e6-b54b-51b557f66456-c000.snappy.parquet",
                "size" : 1473,
                "modifiedTime" : 1585280921507
              }, {
                "name" : "part-00000-abc8821f-a1b2-4b3b-be9e-687b6fa6d123-c000.snappy.parquet",
                "size" : 1878,
                "modifiedTime" : 1585280853559
              } ],
              "subDirs" : [ ]
            }]
          } ]
        } ]
      } ]
    } ]
  }
}

Does this PR introduce any user-facing change?

no

How was this patch tested?

unit tests

@sezruby
Copy link
Collaborator

sezruby commented Sep 15, 2020

Could you add the merge result of "v__=0" and "v__=1" simply in description?

@rapoth
Copy link
Contributor

rapoth commented Sep 15, 2020

Also, please update the PR description to indicate which uber-issue this is part of.

@rapoth rapoth added this to the 0.4.0 milestone Sep 15, 2020
@apoorvedave1
Copy link
Contributor Author

Could you add the merge result of "v__=0" and "v__=1" simply in description?

thanks @sezruby , added

@apoorvedave1 apoorvedave1 requested a review from rapoth September 15, 2020 17:57
@rapoth
Copy link
Contributor

rapoth commented Sep 15, 2020

Also, please update the PR description to indicate which uber-issue this is part of.

Thanks! I saw you updated the PR description. Can you link to the uber-issue please?

@apoorvedave1
Copy link
Contributor Author

Also, please update the PR description to indicate which uber-issue this is part of.

thanks @rapoth , updated the PR description

// temp/a/f1
// temp/b/f2
// testDir/temp/a/f1
// testDir/temp/b/f2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is an resolution of this comment. It's repeated across many other tests.

// testDir/a/f1
// testDir/b/c/f2
// testDir/temp/a/f1
// testDir/temp/b/c/f2
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is an resolution of this comment. It's repeated across many other tests.

// testDir/a/c/f3
// testDir/temp/a/f1
// testDir/temp/a/b/f2
// testDir/temp/a/c/f3
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is an resolution of this comment. It's repeated across many other tests.

Copy link
Contributor

@imback82 imback82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @apoorvedave1!

@imback82
Copy link
Contributor

I rebased your branch, so make sure to run git pull.

@imback82
Copy link
Contributor

Merging to master! @sezruby / @rapoth, feel free to review this and leave any comments if needed.

@imback82 imback82 merged commit d439ef6 into microsoft:master Sep 16, 2020
@rapoth
Copy link
Contributor

rapoth commented Sep 16, 2020

LGTM. All my comments have been resolved. Thanks a lot @apoorvedave1!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants