Skip to content

Sensitive data redacting PoC#711

Merged
KirillOsenkov merged 13 commits intoKirillOsenkov:mainfrom
JanKrivanek:proto/redacting-poc
Nov 8, 2023
Merged

Sensitive data redacting PoC#711
KirillOsenkov merged 13 commits intoKirillOsenkov:mainfrom
JanKrivanek:proto/redacting-poc

Conversation

@JanKrivanek
Copy link
Copy Markdown
Collaborator

@JanKrivanek JanKrivanek commented Oct 27, 2023

Context

Based on offline talks with @KirillOsenkov I'm spinning some experimentation with sensitive data redacting.
Currently the main effort was on:

  • bringing the Write and Replay functionality back to functional state
  • adding ability to replay embedded content
  • adding ability to hook embedded strings and embedded files

The actual redacting is a very PoC thing - just to demonstrate the idea.
In next iteration I'd ideally ref the https://www.nuget.org/packages/MSBuild.BinlogRedactor/ and just inject the replay functionality into it. The package would take care about the redaction of data - as it already has some auto-detection of sensitive data (and more patterns are going to be added soon)

If all the stream utilities in Postprocessing folder feels as too much code to drag - I can reference https://www.nuget.org/packages/DotUtils.StreamUtils/, but viewer seeems to be very minimalistic in package refs - so I rather just used the code.

UX

Updated:

image


image


image


image

@JanKrivanek
Copy link
Copy Markdown
Collaborator Author

The CI is failing due the new test getting

System.IO.FileNotFoundException : Could not find file 'C:\Users\appveyor\AppData\Local\Temp\1\47b8ec7b-5245-49cf-8842-ef31ceaa4965\47b8ec7b-5245-49cf-8842-ef31ceaa4965\assembly\dl3\39c84884\6d30ae17_e408da01\1.binlog'.

While for obtaining that test binlog (1.binlog) I use same GetTestFile("1.binlog") call: https://github.com/KirillOsenkov/MSBuildStructuredLog/pull/711/files#diff-1ab0219f9bea23d8aa114270d3b3010a73c59dbca476befba614f2e9cb6f89d3R132

as is already used in other tests - e.g.:

public void TestBinaryLoggerRoundtrip(bool useInMemoryProject)
{
var binLog = GetTestFile("1.binlog");

This works just fine locally.
@KirillOsenkov - any quick idea from top of your head, why 1.binlog might be missing?

@KirillOsenkov
Copy link
Copy Markdown
Owner

No quick idea, just noticing that in the path the GUID is mentioned twice:
47b8ec7b-5245-49cf-8842-ef31ceaa4965\47b8ec7b-5245-49cf-8842-ef31ceaa4965

probably eyeball GetTestFile() and see where it could go wrong?

@KirillOsenkov
Copy link
Copy Markdown
Owner

KirillOsenkov commented Oct 28, 2023

Some first notes as I read:

  1. better call it "secrets" everywhere, not "passwords"
  2. avoid abbreviations, so "password" instead of "pwd" for example
  3. I'd avoid using Path.GetTempFileName(), it can only hold 65,536 files, can overflow easily (and it's hard to debug), and people don't clean it up. Use the viewer directory (%localappdata%\Microsoft\MSBuildStructuredLog\Temp\), I think we already have code somewhere to write into it.
  4. I think there was a problem with making BuildEventArgsReader public, but I don't quite remember what it is. It could be that it would conflict with the one in MSBuild?
  5. when you update to latest main you'll notice Roman's recent change to support ExtendedCriticalBuildMessageEventArgs
  6. in terms of UI, you could probably show an empty editor in the right pane (there's an easy API for that), and then use each line from that text as a secret, and have a button to "commit". This way you could maybe specify an existing file with secrets?
  7. A design decision that needs to be made is whether to redact in-place or into a different file.
  8. Are you sure we need several new interfaces here, all fine-granular? It looks good, but just check that every interface is justified, or can we make do without?

Otherwise all seems reasonable from a quick read.

@JanKrivanek
Copy link
Copy Markdown
Collaborator Author

Some first notes as I read:

  1. better call it "secrets" everywhere, not "passwords"
  2. avoid abbreviations, so "password" instead of "pwd" for example
  3. I'd avoid using Path.GetTempFileName(), it can only hold 65,536 files, can overflow easily (and it's hard to debug), and people don't clean it up. Use the viewer directory (%localappdata%\Microsoft\MSBuildStructuredLog\Temp\), I think we already have code somewhere to write into it.
  4. I think there was a problem with making BuildEventArgsReader public, but I don't quite remember what it is. It could be that it would conflict with the one in MSBuild?
  5. when you update to latest main you'll notice Roman's recent change to support ExtendedCriticalBuildMessageEventArgs
  6. in terms of UI, you could probably show an empty editor in the right pane (there's an easy API for that), and then use each line from that text as a secret, and have a button to "commit". This way you could maybe specify an existing file with secrets?
  7. A design decision that needs to be made is whether to redact in-place or into a different file.
  8. Are you sure we need several new interfaces here, all fine-granular? It looks good, but just check that every interface is justified, or can we make do without?

Otherwise all seems reasonable from a quick read.

I'l work on these.

As far as UI - if you have anything specific in mind, a quick non-functional markup mockup would be super helpful as I'm bit rusty with UI. But I can definitely find a way otherwise. I'd probably add the 'Replace In Place' checkbox (or something similar that would flip behavior similar to Save / Save As) - that way we wouldn't need to decide whether it's a in-place redact or separate file - it'd be up to user, with sensible default.

Interfaces - I'd want to keep this as similar to MSBuild as possible - it'll make it easier to port changes both ways. But I'm definitely open to suggetions here

@JanKrivanek
Copy link
Copy Markdown
Collaborator Author

I hope I addressed all the preliminary comments + I created a proposal for the GUI.

So the current GUI proposal:

image
image
image
  • The choice to overwrite in place vs save as new is left upon the user
  • Redacting is using the BuildProgress control to prevent interaction during redacting
  • Few other choices are exposed - the 'common credentials' option is currently disabled (with 'not yet supported' tooltip). But I'll plan to pull and reference the 'real' redactor nuget in next revision and hence got the additional redactors functionality

@JanKrivanek
Copy link
Copy Markdown
Collaborator Author

Finalized v1:

  • Removed temporary code, replaced with package references (redactor and stream utils) - this as well brings the automated detection of some common secret types (more are going to be added to the package soon).
  • Surfaced the additional redacting options in UI (updated the GUI screens in the description of this PR)

It feels ready for review ;-)

Comment thread src/StructuredLogger/BinaryLogger/BuildEventArgsReader.cs Outdated
Comment thread src/StructuredLogger/BinaryLogger/Postprocessing/BinlogRedactor.cs
Comment thread src/StructuredLogger/StructuredLogger.csproj Outdated
<PackageReference Include="Microsoft.Build.Framework" Version="$(MSBuildPackageVersion)" />
<PackageReference Include="Microsoft.Build.Utilities.Core" Version="$(MSBuildPackageVersion)" />
<PackageReference Include="Microsoft.SourceLink.GitHub" Version="1.0.0" PrivateAssets="all" />
<PackageReference Include="MSBuild.BinlogRedactor.SensitiveDataDetector" Version="0.0.4-beta" />
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern with adding another dependency here. Do you think it'd be possible to only have the StructuredLogViewer and BinlogTool projects depend on this one? So that people who only depend on StructuredLogger.dll do not inherit two more dependencies they probably don't need? This would mean that StructuredLogger.dll doesn't have the built-in ability to redact.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new indirection project (StructuredLogger.Utils) to separate the functionality from the StructuredLogger.
I wanted to avoid duplicating or linking the redaction work in both of the ui layers. (The existing StructuredLogViewer.Core is not suitable for binlogtool and referencing binlogtool from the viewer felt too unnecesary)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Let's make sure all .dlls that need to ship with the viewer are mentioned here:
https://github.com/KirillOsenkov/MSBuildStructuredLog/blob/main/src/StructuredLogViewer/MSBuildStructuredLogViewer.nuspec

For binlogtool, I think the packing is all automatic, so nothing should be done hopefully.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for binlogtool I installed it from local built package and verified it works as expected - good.
For the viewer I verified by building, extracting the lib folder from the created packages and verifying it works as expected - not sure if there is better way of testing.

Comment thread src/StructuredLogger/StructuredLogger.cs Outdated
Comment thread src/StructuredLogger/BinaryLogger/Postprocessing/BinlogRedactor.cs Outdated
Comment thread src/StructuredLogger/BinaryLogger/BinaryLogReplayEventSource.cs Outdated
Comment thread src/StructuredLogger/BinaryLogger/BuildEventArgsReader.cs
Comment thread src/StructuredLogger/BinaryLogger/BuildEventArgsReader.cs
Comment thread src/StructuredLogger/BinaryLogger/BuildEventArgsReader.cs
Comment thread src/StructuredLogger/BinaryLogger/BuildEventArgsReader.cs
Comment thread src/StructuredLogger/BinaryLogger/BuildEventArgsWriter.cs
Comment thread src/StructuredLogger/BinaryLogger/BuildEventArgsWriter.cs
Comment thread src/StructuredLogger/BinaryLogger/BuildEventArgsWriter.cs
Comment thread src/StructuredLogger/BinaryLogger/BuildEventArgsReader.cs Outdated
@KirillOsenkov
Copy link
Copy Markdown
Owner

Thank you for this gargantuan effort! Your hard work and patience are much appreciated! I think it looks really good already.

I really didn't want to cause any unnecessary stop energy, but I'm paranoid about breaking people, so just wanted to iron out a couple of things before this goes in.

Primarily:

  1. do you think it's doable to move the dependency on redaction .dll and the redactor logic from the StructuredLogger.dll and into the "UI" and BinlogTool layer? Technically I'd like to keep StructuredLogger minimal, just read/write binlogs, no UI, no extra stuff. People should be able to build on top of that to add extra functionality. Perhaps StructuredLogViewer.Core is a better place? It's "in between" the core and the UI layers. Or maybe add a stub empty static Redact() delegate in the core, and have the UI plug the real implementation.
  2. similarly, if we could avoid the dependency on the streams .dll... I know it's ugly to inline the source, but historically people have been requesting that it's possible to just download StructuredLogger.dll from the Release and be able to use it, even bypassing the NuGet package entirely. This is why I've been publishing StructuredLogger.dll as part of every release: https://github.com/KirillOsenkov/MSBuildStructuredLog/releases/tag/v2.1.858
  3. is the redaction workflow separate from the regular binlog reading logic? If so, I have no concerns. I'm just worried to not regress the mainline binlog loading scenarios in terms of performance.

The rest (including the UI) seems fine. Once the above is clarified, we could merge and continue iterating on it in separate PRs if needed.

For the binlogtool, we should also later think about supporting a ".rsp" file, so that someone has a file with secrets somewhere (one per line), and we could just pass it to the tool instead of mentioning the secrets in the command line. Also clarify the in-place vs. a new file for command-line.

Comment thread src/StructuredLogViewer/MainWindow.xaml.cs Outdated
Comment thread src/StructuredLogViewer/MainWindow.xaml.cs Outdated
Comment thread src/StructuredLogger.Tests/BinaryLoggerTests.cs Outdated
Comment thread src/StructuredLogger.Tests/StructuredLogger.Tests.csproj
@KirillOsenkov
Copy link
Copy Markdown
Owner

Hmm, I just thought, for Stream utils, do you think you could publish a source-only package? DotUtils.StreamUtils.Source? You could build it from the same source using for example a custom .nuspec file: https://github.com/KirillOsenkov/MSBuildTools/blob/d8afc187fa7be3771743e223a31765a68248f4e5/src/PackNuspec/PackNuspec.csproj#L9-L37

or coerce the built-in MSBuild properties to produce a source-only package in addition to the default one.

I think this would be ideal: doesn't add a runtime dependency, but also avoids forking the source. Wish we could do the same for Microsoft.NET.StringTools.dll, but that ship has sailed I'm afraid.

@JanKrivanek
Copy link
Copy Markdown
Collaborator Author

Thank you for this gargantuan effort! Your hard work and patience are much appreciated! I think it looks really good already.

I really didn't want to cause any unnecessary stop energy, but I'm paranoid about breaking people, so just wanted to iron out a couple of things before this goes in.

Primarily:

  1. do you think it's doable to move the dependency on redaction .dll and the redactor logic from the StructuredLogger.dll and into the "UI" and BinlogTool layer? Technically I'd like to keep StructuredLogger minimal, just read/write binlogs, no UI, no extra stuff. People should be able to build on top of that to add extra functionality. Perhaps StructuredLogViewer.Core is a better place? It's "in between" the core and the UI layers. Or maybe add a stub empty static Redact() delegate in the core, and have the UI plug the real implementation.
  2. similarly, if we could avoid the dependency on the streams .dll... I know it's ugly to inline the source, but historically people have been requesting that it's possible to just download StructuredLogger.dll from the Release and be able to use it, even bypassing the NuGet package entirely. This is why I've been publishing StructuredLogger.dll as part of every release: https://github.com/KirillOsenkov/MSBuildStructuredLog/releases/tag/v2.1.858
  3. is the redaction workflow separate from the regular binlog reading logic? If so, I have no concerns. I'm just worried to not regress the mainline binlog loading scenarios in terms of performance.

The rest (including the UI) seems fine. Once the above is clarified, we could merge and continue iterating on it in separate PRs if needed.

For the binlogtool, we should also later think about supporting a ".rsp" file, so that someone has a file with secrets somewhere (one per line), and we could just pass it to the tool instead of mentioning the secrets in the command line. Also clarify the in-place vs. a new file for command-line.

I'm grateful for the detailed look and all the feedback! I definitely perceive it as helpful - no worries about interrupting the flow. I'm happy to iterate over things in this PR and not leaving them "for the future" (with few small exceptions - as my plan is to start integrating the 'forward compatibility reading support' after this one is merged and it has some of your concerns addressed - e.g. lazy ArchiveFiles. I'll call those explicitly out and point to the MSBuild PR what's the plan).

As to the individual concerns:

  • Moving redactor dependency and swap StreamUtils dependency with sources - I'll work on that (plus bump BinlogRedactor from prerelase)
  • Microsoft.NET.StringTools.dll - Is that even used, or just a transitive dependency?
  • Redacting perf and impact on reading - yes it's separate. It's currently a slow dummy 'get things done' way. It'll get faster by intorducing the 'forward compatible format' - as there we'll just skip all structured events and focus on strings and blobs only. There is as well a space for improvement for the 'auto-detections' - those use naive regexes implementation now (but this is likely not for near future).
  • binlogtool concerns - .rsp support and explicit distinction between in-place vs new-file - great points - I'll work on those.

Rest of comments are clear - I'll eaither right away address them or comment inline if any needs more context.

Thank you!! 😊

@KirillOsenkov
Copy link
Copy Markdown
Owner

Sounds great!

StructuredLogger only references these:

image

And Microsoft.Build.Utilities.Core.dll is only used for the Logger type:
image

I only mentioned Microsoft.NET.StringTools.dll as an example of a dependency MSBuild didn't add (or added as source-only), because it caused a lot of grief for a lot of tools out there, that use MSBuild libraries, they now needed to ship an extra .dll, update installers, etc.

@JanKrivanek
Copy link
Copy Markdown
Collaborator Author

@KirillOsenkov - it now feels ready for next round of review.

Main changes:

  • Addressed all feedback (hopefully)
  • Manually e-2-e tested (not sure if there is recommended way. WhatI did: Sensitive data redacting PoC #711 (comment))
  • Added indirection StructuredLogger.Utils for separating redaction logic and dependency from the StructuredLogger
  • StreamUtils added as sources
  • Improved redaction via binlog tool (still not all options exposed, but already usable for scraping logs in CIs)

Looking forward for the next round of feedback :-)

Copy link
Copy Markdown
Owner

@KirillOsenkov KirillOsenkov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job! Thanks so much!

@KirillOsenkov KirillOsenkov merged commit 2a13f88 into KirillOsenkov:main Nov 8, 2023
@JanKrivanek
Copy link
Copy Markdown
Collaborator Author

Oh - nice
Please ping me once you'll publish new version of the binlogtool - I'll then work on incorporating into arcade

@KirillOsenkov
Copy link
Copy Markdown
Owner

@KirillOsenkov
Copy link
Copy Markdown
Owner

image

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants