[feature/dataflow] Add dataflow analysis prototype by MichalStrehovsky · Pull Request #994 · dotnet/linker

MichalStrehovsky · 2020-03-12T12:28:26Z

This can do enough dataflow analysis so that this code "just works" without any annotations (real-world pattern in e.g. System.Text.Json):

Type t;
if (isValueType)
    t = typeof(ValueTypeAccessor<>).MakeGenericType(p);
else
    t = typeof(ReferenceTypeAccessor<>).MakeGenericType(p);

// Linker keeps the default constructor for both ValueTypeAccessor and
// ReferenceTypeAccessor.
Activator.CreateInstance(t);

On a very high level:

we keep track of interesting value on the evaluation stack (interesting being constant integers, allocated array sizes, System.Type instances, RuntimeTypeHandles, string constants, parameter loads, field loads, and the likes).
uninteresting values are simply tracked as Unknown.
we make a single pass over the IL, modeling the value on the stack.
in case of merge points, we model the stack slots while preserving all possible values (e.g. in the example above, there's a merge point after the else branch, at which point the value stored in t can either be ValueTypeAccessor or ReferenceTypeAccessor).
Loading a multi-value location on the evaluation stack produces a stack slot representing multiple values. One can access the values by doing a foreach over the UniqueValues property of the ValueNode.

It's easiest to start reviewing this by looking at the diff in MarkStep.cs, which really shows the consumption part of the analysis.

ValueNode.cs and MethodBodyScanner.cs are Cecil rewrites of same-named files in the closed-source .NET Native Dependency Reducer, with a lot of stuff removed. I also got rid of most CCI-sms. I kept the structure the same because we can raid the .NET Native Dependency Reducer source code for more patterns if we need to later (e.g. the .NET Native Dependency Reducer can handle dataflow across async state machines generated by the C# compiler for async methods - but I kind of hope we won't need those kinds of things, ever).

Unfortunately, diffing with the .NET Native codebase is impaired by non-standard code formatting rules in the Linker repo (that the new files adhere to for consistency reasons).

ValueNode.cs has barely any references to Cecil - the hope is that we would be able to reuse this file for e.g. a Roslyn analyzer, but I haven't investigated Roslyn's dataflow analysis facilities. I assume there are some for the nullable analysis, but I didn't look whether they're public or we need to roll our own.

This can do enough dataflow analysis so that this code "just works" without any annotations (real-world pattern in e.g. System.Text.Json): ```csharp Type t; if (isValueType) t = typeof(ValueTypeAccessor<>).MakeGenericType(p); else t = typeof(ReferenceTypeAccessor<>).MakeGenericType(p); // Linker keeps the default constructor for both ValueTypeAccessor and // ReferenceTypeAccessor. Activator.CreateInstance(t); ``` ValueNode.cs and MethodBodyScanner.cs are Cecil rewrites of same-named files in the closed-source .NET Native Dependency Reducer, with _a lot_ of stuff removed. I also got rid of most CCI-sms. I kept the structure the same because we can raid the .NET Native Dependency Reducer source code for more patterns if we need to later (e.g. the .NET Native Dependency Reducer can handle dataflow across `async` state machines generated by the C# compiler for async methods - but I kind of hope we won't need those kinds of things, ever). Unfortunately, diffing with the .NET Native codebase is impaired by non-standard code formatting rules in the Linker repo (that the new files adhere to for consistency reasons). ValueNode.cs has barely any references to Cecil - the hope is that we would be able to reuse this file for e.g. a Roslyn analyzer, but I haven't investigated Roslyn's dataflow analysis facilities. I assume there are some for the nullable analysis, but I didn't look whether they're public or we need to roll our own.

MichalStrehovsky · 2020-03-12T12:29:08Z

+	/// <summary>
+	/// Tracks information about the contents of a stack slot
+	/// </summary>
+	class StackSlot


Kept this as class for now because that's what it is in .NET Native, but we should be able to make this a struct, I think.

MichalStrehovsky · 2020-03-12T12:32:40Z

 			if (HasManuallyTrackedDependency (body))
 				return;

+			var scanner = new ReflectionMethodBodyScanner (this);


Eventually, this should only run when we detect stores into unsafe locations (unsafe method callsite/unsafe method return value/unsafe field) as part of the method body marking.

MichalStrehovsky · 2020-03-12T12:37:50Z

+				return true;
+			}
+
+			void MarkMethodsFromReflectionCall (ref ReflectionPatternContext reflectionContext, TypeDefinition declaringType, string name, BindingFlags? bindingFlags, int? parametersCount = null)


This is copypaste of the existing method in this file. I removed the generic arity parameter because generic methods don't have mangled names. I don't think that code path is ever reachable. I expect when we're done, the existing method will be deleted in favor of this scanner, so I didn't bother unifying.

vitek-karas · 2020-03-12T12:41:39Z

+
+								// Go over all types we've seen
+								foreach (var value in methodParams[0].UniqueValues ()) {
+									if (value is SystemTypeValue systemTypeValue) {


We'll probably have to change the reporting here
The context expects to get basically one call per analysis - either recognized or not. Since this is a foreach, it may get multiple. It probably just needs reviewing.

vitek-karas · 2020-03-12T12:42:00Z

+										MarkMethodsFromReflectionCall (ref reflectionContext, systemTypeValue.TypeRepresented, ".ctor", bindingFlags, ctorParameterCount);
+									} else if (value == NullValue.Instance) {
+										// Nothing to report. This is likely just a value on some unreachable branch.
+										reflectionContext.RecordHandledPattern ();


Why do we need a new method - why not use the existing RecordRecognizedPattern?

This handles:

Type t = null; if (condition) t = typeof(Foo); else t = typeof(Bar); CreateInstance(t);

We're going to see null, Foo, Bar for this. We could also report nothing for null, but then this is going to assert/crash on CreateInstance(null) (I'm sure there will be user code with this, however meaningless that pattern is).

RecordRecognizedPattern expects that there's going to be some accessed item. In this case there's no accessed item.

In that the correct reporting would be "match - Foo" and "match - Bar", ignoring the null completely. But if we see only null (alone) we should report it is "unrecognized". That's how the rest of the pattern matching behaves in cases where there are potentially multiple matches (like method overrides for GetMethod for example).

I tried to update the if cascade to make it like you suggest (report CreateInstance(null) as unrecognized, even though it's recognized), but things got awkward quickly. I think we'll want to make make adjustments to the reflection context anyway to make it more convenient to use it at merge points (where we have multiple values, some of which might be known).

I added that to the running list of issues to look into.

MichalStrehovsky · 2020-03-12T14:20:33Z

I'm trying to figure out what's the Linux failure. It's triggering the invalid IL detection that currently throws for diagnostic purposes (it will eventually be a nop):

https://github.com/mono/linker/pull/994/files#diff-83d9d9aa53da0b0737513a2561f82bc8R3373-R3375

The stack trace of the failure looks impossible (the source for the test has a call to a static method with no parameters, but at the time of dataflow analysis, Cecil thinks the called method takes some parameters that weren't supplied).

I'm setting up a Linux environment with mcs, Mono ILASM, etc. so that I can repro.

mrvoorhe · 2020-03-12T19:20:38Z

 			if (HasManuallyTrackedDependency (body))
 				return;

+			var scanner = new ReflectionMethodBodyScanner (this);


Would you mind giving us some way to turn this new code off without us having to customize our fork?

One way would be refactoring MarkReflectionLikeDependencies a bit so that we can override the new behavior while leaving the existing reflection detection running.

protected virtual void MarkReflectionLikeDependencies (MethodBody body) { if (HasManuallyTrackedDependency (body)) return; MarkReflectionWithScanner(body); MarkReflectionWithPatternDetector(body); } protected virtual void MarkReflectionWithScanner(MethodBody body) { var scanner = new ReflectionMethodBodyScanner (this); scanner.Scan (body); } protected virtual void MarkReflectionWithPatternDetector(MethodBody body) { var instructions = body.Instructions; ReflectionPatternDetector detector = new ReflectionPatternDetector (this, body.Method); // // Starting at 1 because all patterns require at least 1 instruction backward lookup // for (var i = 1; i < instructions.Count; i++) { var instruction = instructions [i]; if (instruction.OpCode != OpCodes.Call && instruction.OpCode != OpCodes.Callvirt) continue; if (ProcessReflectionDependency (body, instruction)) continue; if (!(instruction.Operand is MethodReference methodCalled)) continue; var methodCalledDefinition = methodCalled.Resolve (); if (methodCalledDefinition == null) continue; ReflectionPatternContext reflectionContext = new ReflectionPatternContext (_context, body.Method, methodCalledDefinition, i); try { detector.Process (ref reflectionContext); } finally { reflectionContext.Dispose (); } } }

Alternatively, some sort of bool option on LinkContext to disable usage of ReflectionMethodBodyScanner would work as well.

There is a lot of new code here and no tests. I'm sure it was tested else where, but I don't want this new code to get in the way of us staying in sync with upstream monolinker

I should have made it more clear - this is going into a branch so that we can shape it into something acceptable for master. Being able to turn this off, making sure it doesn't run unless absolutely necessary, etc. is going to be addressed before this would make it into master.

We're going to add tests too - there are existing targeted tests we can port and we're going to write new tests too.

O great that sounds good.

We have more complicated reflection detection tests in our UnityLinker test suite that are supported by stack analysis that we have. I can put these tests on branch for you to grab if you’d like.

We would be very grateful for any tests in this area. Thanks a lot!

@vitek-karas here are all of our reflection tests

Unity-Technologies@1c37396

You can cherry-pick this to a branch from master and everything should run. Of course many tests will fail.

MichalStrehovsky · 2020-03-13T11:13:21Z

I'm trying to figure out what's the Linux failure

Okay, this is a pre-existing problem. The input to the linker is invalid (look for the <WTF> marker - this method is definitely not an instance method):

.method public hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       7 (0x7)
  .maxstack  8
  IL_0000:  call       <WTF>instance</WTF> object ['library']Mono.Linker.Tests.Cases.UnreachableBlock.Dependencies.ClassA::Method_1()
  IL_0005:  pop
  IL_0006:  ret
} // end of method UninitializedLocals::Main

I'm going to assume this is a mcs bug because this test works fine on Windows. Would #996 help? I'll try to retrigger the CI.

MichalStrehovsky · 2020-03-13T11:17:45Z

I'm going to assume this is a mcs bug because this test works fine on Windows.

Nope, this is a test bug. #997 is the fix.

MichalStrehovsky · 2020-03-13T11:22:28Z

On a second though, it is also an mcs bug because it treats the method as static in the source file (doesn't complain about the signature), but then proceeds to generate the signature as instance. Do we care about mcs bugs, or not anymore?

marek-safar · 2020-03-13T11:33:47Z

+					case Code.Ldc_I4_6:
+					case Code.Ldc_I4_7:
+					case Code.Ldc_I4_8: {
+							int value = operation.OpCode.Code - Code.Ldc_I4_0;


I am not sure I like this. Cecil can change the values anytime

I don't think that's likely. The enum is directly mirroring the values in the CIL specification.

Cecil can change the values anytime

Changing the value of an enum member is a breaking source and binary change.

I could see Cecil being forced into a breaking change if we ever add new unprefixed IL instructions (Cecil currently assumes all IL instructions fit in a single byte, and folds the ones that have a twobyte encoding into a single byte crossing fingers there's enough space left). But I wouldn't expect these to be ever reordered - complex lookup tables from physical opcode encoding to Code would introduce unnecessary inefficiencies when parsing.

Cecil is not API stable so even source breaking changes can happen (so just be careful)

MichalStrehovsky · 2020-03-17T11:37:24Z

CI is not trigger so I'm going to try closing a reopening...

MichalStrehovsky · 2020-03-17T18:55:08Z

I had to get a local repro to get to the failure:

     1>ILLINK : error :  [C:\git\linker\artifacts\bin\ILLink.Tasks.IntegrationTests\Release\netcoreapp3.0\tests-temp\helloworld\helloworld\helloworld.csproj]
           An assembly specified in the application dependencies manifest (illink.deps.json) was not found:
             package: 'Microsoft.Bcl.HashCode', version: '1.1.0'
             path: 'lib/netcoreapp2.1/Microsoft.Bcl.HashCode.dll'

I'm tempted to just revert the Microsoft.Bcl.HashCode (commit 88b158e) thing to unblock work in the branch.

@vitek-karas Thoughts?

vitek-karas · 2020-03-17T19:00:46Z

Agreed - revert that and add it to the list (with the commit hash of the reverted change). This should relatively simple fix - the test is probably not deploying all the necessary dependencies correctly.

This reverts commit 88b158e.

This can do enough dataflow analysis so that this code "just works" without any annotations (real-world pattern in e.g. System.Text.Json): ```csharp Type t; if (isValueType) t = typeof(ValueTypeAccessor<>).MakeGenericType(p); else t = typeof(ReferenceTypeAccessor<>).MakeGenericType(p); // Linker keeps the default constructor for both ValueTypeAccessor and // ReferenceTypeAccessor. Activator.CreateInstance(t); ``` ValueNode.cs and MethodBodyScanner.cs are Cecil rewrites of same-named files in the closed-source .NET Native Dependency Reducer, with _a lot_ of stuff removed. I also got rid of most CCI-sms. I kept the structure the same because we can raid the .NET Native Dependency Reducer source code for more patterns if we need to later (e.g. the .NET Native Dependency Reducer can handle dataflow across `async` state machines generated by the C# compiler for async methods - but I kind of hope we won't need those kinds of things, ever). Unfortunately, diffing with the .NET Native codebase is impaired by non-standard code formatting rules in the Linker repo (that the new files adhere to for consistency reasons). Commit migrated from dotnet/linker@7b608ba

MichalStrehovsky requested a review from marek-safar as a code owner March 12, 2020 12:28

vitek-karas changed the title ~~Add dataflow analysis prototype~~ [feature/dataflow] Add dataflow analysis prototype Mar 12, 2020

MichalStrehovsky commented Mar 12, 2020

View reviewed changes

vitek-karas approved these changes Mar 12, 2020

View reviewed changes

mrvoorhe reviewed Mar 12, 2020

View reviewed changes

MichalStrehovsky closed this Mar 13, 2020

MichalStrehovsky reopened this Mar 13, 2020

marek-safar reviewed Mar 13, 2020

View reviewed changes

MichalStrehovsky added 3 commits March 13, 2020 15:28

Feedback 1

a7dfa43

Feedback 2

5c0558a

Feedback 3

88b158e

MichalStrehovsky closed this Mar 17, 2020

MichalStrehovsky reopened this Mar 17, 2020

MichalStrehovsky closed this Mar 17, 2020

MichalStrehovsky reopened this Mar 17, 2020

Merge branch 'feature/dataflow' into flow

0463faa

Revert "Feedback 3"

cdefa8f

This reverts commit 88b158e.

MichalStrehovsky mentioned this pull request Mar 17, 2020

[feature/dataflow] Branch exit criteria cleanups #999

Open

12 tasks

Allow unresolvable type references

9fd5102

MichalStrehovsky merged commit 7b608ba into dotnet:feature/dataflow Mar 18, 2020

MichalStrehovsky deleted the flow branch March 18, 2020 14:03

MichalStrehovsky mentioned this pull request Aug 4, 2020

Linker warns about compiler generated fields when using new reflection attributes #1403

Closed

Conversation

MichalStrehovsky commented Mar 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky commented Mar 12, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MichalStrehovsky commented Mar 13, 2020

Uh oh!

MichalStrehovsky commented Mar 13, 2020

Uh oh!

MichalStrehovsky commented Mar 13, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MichalStrehovsky commented Mar 17, 2020

Uh oh!

MichalStrehovsky commented Mar 17, 2020

Uh oh!

vitek-karas commented Mar 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

MichalStrehovsky commented Mar 12, 2020 •

edited

Loading