Skip to content

[feature/dataflow] Add dataflow analysis prototype#994

Merged
MichalStrehovsky merged 7 commits into
dotnet:feature/dataflowfrom
MichalStrehovsky:flow
Mar 18, 2020
Merged

[feature/dataflow] Add dataflow analysis prototype#994
MichalStrehovsky merged 7 commits into
dotnet:feature/dataflowfrom
MichalStrehovsky:flow

Conversation

@MichalStrehovsky
Copy link
Copy Markdown
Member

@MichalStrehovsky MichalStrehovsky commented Mar 12, 2020

This can do enough dataflow analysis so that this code "just works" without any annotations (real-world pattern in e.g. System.Text.Json):

Type t;
if (isValueType)
    t = typeof(ValueTypeAccessor<>).MakeGenericType(p);
else
    t = typeof(ReferenceTypeAccessor<>).MakeGenericType(p);

// Linker keeps the default constructor for both ValueTypeAccessor and
// ReferenceTypeAccessor.
Activator.CreateInstance(t);

On a very high level:

  • we keep track of interesting value on the evaluation stack (interesting being constant integers, allocated array sizes, System.Type instances, RuntimeTypeHandles, string constants, parameter loads, field loads, and the likes).
  • uninteresting values are simply tracked as Unknown.
  • we make a single pass over the IL, modeling the value on the stack.
  • in case of merge points, we model the stack slots while preserving all possible values (e.g. in the example above, there's a merge point after the else branch, at which point the value stored in t can either be ValueTypeAccessor or ReferenceTypeAccessor).
  • Loading a multi-value location on the evaluation stack produces a stack slot representing multiple values. One can access the values by doing a foreach over the UniqueValues property of the ValueNode.

It's easiest to start reviewing this by looking at the diff in MarkStep.cs, which really shows the consumption part of the analysis.

ValueNode.cs and MethodBodyScanner.cs are Cecil rewrites of same-named files in the closed-source .NET Native Dependency Reducer, with a lot of stuff removed. I also got rid of most CCI-sms. I kept the structure the same because we can raid the .NET Native Dependency Reducer source code for more patterns if we need to later (e.g. the .NET Native Dependency Reducer can handle dataflow across async state machines generated by the C# compiler for async methods - but I kind of hope we won't need those kinds of things, ever).

Unfortunately, diffing with the .NET Native codebase is impaired by non-standard code formatting rules in the Linker repo (that the new files adhere to for consistency reasons).

ValueNode.cs has barely any references to Cecil - the hope is that we would be able to reuse this file for e.g. a Roslyn analyzer, but I haven't investigated Roslyn's dataflow analysis facilities. I assume there are some for the nullable analysis, but I didn't look whether they're public or we need to roll our own.

This can do enough dataflow analysis so that this code "just works"
without any annotations (real-world pattern in e.g. System.Text.Json):

```csharp
Type t;
if (isValueType)
    t = typeof(ValueTypeAccessor<>).MakeGenericType(p);
else
    t = typeof(ReferenceTypeAccessor<>).MakeGenericType(p);

// Linker keeps the default constructor for both ValueTypeAccessor and
// ReferenceTypeAccessor.
Activator.CreateInstance(t);
```

ValueNode.cs and MethodBodyScanner.cs are Cecil rewrites of same-named
files in the closed-source .NET Native Dependency Reducer, with _a lot_
of stuff removed. I also got rid of most CCI-sms. I kept the structure the
same because we can raid the .NET Native Dependency Reducer source code
for more patterns if we need to later (e.g. the .NET Native Dependency
Reducer can handle dataflow across `async` state machines generated by
the C# compiler for async methods - but I kind of hope we won't need
those kinds of things, ever).

Unfortunately, diffing with the .NET Native codebase is impaired by
non-standard code formatting rules in the Linker repo (that the new files
adhere to for consistency reasons).

ValueNode.cs has barely any references to Cecil - the hope is that we would
be able to reuse this file for e.g. a Roslyn analyzer, but I haven't investigated
Roslyn's dataflow analysis facilities. I assume there are some for the nullable
analysis, but I didn't look whether they're public or we need to roll our own.
@vitek-karas vitek-karas changed the title Add dataflow analysis prototype [feature/dataflow] Add dataflow analysis prototype Mar 12, 2020
/// <summary>
/// Tracks information about the contents of a stack slot
/// </summary>
class StackSlot
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept this as class for now because that's what it is in .NET Native, but we should be able to make this a struct, I think.

if (HasManuallyTrackedDependency (body))
return;

var scanner = new ReflectionMethodBodyScanner (this);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, this should only run when we detect stores into unsafe locations (unsafe method callsite/unsafe method return value/unsafe field) as part of the method body marking.

return true;
}

void MarkMethodsFromReflectionCall (ref ReflectionPatternContext reflectionContext, TypeDefinition declaringType, string name, BindingFlags? bindingFlags, int? parametersCount = null)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is copypaste of the existing method in this file. I removed the generic arity parameter because generic methods don't have mangled names. I don't think that code path is ever reachable. I expect when we're done, the existing method will be deleted in favor of this scanner, so I didn't bother unifying.

Comment thread src/linker/Linker.Dataflow/ValueNode.cs Outdated
Comment thread src/linker/Linker.Dataflow/ValueNode.cs Outdated

// Go over all types we've seen
foreach (var value in methodParams[0].UniqueValues ()) {
if (value is SystemTypeValue systemTypeValue) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll probably have to change the reporting here
The context expects to get basically one call per analysis - either recognized or not. Since this is a foreach, it may get multiple. It probably just needs reviewing.

MarkMethodsFromReflectionCall (ref reflectionContext, systemTypeValue.TypeRepresented, ".ctor", bindingFlags, ctorParameterCount);
} else if (value == NullValue.Instance) {
// Nothing to report. This is likely just a value on some unreachable branch.
reflectionContext.RecordHandledPattern ();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a new method - why not use the existing RecordRecognizedPattern?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This handles:

Type t = null;
if (condition)
    t = typeof(Foo);
else
    t = typeof(Bar);
CreateInstance(t);

We're going to see null, Foo, Bar for this. We could also report nothing for null, but then this is going to assert/crash on CreateInstance(null) (I'm sure there will be user code with this, however meaningless that pattern is).

RecordRecognizedPattern expects that there's going to be some accessed item. In this case there's no accessed item.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that the correct reporting would be "match - Foo" and "match - Bar", ignoring the null completely. But if we see only null (alone) we should report it is "unrecognized". That's how the rest of the pattern matching behaves in cases where there are potentially multiple matches (like method overrides for GetMethod for example).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to update the if cascade to make it like you suggest (report CreateInstance(null) as unrecognized, even though it's recognized), but things got awkward quickly. I think we'll want to make make adjustments to the reflection context anyway to make it more convenient to use it at merge points (where we have multiple values, some of which might be known).

I added that to the running list of issues to look into.

@MichalStrehovsky
Copy link
Copy Markdown
Member Author

I'm trying to figure out what's the Linux failure. It's triggering the invalid IL detection that currently throws for diagnostic purposes (it will eventually be a nop):

https://github.com/mono/linker/pull/994/files#diff-83d9d9aa53da0b0737513a2561f82bc8R3373-R3375

The stack trace of the failure looks impossible (the source for the test has a call to a static method with no parameters, but at the time of dataflow analysis, Cecil thinks the called method takes some parameters that weren't supplied).

I'm setting up a Linux environment with mcs, Mono ILASM, etc. so that I can repro.

if (HasManuallyTrackedDependency (body))
return;

var scanner = new ReflectionMethodBodyScanner (this);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you mind giving us some way to turn this new code off without us having to customize our fork?

One way would be refactoring MarkReflectionLikeDependencies a bit so that we can override the new behavior while leaving the existing reflection detection running.

		protected virtual void MarkReflectionLikeDependencies (MethodBody body)
		{
			if (HasManuallyTrackedDependency (body))
				return;

                       MarkReflectionWithScanner(body);
                       MarkReflectionWithPatternDetector(body);
		}
        
        protected virtual void MarkReflectionWithScanner(MethodBody body)
        {
             var scanner = new ReflectionMethodBodyScanner (this);
             scanner.Scan (body);
        }
        
        protected virtual void MarkReflectionWithPatternDetector(MethodBody body)
        {
        	       var instructions = body.Instructions;
			ReflectionPatternDetector detector = new ReflectionPatternDetector (this, body.Method);

			//
			// Starting at 1 because all patterns require at least 1 instruction backward lookup
			//
			for (var i = 1; i < instructions.Count; i++) {
				var instruction = instructions [i];

				if (instruction.OpCode != OpCodes.Call && instruction.OpCode != OpCodes.Callvirt)
					continue;

				if (ProcessReflectionDependency (body, instruction))
					continue;

				if (!(instruction.Operand is MethodReference methodCalled))
					continue;

				var methodCalledDefinition = methodCalled.Resolve ();
				if (methodCalledDefinition == null)
					continue;

				ReflectionPatternContext reflectionContext = new ReflectionPatternContext (_context, body.Method, methodCalledDefinition, i);
				try {
					detector.Process (ref reflectionContext);
				}
				finally {
					reflectionContext.Dispose ();
				}
			}
        }

Alternatively, some sort of bool option on LinkContext to disable usage of ReflectionMethodBodyScanner would work as well.

There is a lot of new code here and no tests. I'm sure it was tested else where, but I don't want this new code to get in the way of us staying in sync with upstream monolinker

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should have made it more clear - this is going into a branch so that we can shape it into something acceptable for master. Being able to turn this off, making sure it doesn't run unless absolutely necessary, etc. is going to be addressed before this would make it into master.

We're going to add tests too - there are existing targeted tests we can port and we're going to write new tests too.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

O great that sounds good.

We have more complicated reflection detection tests in our UnityLinker test suite that are supported by stack analysis that we have. I can put these tests on branch for you to grab if you’d like.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We would be very grateful for any tests in this area. Thanks a lot!

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vitek-karas here are all of our reflection tests

Unity-Technologies@1c37396

You can cherry-pick this to a branch from master and everything should run. Of course many tests will fail.

@MichalStrehovsky
Copy link
Copy Markdown
Member Author

I'm trying to figure out what's the Linux failure

Okay, this is a pre-existing problem. The input to the linker is invalid (look for the <WTF> marker - this method is definitely not an instance method):

.method public hidebysig static void  Main() cil managed
{
  .entrypoint
  // Code size       7 (0x7)
  .maxstack  8
  IL_0000:  call       <WTF>instance</WTF> object ['library']Mono.Linker.Tests.Cases.UnreachableBlock.Dependencies.ClassA::Method_1()
  IL_0005:  pop
  IL_0006:  ret
} // end of method UninitializedLocals::Main

I'm going to assume this is a mcs bug because this test works fine on Windows. Would #996 help? I'll try to retrigger the CI.

@MichalStrehovsky
Copy link
Copy Markdown
Member Author

I'm going to assume this is a mcs bug because this test works fine on Windows.

Nope, this is a test bug. #997 is the fix.

@MichalStrehovsky
Copy link
Copy Markdown
Member Author

On a second though, it is also an mcs bug because it treats the method as static in the source file (doesn't complain about the signature), but then proceeds to generate the signature as instance. Do we care about mcs bugs, or not anymore?

case Code.Ldc_I4_6:
case Code.Ldc_I4_7:
case Code.Ldc_I4_8: {
int value = operation.OpCode.Code - Code.Ldc_I4_0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I like this. Cecil can change the values anytime

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's likely. The enum is directly mirroring the values in the CIL specification.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cecil can change the values anytime

Changing the value of an enum member is a breaking source and binary change.

I could see Cecil being forced into a breaking change if we ever add new unprefixed IL instructions (Cecil currently assumes all IL instructions fit in a single byte, and folds the ones that have a twobyte encoding into a single byte crossing fingers there's enough space left). But I wouldn't expect these to be ever reordered - complex lookup tables from physical opcode encoding to Code would introduce unnecessary inefficiencies when parsing.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cecil is not API stable so even source breaking changes can happen (so just be careful)

Comment thread src/linker/Linker.Dataflow/Helpers.cs Outdated
Comment thread src/linker/Linker.Dataflow/Helpers.cs Outdated
Comment thread src/linker/Linker.Dataflow/MethodBodyScanner.cs Outdated
Comment thread src/linker/Linker.Dataflow/MethodBodyScanner.cs Outdated
Comment thread src/linker/Linker.Dataflow/MethodBodyScanner.cs Outdated
Comment thread src/linker/Linker.Dataflow/MethodBodyScanner.cs Outdated
Comment thread src/linker/Linker.Dataflow/ValueNode.cs
Comment thread src/linker/Linker.Dataflow/ValueNode.cs Outdated
Comment thread src/linker/Linker.Steps/MarkStep.cs
@MichalStrehovsky
Copy link
Copy Markdown
Member Author

CI is not trigger so I'm going to try closing a reopening...

@MichalStrehovsky
Copy link
Copy Markdown
Member Author

I had to get a local repro to get to the failure:

     1>ILLINK : error :  [C:\git\linker\artifacts\bin\ILLink.Tasks.IntegrationTests\Release\netcoreapp3.0\tests-temp\helloworld\helloworld\helloworld.csproj]
           An assembly specified in the application dependencies manifest (illink.deps.json) was not found:
             package: 'Microsoft.Bcl.HashCode', version: '1.1.0'
             path: 'lib/netcoreapp2.1/Microsoft.Bcl.HashCode.dll'

I'm tempted to just revert the Microsoft.Bcl.HashCode (commit 88b158e) thing to unblock work in the branch.

@vitek-karas Thoughts?

@vitek-karas
Copy link
Copy Markdown
Member

Agreed - revert that and add it to the list (with the commit hash of the reverted change). This should relatively simple fix - the test is probably not deploying all the necessary dependencies correctly.

This reverts commit 88b158e.
tkapin pushed a commit to tkapin/runtime that referenced this pull request Jan 31, 2023
This can do enough dataflow analysis so that this code "just works"
without any annotations (real-world pattern in e.g. System.Text.Json):

```csharp
Type t;
if (isValueType)
    t = typeof(ValueTypeAccessor<>).MakeGenericType(p);
else
    t = typeof(ReferenceTypeAccessor<>).MakeGenericType(p);

// Linker keeps the default constructor for both ValueTypeAccessor and
// ReferenceTypeAccessor.
Activator.CreateInstance(t);
```

ValueNode.cs and MethodBodyScanner.cs are Cecil rewrites of same-named
files in the closed-source .NET Native Dependency Reducer, with _a lot_
of stuff removed. I also got rid of most CCI-sms. I kept the structure the
same because we can raid the .NET Native Dependency Reducer source code
for more patterns if we need to later (e.g. the .NET Native Dependency
Reducer can handle dataflow across `async` state machines generated by
the C# compiler for async methods - but I kind of hope we won't need
those kinds of things, ever).

Unfortunately, diffing with the .NET Native codebase is impaired by
non-standard code formatting rules in the Linker repo (that the new files
adhere to for consistency reasons).

Commit migrated from dotnet/linker@7b608ba
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants