[feature/dataflow] Add dataflow analysis prototype#994
Conversation
This can do enough dataflow analysis so that this code "just works"
without any annotations (real-world pattern in e.g. System.Text.Json):
```csharp
Type t;
if (isValueType)
t = typeof(ValueTypeAccessor<>).MakeGenericType(p);
else
t = typeof(ReferenceTypeAccessor<>).MakeGenericType(p);
// Linker keeps the default constructor for both ValueTypeAccessor and
// ReferenceTypeAccessor.
Activator.CreateInstance(t);
```
ValueNode.cs and MethodBodyScanner.cs are Cecil rewrites of same-named
files in the closed-source .NET Native Dependency Reducer, with _a lot_
of stuff removed. I also got rid of most CCI-sms. I kept the structure the
same because we can raid the .NET Native Dependency Reducer source code
for more patterns if we need to later (e.g. the .NET Native Dependency
Reducer can handle dataflow across `async` state machines generated by
the C# compiler for async methods - but I kind of hope we won't need
those kinds of things, ever).
Unfortunately, diffing with the .NET Native codebase is impaired by
non-standard code formatting rules in the Linker repo (that the new files
adhere to for consistency reasons).
ValueNode.cs has barely any references to Cecil - the hope is that we would
be able to reuse this file for e.g. a Roslyn analyzer, but I haven't investigated
Roslyn's dataflow analysis facilities. I assume there are some for the nullable
analysis, but I didn't look whether they're public or we need to roll our own.
| /// <summary> | ||
| /// Tracks information about the contents of a stack slot | ||
| /// </summary> | ||
| class StackSlot |
There was a problem hiding this comment.
Kept this as class for now because that's what it is in .NET Native, but we should be able to make this a struct, I think.
| if (HasManuallyTrackedDependency (body)) | ||
| return; | ||
|
|
||
| var scanner = new ReflectionMethodBodyScanner (this); |
There was a problem hiding this comment.
Eventually, this should only run when we detect stores into unsafe locations (unsafe method callsite/unsafe method return value/unsafe field) as part of the method body marking.
| return true; | ||
| } | ||
|
|
||
| void MarkMethodsFromReflectionCall (ref ReflectionPatternContext reflectionContext, TypeDefinition declaringType, string name, BindingFlags? bindingFlags, int? parametersCount = null) |
There was a problem hiding this comment.
This is copypaste of the existing method in this file. I removed the generic arity parameter because generic methods don't have mangled names. I don't think that code path is ever reachable. I expect when we're done, the existing method will be deleted in favor of this scanner, so I didn't bother unifying.
|
|
||
| // Go over all types we've seen | ||
| foreach (var value in methodParams[0].UniqueValues ()) { | ||
| if (value is SystemTypeValue systemTypeValue) { |
There was a problem hiding this comment.
We'll probably have to change the reporting here
The context expects to get basically one call per analysis - either recognized or not. Since this is a foreach, it may get multiple. It probably just needs reviewing.
| MarkMethodsFromReflectionCall (ref reflectionContext, systemTypeValue.TypeRepresented, ".ctor", bindingFlags, ctorParameterCount); | ||
| } else if (value == NullValue.Instance) { | ||
| // Nothing to report. This is likely just a value on some unreachable branch. | ||
| reflectionContext.RecordHandledPattern (); |
There was a problem hiding this comment.
Why do we need a new method - why not use the existing RecordRecognizedPattern?
There was a problem hiding this comment.
This handles:
Type t = null;
if (condition)
t = typeof(Foo);
else
t = typeof(Bar);
CreateInstance(t);We're going to see null, Foo, Bar for this. We could also report nothing for null, but then this is going to assert/crash on CreateInstance(null) (I'm sure there will be user code with this, however meaningless that pattern is).
RecordRecognizedPattern expects that there's going to be some accessed item. In this case there's no accessed item.
There was a problem hiding this comment.
In that the correct reporting would be "match - Foo" and "match - Bar", ignoring the null completely. But if we see only null (alone) we should report it is "unrecognized". That's how the rest of the pattern matching behaves in cases where there are potentially multiple matches (like method overrides for GetMethod for example).
There was a problem hiding this comment.
I tried to update the if cascade to make it like you suggest (report CreateInstance(null) as unrecognized, even though it's recognized), but things got awkward quickly. I think we'll want to make make adjustments to the reflection context anyway to make it more convenient to use it at merge points (where we have multiple values, some of which might be known).
I added that to the running list of issues to look into.
|
I'm trying to figure out what's the Linux failure. It's triggering the invalid IL detection that currently throws for diagnostic purposes (it will eventually be a nop): The stack trace of the failure looks impossible (the source for the test has a call to a static method with no parameters, but at the time of dataflow analysis, Cecil thinks the called method takes some parameters that weren't supplied). I'm setting up a Linux environment with mcs, Mono ILASM, etc. so that I can repro. |
| if (HasManuallyTrackedDependency (body)) | ||
| return; | ||
|
|
||
| var scanner = new ReflectionMethodBodyScanner (this); |
There was a problem hiding this comment.
Would you mind giving us some way to turn this new code off without us having to customize our fork?
One way would be refactoring MarkReflectionLikeDependencies a bit so that we can override the new behavior while leaving the existing reflection detection running.
protected virtual void MarkReflectionLikeDependencies (MethodBody body)
{
if (HasManuallyTrackedDependency (body))
return;
MarkReflectionWithScanner(body);
MarkReflectionWithPatternDetector(body);
}
protected virtual void MarkReflectionWithScanner(MethodBody body)
{
var scanner = new ReflectionMethodBodyScanner (this);
scanner.Scan (body);
}
protected virtual void MarkReflectionWithPatternDetector(MethodBody body)
{
var instructions = body.Instructions;
ReflectionPatternDetector detector = new ReflectionPatternDetector (this, body.Method);
//
// Starting at 1 because all patterns require at least 1 instruction backward lookup
//
for (var i = 1; i < instructions.Count; i++) {
var instruction = instructions [i];
if (instruction.OpCode != OpCodes.Call && instruction.OpCode != OpCodes.Callvirt)
continue;
if (ProcessReflectionDependency (body, instruction))
continue;
if (!(instruction.Operand is MethodReference methodCalled))
continue;
var methodCalledDefinition = methodCalled.Resolve ();
if (methodCalledDefinition == null)
continue;
ReflectionPatternContext reflectionContext = new ReflectionPatternContext (_context, body.Method, methodCalledDefinition, i);
try {
detector.Process (ref reflectionContext);
}
finally {
reflectionContext.Dispose ();
}
}
}
Alternatively, some sort of bool option on LinkContext to disable usage of ReflectionMethodBodyScanner would work as well.
There is a lot of new code here and no tests. I'm sure it was tested else where, but I don't want this new code to get in the way of us staying in sync with upstream monolinker
There was a problem hiding this comment.
I should have made it more clear - this is going into a branch so that we can shape it into something acceptable for master. Being able to turn this off, making sure it doesn't run unless absolutely necessary, etc. is going to be addressed before this would make it into master.
We're going to add tests too - there are existing targeted tests we can port and we're going to write new tests too.
There was a problem hiding this comment.
O great that sounds good.
We have more complicated reflection detection tests in our UnityLinker test suite that are supported by stack analysis that we have. I can put these tests on branch for you to grab if you’d like.
There was a problem hiding this comment.
We would be very grateful for any tests in this area. Thanks a lot!
There was a problem hiding this comment.
@vitek-karas here are all of our reflection tests
You can cherry-pick this to a branch from master and everything should run. Of course many tests will fail.
Okay, this is a pre-existing problem. The input to the linker is invalid (look for the I'm going to assume this is a mcs bug because this test works fine on Windows. Would #996 help? I'll try to retrigger the CI. |
Nope, this is a test bug. #997 is the fix. |
|
On a second though, it is also an mcs bug because it treats the method as static in the source file (doesn't complain about the signature), but then proceeds to generate the signature as instance. Do we care about mcs bugs, or not anymore? |
| case Code.Ldc_I4_6: | ||
| case Code.Ldc_I4_7: | ||
| case Code.Ldc_I4_8: { | ||
| int value = operation.OpCode.Code - Code.Ldc_I4_0; |
There was a problem hiding this comment.
I am not sure I like this. Cecil can change the values anytime
There was a problem hiding this comment.
I don't think that's likely. The enum is directly mirroring the values in the CIL specification.
There was a problem hiding this comment.
Cecil can change the values anytime
Changing the value of an enum member is a breaking source and binary change.
I could see Cecil being forced into a breaking change if we ever add new unprefixed IL instructions (Cecil currently assumes all IL instructions fit in a single byte, and folds the ones that have a twobyte encoding into a single byte crossing fingers there's enough space left). But I wouldn't expect these to be ever reordered - complex lookup tables from physical opcode encoding to Code would introduce unnecessary inefficiencies when parsing.
There was a problem hiding this comment.
Cecil is not API stable so even source breaking changes can happen (so just be careful)
|
CI is not trigger so I'm going to try closing a reopening... |
|
I had to get a local repro to get to the failure: I'm tempted to just revert the Microsoft.Bcl.HashCode (commit 88b158e) thing to unblock work in the branch. @vitek-karas Thoughts? |
|
Agreed - revert that and add it to the list (with the commit hash of the reverted change). This should relatively simple fix - the test is probably not deploying all the necessary dependencies correctly. |
This reverts commit 88b158e.
This can do enough dataflow analysis so that this code "just works"
without any annotations (real-world pattern in e.g. System.Text.Json):
```csharp
Type t;
if (isValueType)
t = typeof(ValueTypeAccessor<>).MakeGenericType(p);
else
t = typeof(ReferenceTypeAccessor<>).MakeGenericType(p);
// Linker keeps the default constructor for both ValueTypeAccessor and
// ReferenceTypeAccessor.
Activator.CreateInstance(t);
```
ValueNode.cs and MethodBodyScanner.cs are Cecil rewrites of same-named
files in the closed-source .NET Native Dependency Reducer, with _a lot_
of stuff removed. I also got rid of most CCI-sms. I kept the structure the
same because we can raid the .NET Native Dependency Reducer source code
for more patterns if we need to later (e.g. the .NET Native Dependency
Reducer can handle dataflow across `async` state machines generated by
the C# compiler for async methods - but I kind of hope we won't need
those kinds of things, ever).
Unfortunately, diffing with the .NET Native codebase is impaired by
non-standard code formatting rules in the Linker repo (that the new files
adhere to for consistency reasons).
Commit migrated from dotnet/linker@7b608ba
This can do enough dataflow analysis so that this code "just works" without any annotations (real-world pattern in e.g. System.Text.Json):
On a very high level:
elsebranch, at which point the value stored intcan either be ValueTypeAccessor or ReferenceTypeAccessor).UniqueValuesproperty of theValueNode.It's easiest to start reviewing this by looking at the diff in MarkStep.cs, which really shows the consumption part of the analysis.
ValueNode.cs and MethodBodyScanner.cs are Cecil rewrites of same-named files in the closed-source .NET Native Dependency Reducer, with a lot of stuff removed. I also got rid of most CCI-sms. I kept the structure the same because we can raid the .NET Native Dependency Reducer source code for more patterns if we need to later (e.g. the .NET Native Dependency Reducer can handle dataflow across
asyncstate machines generated by the C# compiler for async methods - but I kind of hope we won't need those kinds of things, ever).Unfortunately, diffing with the .NET Native codebase is impaired by non-standard code formatting rules in the Linker repo (that the new files adhere to for consistency reasons).
ValueNode.cs has barely any references to Cecil - the hope is that we would be able to reuse this file for e.g. a Roslyn analyzer, but I haven't investigated Roslyn's dataflow analysis facilities. I assume there are some for the nullable analysis, but I didn't look whether they're public or we need to roll our own.