Skip to content

Add serializer for basic types#9409

Open
jacob-carlborg wants to merge 1 commit intodlang:masterfrom
jacob-carlborg:serializer-basic-types
Open

Add serializer for basic types#9409
jacob-carlborg wants to merge 1 commit intodlang:masterfrom
jacob-carlborg:serializer-basic-types

Conversation

@jacob-carlborg
Copy link
Copy Markdown
Contributor

@jacob-carlborg jacob-carlborg commented Mar 3, 2019

This is the firs step in adding a serializer to DMD which can output a textual representation of any type, manly the AST. This is useful for debugging and learning the AST.

When everything is implemented, it's going to look something like: https://github.com/jacob-carlborg/dmd/tree/serializer.

Depends on #9408.

@dlang-bot
Copy link
Copy Markdown
Contributor

Thanks for your pull request and interest in making D better, @jacob-carlborg! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + dmd#9409"

@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

Any suggestion for the failing AppVeyor? The result is 2.36655e-312 instead of the expected 3.1.

@ghost
Copy link
Copy Markdown

ghost commented Mar 3, 2019

Internally it looks like OutBuffer uses _vsnprintf to do the printing. Windows doesn't support the 80-bit floating point number so it won't print the value correctly as it is expecting a double. C++'s long double is equivalent to double on Windows.

@WalterBright
Copy link
Copy Markdown
Member

Isn't that what the .json output is supposed to do? There's also printast.d

@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

jacob-carlborg commented Mar 4, 2019

Isn't that what the .json output is supposed to do? There's also printast.d

They're not doing what I want. When the serializer is complete it will print all fields of all classes and structs. Here's an example of the output of passing in an instance of Module:

--- &1 !d/class:Module
ident: &2 !d/class:Identifier
  value: 120
  name: main
parent: null
csym: null
isym: null
comment: null
loc: !d/struct:Loc
  filename: null
  linnum: 0
  charnum: 0
_scope: !d/struct:Scope
  enclosing: !d/struct:Scope
    enclosing: null
    _module: *1
    scopesym: &3 !d/class:ScopeDsymbol
      ident: null
      parent: null
      csym: null
      isym: null
      comment: null
      loc: !d/struct:Loc
        filename: null
        linnum: 0
        charnum: 0

Here's an example of some struggled I had where this would have helped. UDAs are represented with the UserAttributeDeclaration AST node [1]. This class contains an array of expression. I would have thought that if a declaration has two UDAs attached, the first UDA would be the first element of the array and the second UDA would be the second element. But that's not the case. Instead the array only contains one element, which is a TupleExp [2]. TupleExp contains an array of expressions as well (exps), and a separate field (e0) containing a single expression. Turns out, in the code I looked at e0 was unused and exps contained two elements.

This would have been trivial to figure out if I could dump the whole AST, instead there where lots of trail and error.

I can also add that both json.d and astprinter.d selectively include what they want. The serializer will dump everything, always. Since it uses .tupleof to inspected the classes and structs it won't miss anything if a new field is added.

[1]

extern (C++) final class UserAttributeDeclaration : AttribDeclaration

[2]

extern (C++) final class TupleExp : Expression

@jacob-carlborg jacob-carlborg force-pushed the serializer-basic-types branch from a3b76ef to 1bac0d8 Compare March 4, 2019 19:51
@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

Internally it looks like OutBuffer uses _vsnprintf to do the printing. Windows doesn't support the 80-bit floating point number so it won't print the value correctly as it is expecting a double.

I found CTFloat.sprint that handles 80-bit floating point on Windows.

@jacob-carlborg jacob-carlborg force-pushed the serializer-basic-types branch from 1bac0d8 to ff41310 Compare March 4, 2019 20:33
{
const result = serialize(true);
assert(result == "--- true", result);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these unittest test just one function which doesn't need any state. What's the motivation of not using embedded unittests within DMD as they would automatically document the function whereas these tests might be a lot harder to find to a reader.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I'm pretty sure you said somewhere that those tests are not being run/are not passing on Windows
  • I think this unit test runner is better, it allows to run only a single unit test block
  • I think there will be too many unit tests in one place

@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

What's up with AppVeyor, some kind of double quoting?

@rainers
Copy link
Copy Markdown
Member

rainers commented Mar 5, 2019

What's up with AppVeyor, some kind of double quoting?

Yes, that's caused by a cleanup in druntime: #9421

@WalterBright
Copy link
Copy Markdown
Member

Here's an example of the output of passing in an instance of Module:

That looks an awful lot like the json output.

@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

That looks an awful lot like the json output.

@WalterBright it's not anything like the JSON output at all. There are 25 fields in the snippet above, three of those exists in the JSON output, name, linnum (which is called line) and charnum (which is called char). The JSON output has a different structure than the AST which makes it not particular useful for learning the AST structure.

@WalterBright
Copy link
Copy Markdown
Member

What I'm suggesting is improve the json output to add missing fields rather than invent a new format.

@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

What I'm suggesting is improve the json output to add missing fields rather than invent a new format.

@WalterBright so you want me to remove json.d and rename serializer.d to json.d?

I don't know if the JSON format is supposed to be stable and is specified somewhere. But if I change the field names and the structure of the JSON format I'm sure that tools using that output will break. Like Ddox, for example.

@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

Ping.

@thewilsonator
Copy link
Copy Markdown
Contributor

@WalterBright ping

@rainers
Copy link
Copy Markdown
Member

rainers commented Mar 27, 2019

I'm also not a huge fan of adding another output format. Does the serializer need to be part of the compiler? Isn't it better suited for the examples folder?

@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

I'm also not a huge fan of adding another output format

The existing formats are not suitable for what this is doing or are intended for other things. The JSON format is explicitly indented to not expose the compiler internals. This format is intended to do exactly that. Changing the JSON format will break tools like Ddox. The JSON format doesn't handle arbitrary types.

Does the serializer need to be part of the compiler?

Yes. I need to be able to call it from anywhere within the compiler.

Isn't it better suited for the examples folder?

Absolutely not. I need to be able to call it from anywhere within the compiler.

Is it only me that see this as being useable? Does everyone else already know exactly all the internals of the compiler?

@rainers
Copy link
Copy Markdown
Member

rainers commented Mar 27, 2019

Sounds like you are trying to use this instead of a debugger?

@wilzbach
Copy link
Copy Markdown
Contributor

Is it only me that see this as being useable? Does everyone else already know exactly all the internals of the compiler?

I like it and am in favor. As it's used for debugging, there won't be any overhead for end-users. Though I think we should document all in-compiler debugging facilities for new users somewhere easy to find.

@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

Sounds like you are trying to use this instead of a debugger?

Yes, but I haven't found anyone that is useable. I've tried now for half an hour to print the dynamic type of a C++ class. And as far as I know the D compiler doesn't output typeid/typeinfo (or whatever it's called) for C++ classes. If that's not available I don't know how it would be possible to inspect the dynamic type.

This is the firs step in adding a serializer to DMD which can output a
textual representation of any type, primarily the AST. This is useful
for debugging and learning the AST.
@jacob-carlborg jacob-carlborg force-pushed the serializer-basic-types branch from 800de64 to 067e956 Compare March 27, 2019 10:43
@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

Though I think we should document all in-compiler debugging facilities for new users somewhere easy to find.

I agree. Although this is not useful as is. I'll do that after the other PRs.

@WalterBright
Copy link
Copy Markdown
Member

The JSON format is explicitly indented to not expose the compiler internals.

It is? It's intent is to be useful. Just add new fields for what's missing.

@wilzbach
Copy link
Copy Markdown
Contributor

We're talking about two different things here. The json output is intended to be read by machines and tools after all semantics has been done.
This serializer (probably a bad name for it), is just intended to be used during debugging DMD and an additional DMD developer tool - a bit like the ASTPrinter.

@WalterBright
Copy link
Copy Markdown
Member

This serializer (probably a bad name for it), is just intended to be used during debugging DMD and an additional DMD developer tool - a bit like the ASTPrinter.

If it's like printast.d, why not extend that? We also already have textual representations for expressions and types (as toChars()). What's wrong with that?

private:

/// Evaluates to `true` if `T` is a basic type, otherwise `false`.
template isBasicType(T)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates isTypeBasic()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isTypeBasic is a member of the class Type. This is intended to pass in native D types, like int and char[]. It's not what the compiler views as a basic type. It's used by the serializer to determine if the value needs to be further decomposed or if it can be transformed into a textual representation as is.

* value = the value to serialize
* isTopLevel = indicates if this is the first value to be serialized
*/
void serializeBasicType(T)(T value, bool isTopLevel)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dmd already has TypeBasic.toChars()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TypeBasic is not used here. Native D types are used.

@andralex
Copy link
Copy Markdown
Member

I'm at a loss why we are to spend so much code on this in the compiler. This is the job of a separate tool using the compiler as a library, which is available after everybody asked for it for years. What, do we now want to build every possible tool into the core compiler?

@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

If it's like printast.d, why not extend that?

I can replace printast.d if you prefer that.

We also already have textual representations for expressions and types (as toChars()). What's wrong with that?

It returns the source code representation not the AST representation. For example, toChars of IntegerExp will print the integer value, i.e. 1. I want it to print:

!d/class:IntegerExp
op: int32Literal
size: 4
parens: 0
loc: !d/struct:Loc
  filename: test.d
  linnum: 1
  charnum: 1
value: 1

@jacob-carlborg
Copy link
Copy Markdown
Contributor Author

I'm at a loss why we are to spend so much code on this in the compiler. This is the job of a separate tool using the compiler as a library, which is available after everybody asked for it for years. What, do we now want to build every possible tool into the core compiler?

It's a debugging tool to be used inside the compiler, just as the hundreds of printf scattered all of the compiler. This is intended to be called from basically anywhere in the compiler source code. A separate tool would only be able to call this from very specific hard coded locations. A separate tool would be outdated as soon as the AST is changed. Even if the compiler is used as a library the tool would need to be recompiled every time the AST is changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants