Skip to content

Adding support for __traits(documentation, ...)#3531

Closed
MasonMcGill wants to merge 1 commit intodlang:masterfrom
MasonMcGill:master
Closed

Adding support for __traits(documentation, ...)#3531
MasonMcGill wants to merge 1 commit intodlang:masterfrom
MasonMcGill:master

Conversation

@MasonMcGill
Copy link

__traits(comment, expr) now evaluates to the documentation comment on expr, if it exists and is available. Comments will only be available if DMD is invoked with the "-D" flag. If no comment is available for expr, __traits(comment, expr) evaluates to the empty string.

Unlike __traits(identifier, expr), __traits(comment, expr) will not report an error if expr is not a symbol or anonymous function. It will instead evaluate to the empty string. This is in the interest of ease of use.

This feature is intended to aide automatic wrapper generation, metaprogramming (e.g. forwarding documentation from a template argument), and simplifying documentation generators. It is not intended to be involved in providing application functionality.

@Hackerpilot
Copy link
Contributor

Looks useful.

@TurkeyMan
Copy link
Contributor

Should it be called 'comment'?
It looks like it's only intended to work with ddoc comment blocks (/** */ or ///), so possibly misleading to call the trait 'comment', since it's only a specific form of comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it curious that these strings have a '\n' at the end... why would that be desirable or expected?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newline is part of the comment as far as lexical analysis is concerned. If you need the stripped comment, just run it through std.string.strip.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newbie question: It looks like StringExp works with C-style strings and doesn't take ownership of its data. Stripping a string may require making a copy of it. How can I do this without leaking memory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DMD leaks memory all over. The idea is that a GC should be used in DMD, it just isn't done that way today. So in general, leaking memory is 'fine'...

Although I question the need to do any string manipulation in the compiler.

@jacob-carlborg
Copy link
Contributor

Please squash your commits when done.

@MasonMcGill
Copy link
Author

In DMD, internally, the "comment" field of a symbol ends in a newline. I suppose this has to do with how they are initially parsed. Generally, spaces newlines will be stripped before HTML or PDF documentation generation, so it doesn't seem like a major issue.

With respect to the term "comment", this is the field name in the DMD JSON output.

I think "notes" or "documentation" might also be good trait names, but I don't know if either is good enough to warrant breaking consistency with DMD JSON output and DMD internals. Thoughts?

@jacob-carlborg
Copy link
Contributor

Could you please make a pull request for updating the documentation as well.

@jacob-carlborg
Copy link
Contributor

I agree with @TurkeyMan. It might be confusing if it only works for ddoc comments. "documentation", "ddoc" or similar would be better.

@TurkeyMan
Copy link
Contributor

I think the newline should be stripped before returning it as a __traits(), otherwise user code will constantly have to strip it manually.

"documentation" or "ddoc" is probably what I'd choose.
The way I see it, most end-users aren't DMD contributors, so I don't think that consistency with the compiler source has any significant weight if the trade-off is language clarity.
I'd also imagine very few users would be particularly upset that a different term was used in the json output. Most users never use/look at that either, it's something that's more likely to be digested by tooling.

For my money, code clarity trumps.

@MasonMcGill
Copy link
Author

My impression (mostly based on experience with Python documentation) is that "ddoc" refers to the standard way to associate comments with symbols, and "documentation" may refer to any kind of documentation (web-based tutorials, examples, etc.)

I read __traits(comment, expr) as "get me the comment associated with expr", and since ddoc is the way to associate comments with symbols, it's unambiguous.

However, if more people prefer "documentation", I'm happy to change it. I'm honestly more interested in using this feature than designing it : )

@jacob-carlborg
Copy link
Contributor

I read __traits(comment, expr) as "get me the comment associated with expr", and since ddoc is the way to associate comments with symbols, it's unambiguous.

Yes, but does it work both for ddoc comments and regular comments?

@TurkeyMan
Copy link
Contributor

I guess the argument is that there's no rule to associate a regular comment with a particular symbol, so the concept is kinda meaningless. Doesn't mean it will always be that way in the future though...
I think I prefer 'ddoc', it's quite specific that it intends ddoc style associated comments.

@MasonMcGill
Copy link
Author

With regards to the newline and extra spaces, should only the first/last be stripped, or all of them? Python leaves them in, so formatting applications can choose to deal with them as they wish.

To me, formatting docstrings seems like a separate issue. It's handled by different code, and used for more than just __traits (like JSON and HTML output). I'm sure anyone interested in working on it could do so before or after the new trait is added.

@MasonMcGill
Copy link
Author

I don't know how many D users know that D's documentation comments are called "ddoc". Like Python's "docstrings", I wouldn't be surprised if many people have seen them but don't know they have a name.

"documentation" and "comment" seem more discoverable.

@MasonMcGill
Copy link
Author

It looks like documentation comment retrieval is performed with __doc__ in Python and documentation in Common Lisp:
http://en.wikipedia.org/wiki/Docstring

@yebblies
Copy link
Contributor

yebblies commented May 7, 2014

With regards to the newline and extra spaces, should only the first/last be stripped, or all of them? Python leaves them in, so formatting applications can choose to deal with them as they wish.

Leave internal, strip at beginning and end. Leading/trailing whitespace does not add information.

"documentation" and "comment" seem more discoverable.

"documentation" is fine but "comment" is imprecise. There is no need to match the internal compiler name unless the alternative is inventing new terminology.

@ghost
Copy link

ghost commented May 7, 2014

-D is not required for this feature to work.

Edit: Wait, the tests test for empty strings. I'll investigate.

Edit2: Well, they should not require -D, or the trait is useless for metaprogramming. I'll see if this can be worked around.

@jacob-carlborg
Copy link
Contributor

The documentation uses "Ddoc".

@TurkeyMan
Copy link
Contributor

WRT worrying about 'ddoc' being a word that's known by many users; I expect that this __traits will likely be wrapped up in a suite of phobos helper's, which may parse and return details like args, return values, etc... Users probably won't come in contact with the __traits directly once it's fleshed out.

@ghost
Copy link

ghost commented May 7, 2014

Apparently the lexer is set up so it completely discards comments when -D is not enabled. This can be worked around by changing this line in module.c:

Parser p(this, buf, buflen, docfile != NULL);

To:

// always lex comments, __traits(comment) might require it
bool doDocComment = true;
Parser p(this, buf, buflen, doDocComment);

The above just forwards this option to the Lexer.

The question is will this slow down compilation? I haven't tested the performance. Either way it's really awkward that you would need to enable -D in order to use a trait, you're essentially writing documentation files to disk even if all you're doing is metaprogramming with a trait.

CC'ing @WalterBright and @9rnsr: Would always lexing ddoc comments introduce a major slowdown in compilation speed? If not then I think we should always lex them, otherwise this trait becomes awkward to use.

@MasonMcGill
Copy link
Author

True "Ddoc" is in the documentation, but I still suspect more people have seen Ddoc comments than know what they are.

I'm in favor of "documentation", for the reason above, for its use in other languages, and because I'm not sure what the proper CamelCase capitalization of "Ddoc" is. Also, new users looking for the feature might be more likely to search dlang.org for "documentation" than "ddoc".

WRT worrying about 'ddoc' being a word that's known by many users; I expect that this __traits will likely be wrapped up in a suite of phobos helper's, which may parse and return details like args, return values, etc... ie, users probably won't come in contact with the __traits directly once it's fleshed out.

This is true. But the __traits name will probably inform the std.trait name, so we might as well sort it out now.

@MasonMcGill
Copy link
Author

Apparently the lexer is set up so it completely discards comments when -D is not enabled. This can be worked around by changing this line in module.c.

This is great! Hopefully extracting ddoc comments isn't too much overhead.

@alexrp
Copy link
Contributor

alexrp commented May 7, 2014

I disagree with making this trait silently fail for symbols that don't exist. If that's the behavior you want (and I sincerely believe this is the exception, not the rule) you can get it by checking with __traits(compiles, ...) first.

Failing loudly by default in that case also makes sense if we're going to make it so this feature doesn't depend on -D.

@MasonMcGill
Copy link
Author

I disagree with making this trait silently fail for symbols that don't exist. If that's the behavior you want (and I sincerely believe this is the exception, not the rule) you can get it by checking with __traits(compiles, ...) first.

This is the behavior I'm most unsure about, as well. Does anyone have a conflicting opinion?

@bearophile
Copy link

It is not intended to be involved in providing application functionality.<

People will use it in all ways possible.

__traits(documentation, expr) or __traits(ddoc, expr) are fine for me. The second is a little shorter, and a little more precise. I'd like this feature to work even when you don't use -D. I suggest to add to the unittest the retrieval of the module ddoc (the one at the top). Automatically stripping the leading newline seems good. I think it's acceptable to return an empty string when there is no ddoc available.

@TurkeyMan
Copy link
Contributor

I agree that it shouldn't silently succeed if the symbol doesn't exist.
If the symbol doesn't exist; error. If the symbol exists but isn't documented; empty string.

Also, strip whitespace from both ends?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is the expected output for a JSON emission test. It was changed because DMD no longer ignores documentation comments. Previously, DMD would ignore Ddoc comments unless it was run with "-D".

@TurkeyMan
Copy link
Contributor

I often work with systems where you interact with editors and various other tooling, and that tooling benefits significantly from being able to present API information in the editor's GUI.
Compile time reflection is used to provide data via an editor API which the tooling can make use of. Code being able to tell you about itself is the definition of reflection, and I don't see why a line should be drawn at the human readable information when everything else is thoroughly reflected.

To separate this data from the binary is error-prone in my experience, especially when remote debugging (which is standard practise in gamedev) and the filesystem is not present. Without embedding such data in the binary, an exe is represented by a bundle of files including the associated metadata (docs in this case), and losing sync becomes very likely.

I definitely want the reflected documentation embedded in the exe. It simplifies everything.
Currently, I use UDA's to manually tag duplicates of the documentation which is also present in the ddoc header, and I find these duplicate strings often fall out of sync. DRY man ;)
/** blah blah */
@editor(x, y, "blah blah")
void func();

@MasonMcGill
Copy link
Author

I cannot see benefit for the processing of ddoc comment during compilation. I think that ddoc comment is meta-information which is higher than the code. I think processing high-level layer information (ddoc comment) in low level layer (CTFE interpretation) is bad information model.

I think you're right when it comes to application or systems programming, but when it comes to library programming, Ddoc notes have become more than comments on the source: they're essentially a user interface. D has already bent over backwards to support interfacing with lower level languages (extern (C), extern (C++), etc.), even though one could argue that ABI directives are "lower than the source". Consider this trait a small feature to improve interfacing with scripting languages, development tools, and human readers.

D currently has every facility necessary for generating foreign language bindings at compile-time, except this. In my domain (academic science), providing a library-based replacement for C++ and SWIG--with no need for Makefiles, configuration files, or manual memory management--could be huge.

I'm developing a system to that end, but currently I'm stuck with a workaround similar to Manu's: using

@("
  This function bakes beans.
")
void bake(Bean b) { /* ... */ }

instead of

/**
 * This function bakes beans.
 */
void bake(Bean b) { /* ... */ }

I opened this pull request because I didn't want to ask my users (who may want to wrap existing code) to do the same.

@MasonMcGill MasonMcGill changed the title Adding support for __traits(comment, ...) Adding support for __traits(documentation, ...) May 7, 2014
@MartinNowak
Copy link
Member

I'm kind of worried about documentation affecting the semantic meaning of a program.

@TurkeyMan
Copy link
Contributor

Don't be worried, it's amazing! :)

@SFrijters
Copy link

This functionality would be quite useful to me; is there any chance this will be ready for the 2.066 release?

@MasonMcGill
Copy link
Author

Should I add the compiler flag discussed with @WalterBright and @jacob-carlborg (see file comments)?

@ghost ghost added Enhancement labels Aug 4, 2014
@CyberShadow
Copy link
Member

This feature would be nice to have. The alternative would be that whenever documentation reflection is required, you are forced to use UDAs for documentation, which is rather ugly, unusable with -D, and doesn't work everywhere (e.g. enum members).

I don't see a problem with documentation comments affecting program behavior. I think this only emphasizes the reasons for which we have regular comments and documentation comments.

Also, I'm very skeptical about the memory usage argument. The ASCII text for all documentation is a drop in the bucket of the current memory consumption needed to compile a program. The entire Phobos source code is 8 MB in total, which is practically nothing compared to how much memory is needed to actually build it.

[Shameless plug] Here is my use case: http://blog.thecybershadow.net/2014/08/05/ae-utils-funopt/

@MartinNowak
Copy link
Member

I would certainly agree to this if __traits(documentation, symbol) were only available at runtime (as a string constant). Having it available during compilation means it can affect program execution which goes against normal intuition and complicates any refactoring/language tools.

/// a + b
int foo(int a, int b) { return mixin(__traits(documentation, __FUNCTION__)); }

Unless there are good reasons to allow this, we shouldn't accidentally introduce it.

@CyberShadow
Copy link
Member

How would that even work? You can't e.g. enumerate a struct's methods at runtime.

@MartinNowak
Copy link
Member

How would that even work? You can't e.g. enumerate a struct's methods at runtime.

Make __traits(documentation, __FUNCTION__) return a reference to a constant string symbol, so it can't be interpreted at compile time but read at runtime.

@mihails-strasuns
Copy link

Making it runtime string / struct sounds like a good compromise to me.

@TurkeyMan
Copy link
Contributor

It sounds pretty lame to me. There can be an awful lot of text in the docs, and I don't want my exe bigger by that amount of raw data. It needs to be processable, so I can pick and choose the bits that are relevant.

Like we've said before, there are many much better ways to produce a string literal that may be operated on in D. People aren't going to go around making /** ... */ strings 'because they can', and if they do, why is that your problem? I am confident you're not going to see it become some sort of 'trend' which will get out of control.

The reason I want this is so I can populate separate editor tools with information relating to the objects they are editing. The binary defines the objects, and the properties they have. If you update the binary, typically you have to update the editor. It's a classic problem for decades in my industry, and it's a lame state of affairs.
With this __traits, we could parse the documentation blocks and feed it through the existing editor API to populate the editor UI with useful and correct information that can't drift out of sync with the rest of the toolset.
The key advantage is that, since the state of editor metadata is embedded in the binary itself, it's possible for a single editor instance to edit multiple binaries (a daily routine when dealing with cross-platform code), which has traditionally been a huge pain in the butt.

Currently, we do what I have described by abusing UDA's. We stick editor documentation text in a UDA, and it is almost always an exact duplicate of the ddoc comment.
For things that are exposed to the editor, the documentation is repeated; once for ddoc, and once again pasted in a string in a UDA to be supplied via the editor API to the tools.
This is pretty error prone; nobody is familiar with it yet, highly vulnerable to human error, and if we agree that duplicating code sucks, duplicating comment data sucks even more, because you never get any warning when it goes out of sync...

@MartinNowak
Copy link
Member

Well accessing documentation at runtime can also change the program behavior so I withdraw my argument. In any case we should carefully evaluate this solution against alternatives, e.g. dmd's json output or customized ddoc macros.

@mleise
Copy link
Contributor

mleise commented Nov 5, 2014

In the way the way Ddoc affects a web browser's behavior I think it is a good thing. The program might itself want to add tool tips for any See_Also in the Ddoc. And being a systems language there is a lot of things people can do, but shouldn't, like going around bounds checks with arr.ptr[idx]. Add a comment to the trait's documentation: "Documentation is not code and it is not checked for semantic errors. So use this trait only to format and display documentation and the appropriate other traits to reflect on parameter types and similar in order to build code."

@bearophile
Copy link

@mleise >"Documentation is not code and it is not checked for semantic errors.<

Currently the D compilers thankfully verify the correctness of the arguments of functions inside the function ddoc.

@mleise
Copy link
Contributor

mleise commented Nov 5, 2014

@bearophile It checks that you didn't miss(spell) a name, yes. But we still don't have exception lists on functions that the compiler could compare with Ddoc "Throws:" for example.

@bearophile
Copy link

@mleise I think probably there's no interest in reintroducing checked exceptions from the window after they were thrown out of the door.

@andralex
Copy link
Member

andralex commented Feb 7, 2015

This has apparently reached an impasse. Should we close in a few days?

@DmitryOlshansky
Copy link
Member

Few days struck long ago, just saying...

@CyberShadow
Copy link
Member

Currently, we do what I have described by abusing UDA's. We stick editor documentation text in a UDA, and it is almost always an exact duplicate of the ddoc comment.

I think this is a good argument in favor of this change. Either that, or there should be a way to do the reverse (populate symbol documentation somehow from UDAs).

@Kapps
Copy link
Contributor

Kapps commented Jul 2, 2015

This is a very useful feature. The editor documentation is likely the most common use case, but it is a very prevalent one. C# has the [Description] attribute, and in practice I've found that it's pretty much always a copy and paste of the documentation, with the result being that I tend to not document the property in order to avoid the duplication.

Yes, it's possible to go out of your way to abuse it in a way that makes no sense, but that's possible with virtually every feature in D. You could inject things FILE or LINE in your code in much the same way to invoke different code depending on the module/line/function/package calling your function. You could even use something like "#line 99999" and invoke special code inside a static if to add members to a struct if LINE is 99999. Many features add potential for abuse like this, and while it is a good thing to avoid, at some point you just have to hope people are sane enough to not abuse them. That being said, I do like the idea of it not being possible to use at compile-time, but it would be nice to select at compile-time (by using the trait) which functions should actually have the documentation text included in the runtime executable.

@wilzbach
Copy link
Contributor

wilzbach commented Jun 8, 2017

@MasonMcGill and others: I revived this PR at #6872
Feedback on the revival is welcome ;-)

@wilzbach wilzbach closed this Jun 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.