Skip to content

Cleanup and refactor some filename-related code to use D slices#8585

Merged
Geod24 merged 8 commits intodlang:masterfrom
Geod24:break-down
Aug 27, 2018
Merged

Cleanup and refactor some filename-related code to use D slices#8585
Geod24 merged 8 commits intodlang:masterfrom
Geod24:break-down

Conversation

@Geod24
Copy link
Member

@Geod24 Geod24 commented Aug 20, 2018

As requested by @WalterBright .
Best reviewed commit-by-commit. Let me know if it should be broken down further.

@dlang-bot
Copy link
Contributor

Thanks for your pull request and interest in making D better, @Geod24! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please verify that your PR follows this checklist:

  • My PR is fully covered with tests (you can see the annotated coverage diff directly on GitHub with CodeCov's browser extension
  • My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
  • I have provided a detailed rationale explaining my changes
  • New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.


If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub fetch digger
dub run digger -- build "master + dmd#8585"

@Geod24 Geod24 changed the title leanup and refactor some filename-related code to use D slices Cleanup and refactor some filename-related code to use D slices Aug 20, 2018
/**
Combine a `path` and a file `name`

Returns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have Params if we want to be strict.

unittest
{
assert("Hello world".toCStringThen!((v) => v == "Hello world\0"));
assert("Hello world\0".toCStringThen!((v) => v == "Hello world\0\0"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why adding an additional null character if one is already present?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function does not care about the content of the slice. And slices should not contain the terminating null char, it should be out of bounds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But all static D strings already do contain a null afterwards, e.g. https://run.dlang.io/is/Abty1m so it looks like we could save quite a bit of copying here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. https://run.dlang.io/is/Abty1m

Your example is reading past the end of the slice. It's unsafe and cannot be relied on in this function.
Also that copy is unlikely to have any noticeable performance impact, since this function will be called primarily to wrap C function calls, and we'll be calling mostly I/O functions.

return cast(const(char)*)(&this + 1);
}

extern (D) inout(char)[] toString() inout
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should at least have an empty Ddoc comment.

@Geod24
Copy link
Member Author

Geod24 commented Aug 20, 2018

Updated:

diff --git a/src/dmd/root/stringtable.d b/src/dmd/root/stringtable.d
index aa73bb82c..2d9b1876c 100644
--- a/src/dmd/root/stringtable.d
+++ b/src/dmd/root/stringtable.d
@@ -58,6 +58,7 @@ pure:
         return cast(const(char)*)(&this + 1);
     }

+    /// Returns: The content of this entry as a D slice
     extern (D) inout(char)[] toString() inout
     {
         return (cast(inout(char)*)(&this + 1))[0 .. length];
diff --git a/src/dmd/root/filename.d b/src/dmd/root/filename.d
index ba34d6e44..e6afe9e7e 100644
--- a/src/dmd/root/filename.d
+++ b/src/dmd/root/filename.d
@@ -298,6 +298,10 @@ nothrow:
     /**
        Combine a `path` and a file `name`

+       Params:
+         path = Path to append to
+         name = name to append to path
+
        Returns:
          The `\0` terminated string which is the combination of `path` and `name`
          and a valid path.

extern (C++) static bool equals(const(char)* name1, const(char)* name2) pure
{
return compare(name1, name2) == 0;
return equals(name1.toDString, name2.toDString);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more efficient to keep the C compare here for now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That difference is negligible.

And if we concern ourselves with such micro-optimization, there's very little I can do, as most of the work of converting involves moving the implementation to slices and providing strlen wrapper, which isn't strictly necessary sometimes (like here).

The other approach would be to convert all the code that relies on it at once, but that applies transitively, and you end up converting all the code of DMD to use slice, which is clearly not feasible.

src/dmd/utils.d Outdated
dg = Delegate to call afterwards

Returns:
The return value of `T`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: For new Phobos code, we recommend to not use any white space or stars as indentation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's absolutely hideous, but if it's the D style...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, it's hideous.

Returns:
The return value of `T`
*/
auto toCStringThen(alias dg)(const(char)[] src) nothrow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe include "temporary" in the name, s.t. it's always clear that keeping the C string around is dangerous?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used this name because it matches what's already there, namely toWStringzThen and toExtendedPathThen.
Marking those functions @safe should enforce that scope is checked for, but we cannot (yet) do that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should enforce that scope is checked for, but we cannot (yet) do that.

Well safety is still broken for now too, e.g. https://issues.dlang.org/show_bug.cgi?id=19175

*/
auto toCStringThen(alias dg)(const(char)[] src) nothrow
{
char[512] small = void;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about as a simple optimization checking whether the last character (or the one after) is already a \0 and thus skipping the copying all together?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would most likely be a bug to have to have the \0 as the last character (and it would mess with comparison). And we cannot check the character after the last, because it's unsafe.

unittest
{
assert("Hello world".toCStringThen!((v) => v == "Hello world\0"));
assert("Hello world\0".toCStringThen!((v) => v == "Hello world\0\0"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But all static D strings already do contain a null afterwards, e.g. https://run.dlang.io/is/Abty1m so it looks like we could save quite a bit of copying here.

Returns:
The return value of `T`
*/
auto toCStringThen(alias dg)(const(char)[] src) nothrow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should enforce that scope is checked for, but we cannot (yet) do that.

Well safety is still broken for now too, e.g. https://issues.dlang.org/show_bug.cgi?id=19175

version (Windows)
{
return stricmp(name1, name2);
return strnicmp(name1.ptr, name2.ptr, name1.length) == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These POSIX functions are deprecated. Use the ISO C++ conformant _strnicmp, _wcsnicmp, _mbsnicmp, _strnicmp_l, _wcsnicmp_l, _mbsnicmp_l instead.

https://msdn.microsoft.com/en-us/library/ms235324.aspx

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see strncmp has reappeared in another guise. Please don't use it. I pointed this out in a previous PR, and why.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous review of strncmp:
#8575 (review)
"strncmp is the wrong answer here. I quote the C Standard: "The strncmp function compares not more than n characters (characters that follow a null character are not compared) from the array pointed to by s1 to the array pointed to bys 2." I.e. embedded 0's are still a factor. Use memcmp() and memicmp() instead. I've never found a legitimate use for strncmp()."

else
{
return strcmp(name1, name2);
return memcmp(name1.ptr, name2.ptr, name1.length) == 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not simply name == name2?

@Geod24
Copy link
Member Author

Geod24 commented Aug 21, 2018

Updated:

diff --git a/src/dmd/root/filename.d b/src/dmd/root/filename.d
index e6afe9e7e..606cb3107 100644
--- a/src/dmd/root/filename.d
+++ b/src/dmd/root/filename.d
@@ -27,7 +27,13 @@ import dmd.utils;

 nothrow
 {
-version (Windows) extern (C) int strnicmp(const char*, const char*, size_t) pure;
+// Win32 prepend an `_`, so this will call the non-deprecated `_strnicmp`
+version (Win32) extern (C) int strnicmp(const char*, const char*, size_t) pure;
+else version (Windows)
+{
+    extern (C) int _strnicmp(const char*, const char*, size_t) pure;
+    alias strnicmp = _strnicmp;
+}
 version (Windows) extern (Windows) DWORD GetFullPathNameW(LPCWSTR, DWORD, LPWSTR, LPWSTR*) @nogc;
 version (Windows) extern (Windows) void SetLastError(DWORD) @nogc;
 version (Windows) extern (C) char* getcwd(char* buffer, size_t maxlen);
@@ -68,7 +74,7 @@ nothrow:
         }
         else
         {
-            return memcmp(name1.ptr, name2.ptr, name1.length) == 0;
+            return name1 == name2;
         }
     }

}
memcpy(f + pathlen, name, namelen + 1);
return f;
const len = path.length + (trailingSlash ? 1 : 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ?1:0 is redundant, because trailingSlash is a bool that is already 1 or 0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't knew it was guaranteed by the language. D never cease to amaze me.

Copy link
Contributor

@JinShil JinShil Aug 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I prefer the ?1:0. When reading code, if I see something like bool trailingSlash; len = path.length + trailingSlash, it's going to give me pause. It reminds me of the --> operator, exploiting some quirk in the language for some perceived benefit. IMO converting a bool to an int should require an explicit cast.

@Geod24 Geod24 force-pushed the break-down branch 2 times, most recently from 6681b13 to 68b5c95 Compare August 22, 2018 00:09
@Geod24
Copy link
Member Author

Geod24 commented Aug 22, 2018

Updated:

diff --git a/src/dmd/root/filename.d b/src/dmd/root/filename.d
index 606cb3107..b80e6236f 100644
--- a/src/dmd/root/filename.d
+++ b/src/dmd/root/filename.d
@@ -21,19 +21,13 @@ import core.sys.windows.windows;
 import dmd.root.array;
 import dmd.root.file;
 import dmd.root.outbuffer;
+import dmd.root.port;
 import dmd.root.rmem;
 import dmd.root.rootobject;
 import dmd.utils;

 nothrow
 {
-// Win32 prepend an `_`, so this will call the non-deprecated `_strnicmp`
-version (Win32) extern (C) int strnicmp(const char*, const char*, size_t) pure;
-else version (Windows)
-{
-    extern (C) int _strnicmp(const char*, const char*, size_t) pure;
-    alias strnicmp = _strnicmp;
-}
 version (Windows) extern (Windows) DWORD GetFullPathNameW(LPCWSTR, DWORD, LPWSTR, LPWSTR*) @nogc;
 version (Windows) extern (Windows) void SetLastError(DWORD) @nogc;
 version (Windows) extern (C) char* getcwd(char* buffer, size_t maxlen);
@@ -70,7 +64,7 @@ nothrow:

         version (Windows)
         {
-            return strnicmp(name1.ptr, name2.ptr, name1.length) == 0;
+            return Port.memicmp(name1.ptr, name2.ptr, name1.length) == 0;
         }
         else
         {
@@ -348,7 +342,7 @@ nothrow:
         {
             assert(0);
         }
-        const len = path.length + (trailingSlash ? 1 : 0);
+        const len = path.length + trailingSlash;
         memcpy(f + len, name.ptr, name.length);
         // Note: At the moment `const(char)*` are being transitioned to
         // `const(char)[]`. To avoid bugs crippling in, we `\0` terminate

Since I realized recently that we have dmd.root.port : memicmp, I used this, but had to add attributes to it as a result (455491c).

@Geod24
Copy link
Member Author

Geod24 commented Aug 22, 2018

Buildkite is being weird @wilzbach :


Building dmd | 0s
-- | --
  | ~> make -C dmd -f posix.mak AUTO_BOOTSTRAP=1 --jobs=4
  | make: Entering directory '/var/lib/buildkite-agent/builds/dime-1/dlang/dmd/build/dmd'
  | make: posix.mak: No such file or directory
  | make: *** No rule to make target 'posix.mak'.  Stop.
  | make: Leaving directory '/var/lib/buildkite-agent/builds/dime-1/dlang/dmd/build/dmd'
  | 🚨 Error: The command exited with status 2


@wilzbach
Copy link
Contributor

Buildkite is being weird @wilzbach

Sorry, my fault. Should be fixed now. I restarted the build.

src/dmd/utils.d Outdated
{
char[512] small = void;
scope ptr = (src.length < (small.length - 1))
? small[0 .. src.length + 1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src.length + 1 is excessively repeated. I think it can be stored in a constant variable and reused. It's asking for the next developer to introduce an off-by-1 error.

@Geod24
Copy link
Member Author

Geod24 commented Aug 24, 2018

Updated:

diff --git a/src/dmd/utils.d b/src/dmd/utils.d
index 55a9bb54d..5747a50a4 100644
--- a/src/dmd/utils.d
+++ b/src/dmd/utils.d
@@ -160,10 +160,11 @@ The return value of `T`
 */
 auto toCStringThen(alias dg)(const(char)[] src) nothrow
 {
+    const len = src.length + 1;
     char[512] small = void;
     scope ptr = (src.length < (small.length - 1))
-                    ? small[0 .. src.length + 1]
-                    : (cast(char*)mem.xmalloc(src.length + 1))[0 .. src.length + 1];
+                    ? small[0 .. len]
+                    : (cast(char*)mem.xmalloc(len))[0 .. len];
     scope (exit)
     {
         if (&ptr[0] != &small[0])


Params:
path = Path to append to
name = name to append to path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capitalize "name" in the description.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@JinShil
Copy link
Contributor

JinShil commented Aug 24, 2018

I think this might cause a merge conflict with #8542. Let's try to get that in first.

@Geod24
Copy link
Member Author

Geod24 commented Aug 24, 2018

I think this might cause a merge conflict with #8542. Let's try to get that in first.

It definitely will, as this PR was what originally prompted me to do this.
I'll wait for the stable merge then.

@Geod24
Copy link
Member Author

Geod24 commented Aug 27, 2018

@JinShil JinShil added the auto-merge label 10 minutes ago

#8542 was to stable. While it was merged, stable hasn't been merged to stable yet, so this is still blocked. I'll remove the label until then :'(

Geod24 added 8 commits August 27, 2018 13:42
It is unused and quite surprising, as the length is what the string points to.
It's of no use to compare file names for anything else than equality,
as can be seen by all the usages of `compare` checking for actual equality.
Additionally, only the comparison of two strings was used, not the `RootObject`
method (as FileName does not inherit from RootObject).
Allows to convert a non-'\0' terminated slice back to a '\0' terminated one.
This is required from some C function, and can be additionally used to transition
 from C-style strings to D-style ones.
@ibuclaw
Copy link
Member

ibuclaw commented Aug 27, 2018

@Geod24 rebase!

@Geod24
Copy link
Member Author

Geod24 commented Aug 27, 2018

Already done and no conflict 🎉

@Geod24
Copy link
Member Author

Geod24 commented Aug 27, 2018

Re-adding auto-merge now that stable has been merged and it didn't lead to any code modification.

@Geod24 Geod24 merged commit 58fde67 into dlang:master Aug 27, 2018
@Geod24 Geod24 deleted the break-down branch August 27, 2018 08:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants