Cleanup and refactor some filename-related code to use D slices#8585
Cleanup and refactor some filename-related code to use D slices#8585Geod24 merged 8 commits intodlang:masterfrom
Conversation
|
Thanks for your pull request and interest in making D better, @Geod24! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.
Please see CONTRIBUTING.md for more information. If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment. Bugzilla referencesYour PR doesn't reference any Bugzilla issue. If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog. Testing this PR locallyIf you don't have a local development environment setup, you can use Digger to test this PR: dub fetch digger
dub run digger -- build "master + dmd#8585" |
| /** | ||
| Combine a `path` and a file `name` | ||
|
|
||
| Returns: |
There was a problem hiding this comment.
Should have Params if we want to be strict.
| unittest | ||
| { | ||
| assert("Hello world".toCStringThen!((v) => v == "Hello world\0")); | ||
| assert("Hello world\0".toCStringThen!((v) => v == "Hello world\0\0")); |
There was a problem hiding this comment.
Why adding an additional null character if one is already present?
There was a problem hiding this comment.
This function does not care about the content of the slice. And slices should not contain the terminating null char, it should be out of bounds.
There was a problem hiding this comment.
But all static D strings already do contain a null afterwards, e.g. https://run.dlang.io/is/Abty1m so it looks like we could save quite a bit of copying here.
There was a problem hiding this comment.
Your example is reading past the end of the slice. It's unsafe and cannot be relied on in this function.
Also that copy is unlikely to have any noticeable performance impact, since this function will be called primarily to wrap C function calls, and we'll be calling mostly I/O functions.
| return cast(const(char)*)(&this + 1); | ||
| } | ||
|
|
||
| extern (D) inout(char)[] toString() inout |
There was a problem hiding this comment.
Should at least have an empty Ddoc comment.
|
Updated: diff --git a/src/dmd/root/stringtable.d b/src/dmd/root/stringtable.d
index aa73bb82c..2d9b1876c 100644
--- a/src/dmd/root/stringtable.d
+++ b/src/dmd/root/stringtable.d
@@ -58,6 +58,7 @@ pure:
return cast(const(char)*)(&this + 1);
}
+ /// Returns: The content of this entry as a D slice
extern (D) inout(char)[] toString() inout
{
return (cast(inout(char)*)(&this + 1))[0 .. length];
diff --git a/src/dmd/root/filename.d b/src/dmd/root/filename.d
index ba34d6e44..e6afe9e7e 100644
--- a/src/dmd/root/filename.d
+++ b/src/dmd/root/filename.d
@@ -298,6 +298,10 @@ nothrow:
/**
Combine a `path` and a file `name`
+ Params:
+ path = Path to append to
+ name = name to append to path
+
Returns:
The `\0` terminated string which is the combination of `path` and `name`
and a valid path. |
| extern (C++) static bool equals(const(char)* name1, const(char)* name2) pure | ||
| { | ||
| return compare(name1, name2) == 0; | ||
| return equals(name1.toDString, name2.toDString); |
There was a problem hiding this comment.
Wouldn't it be more efficient to keep the C compare here for now?
There was a problem hiding this comment.
That difference is negligible.
And if we concern ourselves with such micro-optimization, there's very little I can do, as most of the work of converting involves moving the implementation to slices and providing strlen wrapper, which isn't strictly necessary sometimes (like here).
The other approach would be to convert all the code that relies on it at once, but that applies transitively, and you end up converting all the code of DMD to use slice, which is clearly not feasible.
src/dmd/utils.d
Outdated
| dg = Delegate to call afterwards | ||
|
|
||
| Returns: | ||
| The return value of `T` |
There was a problem hiding this comment.
Nit: For new Phobos code, we recommend to not use any white space or stars as indentation.
There was a problem hiding this comment.
That's absolutely hideous, but if it's the D style...
There was a problem hiding this comment.
I agree, it's hideous.
| Returns: | ||
| The return value of `T` | ||
| */ | ||
| auto toCStringThen(alias dg)(const(char)[] src) nothrow |
There was a problem hiding this comment.
Maybe include "temporary" in the name, s.t. it's always clear that keeping the C string around is dangerous?
There was a problem hiding this comment.
I used this name because it matches what's already there, namely toWStringzThen and toExtendedPathThen.
Marking those functions @safe should enforce that scope is checked for, but we cannot (yet) do that.
There was a problem hiding this comment.
should enforce that scope is checked for, but we cannot (yet) do that.
Well safety is still broken for now too, e.g. https://issues.dlang.org/show_bug.cgi?id=19175
| */ | ||
| auto toCStringThen(alias dg)(const(char)[] src) nothrow | ||
| { | ||
| char[512] small = void; |
There was a problem hiding this comment.
How about as a simple optimization checking whether the last character (or the one after) is already a \0 and thus skipping the copying all together?
There was a problem hiding this comment.
It would most likely be a bug to have to have the \0 as the last character (and it would mess with comparison). And we cannot check the character after the last, because it's unsafe.
| unittest | ||
| { | ||
| assert("Hello world".toCStringThen!((v) => v == "Hello world\0")); | ||
| assert("Hello world\0".toCStringThen!((v) => v == "Hello world\0\0")); |
There was a problem hiding this comment.
But all static D strings already do contain a null afterwards, e.g. https://run.dlang.io/is/Abty1m so it looks like we could save quite a bit of copying here.
| Returns: | ||
| The return value of `T` | ||
| */ | ||
| auto toCStringThen(alias dg)(const(char)[] src) nothrow |
There was a problem hiding this comment.
should enforce that scope is checked for, but we cannot (yet) do that.
Well safety is still broken for now too, e.g. https://issues.dlang.org/show_bug.cgi?id=19175
src/dmd/root/filename.d
Outdated
| version (Windows) | ||
| { | ||
| return stricmp(name1, name2); | ||
| return strnicmp(name1.ptr, name2.ptr, name1.length) == 0; |
There was a problem hiding this comment.
These POSIX functions are deprecated. Use the ISO C++ conformant _strnicmp, _wcsnicmp, _mbsnicmp, _strnicmp_l, _wcsnicmp_l, _mbsnicmp_l instead.
There was a problem hiding this comment.
I see strncmp has reappeared in another guise. Please don't use it. I pointed this out in a previous PR, and why.
There was a problem hiding this comment.
Previous review of strncmp:
#8575 (review)
"strncmp is the wrong answer here. I quote the C Standard: "The strncmp function compares not more than n characters (characters that follow a null character are not compared) from the array pointed to by s1 to the array pointed to bys 2." I.e. embedded 0's are still a factor. Use memcmp() and memicmp() instead. I've never found a legitimate use for strncmp()."
src/dmd/root/filename.d
Outdated
| else | ||
| { | ||
| return strcmp(name1, name2); | ||
| return memcmp(name1.ptr, name2.ptr, name1.length) == 0; |
There was a problem hiding this comment.
Why not simply name == name2?
|
Updated: diff --git a/src/dmd/root/filename.d b/src/dmd/root/filename.d
index e6afe9e7e..606cb3107 100644
--- a/src/dmd/root/filename.d
+++ b/src/dmd/root/filename.d
@@ -27,7 +27,13 @@ import dmd.utils;
nothrow
{
-version (Windows) extern (C) int strnicmp(const char*, const char*, size_t) pure;
+// Win32 prepend an `_`, so this will call the non-deprecated `_strnicmp`
+version (Win32) extern (C) int strnicmp(const char*, const char*, size_t) pure;
+else version (Windows)
+{
+ extern (C) int _strnicmp(const char*, const char*, size_t) pure;
+ alias strnicmp = _strnicmp;
+}
version (Windows) extern (Windows) DWORD GetFullPathNameW(LPCWSTR, DWORD, LPWSTR, LPWSTR*) @nogc;
version (Windows) extern (Windows) void SetLastError(DWORD) @nogc;
version (Windows) extern (C) char* getcwd(char* buffer, size_t maxlen);
@@ -68,7 +74,7 @@ nothrow:
}
else
{
- return memcmp(name1.ptr, name2.ptr, name1.length) == 0;
+ return name1 == name2;
}
} |
src/dmd/root/filename.d
Outdated
| } | ||
| memcpy(f + pathlen, name, namelen + 1); | ||
| return f; | ||
| const len = path.length + (trailingSlash ? 1 : 0); |
There was a problem hiding this comment.
The ?1:0 is redundant, because trailingSlash is a bool that is already 1 or 0.
There was a problem hiding this comment.
Didn't knew it was guaranteed by the language. D never cease to amaze me.
There was a problem hiding this comment.
FWIW I prefer the ?1:0. When reading code, if I see something like bool trailingSlash; len = path.length + trailingSlash, it's going to give me pause. It reminds me of the --> operator, exploiting some quirk in the language for some perceived benefit. IMO converting a bool to an int should require an explicit cast.
6681b13 to
68b5c95
Compare
|
Updated: diff --git a/src/dmd/root/filename.d b/src/dmd/root/filename.d
index 606cb3107..b80e6236f 100644
--- a/src/dmd/root/filename.d
+++ b/src/dmd/root/filename.d
@@ -21,19 +21,13 @@ import core.sys.windows.windows;
import dmd.root.array;
import dmd.root.file;
import dmd.root.outbuffer;
+import dmd.root.port;
import dmd.root.rmem;
import dmd.root.rootobject;
import dmd.utils;
nothrow
{
-// Win32 prepend an `_`, so this will call the non-deprecated `_strnicmp`
-version (Win32) extern (C) int strnicmp(const char*, const char*, size_t) pure;
-else version (Windows)
-{
- extern (C) int _strnicmp(const char*, const char*, size_t) pure;
- alias strnicmp = _strnicmp;
-}
version (Windows) extern (Windows) DWORD GetFullPathNameW(LPCWSTR, DWORD, LPWSTR, LPWSTR*) @nogc;
version (Windows) extern (Windows) void SetLastError(DWORD) @nogc;
version (Windows) extern (C) char* getcwd(char* buffer, size_t maxlen);
@@ -70,7 +64,7 @@ nothrow:
version (Windows)
{
- return strnicmp(name1.ptr, name2.ptr, name1.length) == 0;
+ return Port.memicmp(name1.ptr, name2.ptr, name1.length) == 0;
}
else
{
@@ -348,7 +342,7 @@ nothrow:
{
assert(0);
}
- const len = path.length + (trailingSlash ? 1 : 0);
+ const len = path.length + trailingSlash;
memcpy(f + len, name.ptr, name.length);
// Note: At the moment `const(char)*` are being transitioned to
// `const(char)[]`. To avoid bugs crippling in, we `\0` terminateSince I realized recently that we have |
|
Buildkite is being weird @wilzbach : |
Sorry, my fault. Should be fixed now. I restarted the build. |
src/dmd/utils.d
Outdated
| { | ||
| char[512] small = void; | ||
| scope ptr = (src.length < (small.length - 1)) | ||
| ? small[0 .. src.length + 1] |
There was a problem hiding this comment.
src.length + 1 is excessively repeated. I think it can be stored in a constant variable and reused. It's asking for the next developer to introduce an off-by-1 error.
|
Updated: diff --git a/src/dmd/utils.d b/src/dmd/utils.d
index 55a9bb54d..5747a50a4 100644
--- a/src/dmd/utils.d
+++ b/src/dmd/utils.d
@@ -160,10 +160,11 @@ The return value of `T`
*/
auto toCStringThen(alias dg)(const(char)[] src) nothrow
{
+ const len = src.length + 1;
char[512] small = void;
scope ptr = (src.length < (small.length - 1))
- ? small[0 .. src.length + 1]
- : (cast(char*)mem.xmalloc(src.length + 1))[0 .. src.length + 1];
+ ? small[0 .. len]
+ : (cast(char*)mem.xmalloc(len))[0 .. len];
scope (exit)
{
if (&ptr[0] != &small[0]) |
src/dmd/root/filename.d
Outdated
|
|
||
| Params: | ||
| path = Path to append to | ||
| name = name to append to path |
There was a problem hiding this comment.
Capitalize "name" in the description.
|
I think this might cause a merge conflict with #8542. Let's try to get that in first. |
It definitely will, as this PR was what originally prompted me to do this. |
It is unused and quite surprising, as the length is what the string points to.
It's of no use to compare file names for anything else than equality, as can be seen by all the usages of `compare` checking for actual equality. Additionally, only the comparison of two strings was used, not the `RootObject` method (as FileName does not inherit from RootObject).
Allows to convert a non-'\0' terminated slice back to a '\0' terminated one. This is required from some C function, and can be additionally used to transition from C-style strings to D-style ones.
|
@Geod24 rebase! |
|
Already done and no conflict 🎉 |
|
Re-adding auto-merge now that stable has been merged and it didn't lead to any code modification. |
As requested by @WalterBright .
Best reviewed commit-by-commit. Let me know if it should be broken down further.