Skip to content

Introduce libzigc for libc function implementations in Zig#23529

Merged
alexrp merged 3 commits intoziglang:masterfrom
alexrp:2879-groundwork
Apr 12, 2025
Merged

Introduce libzigc for libc function implementations in Zig#23529
alexrp merged 3 commits intoziglang:masterfrom
alexrp:2879-groundwork

Conversation

@alexrp
Copy link
Member

@alexrp alexrp commented Apr 11, 2025

This lays the groundwork for #2879. This library will be built and linked when a static libc is going to be linked into the compilation. Currently, that means musl, wasi-libc, and MinGW-w64. As a demonstration, this commit removes the musl C code for a few string functions and implements them in libzigc. This means that those libzigc functions are now load-bearing for musl and wasi-libc.

Note that if a function has an implementation in compiler-rt already, libzigc should not implement it. Instead, as we recently did for memcpy/memmove, we should delete the libc copy and rely on the compiler-rt implementation.

I repurposed the existing "universal libc" code to do this. That code hadn't seen development beyond basic string functions in years, and was only usable-ish on freestanding. I think that if we want to seriously pursue the idea of Zig providing a freestanding libc, we should do so only after defining clear goals (and non-goals) for it. See also #22240 for a similar case.

Release Notes

In this release, we've started the effort to share code between the statically-linked libcs that Zig provides -- currently musl, wasi-libc, and MinGW-w64 -- by reimplementing common functions in Zig code in the new libzigc library. This means that there is a single canonical implementation of each function, and we're able to improve the implementation without having to modify the vendored libc code from the aforementioned projects. The very long term aspiration here -- which will require a lot of work -- is to completely eliminate our dependency on the upstream C implementation code of those libcs, such that we ship only their headers.

This effort is very contributor-friendly, so if this sounds interesting to you, check out issue #2879 for details.

Copy link
Member

@andrewrk andrewrk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to see progress on this sub-project.

Note that libcs have functions in individual files in order to make them be in separate compilation units so that they can be omitted if unused.

In zig's case, it's all one compilation unit, but we should be enabling -ffunction-sections for the libc compilation unit. One time I tried putting all the compiler_rt functions in separate files to see if that was easier on the linker, but it was remarkably slower than using function sections. That's why compiler_rt is organized the way it is. I don't think we should mindlessly put one function per file, copying libc, or copying compiler_rt, but I'm not saying don't do it either. Whatever makes the most sense.

@andrewrk
Copy link
Member

prior art

@alexrp
Copy link
Member Author

alexrp commented Apr 11, 2025

Note that libcs have functions in individual files in order to make them be in separate compilation units so that they can be omitted if unused.

In zig's case, it's all one compilation unit, but we should be enabling -ffunction-sections for the libc compilation unit. One time I tried putting all the compiler_rt functions in separate files to see if that was easier on the linker, but it was remarkably slower than using function sections. That's why compiler_rt is organized the way it is. I don't think we should mindlessly put one function per file, copying libc, or copying compiler_rt, but I'm not saying don't do it either. Whatever makes the most sense.

Makes sense. We do indeed enable -ffunction-sections for Zig-provided libraries. I'd definitely prefer to group functions into files based on category or whatever makes sense, so I'll just throw these functions into string.zig to mirror string.h where they're declared.

@rpkak
Copy link
Contributor

rpkak commented Apr 11, 2025

These zigc functions seem to be tested by zig tests. If the plan is to implement more libc functions into zigc, I think a libc test suite to stop potential diversion between zigc and libc might make sense. After a quick google, I found libc-test by musl, which states "the test system should run on all archs and libcs" in its readme.

@alexrp
Copy link
Member Author

alexrp commented Apr 11, 2025

We have indeed looked at libc-test for this, but there are still some open questions in regards to how it should be integrated, so I would consider that follow-up work.

This lays the groundwork for #2879. This library will be built and linked when a
static libc is going to be linked into the compilation. Currently, that means
musl, wasi-libc, and MinGW-w64. As a demonstration, this commit removes the musl
C code for a few string functions and implements them in libzigc. This means
that those libzigc functions are now load-bearing for musl and wasi-libc.

Note that if a function has an implementation in compiler-rt already, libzigc
should not implement it. Instead, as we recently did for memcpy/memmove, we
should delete the libc copy and rely on the compiler-rt implementation.

I repurposed the existing "universal libc" code to do this. That code hadn't
seen development beyond basic string functions in years, and was only usable-ish
on freestanding. I think that if we want to seriously pursue the idea of Zig
providing a freestanding libc, we should do so only after defining clear goals
(and non-goals) for it. See also #22240 for a similar case.
return switch (std.mem.orderZ(u8, @ptrCast(s1), @ptrCast(s2))) {
.lt => -1,
.eq => 0,
.gt => 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correct implementation but has different behavior from musl, which returns the difference of the final, differing bytes instead of just -1, 0, and 1. Is that okay when the user has specifically requested musl? What level of compatibility are we aiming for? If someone does happen to rely on a musl bug or quirk, should they just manually link a different copy of the musl sources?

Copy link
Member Author

@alexrp alexrp Apr 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like musl's behavior is particularly deliberate, but rather an artifact of how it's implemented. It's not documented as being guaranteed either.

musl generally takes a strong stance on C/POSIX standards compliance, to the point of exploiting many requirements in those standards for efficiency in its implementation code, so I have a feeling that if you asked Rich if you should rely on this strcmp implementation detail, you'd get an emphatic "no".

So I think it's fine to just go by the C standard here.

If someone does happen to rely on a musl bug or quirk, should they just manually link a different copy of the musl sources?

We have to handle this on a case-by-case basis since long-standing bugs do become features sometimes, especially when the standards are unclear on the matter. But in this particular case, ideally, they should fix their code because it isn't compliant with the C standard's definition of strcmp and thus non-portable.

@marler8997
Copy link
Contributor

Good seeing someone work on this, feel free to reference/use anything from https://github.com/marler8997/ziglibc as well.

@export(&strncmp, .{ .name = "strncmp", .linkage = common.linkage, .visibility = common.visibility });
}

fn strcmp(s1: [*:0]const c_char, s2: [*:0]const c_char) callconv(.c) c_int {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this u8 instead of c_char since it's the same ABI (what's being passed is a pointer), and avoid all the ptr casts in this file.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't that become an impediment to #20654 because we'd lose the information that the type in the signature is actually meant to be C char, which can be u8 or i8 depending on target?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mm, good point. I did mention this though:

Check for strict equality? Should there be a notion of compatibility? Should there be compatibility-checking settings? I'm thinking that there is indeed a notion of compatibility. For example, if the ABI defines a usize but you want to have an enum on the other side, that is well-defined behavior according to the ABI. However, one might want to tweak what kinds of compatibility are or are not counted as mismatch.

still I think you're right, the ability to generate viable .h files, or at least partial, would be nice.

@alexrp alexrp merged commit 9352f37 into ziglang:master Apr 12, 2025
17 of 18 checks passed
@alexrp alexrp deleted the 2879-groundwork branch April 12, 2025 16:15
@alexrp alexrp added the release notes This PR should be mentioned in the release notes. label May 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release notes This PR should be mentioned in the release notes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants