Skip to content

intermittent TlsConnectionTruncated #14573

@marler8997

Description

@marler8997

Zig Version

0.11.0-dev.1507+6f13a725a

Steps to Reproduce and Observed Behavior

I found this intermittent issue when trying to use the new std TLS implementation for zigup. I've created a small application that will intermittently reproduce the issue. It's extremely intermittent. Once I had 30 successful runs before I saw it, but usually I see it before 10 runs, sometimes after the first few runs as well. Seems to happen on CI pretty often though (see https://github.com/marler8997/zigup/actions/runs/4095377532/jobs/7066880134), it's odd. It also seems to be sensitive to the "read buffer size".

Here's the program:

const std = @import("std");

// play around with this size to see different kinds of errors
var buf: [8192]u8 = undefined;

pub fn main() !void {
    const host = "ziglang.org";
    const port = 443;
    const resource = "https://ziglang.org/download/0.7.0/zig-linux-x86_64-0.7.0.tar.xz";
    const content_len = 37154432; // hardcoded, always the same

    var arena = std.heap.ArenaAllocator.init(std.heap.page_allocator);
    
    const stream = try std.net.tcpConnectToHost(arena.allocator(), host, port);
    defer stream.close();
    
    var ca_bundle = std.crypto.Certificate.Bundle{ };
    defer ca_bundle.deinit(arena.allocator());

    try ca_bundle.rescan(arena.allocator());

    var client = try std.crypto.tls.Client.init(stream, ca_bundle, host);

    const request = try std.fmt.bufPrint(
        &buf,
           "GET {s} HTTP/1.1\r\n"
        ++ "Host: {s}\r\n"
        ++ "Connection: close\r\n"
        ++ "\r\n",
        .{ resource, host});
    std.log.info("sending request\n----\n{s}----\n", .{request});
    try client.writeAll(stream, request);

    var total_read: usize = 0;
    const end_of_headers = blk: {
        while (true) {
            const len = try client.read(stream, buf[total_read..]);
            std.debug.assert(len != 0);
            total_read += len;
            
            if (std.mem.indexOf(u8, buf[0 .. total_read], "\r\n\r\n")) |eoh| break :blk eoh;
        }
    };
    std.log.info("received headers\n----\n{s}\n----\n", .{buf[0 .. end_of_headers]});

    const total_expected = end_of_headers + 4 + content_len;
    while (true) {
        const len = try client.read(stream, &buf);
        if (len == 0)
            break;
        std.log.info("got {} bytes", .{len});
        total_read += len;
    }

    if (total_read != total_expected) {
        std.log.err("expected {} bytes but got {}", .{total_expected, total_read});
        std.os.exit(0xff);
    }
    std.log.info("Success!", .{});
}

This program sends a simple HTTP request to download the zig 0.7.0 linux tarball release. A non-successful run (can take running a bunch of times to see) looks like this:

$ zig build-exe tlsbug.zig
$ ./tlsbug
info: sending request
----
GET https://ziglang.org/download/0.7.0/zig-linux-x86_64-0.7.0.tar.xz HTTP/1.1
Host: ziglang.org
Connection: close

----

info: received headers
----
HTTP/1.1 200 OK
Content-Type: application/x-xz
Content-Length: 37154432
Connection: close
Date: Fri, 03 Feb 2023 23:12:59 GMT
x-amz-meta-s3cmd-attrs: atime:1604890484/ctime:1604890546/gid:100/gname:users/md5:b29ab9c96c8f7963b36e11511f75447a/mode:33188/mtime:1604890546/uid:1000/uname:andy
Cache-Control: public, max-age=31536000, immutable
Last-Modified: Mon, 09 Nov 2020 03:18:34 GMT
ETag: "97e705bb6119a7c60a3e589506224e19-3"
Server: AmazonS3
X-Cache: Hit from cloudfront
Via: 1.1 b2d3922a177f6cecf9222a78a0a1ad32.cloudfront.net (CloudFront)
X-Amz-Cf-Pop: SEA19-C3
X-Amz-Cf-Id: Hok7dxOfwRHyeB9jTuDGMpV01C1wn9T-29QGVKGvBajduqZa46XlNA==
Age: 162960
----

info: got 8192 bytes
[ ... snipped a bunch of duplicate lines ... ]
info: got 8192 bytes
error: TlsConnectionTruncated
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/crypto/tls/Client.zig:917:13: 0x3670f8 in readvAdvanced__anon_7479 (tlsbug)
            return error.TlsConnectionTruncated;
            ^
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/crypto/tls/Client.zig:831:19: 0x36645b in readvAtLeast__anon_7478 (tlsbug)
        var amt = try c.readvAdvanced(stream, iovecs[vec_i..]);
                  ^
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/crypto/tls/Client.zig:792:5: 0x36623f in readAtLeast__anon_7476 (tlsbug)
    return readvAtLeast(c, stream, &iovecs, len);
    ^
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/crypto/tls/Client.zig:797:5: 0x3661c1 in read__anon_7475 (tlsbug)
    return readAtLeast(c, stream, buffer, 1);
    ^
/home/marler8997/git/ziget/tlsbug.zig:47:21: 0x36fd12 in main (tlsbug)
        const len = try client.read(stream, &buf);
                    ^

If I change the buffer size to 8192 * 10, then I get a different error:

$ ./tlsbug
[ .. output snipped ... ]
info: got 81920 bytes
thread 3123151 panic: reached unreachable code
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/debug.zig:281:14: 0x308b9c in assert (tlsbug)
    if (!ok) unreachable; // assertion failure
             ^
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/mem.zig:203:30: 0x301f57 in copy__anon_4376 (tlsbug)
    assert(dest.len >= source.len);
                             ^
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/crypto/tls/Client.zig:993:30: 0x368964 in readvAdvanced__anon_7479 (tlsbug)
            mem.copy(u8, frag[0..in], first);
                             ^
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/crypto/tls/Client.zig:831:53: 0x36643a in readvAtLeast__anon_7478 (tlsbug)
        var amt = try c.readvAdvanced(stream, iovecs[vec_i..]);
                                                    ^
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/crypto/tls/Client.zig:792:24: 0x366212 in readAtLeast__anon_7476 (tlsbug)
    return readvAtLeast(c, stream, &iovecs, len);
                       ^
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/crypto/tls/Client.zig:797:23: 0x366194 in read__anon_7475 (tlsbug)
    return readAtLeast(c, stream, buffer, 1);
                      ^
/home/marler8997/git/ziget/tlsbug.zig:48:36: 0x36fcc8 in main (tlsbug)
        const len = try client.read(stream, &buf);
                                   ^
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/start.zig:616:37: 0x2ff1a9 in posixCallMainAndExit (tlsbug)
            const result = root.main() catch |err| {
                                    ^
/home/marler8997/zig/0.11.0-dev.1507+6f13a725a/files/lib/std/start.zig:376:5: 0x2fec11 in _start (tlsbug)
    @call(.never_inline, posixCallMainAndExit, .{});
    ^
Aborted (core dumped)

Another thing to note is that in a wireshark trace, nothing coming from the server seems "out of the ordinary" as far as I can see. The server isn't trying to close the connection, it seems to just be sending the data and then suddenly the client closes the connection with "RST" packets.

Expected Behavior

The commands above to build and then run tlsbug.zig should end with output that looks like this:

[ ... output snipped ... ]
info: got 81920 bytes
info: got 61722 bytes
info: Success!

The program shouldn't intermittently fail.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugObserved behavior contradicts documented or intended behaviorcontributor friendlyThis issue is limited in scope and/or knowledge of Zig internals.standard libraryThis issue involves writing Zig code for the standard library.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions