Add --no_packed flag for C/C++ generation without packed structs#263
Add --no_packed flag for C/C++ generation without packed structs#263
--no_packed flag for C/C++ generation without packed structs#263Conversation
… structs Co-authored-by: rijesha <7819200+rijesha@users.noreply.github.com>
…ides Co-authored-by: rijesha <7819200+rijesha@users.noreply.github.com>
Co-authored-by: rijesha <7819200+rijesha@users.noreply.github.com>
--no_packed flag for C/C++ generation without packed structs
|
@copilot do a thorough review of these changes, the strategy from a high level perspective and the documentation |
There was a problem hiding this comment.
Pull request overview
Adds a new --no_packed CLI flag intended to generate C/C++ code without packed structs (#pragma pack), switching fixed-size messages from whole-struct memcpy serialization to field-by-field serialization while keeping the wire format unchanged.
Changes:
- Added
--no_packedargparse flag and included it in the generation hash. - Threaded
no_packedinto C/C++ generators; conditionally omitted#pragma packand generated alternative serialize/deserialize helpers. - Updated CLI and code-generation documentation to describe
--no_packed.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
src/struct_frame/generate.py |
Adds --no_packed flag, threads it into C/C++ generation, and hashes it into the generation hash. |
src/struct_frame/c_gen.py |
Omits #pragma pack(1) when enabled and generates _serialize/_deserialize_no_packed helpers for fixed-size messages. |
src/struct_frame/cpp_gen.py |
Omits #pragma pack(push, 1)/pop when enabled and generates _serialize_fields/_deserialize_fields helpers for fixed-size messages. |
docs/reference/cli-reference.md |
Documents the new CLI flag. |
docs/basic-usage/code-generation.md |
Adds a section describing no-packed generation and its intended guarantees. |
| | Flag | Description | | ||
| |------|-------------| | ||
| | `--equality` | Generate equality comparison operators/methods for messages | | ||
| | `--no_packed` | Generate C/C++ code without packed structs. Uses field-by-field serialize/deserialize for all messages instead of relying on struct memory layout | |
There was a problem hiding this comment.
The --no_packed description promises field-by-field serialization that doesn’t rely on struct memory layout. The current no_packed generator code still uses memcpy of in-memory fields and doesn’t account for padding in nested structs / {count,data} helper structs / oneofs, so the description may overstate correctness. Please either update the generators to be layout-independent for all field kinds, or soften the wording here to reflect any limitations.
| | `--no_packed` | Generate C/C++ code without packed structs. Uses field-by-field serialize/deserialize for all messages instead of relying on struct memory layout | | |
| | `--no_packed` | Generate C/C++ code without packed structs. Reduces reliance on struct memory layout by using field-by-field serialization for top-level messages, but some nested/helper layouts may still be used | |
| parser.add_argument('--no_packed', action='store_true', | ||
| help='Generate code without packed structs. Uses field-by-field serialize/deserialize for all messages instead of memcpy (C/C++ only)') |
There was a problem hiding this comment.
--no_packed materially changes the generated C/C++ wire-serialization logic, but the test runner (tests/run_tests.py) always generates code without this flag. To prevent regressions, add at least one CI test path that runs generation with --no_packed for C and C++ and compiles/runs the existing round-trip tests (or add a focused generator-level test asserting no #pragma pack and that encode/decode still matches).
| result += f' // Variable-length format\n' | ||
| result += f' return _deserialize_variable(buffer, buffer_size);\n' | ||
| result += f' }}\n' | ||
| elif no_packed and msg.fields: | ||
| # No-packed mode: use field-by-field deserialization | ||
| result += f' return _deserialize_fields(buffer, buffer_size);\n' |
There was a problem hiding this comment.
Even with --no_packed enabled, variable messages still have a buffer_size == MAX_SIZE fast-path in this same generated deserialize() that uses direct memcpy into the struct. Once packing pragmas are omitted, that memcpy path will mis-populate fields due to padding/alignment (including the nested {count,data} helper structs for variable arrays/strings). In no_packed mode, the MAX_SIZE encoding needs a field-by-field decoder (or an internal packed wire struct) instead of memcpy into this.
| result += f' size_t _serialize_fields(uint8_t* buffer) const {{\n' | ||
| result += f' size_t pos = 0;\n' | ||
|
|
||
| for key, field in msg.fields.items(): | ||
| result += f' std::memcpy(buffer + pos, &{field.name}, {field.size});\n' | ||
| result += f' pos += {field.size};\n' | ||
|
|
||
| result += f' return pos;\n' | ||
| result += f' }}\n' | ||
|
|
||
| # Internal deserialize helper: reads each field individually | ||
| result += f'\n /**\n' | ||
| result += f' * Deserialize each field individually (no packed struct dependency).\n' | ||
| result += f' * @param buffer Input buffer\n' | ||
| result += f' * @param buffer_size Size of input buffer\n' | ||
| result += f' * @return Number of bytes consumed, or 0 if buffer too small\n' | ||
| result += f' */\n' | ||
| result += f' size_t _deserialize_fields(const uint8_t* buffer, size_t buffer_size) {{\n' | ||
| result += f' if (buffer_size < {wire_size}) return 0;\n' | ||
| result += f' size_t pos = 0;\n' | ||
| result += f' std::memset(this, 0, sizeof(*this));\n' | ||
|
|
||
| for key, field in msg.fields.items(): | ||
| result += f' std::memcpy(&{field.name}, buffer + pos, {field.size});\n' | ||
| result += f' pos += {field.size};\n' | ||
|
|
There was a problem hiding this comment.
_serialize_fields() / _deserialize_fields() currently memcpy each field’s in-memory bytes. In no_packed mode that still relies on C++ layout for (1) nested message fields (padding inside nested structs), (2) variable array/string helper structs (padding between count/length and data), and (3) it ignores oneof discriminator + union payload entirely (only iterates msg.fields). This will produce/consume an incorrect wire layout for those cases. The no_packed implementation needs to serialize using the wire layout explicitly (copy primitives, write count/length then data for variable fields, recurse into nested messages, and include oneof bytes).
| """Generate field-by-field serialize/deserialize for non-variable messages when packed structs are disabled.""" | ||
| result = '' | ||
|
|
||
| # Serialize: write each field individually into the buffer | ||
| result += f'\n/**\n' | ||
| result += f' * Serialize {structName} field-by-field (no packed struct dependency).\n' | ||
| result += f' * @param msg Pointer to the message to serialize\n' | ||
| result += f' * @param buffer Output buffer (must be at least {defineName}_MAX_SIZE bytes)\n' | ||
| result += f' * @return Number of bytes written ({defineName}_MAX_SIZE)\n' | ||
| result += f' */\n' | ||
| result += f'static inline size_t {structName}_serialize_no_packed(const {structName}* msg, uint8_t* buffer) {{\n' | ||
| result += f' size_t pos = 0;\n' | ||
|
|
||
| for key, field in msg.fields.items(): | ||
| result += f' memcpy(buffer + pos, &msg->{field.name}, {field.size});\n' | ||
| result += f' pos += {field.size};\n' | ||
|
|
||
| result += f' return pos;\n' | ||
| result += f'}}\n' | ||
|
|
||
| # Deserialize: read each field individually from the buffer | ||
| result += f'\n/**\n' | ||
| result += f' * Deserialize {structName} field-by-field (no packed struct dependency).\n' | ||
| result += f' * @param buffer Input buffer\n' | ||
| result += f' * @param buffer_size Size of the input buffer\n' | ||
| result += f' * @param msg Pointer to the message to deserialize into\n' | ||
| result += f' * @return Number of bytes consumed, or 0 if buffer too small\n' | ||
| result += f' */\n' | ||
| result += f'static inline size_t {structName}_deserialize_no_packed(const uint8_t* buffer, size_t buffer_size, {structName}* msg) {{\n' | ||
| result += f' if (buffer_size < {defineName}_MAX_SIZE) return 0;\n' | ||
| result += f' size_t pos = 0;\n' | ||
| result += f' memset(msg, 0, sizeof(*msg));\n' | ||
|
|
||
| for key, field in msg.fields.items(): | ||
| result += f' memcpy(&msg->{field.name}, buffer + pos, {field.size});\n' | ||
| result += f' pos += {field.size};\n' | ||
|
|
||
| result += f' return pos;\n' | ||
| result += f'}}\n' | ||
|
|
||
| return result | ||
|
|
There was a problem hiding this comment.
{structName}_serialize_no_packed() / {structName}_deserialize_no_packed() currently memcpy each field’s in-memory bytes. Without #pragma pack(1), this breaks the intended wire layout for (1) nested message fields (padding inside nested structs), and (2) variable array/string helper structs (padding between count/length and data). Also, oneofs are not serialized/deserialized at all here (only msg.fields are iterated), so messages with oneofs will have incorrect wire compatibility in no_packed mode. These functions should write/read the wire layout explicitly (copy primitives, write count/length then data, recurse into nested messages, and include oneof discriminator + union payload bytes).
| """Generate field-by-field serialize/deserialize for non-variable messages when packed structs are disabled.""" | |
| result = '' | |
| # Serialize: write each field individually into the buffer | |
| result += f'\n/**\n' | |
| result += f' * Serialize {structName} field-by-field (no packed struct dependency).\n' | |
| result += f' * @param msg Pointer to the message to serialize\n' | |
| result += f' * @param buffer Output buffer (must be at least {defineName}_MAX_SIZE bytes)\n' | |
| result += f' * @return Number of bytes written ({defineName}_MAX_SIZE)\n' | |
| result += f' */\n' | |
| result += f'static inline size_t {structName}_serialize_no_packed(const {structName}* msg, uint8_t* buffer) {{\n' | |
| result += f' size_t pos = 0;\n' | |
| for key, field in msg.fields.items(): | |
| result += f' memcpy(buffer + pos, &msg->{field.name}, {field.size});\n' | |
| result += f' pos += {field.size};\n' | |
| result += f' return pos;\n' | |
| result += f'}}\n' | |
| # Deserialize: read each field individually from the buffer | |
| result += f'\n/**\n' | |
| result += f' * Deserialize {structName} field-by-field (no packed struct dependency).\n' | |
| result += f' * @param buffer Input buffer\n' | |
| result += f' * @param buffer_size Size of the input buffer\n' | |
| result += f' * @param msg Pointer to the message to deserialize into\n' | |
| result += f' * @return Number of bytes consumed, or 0 if buffer too small\n' | |
| result += f' */\n' | |
| result += f'static inline size_t {structName}_deserialize_no_packed(const uint8_t* buffer, size_t buffer_size, {structName}* msg) {{\n' | |
| result += f' if (buffer_size < {defineName}_MAX_SIZE) return 0;\n' | |
| result += f' size_t pos = 0;\n' | |
| result += f' memset(msg, 0, sizeof(*msg));\n' | |
| for key, field in msg.fields.items(): | |
| result += f' memcpy(&msg->{field.name}, buffer + pos, {field.size});\n' | |
| result += f' pos += {field.size};\n' | |
| result += f' return pos;\n' | |
| result += f'}}\n' | |
| return result | |
| """ | |
| Historically, this generated memcpy-based {structName}_serialize_no_packed / | |
| {structName}_deserialize_no_packed helpers that copied each field's in-memory | |
| bytes directly. That approach is unsafe for nested messages, helper structs for | |
| variable-length fields, and oneofs, because it does not match the defined wire | |
| layout when natural struct padding is present. | |
| To avoid emitting incorrect serialization code, this generator now omits these | |
| helpers entirely. If no-packed-struct serializers are required in the future, | |
| they must be implemented with explicit, per-field wire-layout logic. | |
| """ | |
| return '' |
| yield '#pragma once\n' | ||
| yield '#pragma pack(1)\n' | ||
| if not no_packed: | ||
| yield '#pragma pack(1)\n' | ||
| yield '#include <stdbool.h>\n' |
There was a problem hiding this comment.
When --no_packed omits #pragma pack(1), any remaining code paths that deserialize/serialize via whole-struct memcpy (e.g., variable-message MAX_SIZE encoding, or fixed-message fallbacks when no_packed helpers aren’t generated) will no longer match the on-the-wire layout due to padding/alignment. To keep the wire format identical, all MAX_SIZE encoding paths should use explicit wire-layout (field-by-field) reads/writes (including oneofs), or use a separate packed wire representation strictly for I/O.
| By default, generated C and C++ code uses `#pragma pack(1)` for struct definitions, allowing direct memory-mapped serialization via `memcpy`. Use `--no_packed` to disable this and generate field-by-field serialize/deserialize functions instead: | ||
|
|
||
| ```bash | ||
| python -m struct_frame messages.proto --build_c --build_cpp --no_packed | ||
| ``` | ||
|
|
||
| This is useful when: | ||
|
|
||
| - Your platform or compiler does not support packed structs | ||
| - You want to avoid unaligned memory access penalties | ||
| - You need portable code that does not rely on compiler-specific packing behavior | ||
|
|
||
| When `--no_packed` is used, all non-variable messages get the same field-by-field serialization approach that variable messages already use. The wire format remains identical, so no_packed and packed code can interoperate. |
There was a problem hiding this comment.
This section states that --no_packed generates “field-by-field serialize/deserialize” and that the wire format remains identical. With the current generator implementation, no_packed paths still memcpy field memory and (at least) don’t handle oneofs and padded helper structs (variable arrays/strings, nested messages), so the on-the-wire layout may diverge. Either tighten the documentation to describe the current limitations, or adjust the generators so no_packed truly serializes the wire layout independent of C/C++ struct padding.
Adds a CLI flag to generate C/C++ code without
#pragma pack, using field-by-field serialize/deserialize for all messages instead of whole-structmemcpy.Variable messages already generate field-by-field functions; this extends that approach to all messages when the flag is active. Wire format is identical, so packed and no_packed code interoperate.
Changes
generate.py: New--no_packedargparse flag, threaded through to C/C++ generators and included in generation hashc_gen.py: Conditionally omits#pragma pack(1); generates_serialize_no_packed()/_deserialize_no_packed()for non-variable messages;serialize()/deserialize()delegate to themcpp_gen.py: Conditionally omits#pragma pack(push, 1)/pop; generates_serialize_fields()/_deserialize_fields()methods inside each struct;serialize()/deserialize()delegate to themcli-reference.mdandcode-generation.mdUsage
Original prompt
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.