Skip to content

zig fmt makes spacing worse in array initializers with Unicode literals #15929

@djpohly

Description

@djpohly

Zig Version

0.11.0-dev.3333+32e719e07

Steps to Reproduce and Observed Behavior

Feed the following nicely formatted example to zig fmt:

const zigg_zagg = [_]u21{
    'ℤ', '1', '2', 'Ẑ', '4', 'ž', 'Ẑ', '7',
    '8', 'Ẑ', 'ž', 'b', 'Ẑ', 'd', 'e', 'ℤ',
};

and it "improves" the spacing in an attempt to align columns:

const zigg_zagg = [_]u21{
    'ℤ', '1',   '2',  'Ẑ', '4',   'ž', 'Ẑ', '7',
    '8',   'Ẑ', 'ž', 'b',   'Ẑ', 'd',  'e',   'ℤ',
};

Expected Behavior

The result looks at least as good as the input?

Rejected ideas:

  1. Blame text editors and terminal emulators for displaying these obviously multi-byte sequences in a single column? 😉
  2. Abandon .len and make a pilgrimage to the peak of Mt. Yunikodo to spend seven years meditating on the meaning of "grapheme" and become attuned to the Way of NFKC. We still get complaints because someone's terminal/font renders ½ a different width from .
  3. Don't try to align array initializers, on the assumption that the people reading the characters have a better understanding of how they are rendered. This means trusting humans for certain aspects of code formatting, which for zig fmt is a little bit out of character (ha!).

Better idea (?):

  1. The tokenizer sets a flag for literals with non-ASCII characters (or use a tag like .string_literal_non_ascii). The formatter does not attempt to align anything when non-ASCII tokens are involved. Simple, and continues to work the same way for the common case, but doesn't make the formatting worse for edge cases.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugObserved behavior contradicts documented or intended behaviorzig fmt

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions