Skip to content

Conversation

@chenyukang
Copy link
Member

@chenyukang chenyukang commented Aug 12, 2022

Fixes #100388

@rust-highfive
Copy link
Contributor

r? @petrochenkov

(rust-highfive has picked a reviewer for you, use r? to override)

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 12, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 12, 2022
@rust-log-analyzer

This comment has been minimized.

let token = unicode_chars::check_for_substitution(self, start, c, &mut err);
if c == '\x00' {
err.help("source files must contain UTF-8 encoded text, unexpected null bytes might occur when a different encoding is used");
} else if let Ok(v) = &err.suggestions && v.len() == 0 && !c.is_ascii() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a different heuristic for "is not visible when rendered", since this triggers on every non-ascii char.

Also, why are you checking err.suggestions here?

Copy link
Member Author

@chenyukang chenyukang Aug 12, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a different heuristic for "is not visible when rendered", since this triggers on every non-ascii char.

Also, why are you checking err.suggestions here?

Yes, I'm still thinking any better method for check a unicode...

I'm checking the err.suggestions here, because we already have a strategy to give suggestions for those possible substitution,

let token = unicode_chars::check_for_substitution(self, start, c, &mut err);
.

My idea is if we already have such kind of suggestion, we won't add more, since it's clear enough:

 --> /home/cat/code/rust/src/test/ui/parser/unicode-quote-chars.rs:2:14
  |
2 |     println!(“hello world”);
  |              ^
  |
= help: Unicode character \u{323} might not be visible when rendered
help: Unicode characters '“' (Left Double Quotation Mark) and '”' (Right Double Quotation Mark) look like '"' (Quotation Mark), but are not

I think = help: Unicode character \u{323} might not be visible when rendered here will be redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm checking the err.suggestions here, because we already have a strategy to give suggestions for those possible substitution,

I think if you change your PR to only note characters that have a rendered width of zero (i.e. combining characters, etc) then this check is redundant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't found a better way to figure out "not visible when rendered" chars.
We have a test case in

fn is_es_whitespace(self) -> bool {

The unicode_space_separator, unicode_space_separator and combining_spacing_mark categories seem contain the chars which are hard to spot, but those are also not totaly 'not visible`, and I think introduce those big tables into compiler code base maybe too heavy ..

@petrochenkov
Copy link
Contributor

petrochenkov commented Aug 12, 2022

@rustbot author
(to address compiler-errors' comments)

@rustbot
Copy link
Collaborator

rustbot commented Aug 12, 2022

Error: Parsing shortcut command in comment failed: ...'bot author' | error: expected end of command at >| ' (to addre'...

Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 12, 2022
@CAD97
Copy link
Contributor

CAD97 commented Aug 13, 2022

You'll need to use the unicode table generator to generate a table of the characters with General Category=Nonspacing Mark and/or Bidi_Class=Nonspacing_Mark for which to warn on. Adding the table to rustc is probably fine, adding it to std is nondesirable.

The UI test is unrelated.

@bors
Copy link
Collaborator

bors commented Sep 28, 2022

☔ The latest upstream changes (presumably #102302) made this pull request unmergeable. Please resolve the merge conflicts.

@JohnCSimon
Copy link
Member

ping from triage:
@chenyukang

Can you please address the merge conflicts - and post your status on this PR?

FYI: when a PR is ready for review, send a message containing
@rustbot ready to switch to S-waiting-on-review so the PR is in the reviewer's backlog.

@chenyukang chenyukang closed this Nov 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Confusing error message in the presence of unicode combining characters

9 participants