Skip to content

Uniq: fix -s and implement -f flags#131

Merged
bors[bot] merged 1 commit intoGrayJack:devfrom
marcospb19:implement-skip-fields-flag
Oct 30, 2020
Merged

Uniq: fix -s and implement -f flags#131
bors[bot] merged 1 commit intoGrayJack:devfrom
marcospb19:implement-skip-fields-flag

Conversation

@marcospb19
Copy link
Contributor

@marcospb19 marcospb19 commented Oct 20, 2020

There are some decisions I have to ask you first before completing and asking for a complete review:

  1. It seems that other implementations of uniq ignore the trailing '\n' at the end of the lines, but does not ignore '\r', what is the desired behavior for me to implement? (I know that this project is only meant to be used in Windows, but \r\n files are still possible).
  2. In Rust we usually deal with UTF-8 encoded text, it happens that the -s flag, commonly named --skip-chars, does not skip UTF-8 chars, but bytes instead (other implementations), see this example:
uniq -s 1
é verdade
ó verdade
a verdade

Uniq will display the 3 lines received.

The problem when comparing different slices of bytes is that:

let text = "não";
let slice = &text[2..];

Throws a runtime panic, because of the multi-byte char 'ã'.

Should I just work this around by using .as_bytes() instead of string slices?


About -s previous errors, the output wasn't showing the full line that first matched with the "pattern". I'll fix tests soon.

@GrayJack
Copy link
Owner

GrayJack commented Oct 20, 2020

  1. Nowadays almost all editors supports windows like line-ending, so I think if a file has it, and it is followed by a "\n" it should be have the same behavior as "\n"

  2. There is something specifying that in the -s should consider only ASCII in the specification? If there isn't I think it should handle UTF-8 gracefully, otherwise, it would be a cool extension a new flag that handles UTF-8

I didn't had the time to look a the code yet, but I'll check as soon as possible

@marcospb19 marcospb19 force-pushed the implement-skip-fields-flag branch from 7b8fd5e to 92c472f Compare October 29, 2020 02:12
@marcospb19 marcospb19 force-pushed the implement-skip-fields-flag branch from 92c472f to c3f5205 Compare October 30, 2020 01:41
@marcospb19 marcospb19 marked this pull request as ready for review October 30, 2020 01:46
@marcospb19
Copy link
Contributor Author

@GrayJack, I had some time to work in fixing -s and implementing -f similar the GNU implementation of uniq.

This means that "\n" is different than "\r\n", and -s|skip-chars=N skips bytes instead (a char from the C language).

I'm sorry I don't have the time to change this :/ so I suggest that you could create an issue for each one if you want to.

@GrayJack GrayJack added the hacktoberfest-accepted Accepted PR for hacktoberfest label Oct 30, 2020
@GrayJack
Copy link
Owner

bors r+

@bors bors bot merged commit ceb606b into GrayJack:dev Oct 30, 2020
@marcospb19 marcospb19 mentioned this pull request Nov 28, 2020
7 tasks
This was referenced Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hacktoberfest-accepted Accepted PR for hacktoberfest

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants