Skip to content

Adding Polish wordlist to BIP39#1037

Closed
KarolTrzeszczkowski wants to merge 4 commits intobitcoin:masterfrom
KarolTrzeszczkowski:master
Closed

Adding Polish wordlist to BIP39#1037
KarolTrzeszczkowski wants to merge 4 commits intobitcoin:masterfrom
KarolTrzeszczkowski:master

Conversation

@KarolTrzeszczkowski
Copy link
Copy Markdown

Words chosen using the following rules:

  1. Words are 4-8 letters long.
  2. Words can be uniquely determined typing the first 4 letters.
  3. Special Polish characters like 'ą', 'ę', 'ć', etc... are considered equal to 'a', 'e', 'c', etc... in terms of identifying a word. Therefore, there is no need to use a Polish keyboard to introduce the passphrase, an application with the Polish wordlist will be able to identify the words after the first 4 chars have been typed even if the chars with accents have been replaced with the equivalent without accents.
  4. All words are in basic form.
  5. No personal names or geographical names.
  6. No very similar words with 2 letter of difference.
  7. Words are sorted according English alphabet ignoring diacritic signs.
  8. No words already used in other language mnemonic sets (english, italian, french, spanish, czech).
  9. Built with the most popular Polish words based on the Open frequency dictionary of lexems
  10. Words include negative and bad things as those are easier to remember.

Unlike #753 it this wordlist is based on popular words dictionary and does not include any words used in other language mnemonic sets. It differs also by using polish symbols. Please consider merging it.

@bitmover-studio
Copy link
Copy Markdown
Contributor

Hello,
I have created a similar list #998 (still not approved).
I just used the same script to check your list.

Your list is very good. levenshtein distance is greater than 1 in every word comparison, and I found no errors in the other rules.

The only word that I would change is this one:

mama

it is a repeated word from the spanish list, mamá

As most software won't be able to make a difference between mama and mamá, I would change this one.

Great work!

Words like mama conflict with Spanish mamá. This commit remove all such words.
@KarolTrzeszczkowski
Copy link
Copy Markdown
Author

Thank you for the nice words and catching the collision!

I was able to identify more such word collisions and I removed them:
faraon
interes
ironia
legion
mama
tabu
teoria
tunel

@GitHub-pepe
Copy link
Copy Markdown

3f34351

@p2w34
Copy link
Copy Markdown

p2w34 commented Nov 29, 2020

Being called to the blackboard by seeing my PR referenced I feel obliged to share some of my thoughts. Here is how I see it:

  • Polish wordlist for BIP0039 #753 was created by manually selecting the words from the well-respected dictionary (https://sjp.pl/slownik/odmiany/); I strongly believe that manual selection is way better than using any list sorted by the frequency of usage; it is not true that the more frequently a particular word is used, the better it fits
  • I commented multiple times on the idea of avoiding repeating the words already used in other word lists; to not to repeat myself – it brings more cons than pros and should be forgotten
  • I do not think that creating yet another version of the Polish word list was necessary; especially without trying to first comment on the existing PR – what purpose does it serve?
  • Last but not least – the sad truth is that similarly to other PRs this effort is wasted as well and this PR will most likely never get merged. The time and energy spent on it could be for sure used better. To not to repeat myself, see my comment in Add Portuguese wordlist to BIP39 #998

@KarolTrzeszczkowski
Copy link
Copy Markdown
Author

@luke-jr Could you please take a look at my pull request?

@michaelfolkson
Copy link
Copy Markdown

As I understand it there are two competing PRs to add a Polish wordlist currently open. This one and #753.

I don't speak Polish and afaik Luke and the BIP 39 authors don't either. Before we ask one of the BIP authors to ACK this (which is needed to merge it) we are going to need Polish speaker(s) who ideally understand BIP 39 to look over this and judge which PR should be merged (if any).

This PR looks high quality to me but I am neither a Polish speaker nor a BIP author.

@michaelfolkson
Copy link
Copy Markdown

This is also potentially relevant to this PR from one of the BIP 39 authors #1047

@tkowalczyk
Copy link
Copy Markdown

Please consider this PR it looks promising and it will be definietly valueable for community.

Code of this PR is not complicated so I believe it will not have an bad impact for project and its efficiency and security.

@KrzychuLSK
Copy link
Copy Markdown

KrzychuLSK commented Dec 28, 2020

Great idea! It will be vey valueable for community!
I'm polish native speaker so for me this wordlist will be perfect.

@cornl1
Copy link
Copy Markdown

cornl1 commented Dec 28, 2020

Looks really good to me. It may have positive impact on Polish community, especially the newcomers.

@Wojtekop
Copy link
Copy Markdown

Polish wordlist will be amazing. It will help every polish native speaker like me.

@p2w34
Copy link
Copy Markdown

p2w34 commented Dec 28, 2020

NACK from my side.

One does not have to spend more than one minute to find words that are considered offensive. I was also stroke by the incorrect order of words at the end of the list. The chosen set of words looks strange to me. I am under the impression the list was generated automatically, without really trying to polish it. Not to mention that the proper approach should be to manually select all the words. And what I really cannot understand is the list of the comments above - are they just quick comments (like doing a favor?), without putting the effort into at least reading the list?
Last but not least, I still do not understand what was the reason behind this list while there was already another PR created.

I am not impressed, it does not look good, hence the NACK.

@KarolTrzeszczkowski
Copy link
Copy Markdown
Author

KarolTrzeszczkowski commented Dec 28, 2020

@p2w34 I explained in the description that I included offensive words as they are loaded with emotions and easy to remember. Seed words are private so there is no reason to avoid them. If it is required, I will remove them.

If you could point me directly to the incorrect order? Thank you.

The reason I created this list was because you refused to include feedback from other people and I didn't like your choices of words at all. They are mostly weird and not memorable at all. Judging from your attitude and how proud you are of your work, I expect you'd refuse my feedback as you refused to include other people feedback.

@p2w34
Copy link
Copy Markdown

p2w34 commented Dec 28, 2020

The reason I created this list was because you refused to include feedback from other people and I didn't like your choices of words at all. They are mostly weird and not memorable at all. Judging from your attitude and how proud you are of your work, I expect you'd refuse my feedback as you refused to include other people feedback.

The only reason I am spending my time being involved in various discussions here is that I am worried about the quality of the word lists. And I cannot say that I am having a good time - on the contrary. I may make comments which sound harsh but I do this only when absolutely necessary. All the comments made in another PR with the Polish list that I created were addressed.
And yes, you got me right - I am proud of my work.

As my final comment, I repeat myself - I am of opinion that BIP0039 should not be continued in the current form. The problem of word lists should be approached separately, in a more holistic manner. This is, however, to be decided by the BIP0039 maintainers. Or one may try to simply write a new proposal.

@KarolTrzeszczkowski
Copy link
Copy Markdown
Author

@p2w34 if you could point me to the ordering error you mentioned?

@KarolTrzeszczkowski
Copy link
Copy Markdown
Author

KarolTrzeszczkowski commented Dec 28, 2020

I don't think the author of the competing PR should leave a NACK here and lie about an ordering error, that would have been found in the initial algorythmic check performed by @bitmover-studio. It's clear that it's nothing but an ego battle having nothing to do with the quality of the proposed wordlist.

@p2w34
Copy link
Copy Markdown

p2w34 commented Dec 28, 2020

I don't think the author of a competing PR should leave a NACK for a competing PR and lie about an ordering error

There are words starting with ł that are placed at the end of the list, instead of being together with l. Ideally, the algorithmic checks you mention should be done by a native.

It's clear that it's nothing but an ego battle having nothing to do with the quality of the proposed wordlist.

Again, it is not.

@KarolTrzeszczkowski
Copy link
Copy Markdown
Author

There are words starting with ł that are placed at the end of the list, instead of being together with l

You are right. I am sorry for this accusation. It's supper weird my algorithm left it out and bitmovers check haven't caught it. I'm sorry once again. I will fix it.

@ZenulAbidin
Copy link
Copy Markdown
Contributor

ZenulAbidin commented Jan 4, 2021

There are words starting with ł that are placed at the end of the list, instead of being together with l

You are right. I am sorry for this accusation. It's supper weird my algorithm left it out and bitmovers check haven't caught it. I'm sorry once again. I will fix it.

I checked the latest revision of your wordlist (which should be this one, correct me if I'm wrong), using my tool bip39validator and my log output (https://paste.ubuntu.com/p/Jwc83KJ8ZB/) says all words are <= 8 chars, no accents, are unique within the first 4 words and have a Levenshtein distance between every other word of at least 2. Those are the default parameters it runs with.

Those are three of the four major checks that a BIP39 wordlist should be tested against, but you currently have to make the fourth check by hand; ensuring there are no words in this list that are similar to words in other (merged) languages' lists.

I should mention that I am not a Polish speaker either.

@luke-jr
Copy link
Copy Markdown
Member

luke-jr commented Jul 2, 2021

@luke-jr luke-jr closed this Jul 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.