-
-
Notifications
You must be signed in to change notification settings - Fork 205
Letter number #1298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Letter number #1298
Conversation
More later...
|
Does this help? |
| <dt>'LetterNumber'[$c$] | ||
| <dd>returns the position of the character $c$ in the English alphabet. | ||
| <dt>'LetterNumber["string"]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would implement Alphabet[] at least for English, (Latin?) and the second parameter, in a way that anything that is not the default raises a message. At the end, the basic implementation of Alphabet is just a dictionary with the form "alphabetname": "abcd..."
isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have a clearer idea of what to do at this point than I do. So if you'd just do it, I'd appreciate.
Otherwise I am happy to have this hang out as a draft for a while.
For larger context, I have been going over https://www.wolfram.com/language/elementary-introduction/2nd-ed/11-strings-and-text.html and more generally https://www.wolfram.com/language/elementary-introduction/2nd-ed/ which has the attribution Copyright 2021 (and nothing more). The 1st Edition, which I have in book form, has a "non-commercial share-alike" copyright and that is similar.
And the bigger context even here is that these give examples that can be used in worksheets. (See the dockerhub image or the sqlite file in mathics-omnibus and please also see Mathics3/mathics-django#32). But in addition this is a much more gentle way to guide us in filling out the code in a way where users can see immediate results.
The problem I have with FeynCalc, Rubi, KnotTheory and similar packages like this is that they are hundreds if not thousands of lines long and use sometimes sophisticated constructs in intertwined ways. I think if we have a more solid base to start out with, things will go easier there.
I tried pulling a 2006 version of KnotTheory and tried building it. In terms of the number of things we need to fix, there are far fewer. I counted maybe 3 or 4 things. Still, without those I am not able to get this to run. And at least for me this is kind of disappointing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the end, the basic implementation of Alphabet is just a dictionary with the form "alphabetname": "abcd..."
isn't it?
I think there exists python bindings for the ICU project to create alphabets:
Unfortunately the alphabets doesn't seem to be exactly the same as in MMA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the ICU project and it looks both awesome and bureaucratic.
Our needs here are extremely simple and basic: for each language give me an ordered list of the alphabet with a way to convert from one case to the other. And don't even need bidirectional conversion so you can choose which case to start out with.
However, although I can see how to do case conversion using unicode properties (and you'd think then that a library would use that provide such a function equivalent to "lower" or "upper"), there isn't anything that says give me the first letter in the alphabet and iterate to to the last letter, as far as I can tell.
But I could be wrong here. The documentation while extensive isn't all that useful. Sigh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that an ICU-based Alphabet is something to implement in an external module. Here, I would limit to define a basic set of alphabets (let's say, "English", "Spanish", "German", and "Greek" alphabets, that is what I could handle). A Pymathics module then can overload this basic implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With ICU more things like IntegerName and Transliterate can be created:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I was thinking along the lines of @mmatera where this would be an external Pymathics librarie.
There seems a lot of functionality somewhere in ICU, which extends beyond just Alphabet.
At some point I will post a query on StackOverflow. But if you look at past queries on this topic they generally are met with derision. I just upvoted https://stackoverflow.com/questions/32375797/what-unicode-ranges-are-considered-letters and it appears I am the only one to have done so just now after 5 1/2 years with no great answers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @axkr for the link and suggestion. Do you mind if we port that code to Python when we get around to writing the PyMathics module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NO problem if you want to port it.
|
In that case, OK, but if you want to merge this, I think it does not hurt. I can improve this later. |
|
I'll make a pass at some point to sync with the error messages. During the week, basically I have only small chunks of time to do things. Whatever can be done in this time, I do, but things that don't fit have to wait. |
|
@mmatera I think this is ready for this level of detail. However one code path that we don't test is found in this example: and that's because I don't know if the above is what is expected. |
In WMA, the output of that sentence is |
Should be addressed in 71b755d |
support for Alphabets
|
@mmatera I was thinking about this a little more. For a small subset of cases not requiring a But please, let us not extend this in core this way. Instead let us delegate this out to a Pymathics module which is based on something that purports to handle in more general language support. |
| "Uppercase": "ABCDEFGHIJKLMNOPQRSTUVWXYZ", | ||
| }, | ||
| "Spanish": { | ||
| "Lowercase": "abcdefghijklmnñopqrstuvwxyz", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WL Aphabets omit any accented characters such as those in Spanish above an "e", but leave those tilde for "n"? Similarly for an umlaut for German?
If this is the case this is irregular and weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No! This is the "Eñe" and is a letter very important for Spanish speakers! https://en.m.wikipedia.org/wiki/%C3%91
:)
|
Actually, my initial idea was to use |
@mmatera Again not sure what error messages/tests we should have here. Please advise.
Also, handling
Alphabets is probably a bigger project that we probably need come back to.