Skip to content

Conversation

@rocky
Copy link
Member

@rocky rocky commented Apr 22, 2021

@mmatera Again not sure what error messages/tests we should have here. Please advise.

Also, handling Alphabets is probably a bigger project that we probably need come back to.

@rocky rocky requested a review from mmatera April 22, 2021 08:36
@rocky rocky marked this pull request as draft April 22, 2021 08:36
@mmatera
Copy link
Contributor

mmatera commented Apr 22, 2021

Does this help?

In[1]:= LetterNumber["ss2!"] 
Out[1]= {19, 19, 0, 0}

In[2]:= LetterNumber[4]

LetterNumber::nas: The argument 4 is not a string.
Out[2]= LetterNumber[4]

In[3]:= LetterNumber[Graphics[{}]]

LetterNumber::nas: The argument -Graphics- is not a string.
Out[3]= LetterNumber[Graphics[{}]

In[4]:= LetterNumber["dd", "Mediano"]

Alphabet::noalpha: The alphabet Mediano is not known or not available.

LetterNumber::nalph: The alphabet Mediano is not known or not available.
 
Out[4]= Missing[NotAvailable]

<dt>'LetterNumber'[$c$]
<dd>returns the position of the character $c$ in the English alphabet.
<dt>'LetterNumber["string"]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would implement Alphabet[] at least for English, (Latin?) and the second parameter, in a way that anything that is not the default raises a message. At the end, the basic implementation of Alphabet is just a dictionary with the form "alphabetname": "abcd..."
isn't it?

Copy link
Member Author

@rocky rocky Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a clearer idea of what to do at this point than I do. So if you'd just do it, I'd appreciate.

Otherwise I am happy to have this hang out as a draft for a while.

For larger context, I have been going over https://www.wolfram.com/language/elementary-introduction/2nd-ed/11-strings-and-text.html and more generally https://www.wolfram.com/language/elementary-introduction/2nd-ed/ which has the attribution Copyright 2021 (and nothing more). The 1st Edition, which I have in book form, has a "non-commercial share-alike" copyright and that is similar.

And the bigger context even here is that these give examples that can be used in worksheets. (See the dockerhub image or the sqlite file in mathics-omnibus and please also see Mathics3/mathics-django#32). But in addition this is a much more gentle way to guide us in filling out the code in a way where users can see immediate results.

The problem I have with FeynCalc, Rubi, KnotTheory and similar packages like this is that they are hundreds if not thousands of lines long and use sometimes sophisticated constructs in intertwined ways. I think if we have a more solid base to start out with, things will go easier there.

I tried pulling a 2006 version of KnotTheory and tried building it. In terms of the number of things we need to fix, there are far fewer. I counted maybe 3 or 4 things. Still, without those I am not able to get this to run. And at least for me this is kind of disappointing.

Copy link

@axkr axkr Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the end, the basic implementation of Alphabet is just a dictionary with the form "alphabetname": "abcd..."
isn't it?

I think there exists python bindings for the ICU project to create alphabets:

Unfortunately the alphabets doesn't seem to be exactly the same as in MMA.

Copy link
Member Author

@rocky rocky Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the ICU project and it looks both awesome and bureaucratic.

Our needs here are extremely simple and basic: for each language give me an ordered list of the alphabet with a way to convert from one case to the other. And don't even need bidirectional conversion so you can choose which case to start out with.

However, although I can see how to do case conversion using unicode properties (and you'd think then that a library would use that provide such a function equivalent to "lower" or "upper"), there isn't anything that says give me the first letter in the alphabet and iterate to to the last letter, as far as I can tell.

But I could be wrong here. The documentation while extensive isn't all that useful. Sigh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that an ICU-based Alphabet is something to implement in an external module. Here, I would limit to define a basic set of alphabets (let's say, "English", "Spanish", "German", and "Greek" alphabets, that is what I could handle). A Pymathics module then can overload this basic implementation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

@rocky rocky Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I was thinking along the lines of @mmatera where this would be an external Pymathics librarie.

There seems a lot of functionality somewhere in ICU, which extends beyond just Alphabet.

At some point I will post a query on StackOverflow. But if you look at past queries on this topic they generally are met with derision. I just upvoted https://stackoverflow.com/questions/32375797/what-unicode-ranges-are-considered-letters and it appears I am the only one to have done so just now after 5 1/2 years with no great answers.

Copy link
Member Author

@rocky rocky Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @axkr for the link and suggestion. Do you mind if we port that code to Python when we get around to writing the PyMathics module?

Copy link

@axkr axkr Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NO problem if you want to port it.

@mmatera
Copy link
Contributor

mmatera commented Apr 22, 2021

In that case, OK, but if you want to merge this, I think it does not hurt. I can improve this later.

@rocky
Copy link
Member Author

rocky commented Apr 22, 2021

I'll make a pass at some point to sync with the error messages.

During the week, basically I have only small chunks of time to do things. Whatever can be done in this time, I do, but things that don't fit have to wait.

@rocky rocky marked this pull request as ready for review April 23, 2021 10:49
@rocky
Copy link
Member Author

rocky commented Apr 23, 2021

@mmatera I think this is ready for this level of detail. However one code path that we don't test is found in this example:

>> LetterNumber[{"P", "Pe", "P1", "eck"}]
 = {16, 16, 5, 16, 0, 5, 3, 11}

and that's because I don't know if the above is what is expected.

@mmatera
Copy link
Contributor

mmatera commented Apr 23, 2021

@mmatera I think this is ready for this level of detail. However one code path that we don't test is found in this example:

>> LetterNumber[{"P", "Pe", "P1", "eck"}]
 = {16, 16, 5, 16, 0, 5, 3, 11}

and that's because I don't know if the above is what is expected.

In WMA, the output of that sentence is
{16, {16, 5}, {16, 0}, {5, 3, 11}}

@rocky
Copy link
Member Author

rocky commented Apr 24, 2021

@mmatera I think this is ready for this level of detail. However one code path that we don't test is found in this example:

>> LetterNumber[{"P", "Pe", "P1", "eck"}]
 = {16, 16, 5, 16, 0, 5, 3, 11}

and that's because I don't know if the above is what is expected.

In WMA, the output of that sentence is
{16, {16, 5}, {16, 0}, {5, 3, 11}}

Should be addressed in 71b755d

@rocky
Copy link
Member Author

rocky commented Apr 24, 2021

@mmatera I was thinking about this a little more. For a small subset of cases not requiring a LoadModule["pymathicsICU"] I suppose the small set we have here is fine.

But please, let us not extend this in core this way. Instead let us delegate this out to a Pymathics module which is based on something that purports to handle in more general language support.

@rocky rocky merged commit e941cf7 into master Apr 24, 2021
"Uppercase": "ABCDEFGHIJKLMNOPQRSTUVWXYZ",
},
"Spanish": {
"Lowercase": "abcdefghijklmnñopqrstuvwxyz",
Copy link
Member Author

@rocky rocky Apr 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WL Aphabets omit any accented characters such as those in Spanish above an "e", but leave those tilde for "n"? Similarly for an umlaut for German?

If this is the case this is irregular and weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No! This is the "Eñe" and is a letter very important for Spanish speakers! https://en.m.wikipedia.org/wiki/%C3%91
:)

rocky added a commit that referenced this pull request Apr 24, 2021
@mmatera
Copy link
Contributor

mmatera commented Apr 24, 2021

Actually, my initial idea was to use Alphabet to something which allows to hook custom definitions inside LetterNumber and other similar builtins. Then, the actual definition of Alphabet could be implemented as a pymathics module, or as a .m WL module. The problem is how to implement the "lowercase" for generic alphabets.

@rocky rocky deleted the LetterNumber branch June 7, 2021 23:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants