Skip to content

Conversation

@pahjbo
Copy link
Member

@pahjbo pahjbo commented May 20, 2024

updates the document for #18

@sonarqubecloud
Copy link

Quality Gate Passed Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@sonarqubecloud
Copy link

sonarqubecloud bot commented Oct 7, 2024

@pdowler
Copy link
Member

pdowler commented Mar 19, 2025

Looking at this in the co text of CAOM-2.5, I have noted that I have used vodml-id(s) and utype(s) in tap_schema inconsistently and they should be related: specifically, the canonical tap_schema should be derived from the model. Is that what you have in mind as well?

If I grok the proposed rules/BNF, then these two examples should illustrate:

vodml-id Observation.uri -> tap_schema utype caom2:Observation.uri
vodml-id Telescope.name -> tap_schema utype caom2:Observation.telescope.name

First the scheme prefix: utype style is technically a vodml-ref rather than vodml-id, which is probably conceptually correct.

Second, the rest: historically, utype style also tended to encode the path of composition names rather than the simpler type-name + attr name. The second example above shows this mismatch.

With VO-DML-1.0 I could chose the style, and hence fix the model or the tap_schema. With this rule in place as-is, the vodml-id(s) would have to conform and the utype(s) would have to change to match. The utype style does tend to make rather longer utype strings than necessary but it provides some context / readability... at least that's probably the intent/idea behind writing them that way. The vodml-id rules as written would make for simple, more or less finite-length vodml-id (and hence utype) values (finite meaning independent of the complexity/nesting/hierarchy of the model and just based on sane naming).

thoughts?

@pahjbo
Copy link
Member Author

pahjbo commented Mar 20, 2025

Looking at this in the co text of CAOM-2.5, I have noted that I have used vodml-id(s) and utype(s) in tap_schema inconsistently and they should be related: specifically, the canonical tap_schema should be derived from the model. Is that what you have in mind as well?

yes the current version of the tooling does produce a TAP schema automatically from the VO-DML (with any possible variations in style tied down in the binding)

@pahjbo
Copy link
Member Author

pahjbo commented Mar 20, 2025

Second, the rest: historically, utype style also tended to encode the path of composition names rather than the simpler type-name + attr name. The second example above shows this mismatch.

With VO-DML-1.0 I could chose the style, and hence fix the model or the tap_schema. With this rule in place as-is, the vodml-id(s) would have to conform and the utype(s) would have to change to match. The utype style does tend to make rather longer utype strings than necessary but it provides some context / readability... at least that's probably the intent/idea behind writing them that way. The vodml-id rules as written would make for simple, more or less finite-length vodml-id (and hence utype) values (finite meaning independent of the complexity/nesting/hierarchy of the model and just based on sane naming).

I thought that one of the motivations behind VO-DML was to bring some rigour and uniformity to UTypes as a result of https://www.ivoa.net/documents/Notes/UTypesUsage/index.html - but a consequence of this will mean that backwards compatibility will be impossible to guarantee for models that declared UTypes that did not conform to the new style.

@olaurino
Copy link

olaurino commented Mar 20, 2025

I thought that one of the motivations behind VO-DML was to bring some rigour and uniformity to UTypes as a result of https://www.ivoa.net/documents/Notes/UTypesUsage/index.html - but a consequence of this will mean that backwards compatibility will be impossible to guarantee for models that declared UTypes that did not conform to the new style.

It should now be possible to rigorously define utypes as "shortcut" references into a VODML-compliant model, at least for backwards compatibility. I am not sure how useful this is unless the mapping is standardized.

The main obstacle to using utypes was that the same attribute of the same type, say a sky position, when appearing in multiple parts of a model, let alone across models, would end up having hundreds of different utypes. This stemmed from a concrete problem when we were working on the Spectral/SED models and applications. Since the consensus was that utypes weren't to be parsed, a client capable of dealing with sky positions couldn't try and figure out that mod:x.y.z.ra and othermod:xx.yy.zz.ra were the same attribute of the same type without a knowledge of both sets of utypes. With VODML and the mapping document, one could parse a VOTABLE and find all instances of a sky position, its frame, etc. without any knowledge of the model it was embedded in. We provided a way of doing this with utypes and without changing the votable schema 10+ years ago, but the suggestion was shot down for being too complicated.

Now, a client using utypes would need to match a string like caom2:Observation.telescope.name exactly, and know that it's the same as say mymodel:MyObservatory.my-mission.my-telescope.name (not particularly useful in this specific case, but with stuff like positions and photometry filters one would see the problem). If one wanted to make sure that caom2:Observation.telescope.name was interpreted as a reference to a specific attribute in a specific type even by a client that doesn't know caom2, there could be a rigorous mapping defined somewhere.

Is this worth the effort? I am not really sure.

@olaurino
Copy link

olaurino commented Mar 20, 2025

I should add that the reason we left the IDs as opaque string was also to allow the freedom that @pdowler was mentioning, so I sympathize with the objection to enforcing a specific grammar.

@pahjbo
Copy link
Member Author

pahjbo commented Mar 20, 2025

As I said in #18 the tooling does not actually directly use the vodml-id, but does rely on the fact that the <vodml-ref> does have the regular "path-like" syntax to refer to a model element - so an alternative would be to redefine <vodml-ref> to break any connection with vodml-id and then that can be used to make arbitrary UTypes

I personally think arbitrary UTypes vodml-id eventually leads to confusion, and some sort of system for them would be good

@pdowler
Copy link
Member

pdowler commented Mar 20, 2025

I'm not really objecting to vodml-id being specifically defined, just commenting that the short specific definition (which would lead to a vodml-ref like caom2:Telescope.name) is different that the classic path-in-model utype style that many (including me) used eg in tap_schema metadata: caom2:Observation.telescope.name.

They are both unique and specific, the vodml-ref is shorter (length independent of model complexity) which I thick is a feature. The classic utype is longer (can get quite long and cumbersome) but I guess feels like it conveys some context info. That's probably why people wrote them like this by hand.

So at this point, I'd probably agree with the shorter one as expressed in the PR (but see my next point below).

@pdowler
Copy link
Member

pdowler commented Mar 20, 2025

Here is the BUT :-) and I hope it makes sense

Then there is the way that I gave used DataType in the caom2 model. Example

ObjectType Energy has an attribute with vodml-id Energy.bounds and the datatype has vodml-ref caom2:types.Interval

types.Interval is a a DatatType with two attributes (types.Interval.lower and types.Interval.upper) that are real.

Let's say I have a TAP service with one column for the interval value (double[2] in PG). That column would have utype="caom2:Energy.bounds" (by the spec). Looks fine. That supposes that a dataType maps to a single column (that is my intent).

But, what if another implemention of caom2 wants to store the interval in two separate columns (type double). Then by the spec these columns would have utype="caom2:types.Interval.lower" and "types.Interval.upper" and not be distinguishable from other uses of Interval in the model... the class pathy style of utype="caom2:Plane.energy.bounds.lower" would be distinguishable. But that would mean utypes are not vodml-ref(s) as currently defined because Plane.energy.bounds.lower is not a vodml-id that exists anywhere.

TL;DR - utype==vodml-ref + re-use of dataType(s) + implementation decision -> non-unique utypes

Which of those has to give? I do not want to try to define utype again because I thought we had :-)
Is attributes in dataType(s) having a specific vodml-id the thing that's wrong? They do not seem useful for anything...

@pahjbo
Copy link
Member Author

pahjbo commented Mar 21, 2025

So I think that the above example basically means that if you want to convey the a UType of caom2:Plane.energy.bounds.lower which seems reasonable then UType != vodml-Id but UType can be built in a 'pathy' way from vodml-Ids. I think that this is the big conceptual change that needs to be made.

As I said above internally in VO-DML documents and between VO-DML documents the tooling just uses vodml-refs which are computed (and does not use vodml-ids to do that computation) so it can be that vodml-ids are what are used only for referencing into a model from externally, and can retain the ability to be arbitrary, and I get that this is pretty much the opposite of what this PR says. Although it would still be useful if most vodml-ids were built from vodml-refs.

Incidentally the VO-DML standard says almost nothing about UTypes and I am really not sure that they are that well defined anywhere in an IVOA standard.

@msdemlei
Copy link
Contributor

msdemlei commented Mar 21, 2025 via email

@pahjbo
Copy link
Member Author

pahjbo commented Mar 21, 2025

Actually I have just noticed on p 20 of the 1.0 spec

That stereotype
defines a tag 'vodml-id'. When assigning the stereotype to a particular model
element one can define an explicit value for the vodml-id of the element, rather
than the default value that is the one generated from the VO-UML itself using the
grammar described in Appendix C below. In spite of this possibility, modelers
SHOULD NOT define custom vodml-id, as the grammar offers an explicit, human
readable expression that gives some hints as to the location of the element in the
model. The main reason to do so is to use values from old lists of utype-s for
example.

perhaps this text should be moved up to the main vodml-id section too.

@olaurino
Copy link

As I mentioned, without changing anything in the document it should be possible (but I haven't tried in the past 10 years) to define a mechanism to declare utype-style strings based on a vodml-compliant model. A well-defined model like caom2 would be a perfect benchmark.

This could be based on some kind of heuristics rather than a standard, for simplicity.

[Even for an old model that only declares utypes (and some hand-wavy set of boxes and arrows), one could probably reverse-engineer the utypes into a compliant model, so that the utypes remain the same. We probably did that in the Spectral/SED context in a previous lifetime.]

The point is that vodml ids and refs are supposed to be opaque, and they could still be so, leaving people the ability to produce them in a more user-readable fashion, like I understand caom2 does. This saves the opaqueness, "algorithmical" nature of the vodml ecosystem, while also producing more humanly appealing strings, if one wishes to do so.

These are still not utypes, because utypes are supposed to be a single string that conveys a lot of information. These utypes have issues in more complex cases [unless one starts to use a parsable grammar for e.g. arrays, like x.y[0].z]. In Naples 2011 somebody from Strasbourg presented the idea to use VOTABLE GROUPs to group together pieces of information that required the same utype to be present more than once and in a relationship with other elements with utypes that would also be repeated. These use cases were somewhat complex, and while they informed the attempt to standardize utypes, they are probably more complex than RegTAP.

For cases in which one doesn't need GROUPs of related data elements with the same utype, a simple heuristics that maps a human-readable utype string to a compliant model element could be enough. So wherever in the IVOA standards a utype is still required, these utypes could rigorously (or even just heuristically) point to an element of a compliant model.

To recap, the idea behind the utype was to have a single non-machine-parsable human-parsable string that would convey information about the type of a data element and its role in a complex structure. This was impossible to do, in a rigorous way, with just one string. VODML now provides a way for that string to be what it was supposed to be, a pointer to a model element, its type and its role. It's possible that the simplest way to do this is what caom2 does, and to produce vodml-ids and vodml-refs in such a way that the same string can work both as a utype and a vodml-ref, and that in order to do so one shouldn't make Appendix C normative.

@pahjbo pahjbo added this to the VO-DML 1.1 milestone Jul 4, 2025
@sonarqubecloud
Copy link

@pahjbo pahjbo added the documentation Improvements or additions to documentation label Nov 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants