From f39a046c957bf2e6544186314501417c7d57cc3d Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Mon, 9 Sep 2024 18:03:55 +0000 Subject: [PATCH 01/14] FAIR appendix --- hapi-dev/HAPI-data-access-spec-dev.md | 79 +++++++++++++++++++++++++++ 1 file changed, 79 insertions(+) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 2f04771..d9dd100 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -2000,3 +2000,82 @@ is responsive and sets the `User-Agent` agent to ``` Note that the use of the [wiki page](https://github.com/hapi-server/data-specification/wiki/hapi-bots.md) to describe bots is encouraged. + +## 8.6 FAIR + +The following elements of FAIR have been copied from https://www.go-fair.org/fair-principles/. Comments relevant to HAPI are shown in _itallic_. + +### Findable + +The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process. + +1. (Meta)data are assigned a globally unique and persistent identifier + + _?? Is our `resourceID` used to describe data or metadata? Both are needed._ + + Let PID be associated with both (Zenodo does both). We can keep `resourceID` and say "If your prefix is doi, what others?" then FAIR by verifier. + +2. Data are described with rich metadata (defined by Reusable, item 1. below) + + _To be FAIR, data providers must meet this requirement using `description` or `additionalMetadata`_ + +3. Metadata clearly and explicitly include the identifier of the data they describe +_e.g., metadata mentions resourceID. Redundant but useful for confirmation._ + +4. (Meta)data are registered or indexed in a searchable resource + + _At present, this requirement has partially been met. All HAPI metadata is viewable at https://hapi-server.org/servers and one can search by keyword within a dataset. In addition, one can use https://heliophysicsdata.gsfc.nasa.gov/ to search for datasets provided by the CDAWeb HAPI server. However, there is a development in which HAPI metadata will be ingested by a general search interface, https://heliodata-staging.heliophysics.net/, in which case this requirement will be met._ + +### Accessible + +Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorisation. + +1. (Meta)data are retrievable by their identifier using a standardised communications protocol + + _?? This is satisfied by a landing page. that uses resourceID that is a PID. However, I read this requirement differently ... research more the HAPI identifier is not a persistent identifier (do they mean persistent identifier?). Which identifier is meant here `/info?dataset=ID` and `/data?dataset=ID` do this_ + +2. The protocol is open, free, and universally implementable + + _HAPI meets these requirements_ + +3. The protocol allows for an authentication and authorisation procedure, where necessary + + _Not applicable_ + +4. Metadata are accessible, even when the data are no longer available + + _heliodata.net will do this. There is a project in which all HAPI metadata will be cached nightly. In this case, this requirement will be met. However, we need a way of communicating if metadata is from cache because server is down._ + +### Interoperable + +The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. + +1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. + + _If HAPI metadata maps to SOSO then someone else will do. Coordinate with Daniel about piping in HAPI to create landing page at heliodata.net. Formal: yes, accessible: yes, shared: ?, broadly applicable: ?_ + +2. (Meta)data use vocabularies that follow FAIR principles + + _HAPI does not use vocabularies. There is a plan to map HAPI metadata a metadata standard that does (e.g., SOSO)._ + +3. (Meta)data include qualified references to other (meta)data + + _We provide `additionalMetadata`_ + +### Reusable + +The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. + +1. (Meta)data are richly described with a plurality of accurate and relevant attributes + +2. (Meta)data are released with a clear and accessible data usage license + + _We need to add a `licence` attribute; verifier should warn if missing_ + +3. (Meta)data are associated with detailed provenance + + _We need to add a `provenance` attribute maybe modify description to tell people to mention provenance; verifier should warn if missing; think about how to say "sameAs" or "relatedTo" other HAPI datasets._ + +3. (Meta)data meet domain-relevant community standards + + _HAPI is community standard and mappings exist to other community standard if you don't agree that it is a community standard_ From d67933f19125853eeacd6544a406491d3c4293cd Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Mon, 16 Sep 2024 16:57:36 +0000 Subject: [PATCH 02/14] notes from Rebecca meeting --- hapi-dev/HAPI-data-access-spec-dev.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index d9dd100..6f2ce61 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -2013,14 +2013,15 @@ The first step in (re)using data is to find them. Metadata and data should be ea _?? Is our `resourceID` used to describe data or metadata? Both are needed._ - Let PID be associated with both (Zenodo does both). We can keep `resourceID` and say "If your prefix is doi, what others?" then FAIR by verifier. + We tell people that if the want FAIR, do this. 2. Data are described with rich metadata (defined by Reusable, item 1. below) _To be FAIR, data providers must meet this requirement using `description` or `additionalMetadata`_ 3. Metadata clearly and explicitly include the identifier of the data they describe -_e.g., metadata mentions resourceID. Redundant but useful for confirmation._ + + _identifier means internal and in sense of 1. We have internal already_ 4. (Meta)data are registered or indexed in a searchable resource From b197406215b94b6597279cb7f62c3ee1bb658580 Mon Sep 17 00:00:00 2001 From: jvandegriff Date: Mon, 16 Sep 2024 14:04:08 -0400 Subject: [PATCH 03/14] Jon's comments after Sep 16 meeting --- hapi-dev/HAPI-data-access-spec-dev.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 6f2ce61..ab8861c 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -2005,15 +2005,19 @@ Note that the use of the [wiki page](https://github.com/hapi-server/data-specifi The following elements of FAIR have been copied from https://www.go-fair.org/fair-principles/. Comments relevant to HAPI are shown in _itallic_. +HAPI is designed to be able to fully represent data that is itself already FAIR. Some aspects of HAPI adress FAIR directly, such as Interoperability, but the aspects related to findabilty and persistent identifiers need to be solved outside of HAPI by the data provider. + ### Findable The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process. 1. (Meta)data are assigned a globally unique and persistent identifier - _?? Is our `resourceID` used to describe data or metadata? Both are needed._ +_?? Is our `resourceID` used to describe data or metadata? Both are needed._ - We tell people that if the want FAIR, do this. + In the `catalog` response, a HAPI server lists a unique dataset identifier for each dataset available at that server. These are not usable as a PID (https://becker.wustl.edu/news/introduction-to-pids-what-they-are-and-how-to-use-them/) since they are unique only to the server, and could be changed by the data provider. If there is a persistent, globally unique for a dataset that is machine resolvable, it can be provided in the `resourceID` field in the `info` response. + + We tell people that if they want FAIR, do this. 2. Data are described with rich metadata (defined by Reusable, item 1. below) @@ -2021,10 +2025,14 @@ The first step in (re)using data is to find them. Metadata and data should be ea 3. Metadata clearly and explicitly include the identifier of the data they describe + This refers to both the internal ID and the persistent ID, so this requirement is met. + _identifier means internal and in sense of 1. We have internal already_ -4. (Meta)data are registered or indexed in a searchable resource +5. (Meta)data are registered or indexed in a searchable resource + This is somewhat outside the realm of HAPI, which is focused on access and not discovery. + _At present, this requirement has partially been met. All HAPI metadata is viewable at https://hapi-server.org/servers and one can search by keyword within a dataset. In addition, one can use https://heliophysicsdata.gsfc.nasa.gov/ to search for datasets provided by the CDAWeb HAPI server. However, there is a development in which HAPI metadata will be ingested by a general search interface, https://heliodata-staging.heliophysics.net/, in which case this requirement will be met._ ### Accessible From ac425c3fe61756c65988811ac45ff5c5ee954765 Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Mon, 23 Sep 2024 16:34:55 +0000 Subject: [PATCH 04/14] 2024-09-23 updates --- hapi-dev/HAPI-data-access-spec-dev.md | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 6f2ce61..8b4c42b 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -2013,7 +2013,13 @@ The first step in (re)using data is to find them. Metadata and data should be ea _?? Is our `resourceID` used to describe data or metadata? Both are needed._ - We tell people that if the want FAIR, do this. + `resourceID` is same for metadata and data. + + We tell people that if the want FAIR, use a globally unique and persistent identifier in `resourceID`. If provider does not have one id per dataset, but has + * single DOI for one server, then put it in `/about` response. + * one DOI per file, serve it as a dataset. + + Need to modify `/about` to have `resourceID` and modify `citation` (which can include doi to paper) to be for when `resourceID` does not exist. 2. Data are described with rich metadata (defined by Reusable, item 1. below) @@ -2023,6 +2029,8 @@ The first step in (re)using data is to find them. Metadata and data should be ea _identifier means internal and in sense of 1. We have internal already_ + HAPI spec requires internal identifier in catalog?all=true response and so provider needs to satisfy 1. + 4. (Meta)data are registered or indexed in a searchable resource _At present, this requirement has partially been met. All HAPI metadata is viewable at https://hapi-server.org/servers and one can search by keyword within a dataset. In addition, one can use https://heliophysicsdata.gsfc.nasa.gov/ to search for datasets provided by the CDAWeb HAPI server. However, there is a development in which HAPI metadata will be ingested by a general search interface, https://heliodata-staging.heliophysics.net/, in which case this requirement will be met._ @@ -2033,7 +2041,7 @@ Once the user finds the required data, she/he/they need to know how they can be 1. (Meta)data are retrievable by their identifier using a standardised communications protocol - _?? This is satisfied by a landing page. that uses resourceID that is a PID. However, I read this requirement differently ... research more the HAPI identifier is not a persistent identifier (do they mean persistent identifier?). Which identifier is meant here `/info?dataset=ID` and `/data?dataset=ID` do this_ + Yes all requests must have `dataset`, which is the identifier. 2. The protocol is open, free, and universally implementable @@ -2041,7 +2049,7 @@ Once the user finds the required data, she/he/they need to know how they can be 3. The protocol allows for an authentication and authorisation procedure, where necessary - _Not applicable_ + _Not applicable, not in spec._ 4. Metadata are accessible, even when the data are no longer available From 2232ab9e855f678ebd3d5959709bd8c49fa25476 Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Mon, 23 Sep 2024 16:43:29 +0000 Subject: [PATCH 05/14] merge --- hapi-dev/HAPI-data-access-spec-dev.md | 7 ------- 1 file changed, 7 deletions(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 44ed00f..20978d6 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -2015,7 +2015,6 @@ The first step in (re)using data is to find them. Metadata and data should be ea _?? Is our `resourceID` used to describe data or metadata? Both are needed._ -<<<<<<< HEAD `resourceID` is same for metadata and data. We tell people that if the want FAIR, use a globally unique and persistent identifier in `resourceID`. If provider does not have one id per dataset, but has @@ -2023,11 +2022,9 @@ _?? Is our `resourceID` used to describe data or metadata? Both are needed._ * one DOI per file, serve it as a dataset. Need to modify `/about` to have `resourceID` and modify `citation` (which can include doi to paper) to be for when `resourceID` does not exist. -======= In the `catalog` response, a HAPI server lists a unique dataset identifier for each dataset available at that server. These are not usable as a PID (https://becker.wustl.edu/news/introduction-to-pids-what-they-are-and-how-to-use-them/) since they are unique only to the server, and could be changed by the data provider. If there is a persistent, globally unique for a dataset that is machine resolvable, it can be provided in the `resourceID` field in the `info` response. We tell people that if they want FAIR, do this. ->>>>>>> b197406215b94b6597279cb7f62c3ee1bb658580 2. Data are described with rich metadata (defined by Reusable, item 1. below) @@ -2039,13 +2036,9 @@ _?? Is our `resourceID` used to describe data or metadata? Both are needed._ _identifier means internal and in sense of 1. We have internal already_ -<<<<<<< HEAD HAPI spec requires internal identifier in catalog?all=true response and so provider needs to satisfy 1. 4. (Meta)data are registered or indexed in a searchable resource -======= -5. (Meta)data are registered or indexed in a searchable resource ->>>>>>> b197406215b94b6597279cb7f62c3ee1bb658580 This is somewhat outside the realm of HAPI, which is focused on access and not discovery. From 14c92a7d43322f0e9521c4d6d330bdf86c839777 Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Wed, 25 Sep 2024 14:42:34 +0000 Subject: [PATCH 06/14] prep for rebecca review --- hapi-dev/HAPI-data-access-spec-dev.md | 53 ++++++++++++--------------- 1 file changed, 23 insertions(+), 30 deletions(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 20978d6..fa364b7 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -2013,36 +2013,25 @@ The first step in (re)using data is to find them. Metadata and data should be ea 1. (Meta)data are assigned a globally unique and persistent identifier -_?? Is our `resourceID` used to describe data or metadata? Both are needed._ + The `resourceID` in HAPI metadata should be interpreted as referring to the dataset and its associated HAPI metadata. - `resourceID` is same for metadata and data. - - We tell people that if the want FAIR, use a globally unique and persistent identifier in `resourceID`. If provider does not have one id per dataset, but has - * single DOI for one server, then put it in `/about` response. - * one DOI per file, serve it as a dataset. - - Need to modify `/about` to have `resourceID` and modify `citation` (which can include doi to paper) to be for when `resourceID` does not exist. - In the `catalog` response, a HAPI server lists a unique dataset identifier for each dataset available at that server. These are not usable as a PID (https://becker.wustl.edu/news/introduction-to-pids-what-they-are-and-how-to-use-them/) since they are unique only to the server, and could be changed by the data provider. If there is a persistent, globally unique for a dataset that is machine resolvable, it can be provided in the `resourceID` field in the `info` response. - - We tell people that if they want FAIR, do this. + To be FAIR, use a globally unique and persistent identifier in `resourceID`. If HAPI data provider does not use one id per dataset, but has + * a single DOI (or equivalent) for the server, then put it in `/about` response as `resourceID` + * one DOI per file (or equivalent), create a dataset of DOIs and serve the dataset where the DOI column has a `stringType` of DOI (see example in the [`stringType` section](#3616-the-stringtype-object)). 2. Data are described with rich metadata (defined by Reusable, item 1. below) - _To be FAIR, data providers must meet this requirement using `description` or `additionalMetadata`_ + Reusable, item 1: _(Meta)data are richly described with a plurality of accurate and relevant attributes_ -3. Metadata clearly and explicitly include the identifier of the data they describe + HAPI metadata requires a plurality of accurate and relevant attributes, so if a HAPI server is schema valid this requirement for FAIR is satisfied. - This refers to both the internal ID and the persistent ID, so this requirement is met. - - _identifier means internal and in sense of 1. We have internal already_ +3. Metadata clearly and explicitly include the identifier of the data they describe - HAPI spec requires internal identifier in catalog?all=true response and so provider needs to satisfy 1. + The HAPI metadata specification requires an internal identifier (`dataset` in the URL and `id` in the `/catalog` response). Although the HAPI `/info` response does not contain the dataset identifier intentionally because we have avoided duplication of metadata in reponses from different endpoints. However, a request for `/catalog?include=all` will return all HAPI metadata, in which case this requirement is satisfied. 4. (Meta)data are registered or indexed in a searchable resource - This is somewhat outside the realm of HAPI, which is focused on access and not discovery. - - _At present, this requirement has partially been met. All HAPI metadata is viewable at https://hapi-server.org/servers and one can search by keyword within a dataset. In addition, one can use https://heliophysicsdata.gsfc.nasa.gov/ to search for datasets provided by the CDAWeb HAPI server. However, there is a development in which HAPI metadata will be ingested by a general search interface, https://heliodata-staging.heliophysics.net/, in which case this requirement will be met._ + This is outside the scope of the HAPI project, which is focused on access and not discovery. However, we are working with other projects that address registration, indexing, and searching. ### Accessible @@ -2050,19 +2039,19 @@ Once the user finds the required data, she/he/they need to know how they can be 1. (Meta)data are retrievable by their identifier using a standardised communications protocol - Yes all requests must have `dataset`, which is the identifier. + All requests must have `dataset`, which is the identifier for HAPI metadata. 2. The protocol is open, free, and universally implementable - _HAPI meets these requirements_ + HAPI meets these requirements. 3. The protocol allows for an authentication and authorisation procedure, where necessary - _Not applicable, not in spec._ + The HAPI specification supports only open data and so a HAPI compliant server cannot have authorization and authentication. 4. Metadata are accessible, even when the data are no longer available - _heliodata.net will do this. There is a project in which all HAPI metadata will be cached nightly. In this case, this requirement will be met. However, we need a way of communicating if metadata is from cache because server is down._ + This is outside the scope of the HAPI project, which is focused on access and not archiving. However, we are working with other projects that address this. ### Interoperable @@ -2070,15 +2059,17 @@ The data usually need to be integrated with other data. In addition, the data ne 1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. - _If HAPI metadata maps to SOSO then someone else will do. Coordinate with Daniel about piping in HAPI to create landing page at heliodata.net. Formal: yes, accessible: yes, shared: ?, broadly applicable: ?_ + HAPI metadata and data are formal and acessiable. + + _Rebecca: what is definition of 'shared' and 'broadly applicable'?_ 2. (Meta)data use vocabularies that follow FAIR principles - _HAPI does not use vocabularies. There is a plan to map HAPI metadata a metadata standard that does (e.g., SOSO)._ + _Rebecca: If we wanted to use vocabularies, what would it look like?_ 3. (Meta)data include qualified references to other (meta)data - _We provide `additionalMetadata`_ + Other metadata can be referenced using `additionalMetadata`. For `units`, an external schema can be referenced using `unitsSchema`. _Rebecca: what does "qualified" mean?_ ### Reusable @@ -2086,14 +2077,16 @@ The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, met 1. (Meta)data are richly described with a plurality of accurate and relevant attributes + This is satisfied by the HAPI specification. + 2. (Meta)data are released with a clear and accessible data usage license - _We need to add a `licence` attribute; verifier should warn if missing_ + This can be satisfied by using the `licence` attribute. 3. (Meta)data are associated with detailed provenance - _We need to add a `provenance` attribute maybe modify description to tell people to mention provenance; verifier should warn if missing; think about how to say "sameAs" or "relatedTo" other HAPI datasets._ + This can be satisfied with the `provenance` attribute. 3. (Meta)data meet domain-relevant community standards - _HAPI is community standard and mappings exist to other community standard if you don't agree that it is a community standard_ + HAPI is a community standard. From ad9fbe534f9cc8174215bcc9773b51a2640d0aef Mon Sep 17 00:00:00 2001 From: jvandegriff Date: Mon, 30 Sep 2024 15:03:59 -0400 Subject: [PATCH 07/14] added TOC entry for 8.6 FAIR --- hapi-dev/HAPI-data-access-spec-dev.md | 1 + 1 file changed, 1 insertion(+) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index fa364b7..668ba83 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -63,6 +63,7 @@    [8.3 JSON Object of Status Codes](#83-json-object-of-status-codes)
   [8.4 Examples](#84-examples)
   [8.5 Robot clients](#85-robot-clients) +   [8.6 FAIR](#85-fair) Version 3.2.0-dev \| Heliophysics Data and Model Consortium (HDMC) \| From 669dfd27462bf7e33c3072e3821befb3de0ec4f0 Mon Sep 17 00:00:00 2001 From: jvandegriff Date: Mon, 30 Sep 2024 16:09:20 -0400 Subject: [PATCH 08/14] updated after getting FAIR questions answered --- hapi-dev/HAPI-data-access-spec-dev.md | 26 ++++++++++++-------------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 668ba83..97e8324 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -2004,9 +2004,9 @@ Note that the use of the [wiki page](https://github.com/hapi-server/data-specifi ## 8.6 FAIR -The following elements of FAIR have been copied from https://www.go-fair.org/fair-principles/. Comments relevant to HAPI are shown in _itallic_. +HAPI follows the FAIR principles that makes sense for a data service. For each of the elements of FAIR listed here (and copied from https://www.go-fair.org/fair-principles/) we describe the interaction of these principles with the HAPI specification. -HAPI is designed to be able to fully represent data that is itself already FAIR. Some aspects of HAPI adress FAIR directly, such as Interoperability, but the aspects related to findabilty and persistent identifiers need to be solved outside of HAPI by the data provider. +HAPI is designed to be able to fully represent data that is itself already FAIR. Some aspects of HAPI adress FAIR directly, such as Interoperability, but aspects related to findabilty and persistent identifiers are outside the scope of an access service like HAPI and hence best addressed by the data provider. ### Findable @@ -2028,11 +2028,11 @@ The first step in (re)using data is to find them. Metadata and data should be ea 3. Metadata clearly and explicitly include the identifier of the data they describe - The HAPI metadata specification requires an internal identifier (`dataset` in the URL and `id` in the `/catalog` response). Although the HAPI `/info` response does not contain the dataset identifier intentionally because we have avoided duplication of metadata in reponses from different endpoints. However, a request for `/catalog?include=all` will return all HAPI metadata, in which case this requirement is satisfied. + The HAPI metadata specification requires an internal identifier for every dataset. The list of all available dataset ids is present in the `catalog/` and then also as the value for the `dataset` request parameter in the URL for an `info/` or `data/` request. The HAPI `/info` response does not contain the dataset identifier intentionally because we have avoided duplication of metadata in reponses from different endpoints. However, a request for `/catalog?include=all` will return all HAPI metadata alongside the ids for all the dataets at that server. -4. (Meta)data are registered or indexed in a searchable resource +5. (Meta)data are registered or indexed in a searchable resource - This is outside the scope of the HAPI project, which is focused on access and not discovery. However, we are working with other projects that address registration, indexing, and searching. + This is outside the scope of the HAPI project, which is focused on access and not discovery. There is currently a way to explore all known HAPI servers at https://hapi-server.org/servers/. We are also working with other projects that address registration, indexing, and searching. ### Accessible @@ -2040,15 +2040,15 @@ Once the user finds the required data, she/he/they need to know how they can be 1. (Meta)data are retrievable by their identifier using a standardised communications protocol - All requests must have `dataset`, which is the identifier for HAPI metadata. + All HAPI endpoints use the HTTP protocoal, and HAPI metadata is in JSON. The `info/` endpoint takes the dataset id and retrieves the JSON metadata. The `data/` endpoint also takes the dataset id and returns the data in CSV, JSON or binary. 2. The protocol is open, free, and universally implementable - HAPI meets these requirements. + HAPI uses a RESTful approach and delivers JSON metadata and well-structured data over HTTP(S), all of which are free and open and impementable in many programming languages. 3. The protocol allows for an authentication and authorisation procedure, where necessary - The HAPI specification supports only open data and so a HAPI compliant server cannot have authorization and authentication. + This is out of scope for HAPI, which was designed to access open data. The HAPI specification explicitly does not allow authentication as part of the HAPI request / response protocols. Access restrcitions can still be implemented for HAPI data using other HTTP(S) authentication mechanisms that operate outside or independent of HAPI. 4. Metadata are accessible, even when the data are no longer available @@ -2060,17 +2060,15 @@ The data usually need to be integrated with other data. In addition, the data ne 1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. - HAPI metadata and data are formal and acessiable. + HAPI metadata are in JSON, with JSON schemas also available for validating all complex HAPI JSON output. JSON and JSON Schemas are widely used. HAPI data is transmitted as JSON or as Comma Spearated Values (CSV), both also widely used. HAPI servers may use a custom binary format, which uses IEEE standards for binary numbers and the layout of whicih mimics the CSV output. - _Rebecca: what is definition of 'shared' and 'broadly applicable'?_ - 2. (Meta)data use vocabularies that follow FAIR principles - _Rebecca: If we wanted to use vocabularies, what would it look like?_ + HAPI refernces formal vocabularies wehn appropriate, mainly with metadata attributes that are more useful when constrained to lists or content curated elsewhere. Exmaples include units strings, coordinate systems, links to other metadata, and data licenses. In those cases, HAPI allows for the expression of the source schema for the attribute content. The mechanism in which HAPI does this does not offer the same level of precision as a a formal vocabulary, but it is close. 3. (Meta)data include qualified references to other (meta)data - Other metadata can be referenced using `additionalMetadata`. For `units`, an external schema can be referenced using `unitsSchema`. _Rebecca: what does "qualified" mean?_ + Other metadata can be referenced using `additionalMetadata`, but this is just a simple reference to indicate that these resources are related, and the nature of the linkage is not qualified. ### Reusable @@ -2090,4 +2088,4 @@ The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, met 3. (Meta)data meet domain-relevant community standards - HAPI is a community standard. + HAPI is built using the widely accepted RESTful approach to web-accessible resoures, which itself is built on top of HTTP(S). We use JSON in a way that is common in the community. The time stadardization we use is a subset of the ISO8601 standard for time strings. The design of the HAPI protocol for requesting and receiving data was built by analyzing multiple, international data centers, and HAPI offers a lowest-common-denominator protocol. This has been verified by the fact that many data centers implement HAPI not with any of our own software, but just by tweaking their existing code to also offer a HAPI-compliant set of endpoints. From 83699c309a2e6f443ed7302fe530836627547d83 Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Tue, 29 Oct 2024 21:27:33 +0000 Subject: [PATCH 09/14] minor edits --- hapi-dev/HAPI-data-access-spec-dev.md | 37 +++++++++++++++------------ 1 file changed, 21 insertions(+), 16 deletions(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 97e8324..2c63723 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -544,9 +544,10 @@ The response is in JSON format [[3](#6-references)] and provides metadata about | `resourceID` | string | **Optional** An identifier by which this data is known in another setting, for example, the SPASE ID. | | `creationDate` | string | **Optional** [Restricted ISO 8601](#376-representation-of-time) date/time of the dataset creation. | | `citation` | string | **Optional** How to cite the data set. An actionable DOI is preferred (e.g., https://doi.org/...). Note that there is a `citation` in an `/about` response that is focused on the server implementation, but this `citation` is focused on one dataset. | -| `modificationDate` | string | **Optional** [Restricted ISO 8601](#376-representation-of-time) date/time of the modification of the any content in the dataset. | -| `contact` | string | **Optional** Relevant contact person name (and possibly contact information) for science questions about the dataset. | -| `contactID` | string | **Optional** The identifier in the discovery system for information about the contact. For example, the SPASE ID or ORCID of the person. | +| `license` | string or array | **Optional** A URL or array of URLs to a license landing page. If license is in the [spdx.org](https://spdx.org/) list, link to it. License can also be a string.| +| `modificationDate` | string | **Optional** [Restricted ISO 8601](#376-representation-of-time) date/time of the modification of the any content in the dataset. | +| `contact` | string | **Optional** Relevant contact person name (and possibly contact information) for science questions about the dataset. | +| `contactID` | string | **Optional** The identifier in the discovery system for information about the contact. For example, the SPASE ID or ORCID of the person. | | `additionalMetadata`| object | **Optional** A way to include a block of other (non-HAPI) metadata. See below for a description of the object, which can directly contain the metadata or point to it via a URL. | | `definitions` | object | **Optional** An object containing definitions that are referenced using a [JSON reference](#3613-json-references) | @@ -2004,33 +2005,37 @@ Note that the use of the [wiki page](https://github.com/hapi-server/data-specifi ## 8.6 FAIR -HAPI follows the FAIR principles that makes sense for a data service. For each of the elements of FAIR listed here (and copied from https://www.go-fair.org/fair-principles/) we describe the interaction of these principles with the HAPI specification. +HAPI metadata can be used to make data [FAIR](https://www.go-fair.org/fair-principles/). For each of the elements of FAIR listed here (and copied from https://www.go-fair.org/fair-principles/) we describe their relationship with the HAPI specification. HAPI is designed to be able to fully represent data that is itself already FAIR. Some aspects of HAPI adress FAIR directly, such as Interoperability, but aspects related to findabilty and persistent identifiers are outside the scope of an access service like HAPI and hence best addressed by the data provider. ### Findable -The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process. +_The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process._ -1. (Meta)data are assigned a globally unique and persistent identifier +1. _(Meta)data are assigned a globally unique and persistent identifier_ - The `resourceID` in HAPI metadata should be interpreted as referring to the dataset and its associated HAPI metadata. + Ideally, use a globally unique and persistent identifier in `resourceID` in the HAPI `/info` response for each dataset. + + Alternatively, + * If each HAPI dataset does not have a globally unique and persistent identifier but the server has one, then put it in the `/about` response as the `resourceID` (this is discouraged). + * If a dataset is associated with more than one identifier, create a dataset of DOIs and serve the a file listing dataset with a DOI column. - To be FAIR, use a globally unique and persistent identifier in `resourceID`. If HAPI data provider does not use one id per dataset, but has - * a single DOI (or equivalent) for the server, then put it in `/about` response as `resourceID` - * one DOI per file (or equivalent), create a dataset of DOIs and serve the dataset where the DOI column has a `stringType` of DOI (see example in the [`stringType` section](#3616-the-stringtype-object)). + If there are more than one identifier for any of the above, use the more broadly adopted (e.g., DOI instead of domain specific identifier) -2. Data are described with rich metadata (defined by Reusable, item 1. below) +2. _Data are described with rich metadata (defined by Reusable, item 1. below)_ Reusable, item 1: _(Meta)data are richly described with a plurality of accurate and relevant attributes_ - HAPI metadata requires a plurality of accurate and relevant attributes, so if a HAPI server is schema valid this requirement for FAIR is satisfied. + The HAPI specification has accurate and relevant attributes; the data provider needs to ensure the attribute values accurately describe the data and includes information needed for interpretation. -3. Metadata clearly and explicitly include the identifier of the data they describe +3. _Metadata clearly and explicitly include the identifier of the data they describe_ - The HAPI metadata specification requires an internal identifier for every dataset. The list of all available dataset ids is present in the `catalog/` and then also as the value for the `dataset` request parameter in the URL for an `info/` or `data/` request. The HAPI `/info` response does not contain the dataset identifier intentionally because we have avoided duplication of metadata in reponses from different endpoints. However, a request for `/catalog?include=all` will return all HAPI metadata alongside the ids for all the dataets at that server. + The HAPI metadata specification requires an internal identifier for every dataset. The list of all available dataset ids is present in the `catalog/` and then also as the value for the `dataset` request parameter in the URL for an `info/` or `data/` request. The HAPI `/info` response does not contain the dataset identifier intentionally because we have avoided duplication of metadata in reponses from different endpoints. + + However, a request for `/catalog?include=all` will return all HAPI metadata alongside the ids for all the dataets at that server. In addition, we create landing pages with JSON-LD that satisfies this requirement. -5. (Meta)data are registered or indexed in a searchable resource +5. _(Meta)data are registered or indexed in a searchable resource_ This is outside the scope of the HAPI project, which is focused on access and not discovery. There is currently a way to explore all known HAPI servers at https://hapi-server.org/servers/. We are also working with other projects that address registration, indexing, and searching. @@ -2038,7 +2043,7 @@ The first step in (re)using data is to find them. Metadata and data should be ea Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorisation. -1. (Meta)data are retrievable by their identifier using a standardised communications protocol +1. _(Meta)data are retrievable by their identifier using a standardised communications protocol_ All HAPI endpoints use the HTTP protocoal, and HAPI metadata is in JSON. The `info/` endpoint takes the dataset id and retrieves the JSON metadata. The `data/` endpoint also takes the dataset id and returns the data in CSV, JSON or binary. From 35fb3d0e393de7888513cabee58471aa5e731912 Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Wed, 20 Nov 2024 18:47:26 +0000 Subject: [PATCH 10/14] proveance --- hapi-dev/HAPI-data-access-spec-dev.md | 1 + 1 file changed, 1 insertion(+) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 2c63723..0de9279 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -545,6 +545,7 @@ The response is in JSON format [[3](#6-references)] and provides metadata about | `creationDate` | string | **Optional** [Restricted ISO 8601](#376-representation-of-time) date/time of the dataset creation. | | `citation` | string | **Optional** How to cite the data set. An actionable DOI is preferred (e.g., https://doi.org/...). Note that there is a `citation` in an `/about` response that is focused on the server implementation, but this `citation` is focused on one dataset. | | `license` | string or array | **Optional** A URL or array of URLs to a license landing page. If license is in the [spdx.org](https://spdx.org/) list, link to it. License can also be a string.| +| `provenance` | string | **Optional** A description of the provenance of this dataset.| | `modificationDate` | string | **Optional** [Restricted ISO 8601](#376-representation-of-time) date/time of the modification of the any content in the dataset. | | `contact` | string | **Optional** Relevant contact person name (and possibly contact information) for science questions about the dataset. | | `contactID` | string | **Optional** The identifier in the discovery system for information about the contact. For example, the SPASE ID or ORCID of the person. | From 75f17206651ada3178d962e08a737fea342a9bd0 Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Wed, 20 Nov 2024 19:18:11 +0000 Subject: [PATCH 11/14] typos, clean-up --- hapi-dev/HAPI-data-access-spec-dev.md | 75 +++++++++++++-------------- 1 file changed, 36 insertions(+), 39 deletions(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 0de9279..1a8971c 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -264,6 +264,7 @@ The server's response to this endpoint must be in JSON format [[3](#6-references | `contact` | string | **Required** Contact information or email address for server issues. HAPI clients should show this contact information when it is certain that an error is due to a problem with the server (as opposed to the client). Ideally, a HAPI client will recommend that the user check their connection and try again at least once before contacting the server contact. | | `description` | string | **Optional** A brief description of the type of data the server provides. | | `contactID` | string | **Optional** The identifier in the discovery system for information about the contact. For example, a SPASE ID of a person identified in the `contact` string. | +| `resourceID` | string | **Optional** An identifier associated with all datasets. | `citation` | string | **Optional** How to cite data server. An actionable DOI is preferred (e.g., https://doi.org/...). This `citation` differs from the `citation` in an `/info` response. Here the citation is for the entity that maintains the data server. | | `dataTest` | `DataTest` | **Optional** Information that a client can use to check that a server is operational. Data response should contain more than zero records. See below for the definition of this object. | @@ -541,7 +542,7 @@ The response is in JSON format [[3](#6-references)] and provides metadata about | `unitsSchema` | string | **Optional** The name of the units convention that describes how to parse all `units` strings in this dataset. Currently, the only allowed values are: `udunits2`, `astropy3`, and `cdf-cluster`. See [`unitsSchema` Details](#363-unitsschema-details) for additional information about these conventions. The list of allowed unit specifications is expected to grow to include other well-documented unit standards. | | `coordinateSystemSchema` | string | **Optional** The name of the schema or convention that contains a list of coordinate system names and definitions. If this keyword is provided, any `coordinateSystemName` keyword given in a [parameter](#366-parameter-object) definition should follow this schema. See [`coordinateSystemSchema` Details](#364-coordinatesystemschema-details) for additional information. | | `resourceURL` | string | **Optional** URL linking to more detailed information about this dataset. | -| `resourceID` | string | **Optional** An identifier by which this data is known in another setting, for example, the SPASE ID. | +| `resourceID` | string | **Optional** An identifier by which this data is known in another setting (e.g., DOI) | `creationDate` | string | **Optional** [Restricted ISO 8601](#376-representation-of-time) date/time of the dataset creation. | | `citation` | string | **Optional** How to cite the data set. An actionable DOI is preferred (e.g., https://doi.org/...). Note that there is a `citation` in an `/about` response that is focused on the server implementation, but this `citation` is focused on one dataset. | | `license` | string or array | **Optional** A URL or array of URLs to a license landing page. If license is in the [spdx.org](https://spdx.org/) list, link to it. License can also be a string.| @@ -2006,9 +2007,9 @@ Note that the use of the [wiki page](https://github.com/hapi-server/data-specifi ## 8.6 FAIR -HAPI metadata can be used to make data [FAIR](https://www.go-fair.org/fair-principles/). For each of the elements of FAIR listed here (and copied from https://www.go-fair.org/fair-principles/) we describe their relationship with the HAPI specification. +For each of the elements of FAIR listed below (copied from https://www.go-fair.org/fair-principles/), we describe their relationship with the HAPI specification. -HAPI is designed to be able to fully represent data that is itself already FAIR. Some aspects of HAPI adress FAIR directly, such as Interoperability, but aspects related to findabilty and persistent identifiers are outside the scope of an access service like HAPI and hence best addressed by the data provider. +HAPI is designed to be able to fully represent data that is itself already FAIR. Some aspects of HAPI address FAIR directly, such as Interoperability, but aspects related to findability and persistent identifiers are outside the scope of an access service like HAPI and hence best addressed by the data provider. ### Findable @@ -2016,82 +2017,78 @@ _The first step in (re)using data is to find them. Metadata and data should be e 1. _(Meta)data are assigned a globally unique and persistent identifier_ - Ideally, use a globally unique and persistent identifier in `resourceID` in the HAPI `/info` response for each dataset. +HAPI requires a globally unique and persistent identifier (`resourceID`) in its `/info` response for a dataset. - Alternatively, - * If each HAPI dataset does not have a globally unique and persistent identifier but the server has one, then put it in the `/about` response as the `resourceID` (this is discouraged). - * If a dataset is associated with more than one identifier, create a dataset of DOIs and serve the a file listing dataset with a DOI column. - - If there are more than one identifier for any of the above, use the more broadly adopted (e.g., DOI instead of domain specific identifier) +Alternatively, +* If each HAPI dataset does not have a globally unique and persistent identifier but the server has one, a data provider can use the `resourceID` in the `/about` response (this is discouraged). +* If a dataset is associated with more than one identifier, a data provider can create a dataset of DOIs and serve the DOIs as a time series. 2. _Data are described with rich metadata (defined by Reusable, item 1. below)_ - Reusable, item 1: _(Meta)data are richly described with a plurality of accurate and relevant attributes_ +Reusable, item 1: _(Meta)data are richly described with a plurality of accurate and relevant attributes_ - The HAPI specification has accurate and relevant attributes; the data provider needs to ensure the attribute values accurately describe the data and includes information needed for interpretation. +The HAPI specification has accurate and relevant attributes; the data provider needs to ensure the attribute values accurately describe the data and includes information required for interpretation. 3. _Metadata clearly and explicitly include the identifier of the data they describe_ - The HAPI metadata specification requires an internal identifier for every dataset. The list of all available dataset ids is present in the `catalog/` and then also as the value for the `dataset` request parameter in the URL for an `info/` or `data/` request. The HAPI `/info` response does not contain the dataset identifier intentionally because we have avoided duplication of metadata in reponses from different endpoints. - - However, a request for `/catalog?include=all` will return all HAPI metadata alongside the ids for all the dataets at that server. In addition, we create landing pages with JSON-LD that satisfies this requirement. +The HAPI metadata specification requires an internal identifier for every dataset. The list of all available dataset ids is returned in a `catalog/` request; the ids are also used in the `dataset` request parameter in the URL for an `info/` or `data/` request. (The HAPI `/info` response does not contain the dataset identifier because we have generaly avoided the duplication of metadata in responses from different endpoints.) 5. _(Meta)data are registered or indexed in a searchable resource_ - This is outside the scope of the HAPI project, which is focused on access and not discovery. There is currently a way to explore all known HAPI servers at https://hapi-server.org/servers/. We are also working with other projects that address registration, indexing, and searching. +This is outside the scope of the HAPI project, which primarily addresses access and not discovery. There is currently a way to explore all known HAPI servers at https://hapi-server.org/servers/. We also work with other projects that address registration, indexing, and searching. ### Accessible -Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorisation. +Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorization. -1. _(Meta)data are retrievable by their identifier using a standardised communications protocol_ +1. _(Meta)data are retrievable by their identifier using a standardized communications protocol_ - All HAPI endpoints use the HTTP protocoal, and HAPI metadata is in JSON. The `info/` endpoint takes the dataset id and retrieves the JSON metadata. The `data/` endpoint also takes the dataset id and returns the data in CSV, JSON or binary. + All HAPI endpoints use the HTTP protocol, and HAPI metadata is in JSON. The `info/` endpoint takes the dataset id and retrieves the JSON metadata. The `data/` endpoint also takes the dataset ID and returns the data in CSV, JSON, or binary. -2. The protocol is open, free, and universally implementable +2. _The protocol is open, free, and universally implementable_ - HAPI uses a RESTful approach and delivers JSON metadata and well-structured data over HTTP(S), all of which are free and open and impementable in many programming languages. +HAPI delivers JSON metadata and well-structured data over HTTP(S), all of which are free and open and implementable in many programming languages. -3. The protocol allows for an authentication and authorisation procedure, where necessary +3. _The protocol allows for an authentication and authorization procedure, where necessary_ - This is out of scope for HAPI, which was designed to access open data. The HAPI specification explicitly does not allow authentication as part of the HAPI request / response protocols. Access restrcitions can still be implemented for HAPI data using other HTTP(S) authentication mechanisms that operate outside or independent of HAPI. +This is out of scope for HAPI, which was designed to access open data. The HAPI specification explicitly does not include an option for authentication. Access restrictions can still be implemented for HAPI data using other HTTP(S) authentication mechanisms that operate outside or independent of HAPI. -4. Metadata are accessible, even when the data are no longer available +4. _Metadata are accessible, even when the data are no longer available_ - This is outside the scope of the HAPI project, which is focused on access and not archiving. However, we are working with other projects that address this. +This is outside the scope of the HAPI project, which is focused on access and not archiving. However, we are working with other projects that address this. ### Interoperable -The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing. +_The data usually needs to be integrated with other data. In addition, the data needs to interoperate with applications or workflows for analysis, storage, and processing._ -1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. +1. _(Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation._ - HAPI metadata are in JSON, with JSON schemas also available for validating all complex HAPI JSON output. JSON and JSON Schemas are widely used. HAPI data is transmitted as JSON or as Comma Spearated Values (CSV), both also widely used. HAPI servers may use a custom binary format, which uses IEEE standards for binary numbers and the layout of whicih mimics the CSV output. +HAPI metadata are in JSON, and JSON schemas are available to validate all complex HAPI JSON output. HAPI data is transmitted as JSON or Comma Separated Values (CSV), both widely used. (HAPI servers may use a simple binary format, which uses IEEE standards for binary numbers and the layout mimics the CSV output.) -2. (Meta)data use vocabularies that follow FAIR principles +2. _(Meta)data use vocabularies that follow FAIR principles_ - HAPI refernces formal vocabularies wehn appropriate, mainly with metadata attributes that are more useful when constrained to lists or content curated elsewhere. Exmaples include units strings, coordinate systems, links to other metadata, and data licenses. In those cases, HAPI allows for the expression of the source schema for the attribute content. The mechanism in which HAPI does this does not offer the same level of precision as a a formal vocabulary, but it is close. +HAPI metadata does not use vocabularies directly, but links can be made to external metadata that uses vocabularies (see next item). -3. (Meta)data include qualified references to other (meta)data +3. _(Meta)data include qualified references to other (meta)data_ - Other metadata can be referenced using `additionalMetadata`, but this is just a simple reference to indicate that these resources are related, and the nature of the linkage is not qualified. +Other metadata can be referenced using `additionalMetadata`, but this is just a simple reference to indicate that these resources are related, and the nature of the linkage is not qualified. ### Reusable -The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings. +_The ultimate goal of FAIR is to optimize the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings._ -1. (Meta)data are richly described with a plurality of accurate and relevant attributes +1. _(Meta)data are richly described with a plurality of accurate and relevant attributes_ - This is satisfied by the HAPI specification. + This is satisfied by the HAPI specification. -2. (Meta)data are released with a clear and accessible data usage license +2. _(Meta)data are released with a clear and accessible data usage license_ - This can be satisfied by using the `licence` attribute. +This can be satisfied by using the `licence` attribute. -3. (Meta)data are associated with detailed provenance +3. _(Meta)data are associated with detailed provenance_ - This can be satisfied with the `provenance` attribute. +This can be satisfied with the `provenance` attribute. 3. (Meta)data meet domain-relevant community standards - HAPI is built using the widely accepted RESTful approach to web-accessible resoures, which itself is built on top of HTTP(S). We use JSON in a way that is common in the community. The time stadardization we use is a subset of the ISO8601 standard for time strings. The design of the HAPI protocol for requesting and receiving data was built by analyzing multiple, international data centers, and HAPI offers a lowest-common-denominator protocol. This has been verified by the fact that many data centers implement HAPI not with any of our own software, but just by tweaking their existing code to also offer a HAPI-compliant set of endpoints. +HAPI is built using the widely accepted RESTful approach to web-accessible resources, which is built on top of HTTP(S). We use JSON in a way that is common in the community. The time standardization we use is a subset of the ISO8601 standard for time strings. The design of the HAPI protocol for requesting and receiving data followed from an analysis of the API of many time series data providers, and HAPI is a standard that provides a common set of features along with a standard for metadata and data transmission streaming format. From e95845d67bf577c0c9eef543ad5359e9da9ac48e Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Sun, 1 Dec 2024 01:31:54 +0000 Subject: [PATCH 12/14] Wording --- hapi-dev/HAPI-data-access-spec-dev.md | 64 +++++++++++++-------------- 1 file changed, 32 insertions(+), 32 deletions(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 1a8971c..4586a5b 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -2009,86 +2009,86 @@ Note that the use of the [wiki page](https://github.com/hapi-server/data-specifi For each of the elements of FAIR listed below (copied from https://www.go-fair.org/fair-principles/), we describe their relationship with the HAPI specification. -HAPI is designed to be able to fully represent data that is itself already FAIR. Some aspects of HAPI address FAIR directly, such as Interoperability, but aspects related to findability and persistent identifiers are outside the scope of an access service like HAPI and hence best addressed by the data provider. +Some aspects of HAPI, which is an API and metadata standard, directly address FAIR; however, some FAIR principles must be addressed by an external service or the data provider. As such, HAPI supports FAIR principles to the extent that the principles are within its scope. ### Findable _The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process._ -1. _(Meta)data are assigned a globally unique and persistent identifier_ +1\. _(Meta)data are assigned a globally unique and persistent identifier_ -HAPI requires a globally unique and persistent identifier (`resourceID`) in its `/info` response for a dataset. +The `resourceID` attribute can be used for a globally unique and persistent identifier. Alternatively, -* If each HAPI dataset does not have a globally unique and persistent identifier but the server has one, a data provider can use the `resourceID` in the `/about` response (this is discouraged). -* If a dataset is associated with more than one identifier, a data provider can create a dataset of DOIs and serve the DOIs as a time series. +* If each HAPI dataset does not have a globally unique and persistent identifier, but the server does, a data provider can use the `resourceID` in the `/about` response (but data providers are encouraged to have dataset-level `resourceID`s). +* If a dataset is associated with more than one identifier, a data provider can create a dataset of identifiers and serve this dataset as a time series. -2. _Data are described with rich metadata (defined by Reusable, item 1. below)_ +2\. _Data are described with rich metadata (defined by Reusable, item 1. below)_ Reusable, item 1: _(Meta)data are richly described with a plurality of accurate and relevant attributes_ -The HAPI specification has accurate and relevant attributes; the data provider needs to ensure the attribute values accurately describe the data and includes information required for interpretation. +The HAPI metadata specification has accurate and relevant attributes; the data provider needs to ensure the attribute values accurately describe the data and include information required for interpretation. -3. _Metadata clearly and explicitly include the identifier of the data they describe_ +3\. _Metadata clearly and explicitly include the identifier of the data they describe_ -The HAPI metadata specification requires an internal identifier for every dataset. The list of all available dataset ids is returned in a `catalog/` request; the ids are also used in the `dataset` request parameter in the URL for an `info/` or `data/` request. (The HAPI `/info` response does not contain the dataset identifier because we have generaly avoided the duplication of metadata in responses from different endpoints.) +The HAPI metadata specification requires an identifier (`id`) for every dataset. The list of all `id`s is returned in a `catalog/` request. `id` is also used in the `dataset` request parameter in the URL for an `info/` or `data/` request. (The HAPI `/info` response does not contain the `id` because we have generally avoided the duplication of metadata in responses from different endpoints.) -5. _(Meta)data are registered or indexed in a searchable resource_ +4\. _(Meta)data are registered or indexed in a searchable resource_ -This is outside the scope of the HAPI project, which primarily addresses access and not discovery. There is currently a way to explore all known HAPI servers at https://hapi-server.org/servers/. We also work with other projects that address registration, indexing, and searching. +This is outside the scope of the HAPI specification. However, there is a way to explore all known HAPI servers at https://hapi-server.org/servers/. We also work with other projects that address registration, indexing, and searching. ### Accessible -Once the user finds the required data, she/he/they need to know how they can be accessed, possibly including authentication and authorization. +Once the user finds the required data, they need to know how they can be accessed, possibly including authentication and authorization. -1. _(Meta)data are retrievable by their identifier using a standardized communications protocol_ +1\. _(Meta)data are retrievable by their identifier using a standardized communications protocol_ - All HAPI endpoints use the HTTP protocol, and HAPI metadata is in JSON. The `info/` endpoint takes the dataset id and retrieves the JSON metadata. The `data/` endpoint also takes the dataset ID and returns the data in CSV, JSON, or binary. +All HAPI endpoints use the HTTP protocol; HAPI metadata is JSON. The `info/` endpoint takes the dataset `id` and returns JSON metadata. The `data/` endpoint also takes the `id` and returns data in CSV, JSON, or binary. -2. _The protocol is open, free, and universally implementable_ +2\. _The protocol is open, free, and universally implementable_ -HAPI delivers JSON metadata and well-structured data over HTTP(S), all of which are free and open and implementable in many programming languages. +HAPI delivers JSON metadata and well-structured data over HTTP(S). The schema for the data and metadata is free, open, and can be implemented in any programming language. -3. _The protocol allows for an authentication and authorization procedure, where necessary_ +3\. _The protocol allows for an authentication and authorization procedure, where necessary_ -This is out of scope for HAPI, which was designed to access open data. The HAPI specification explicitly does not include an option for authentication. Access restrictions can still be implemented for HAPI data using other HTTP(S) authentication mechanisms that operate outside or independent of HAPI. +This is out of scope for HAPI, which was designed to access open data. The HAPI specification explicitly does not include an option for authentication. Access restrictions can still be implemented using authentication mechanisms outside or independent of HAPI. -4. _Metadata are accessible, even when the data are no longer available_ +4\. _Metadata are accessible, even when the data are no longer available_ -This is outside the scope of the HAPI project, which is focused on access and not archiving. However, we are working with other projects that address this. +This is outside the scope of the HAPI specification. However, an [https://github.com/hapi-server/servers](affiliated HAPI project) caches metadata from all known HAPI servers nightly. ### Interoperable _The data usually needs to be integrated with other data. In addition, the data needs to interoperate with applications or workflows for analysis, storage, and processing._ -1. _(Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation._ +1\. _(Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation._ -HAPI metadata are in JSON, and JSON schemas are available to validate all complex HAPI JSON output. HAPI data is transmitted as JSON or Comma Separated Values (CSV), both widely used. (HAPI servers may use a simple binary format, which uses IEEE standards for binary numbers and the layout mimics the CSV output.) +HAPI metadata are in JSON, and JSON schemas are available for validation. HAPI data is transmitted as JSON or Comma Separated Values (CSV), both widely used. (HAPI servers may use a simple binary format, which uses IEEE standards for binary numbers, and the layout mimics the CSV output.) -2. _(Meta)data use vocabularies that follow FAIR principles_ +2\. _(Meta)data use vocabularies that follow FAIR principles_ HAPI metadata does not use vocabularies directly, but links can be made to external metadata that uses vocabularies (see next item). -3. _(Meta)data include qualified references to other (meta)data_ +3\. _(Meta)data include qualified references to other (meta)data_ -Other metadata can be referenced using `additionalMetadata`, but this is just a simple reference to indicate that these resources are related, and the nature of the linkage is not qualified. +Other metadata can be referenced using `additionalMetadata`. ### Reusable _The ultimate goal of FAIR is to optimize the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings._ -1. _(Meta)data are richly described with a plurality of accurate and relevant attributes_ +1\. _(Meta)data are richly described with a plurality of accurate and relevant attributes_ - This is satisfied by the HAPI specification. +This is satisfied by the HAPI specification. -2. _(Meta)data are released with a clear and accessible data usage license_ +2\. _(Meta)data are released with a clear and accessible data usage license_ This can be satisfied by using the `licence` attribute. -3. _(Meta)data are associated with detailed provenance_ +3\. _(Meta)data are associated with detailed provenance_ -This can be satisfied with the `provenance` attribute. +This can be satisfied using the `provenance` attribute. -3. (Meta)data meet domain-relevant community standards +4\. (Meta)data meet domain-relevant community standards -HAPI is built using the widely accepted RESTful approach to web-accessible resources, which is built on top of HTTP(S). We use JSON in a way that is common in the community. The time standardization we use is a subset of the ISO8601 standard for time strings. The design of the HAPI protocol for requesting and receiving data followed from an analysis of the API of many time series data providers, and HAPI is a standard that provides a common set of features along with a standard for metadata and data transmission streaming format. +HAPI is built using the widely adopted RESTful approach to web-accessible resources. We use JSON in a way that is common in the community. The time standard is a subset of the ISO8601 standard for time strings. The design of the HAPI API for requesting and receiving data followed from an analysis of the API of many time series data providers, and HAPI is a standard that provides a common set of features. From 955d2531cffcf038b82663682da9dce49d9e3add Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Mon, 2 Dec 2024 15:48:34 +0000 Subject: [PATCH 13/14] Fix link and TOC --- hapi-dev/HAPI-data-access-spec-dev.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 4586a5b..26da506 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -63,7 +63,7 @@    [8.3 JSON Object of Status Codes](#83-json-object-of-status-codes)
   [8.4 Examples](#84-examples)
   [8.5 Robot clients](#85-robot-clients) -   [8.6 FAIR](#85-fair) +   [8.6 FAIR](#86-fair) Version 3.2.0-dev \| Heliophysics Data and Model Consortium (HDMC) \| @@ -2055,7 +2055,7 @@ This is out of scope for HAPI, which was designed to access open data. The HAPI 4\. _Metadata are accessible, even when the data are no longer available_ -This is outside the scope of the HAPI specification. However, an [https://github.com/hapi-server/servers](affiliated HAPI project) caches metadata from all known HAPI servers nightly. +This is outside the scope of the HAPI specification. However, an [affiliated HAPI project](https://github.com/hapi-server/servers) caches metadata from all known HAPI servers nightly. ### Interoperable From e3fb5ee5f90fd49e88da1c49f95bbc57f5201479 Mon Sep 17 00:00:00 2001 From: Bob Weigel Date: Mon, 2 Dec 2024 17:07:35 +0000 Subject: [PATCH 14/14] newline --- hapi-dev/HAPI-data-access-spec-dev.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hapi-dev/HAPI-data-access-spec-dev.md b/hapi-dev/HAPI-data-access-spec-dev.md index 26da506..aef4282 100644 --- a/hapi-dev/HAPI-data-access-spec-dev.md +++ b/hapi-dev/HAPI-data-access-spec-dev.md @@ -62,7 +62,7 @@    [8.2 Allowed Characters in id, dataset, and parameter](#82-allowed-characters-in-id-dataset-and-parameter)
   [8.3 JSON Object of Status Codes](#83-json-object-of-status-codes)
   [8.4 Examples](#84-examples)
-   [8.5 Robot clients](#85-robot-clients) +   [8.5 Robot clients](#85-robot-clients)
   [8.6 FAIR](#86-fair)