From 0c0d8fd0c49d87d86d8e2e6780f782a6f9c7d766 Mon Sep 17 00:00:00 2001
From: Gaurav Vaidya <gaurav@ggvaidya.com>
Date: Tue, 1 Jul 2025 03:03:36 -0400
Subject: [PATCH 1/2] Added API documentation.

---
 README.md            |   3 +-
 documentation/API.md | 231 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 233 insertions(+), 1 deletion(-)
 create mode 100644 documentation/API.md

diff --git a/README.md b/README.md
index bd124e7..397e26f 100644
--- a/README.md
+++ b/README.md
@@ -105,7 +105,8 @@ The web server can be started after successful completion of the load.
     uvicorn --host 0.0.0.0 --port 8000 --workers 1 node_normalizer.server:app
 ```
 
-Then navigate to http://localhost:8000/docs to run the application
+Then navigate to http://localhost:8000/docs to run the application.
+Documentation for the [NodeNorm API](./documentation/API.md) is available.
 
 ### Webserver Docker container creation and execution
 Much like the Redis Docker container noted above, a Docker container can also be created and executed to run the webserver.
diff --git a/documentation/API.md b/documentation/API.md
new file mode 100644
index 0000000..41f9798
--- /dev/null
+++ b/documentation/API.md
@@ -0,0 +1,231 @@
+# NodeNorm API
+
+The NodeNorm API includes many API endpoints that cover normalization of identifiers, TRAPI messages
+and identifier sets, as well as endpoints to retrieve allowed conflations, semantic types and CURIE
+prefixes. The [NodeNorm FastAPI Documentation](https://nodenormalization-sri.renci.org/docs) includes
+information about the parameters for calling each endpoint, but this document will describe the
+intended function of each endpoint, suggestions for use and descriptions of the JSON documents returned.
+
+## Identifier/Node Normalization
+
+### `/get_normalized_nodes`
+
+* Method: [GET](https://nodenormalization-sri.renci.org/docs#/default/get_normalized_node_handler_get_normalized_nodes_get)
+  * Parameters:
+    * `curie` (e.g. `curie=MESH:D014867&curie=NCIT:C34373`): The identifiers to normalize.
+    * `conflate` (e.g. `conflate=true`): Whether to apply GeneProtein conflation.
+    * `drug_chemical_conflate` (e.g. `drug_chemical_conflate=true`): Whether to apply DrugChemical conflation.
+    * `description` (e.g. `description=false`): Whether to include descriptions for nodes that have descriptions.
+    * `individual_types` (e.g. `individual_types=true`): When returning a conflated result, should Biolink types be
+      returned for each individual identifier.
+* Method: [POST](https://nodenormalization-sri.renci.org/docs#/default/get_normalized_node_handler_post_get_normalized_nodes_post)
+  * POST Body: A JSON object with the same parameters as the GET method, with a `curies` list instead of individual
+    `curie` entries.
+
+Example output:
+
+```json
+{
+  "MESH:D014867": {
+    "id": {
+      "identifier": "CHEBI:15377",
+      "label": "Water",
+      "description": "An oxygen hydride consisting of an oxygen atom that is covalently bonded to two hydrogen atoms"
+    },
+    "equivalent_identifiers": [
+      {
+        "identifier": "CHEBI:15377",
+        "label": "water",
+        "description": "An oxygen hydride consisting of an oxygen atom that is covalently bonded to two hydrogen atoms",
+        "type": "biolink:SmallMolecule"
+      },
+      {
+        "identifier": "UNII:059QF0KO0R",
+        "label": "WATER",
+        "type": "biolink:SmallMolecule"
+      },
+      {
+        "identifier": "PUBCHEM.COMPOUND:962",
+        "label": "Water",
+        "type": "biolink:SmallMolecule"
+      },
+      [...]
+    ],
+    "type": [
+      "biolink:SmallMolecule",
+      "biolink:MolecularEntity",
+      "biolink:ChemicalEntity",
+      "biolink:PhysicalEssence",
+      "biolink:ChemicalOrDrugOrTreatment",
+      "biolink:ChemicalEntityOrGeneOrGeneProduct",
+      "biolink:ChemicalEntityOrProteinOrPolypeptide",
+      "biolink:NamedThing",
+      "biolink:PhysicalEssenceOrOccurrent"
+    ],
+    "information_content": 47.7
+  }
+}
+```
+
+* Output values: the output is a dictionary with queried CURIEs as the keys and with JSON objects
+  as the values, containing the following keys:
+  * `id`: A JSON object that provides the preferred identifier and labels for this clique.
+    * `identifier`: The preferred CURIE for this clique. Every Biolink class includes a list of
+      preferred prefixes (e.g.
+      [valid ID prefixes for SmallMolecule](https://biolink.github.io/biolink-model/SmallMolecule/#valid-id-prefixes)),
+      and this is used to choose the preferred CURIE for this clique.
+    * `label`: The preferred label for this clique. Note that this is not necessarily the label associated with the
+      preferred CURIE: for some classes (such as chemicals), we choose the best label in a different prefix order than
+      the Biolink Model preferred prefix order, based on which sources tend to have the best labels.
+    * `description`: One of the descriptions for the identifiers within this clique.
+  * `equivalent_identifiers`: a list of identifiers that are part of this clique given the conflation options.
+    Each identifier includes an `identifier` (a CURIE), a `label` (which corresponds to the label of the CURIE as per
+    its authoritative source), a `description` (currently only taken from UberGraph), and (if `individual_types` is set)
+    the Biolink type of each identifier. This list is ordered in the Biolink Model's preferred prefix order for this class.
+  * `type`: The list of Biolink classes for this clique, starting with the most specific type (in this example,
+    `biolink:SmallMolecule`), and ending with any mixins that include this class.
+  * `information_content`: the information content value between 0 and 100. This is calculated by retrieving the
+    [normalized information content value](https://github.com/INCATools/ubergraph/?tab=readme-ov-file#graph-organization)
+    for each identifier that is present in UberGraph, and then calculating the lowest information content value of
+    any identifier in this clique for which UberGraph has an identifier value. According to UberGraph's documentation,
+    the normalized information content value is "Precomputed information content score for each ontology class, based
+    on the count of terms related to a given term via rdfs:subClassOf or any existential relation. The scores are
+    xsd:decimal values scaled from 0 to 100 (e.g., a very specific term with no subclasses)."
+  * Internally, conflation is represented as sets of cliques that should be combined when that conflation is turned on.
+    This means that a conflated clique will be represented by a single list of equivalent identifiers, starting with the
+    equivalent identifiers from the first clique, followed by the equivalent identifiers from the second clique, and so
+    on. There is currently no way to retrieve the clique leaders (although
+    [this is a requested feature](https://github.com/TranslatorSRI/NodeNormalization/issues/320)), but you can use the
+    `individual_types` parameter to get a Biolink type for each identifier.
+
+## TRAPI Normalization
+
+### `/query`
+
+* Method: [POST](https://nodenormalization-sri.renci.org/docs#/default/query_query_post)
+* TODO
+
+### `/asyncquery`
+
+* Method: [POST](https://nodenormalization-sri.renci.org/docs#/default/async_query_asyncquery_post)
+* TODO
+
+## Sets
+
+### `/get_setid`
+
+This endpoint is used to calculate a `set ID` for a set of CURIEs. CURIEs that can be normalized will
+be normalized (using the conflation settings provided), and those that can't be will be left as is.
+Duplicate normalized CURIEs will be removed, even if two distinct CURIEs were passed to this endpoint
+but were normalized to the same CURIE. CURIEs will then be sorted in alphabetical order and a hash
+generated as a set ID for that set of CURIEs. A set ID is therefore unique to a set of normalized CURIEs for the curies
+passed in.
+
+* Method: [GET](https://nodenormalization-sri.renci.org/docs#/default/get_setid_get_setid_get)
+  * Parameters:
+    * `curie` (example: `curie=MESH:D014867&curie=NCIT:C34373`): The CURIEs to normalize as a set.
+    * `conflation` (optional, example: `conflation=GeneProtein&conflation=DrugChemical`): The conflations to apply.
+* Method: [POST](https://nodenormalization-sri.renci.org/docs#/default/get_setid_get_setid_post)
+  * POST Body: a JSON string representing a list of sets, where each set consists of:
+    * `curies` (e.g. `"MESH:D014867", "NCIT:C34373": A list of CURIEs to normalize as a set.
+    * `conflations` (optional, e.g. `["GeneProtein", "DrugChemical"]): A list of conflations to apply.
+
+Example output: note that the GET method will return a single object, while the POST method will
+return a list that corresponds to the list of sets sent to this endpoint for normalization.
+
+```json
+[
+  {
+    "curies": [
+      "NCIT:C34373",
+      "MESH:D014867",
+      "UNII:63M8RYN44N",
+      "RUBBISH:1234"
+    ],
+    "conflations": [
+      "GeneProtein",
+      "DrugChemical"
+    ],
+    "error": null,
+    "normalized_curies": [
+      "CHEBI:15377",
+      "MONDO:0004976",
+      "RUBBISH:1234"
+    ],
+    "normalized_string": "CHEBI:15377||MONDO:0004976||RUBBISH:1234",
+    "setid": "uuid:771d3c09-9a8c-5c46-8b85-97f481a90d40"
+  }
+]
+```
+
+Output values:
+* `curies`: The list of CURIEs passed to this endpoint for normalization.
+* `conflations`: The list of conflations to apply as passed to this endpoint.
+* `error`: Any error that occurred when normalizing this string. Note that a CURIE that cannot be normalized
+  does not count as an error.
+* `normalized_curies`: The list of unique normalized queries used to construct the setid.
+* `setid`: The setid calculated for this set.
+
+## Status
+
+### [/status](https://nodenormalization-sri.renci.org/docs#/default/status_get_status_get)
+
+This endpoint can be used to find out about the NodeNorm service and the underlying Redis databases.
+It can be useful to confirm whether the databases are fully loaded and how much memory is being used.
+
+* Methods: GET only
+* Parameters: None
+
+Example output:
+```json
+{
+  "status": "running",
+  "babel_version": "2025mar31",
+  "babel_version_url": "https://github.com/TranslatorSRI/Babel/blob/master/releases/2025mar31.md",
+  "databases": {
+    "eq_id_to_id_db": {
+      "dbname": "id-id",
+      "count": 677731045,
+      "used_memory_rss_human": "68.83G",
+      "is_cluster": false
+    },
+    [...]
+  }
+}
+```
+
+Output values:
+
+* `status` (example: `running`): Whether or not the service is running.
+* `babel_version` (example: `2025mar31`): The version of [Babel](https://github.com/TranslatorSRI/Babel) used to generate
+  the cliques being presented. These are usually date-based versions indicating approximately when the Babel build was
+  completed.
+* `babel_version_url` (example: https://github.com/TranslatorSRI/Babel/blob/master/releases/2025mar31.md): A URL you
+  can use to learn more about this version of Babel, and how it differs from previous and future versions.
+* `databases`: A dictionary of Redis key-value databases used by this NodeNorm instance (currently: 7). Each database
+  uses the internal name of this database as its key, along with the following information:
+  * `dbname`: A second name used for this database.
+  * `count`: The number of keys in this database.
+  * `used_memory_rss_human`: the `used_memory_rss_human` value returned by this Redis database, described
+    [in the Redis documentation](https://redis.io/docs/latest/commands/info/) as "Human readable representation of
+    [Number of bytes that Redis allocated as seen by the operating system (a.k.a resident set size). This is the number reported by tools such as top(1) and
+    ps(1)]."
+  * `is_cluster`: Whether this database is being used as part of a cluster or as a single node database.
+
+## Informational endpoints
+
+### `/get_allowed_conflations`
+
+* Method: [GET](https://nodenormalization-sri.renci.org/docs#/default/get_conflations_get_allowed_conflations_get)
+
+### `/get_semantic_types`
+
+* Method: [GET](https://nodenormalization-sri.renci.org/docs#/default/get_semantic_types_handler_get_semantic_types_get)
+
+### `/get_curie_prefixes`
+
+* Method: [GET](https://nodenormalization-sri.renci.org/docs#/default/get_curie_prefixes_handler_get_curie_prefixes_get)
+  * TODO
+* Method: [POST](https://nodenormalization-sri.renci.org/docs#/default/get_curie_prefixes_handler_get_curie_prefixes_post)
+  * TODO
+* TODO

From 420c1d8f3566901259cd5bc78af404d201d85b70 Mon Sep 17 00:00:00 2001
From: Gaurav Vaidya <gaurav@ggvaidya.com>
Date: Tue, 1 Jul 2025 14:33:35 -0400
Subject: [PATCH 2/2] Documented additional methods.

---
 documentation/API.md | 119 +++++++++++++++++++++++++++++++++++++------
 1 file changed, 103 insertions(+), 16 deletions(-)

diff --git a/documentation/API.md b/documentation/API.md
index 41f9798..5231d69 100644
--- a/documentation/API.md
+++ b/documentation/API.md
@@ -98,18 +98,6 @@ Example output:
     [this is a requested feature](https://github.com/TranslatorSRI/NodeNormalization/issues/320)), but you can use the
     `individual_types` parameter to get a Biolink type for each identifier.
 
-## TRAPI Normalization
-
-### `/query`
-
-* Method: [POST](https://nodenormalization-sri.renci.org/docs#/default/query_query_post)
-* TODO
-
-### `/asyncquery`
-
-* Method: [POST](https://nodenormalization-sri.renci.org/docs#/default/async_query_asyncquery_post)
-* TODO
-
 ## Sets
 
 ### `/get_setid`
@@ -174,7 +162,7 @@ This endpoint can be used to find out about the NodeNorm service and the underly
 It can be useful to confirm whether the databases are fully loaded and how much memory is being used.
 
 * Methods: GET only
-* Parameters: None
+* No parameters.
 
 Example output:
 ```json
@@ -216,16 +204,115 @@ Output values:
 
 ### `/get_allowed_conflations`
 
+Returns a list of the supported conflations.
+
 * Method: [GET](https://nodenormalization-sri.renci.org/docs#/default/get_conflations_get_allowed_conflations_get)
+* No parameters.
+
+Example output:
+
+```json
+{
+  "conflations": [
+    "GeneProtein",
+    "DrugChemical"
+  ]
+}
+```
 
 ### `/get_semantic_types`
 
+Returns a list of all the Biolink types/classes that this instance of NodeNorm has at least one identifier for.
+
 * Method: [GET](https://nodenormalization-sri.renci.org/docs#/default/get_semantic_types_handler_get_semantic_types_get)
 
+Example output:
+
+```json
+{
+  "semantic_types": {
+    "types": [
+      "biolink:NucleicAcidEntity",
+      "biolink:ActivityAndBehavior",
+      "biolink:PhysicalEssence",
+      "biolink:StudyPopulation",
+      "biolink:PhysicalEssenceOrOccurrent",
+      "biolink:GenomicEntity",
+      "biolink:Protein",
+      "biolink:Event",
+      [...]
+    ]
+  }
+}
+```
+
 ### `/get_curie_prefixes`
 
+Returns a list of CURIE prefixes for zero or more Biolink types, as well as the number of identifiers for each prefix.
+
+These are generated when the Babel compendia are loaded into NodeNorm, and I haven't verified if they are
+accurate — I'm more confident about the Babel reports, but I haven't checked them against each other.
+
 * Method: [GET](https://nodenormalization-sri.renci.org/docs#/default/get_curie_prefixes_handler_get_curie_prefixes_get)
-  * TODO
+  * Parameters:
+    * `semantic_type` (optional, e.g. `semantic_type=biolink:ChemicalEntity&semantic_type=biolink:AnatomicalEntity`)
+    * Without a `semantic_type`, every semantic type is returned.
 * Method: [POST](https://nodenormalization-sri.renci.org/docs#/default/get_curie_prefixes_handler_get_curie_prefixes_post)
-  * TODO
-* TODO
+  * POST Body: `{"semantic_types": ["biolink:ChemicalEntity", "biolink:AnatomicalEntity"]}`
+
+Example output:
+```json
+{
+  "biolink:ChemicalEntity": {
+    "curie_prefix": {
+      "PUBCHEM.COMPOUND": "119397095",
+      "INCHIKEY": "115661650",
+      "CHEMBL.COMPOUND": "2496527",
+      "CAS": "4029002",
+      "CHEBI": "200507",
+      "HMDB": "217920",
+      "MESH": "258506",
+      "UMLS": "668019",
+      "KEGG.COMPOUND": "16035",
+      "UNII": "134411",
+      "DRUGBANK": "16108",
+      "GTOPDB": "12953",
+      "DrugCentral": "4995",
+      "RXCUI": "124852"
+    }
+  },
+  "biolink:AnatomicalEntity": {
+    "curie_prefix": {
+      "UMLS": "159496",
+      "FMA": "98632",
+      "MESH": "1992",
+      "UBERON": "14513",
+      "NCIT": "10223",
+      "EMAPA": "968",
+      "ZFA": "607",
+      "FBbt": "117",
+      "WBbt": "18",
+      "CL": "2865",
+      "SNOMEDCT": "1421",
+      "GO": "4022"
+    }
+  }
+}
+```
+
+## TRAPI Normalization (deprecated)
+
+These methods 
+
+### `/query`
+
+Normalizes all the identifiers in a [TRAPI](https://github.com/NCATSTranslator/ReasonerAPI) message.
+
+* Method: [POST](https://nodenormalization-sri.renci.org/docs#/default/query_query_post)
+
+### `/asyncquery`
+
+Identical to `/query`, but returns a URL that the requester can use to poll for the response
+rather than waiting for the request to complete.
+
+* Method: [POST](https://nodenormalization-sri.renci.org/docs#/default/async_query_asyncquery_post)