Skip to content

Conversation

@jmarshall
Copy link
Member

@jmarshall jmarshall commented Feb 3, 2021

Apologies for the belated followup to PR #1170. This fixes some minor API bugs, moves some code around to a better place for it, and renames the hFILE plugin introspection functions:

  • Use unsigned int consistently, and leave gaps between the allocated bits in different categories (general/transport/compression/build) for future additions.

  • Users will find "htslib/hfile.h" to be an odd place for the general-purpose htslib_features() etc functions. This moves them to "htslib/hts.h" alongside the existing hts_version() functions — as hts.h is already a bit of a grab bag of everything, this is a good place for it. The hFILE introspection functions of course stay in hfile.h.

  • The hFILE plugin mechanism is the only current plugin endpoint in HTSlib, but the facility is general so that there could be other plugin endpoints added in future, e.g., for other compression methods. So rename these introspection functions to have a hfile_ prefix as they introspect only the hFILE plugins and URL scheme machinery.

@jmarshall
Copy link
Member Author

(Added another commit to resolve the conflict with PR #1211's fix of the same Makefile buglet. This commit should probably be omitted in the event this PR is rebased or merged.)

@jmarshall
Copy link
Member Author

You might also consider renaming the general-purpose introspection functions as hts_features(), hts_test_feature(), and hts_feature_string() — thus following the existing convention e.g. hts_version(). At present there are no public htslib_foo() function names.

@jkbonfield
Copy link
Contributor

jkbonfield commented Feb 3, 2021

Is there any guidance on naming or file structure? I confess I don't really understand what should be in what file as it doesn't feel particularly cleanly separated to me. I chose hfile because it felt like the best fit, but clearly you disagree (and you're the author of both that and hts.c).

The only documentation I could find is counter to your changes. Eg hts.h explicitly states:

Format-neutral I/O, indexing, and iterator API functions.

This is obviously at odds with introspection functions. As for that matter is hfile.h (Buffered low-level input/output streams). Both seem to contain other unrelated stuff. Maybe we need a different header completely (Edit: it's why I initially chose neither and they started off life in hts_os.h which was the general rag-tag of OS specific things, which felt like a better fit for introspection queries to me)? Either that or fix the documentation to match reality.

PS. Don't even get me started on sam.h, which appears to include not only the kitchen sink but the entire scullery.

@jmarshall
Copy link
Member Author

No real guidance that I'm aware of I'm afraid 😄 IMHO there's a division between relatively well-delineated stuff (most .h and .c files) and general dumping grounds (hts.h, sam.h).

So e.g. hfile.h is pretty well-delineated and focussed on hFILE and its “subclasses” and plugins. Hence the hFILE plugin introspection parts have a natural home there (and their implementations need to be in hfile.c to get at the local variables of interest).

Yes, hts.h says “Format-neutral I/O, indexing, and iterator API functions” as a summary, but that's mainly a polite way of saying “That format-neutral stuff and a random grab bag of other generic htsfoo stuff”. Similarly, whatever the comment at the top of sam.h says, what it means is “random grab bag of everything to do with SAM(/BAM/CRAM generically) sequence data files”.

It'll be convenient for API users to find the HTS_FEATURE_* functionality
in htslib/hts.h alongside hts_version() etc.

Fix the missing quote marks for -DHTS_CPPFLAGS by recoding to implement
this via a new generated config_vars.h header instead. Similarly to the
version.h and config.h rules, this avoids the fragile quoting needed to
add possibly-whitespace-containing string values via -D options.

Ignore config_vars.h and some recently-added test executables.
Leave gaps in the allocated bits between the different feature
categories. Leave a gap at the end for HTS_FEATURE_LIBS if it is
added in future. Don't mention htslib_plugin_path() in the public
header as it is an internal function.

Change htslib_test_feature() to use unsigned int, as per htslib_features().
The hFILE plugin mechanism is the only current plugin endpoint in HTSlib,
but the facility is general so that there could be other plugin endpoints
added in future, e.g., for other compression methods.

Rename these introspection functions as they pertain to hFILE plugins only.
@jkbonfield
Copy link
Contributor

Thanks John, I'll rebase and merge this.

It occurs to me that I forgot to do part 2 of this - a samtools version command to dump out more verbose information on version and configuration; akin to perl's perl -V output. I vaguely recall not doing it because I was unsure if this would ever get acceptance, which was further evidenced by it being ignored for a while.

Trivial to do though so I'll knock one up. :-)

@jmarshall
Copy link
Member Author

Excellent, thanks.

I'd draw your attention to the question in #1222 (comment) although TBH htslib_ is probably the better prefix for these whole-library queries and I somewhat wish I'd used it for hts_version(). (I think it was just lack of precedent that made that follow the hts_ that everything else at the time used.)

At the risk of opening a can of worms: Is there any version infrastructure for htscodecs? If you use the standard submodule commit as recorded in the htslib commit (rather than submodule update --remote) there is no issue, but e.g. if you use htslib with an external htscodecs…

@jkbonfield
Copy link
Contributor

I'm unsure what to do about the introspection stuff - I can see both hts and htslib having benefits. I think maybe for consistency they should be renamed though. Hmm.

Htscodecs versioning follows the strict major, minor, patchlevel rules dictated by GNU and autotools (which is compatible with the usual versioning numbers). It doesn't have any introspection functions for querying the version number. It's probably something we should add though.

I know Rob was doing something about submodule configurations and the ways it tracks, but tbh I'm still learning what works and what doesn't.

@jmarshall
Copy link
Member Author

I'll leave hts vs htslib in your and Rob's et al's capable hands 😄

I had in mind whether the associated htscodecs version was something that needed to be added to hts_version()'s output string (somehow!) or whether HTSlib needs an additional function to report it (hts_codecs_version()?) or …

This maintains consistency with the existing hts_version() function.

Note this also renames the internal htslib_plugin_path function too,
again simply for consistency.  Not it queries the HTS_PATH and not
HTSLIB_PATH environment variable, so the new name is better.
@jkbonfield
Copy link
Contributor

I think htscodecs is a sufficiently different issue that it can come as a new PR. It'll need to be done in sync with changes there too.

Meanwhile, I took the liberty of pushing an additional commit to this branch to do the renaming. Let me know what you think and if you're happy I'll hit the merge button.

@jmarshall
Copy link
Member Author

Looks good for the public functions.

For htslib_plugin_path(), as it's an internal function I didn't complicate this PR with it, but I have some thoughts about its naming: 😄

The existing path-handling functions that this introspection function is associated with are hts_path_itr_setup() and hts_path_itr_next(). So the name I would suggest would be hts_path_string() alongside those two.

And the flexible signature would parallel hts_path_itr_setup()'s:

const char *hts_path_string(const char *path, const char *builtin_path)
{
    if (! builtin_path) builtin_path = PLUGINPATH;
    if (! path) path = getenv("HTS_PATH");
    // …

Or expose the kstring_t to avoid the non-reentrancy and truncation:

const char *hts_path_expand(kstring_t *dest, const char *path, const char *builtin_path)
{
    ks_clear(dest);
    // set path & builtin_path as before
    // …

Anyway, as it's internal all that can always be made more flexible later if it's useful…

@jkbonfield jkbonfield merged commit d9890a9 into samtools:develop Feb 4, 2021
@jmarshall
Copy link
Member Author

👍 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants