Skip to content

Add vs00 timed variable string events#102

Open
geishm-ansto wants to merge 2 commits intoess-dmsc:masterfrom
geishm-ansto:string-event
Open

Add vs00 timed variable string events#102
geishm-ansto wants to merge 2 commits intoess-dmsc:masterfrom
geishm-ansto:string-event

Conversation

@geishm-ansto
Copy link
Copy Markdown
Contributor

Add a new flatbuffer description that is useful at ANSTO to support logging and variable strings in the nxs file writer.
The read me file was updated and there is no breaking changes.

Approval Criteria

This PR should not be merged until the ECDC Group Leader (acting or permanent) has given their explicit approval in the comments section.
SCIPP/DRAM should also be consulted on changes which may affect them.

@rerpha
Copy link
Copy Markdown
Contributor

rerpha commented Oct 6, 2025

Hi @geishm-ansto , we were thinking of doing something similar at ISIS - nexusformat/definitions#1432 by first adding a new nexus base class (then subsequently a flatbuffers schema here for EPICS string PV value updates) that adds support for strings in the same way NXLog works. I don't know if this fits your use case too?

@geishm-ansto
Copy link
Copy Markdown
Contributor Author

Hi @rerpha , it's possible that we could use it but I would need to see the details. At the moment we have implemented vs00 in a local ANSTO variant of the streaming data types and nxs writer but didn't want to add a different python package for reading the data type so raised the PR to see if there was any interest. We wanted to be able to capture system log messages during an experiment using a variable string format.

@ggoneiESS
Copy link
Copy Markdown
Member

For some reason we didn't get a notification on this - I'm adding a comment so that I am updated. But yes, the current issue is writing it out in a NeXusy way

@ggoneiESS
Copy link
Copy Markdown
Member

I was reminded about this today.

Are either of you using this in the filewriter? I have worked with strings before in the modules there and it can be a bit difficult when using variable lengths.

If there was an additional entry in the flatbuffer to specify size of the string it would be an improvement.

It would also be good to know about use cases.

@geishm-ansto
Copy link
Copy Markdown
Contributor Author

@ggoneiESS Hi, we have a local Ansto branch of the filewriter and within that I have added support for the 'vs00' flatbuffer. We use it primarily to record logging events. It required adding a Variablestring class to the ExtensibleDataset component and a vs00 writer module.
I believe adding the string length is not necessary as it is already managed at the lower level.

/// \brief
class VariableString : public hdf5::node::ChunkedDataset {
public:
VariableString() = default;
/// \brief Create/open a fixed string length datatset.
///
/// \param Parent The group/node where the dataset is to be located.
/// \param Name The name of the dataset.
/// \param CMode Should the dataset be opened or created.
/// \param ChunkSize The number of strings in one chunk.
VariableString(const hdf5::node::Group &Parent, std::string Name, Mode CMode,
size_t ChunkSize = 1024);

/// \brief Append a new string to the dataset array
///
/// \param InString The string that is to be appended to the dataset.
void appendStringElement(std::string const &InString);

private:
hdf5::datatype::String StringType;
size_t NrOfStrings{0};
};

VariableString::VariableString(const hdf5::node::Group &Parent,
std::string Name, Mode CMode, size_t ChunkSize)
: hdf5::node::ChunkedDataset(),
StringType(hdf5::datatype::String::variable()) {

if (Mode::Create == CMode) {
hdf5::Dimensions ChunkDims{ChunkSize};
hdf5::dataspace::Simple Space({0}, {hdf5::dataspace::Simple::unlimited});
Dataset::operator=(hdf5::node::ChunkedDataset(
Parent, Name, StringType, Space, ChunkDims));
} else if (Mode::Open == CMode) {
Dataset::operator=(Parent.get_dataset(Name));
NrOfStrings = static_cast<size_t>(dataspace().size());
} else {
throw std::runtime_error(
"VariableStringValue::VariableStringValue(): Unknown mode.");
}
}

void VariableString::appendStringElement(std::string const &InString) {
Dataset::extent(0, 1); // Extend by 1 element along dimension 0
hdf5::dataspace::Hyperslab Selection{{NrOfStrings}, {1}};
write(InString, Selection);
++NrOfStrings;
}

@rerpha
Copy link
Copy Markdown
Contributor

rerpha commented Jan 12, 2026

we aren't using it currently, but may do in the future for generic string diagnostic stuff, even if/when nexusformat/definitions#1590 is accepted and we make a new schema for SE strings.

@ggoneiESS
Copy link
Copy Markdown
Member

I have done a bit of a refresher, but haven't done a deep-dive into the implementation in hdf5 2.0.0 (we aren't using that yet but we will this year).

I still worry a bit about the idea of variable-length strings. If this is used rarely (in comparison to e.g. detector data etc) it's not a big deal but:

  • variable-length datasets cannot be compressed
  • the data no longer exists contiguously (it necessarily becomes an array of pointers to strings, rather than just raw data)

And (academic but technical arguments)

  • heap storage requires more space than regular 'raw data' storage (i.e. how the HDF5 object exists in memory)
  • general reduction in I/O efficiency because it requires individual write operations for each data element rather than one write per dataset chunk (actually, chunking isn't allowed at all)

Performance is definitely at a premium V storage.

I found this via the HDF5 clinic - https://steven-varga.ca/blog/hdf5-fixed-vs-variable-benchmark/ and it provides a CPP file. It might be possible to incorporate into a filewriter test.

@ggoneiESS
Copy link
Copy Markdown
Member

I haven't quite forgotten about this!

My writing objection wasn't very well thought through, there's no reason it needs to be part of the schema (it can be handled by kafka-to-nexus or whatever software), and it shouldn't affect message brokers (although it's perhaps advisable to keep message size down I'm not sure that needs to be recorded in the schema). I'm going to set up a proper workflow for this @ ESS; I assume we have implicit approval from:

  • ISIS ✅
  • ANSTO ✅
  • PSI

@ggoneiESS
Copy link
Copy Markdown
Member

@geishm-ansto two questions:

  • did you also consider just adding a new type to f144 and creating an f145? (I can see some value in keeping numeric data and text-like data for similar reasons as to those made by NeXus)
  • any objections or reasons why we shouldn't also allow an array of strings?

@geishm-ansto
Copy link
Copy Markdown
Contributor Author

@ggoneiESS
Correct me if I'm wrong but I believe f142 originally had a string option and it was deliberately removed. I know that the nexus writer at the time didn't support strings. I'm not sure what the issue was as Jonas had left by that time, maybe numeric and text values were considered fundamentally different. I needed variable string support in the short term and just implemented vs00 locally. I generally prefer simpler flatbuffer definitions as the kafka-to-nexus writer modules and test cases are much cleaner.

@rerpha
Copy link
Copy Markdown
Contributor

rerpha commented Mar 25, 2026

The Nexus standard itself doesn't actually support writing strings in an NXlog (see here - its NXNUMBER) - this is why we created nexusformat/definitions#1590

We also need it at ISIS as we have string PVs that need to be written to a file. VS00 is probably fine for a short term fix for us, but it'd be nice if the nexus standard itself actually supported it too.

@ggoneiESS
Copy link
Copy Markdown
Member

@geishm-ansto this was before my time but let's then keep this separation - agreed about the modules and tests. They are indeed implemented 'under the hood' in a slightly different way, mostly because an e.g. uint16 is always an uint16, but a string is far more complicated.

@rerpha it was because I am trying to work on some tooling and NXtextlog was failing (since it isn't actually merged yet) that made me think about this MR again. But we can have this in place already.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants