Skip to content

Fix stats print on ARM due to indeterminate state of "va_list" parameter after vasprintf (take 2)#935

Merged
n-eiling merged 7 commits into
masterfrom
fix-table-print-arm
Jul 16, 2025
Merged

Fix stats print on ARM due to indeterminate state of "va_list" parameter after vasprintf (take 2)#935
n-eiling merged 7 commits into
masterfrom
fix-table-print-arm

Conversation

@stv0g
Copy link
Copy Markdown
Contributor

@stv0g stv0g commented Jun 27, 2025

This is another take on #933 / #678.

I think, I found a nice approach by pre-building the format string for a whole table row ince during the initialization. That allows us to use a single vsprintf call to format the whole row.

What do you think?

I also tweaked the table rendering a bit to fix some miscalculated column width.

Closes #678

@n-eiling
Copy link
Copy Markdown
Member

If you are at it: Wouldn't it be nice to not use an allocating printf? We should be able to know the (max) column size and therefore should be able to use a fixed size buffer. Also, an option to buffer the output by pre-allocating a bunch of columns would be nice for real-time code. Or even pre-allocating all rows if we are running for a known time. With the current code, the stats hook is not usable in low time step real-time scenarios.

@stv0g
Copy link
Copy Markdown
Contributor Author

stv0g commented Jun 27, 2025

@n-eiling Thanks for the rewiew. I rewrote it again to use fmtlib argument lists. This lets us ditch va_list completely. And we are using C++ templates for proper variadic functions.

Furthtermore, we should save a bunch of allocations as all of that is handled by fmtlib/spdlog.

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

  1. Did you / would you like me to test this on HW?
  2. Pipeline fails

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

IgnoreWarnings commented Jun 27, 2025

@n-eiling Thanks for the rewiew. I rewrote it again to use fmtlib argument lists. This lets us ditch va_list completely. And we are using C++ templates for proper variadic functions.

Furthtermore, we should save a bunch of allocations as all of that is handled by fmtlib/spdlog.

I think the same about the allocations (as I answered in #933)

@n-eiling
Copy link
Copy Markdown
Member

n-eiling commented Jun 27, 2025

This still allocates in the real-time critical path, or not? I think fmt is better for sure. If we ever need to output data in real-time we probably should create a separate hook similar to what I implemented in DPsim (https://github.com/sogno-platform/dpsim/blob/master/dpsim/src/RealTimeDataLogger.cpp)

@IgnoreWarnings yes please test as much as possible :)

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

Ping me when you reache a final state, the I will test the branch on arm hardware.

@stv0g stv0g force-pushed the fix-table-print-arm branch from e9c0464 to 604f80e Compare July 3, 2025 18:57
@stv0g
Copy link
Copy Markdown
Contributor Author

stv0g commented Jul 3, 2025

Ping me when you reache a final state, the I will test the branch on arm hardware.

I fixed the broken pre-commit check. The CI should be green in a few minutes.

From my side, this is ready for test. I have already validated the fix on a ARM64 system (Macbook M4 Pro with a Linux VM).

@n-eiling
Copy link
Copy Markdown
Member

n-eiling commented Jul 4, 2025

Just a note: I think testing adding a raspberry pi VM to the CI could be a good idea.

@stv0g
Copy link
Copy Markdown
Contributor Author

stv0g commented Jul 4, 2025

Just a note: I think testing adding a raspberry pi VM to the CI could be a good idea.

We have a GitLab CI runner on a Raspberry Pi 5 in the ACS lab:
https://git.rwth-aachen.de/acs/public/villas/node/-/jobs/6431303

Its currently only building the code / Docker image. We dont do any tests on it so far.

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

Ping me when you reache a final state, the I will test the branch on arm hardware.

I fixed the broken pre-commit check. The CI should be green in a few minutes.

From my side, this is ready for test. I have already validated the fix on a ARM64 system (Macbook M4 Pro with a Linux VM).

The pipeline is still not happy :(.

@stv0g stv0g force-pushed the fix-table-print-arm branch 2 times, most recently from bfdafa0 to bd505a3 Compare July 9, 2025 10:46
@stv0g
Copy link
Copy Markdown
Contributor Author

stv0g commented Jul 9, 2025

The pipeline is still not happy :(.

Okay, I fixed the code formatting again. Lets see what it brings.

@stv0g stv0g changed the title Fix #678: Fix stats print on ARM due to indeterminate state of "va_list" parameter after vasprintf (take 2) Fix stats print on ARM due to indeterminate state of "va_list" parameter after vasprintf (take 2) Jul 9, 2025
@stv0g
Copy link
Copy Markdown
Contributor Author

stv0g commented Jul 9, 2025

@IgnoreWarnings, the pipeline passed now :) Did you already had a chance to test it on real hardware?

n-eiling
n-eiling previously approved these changes Jul 9, 2025
Copy link
Copy Markdown
Member

@n-eiling n-eiling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please wait for @IgnoreWarnings tests :)

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

@IgnoreWarnings, the pipeline passed now :) Did you already had a chance to test it on real hardware?

I checked out your branch and it doesnt build due to some rtc error. It seems like it is not related to your changes but some other update on the master. My Pi just crashed, I will send the compiler output tomorrow.

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

@stv0g
2 months ago you pushed S13d62e29c on master. From this commit I get a compiler error:

[build] [ 54%] Building CXX object lib/nodes/CMakeFiles/nodes.dir/webrtc.cpp.o
[build] .../node/lib/nodes/webrtc.cpp: In member function ‘virtual const std::string& villas::node::WebRTCNode::getDetails()’:
[build] .../node/lib/nodes/webrtc.cpp:191:42: error: ‘struct rtc::Reliability’ has no member named ‘maxRetransmits’
[build]   191 |                         *dci.reliability.maxRetransmits,
[build]       |                                          ^~~~~~~~~~~~~~
[build] gmake[2]: *** [lib/nodes/CMakeFiles/nodes.dir/build.make:440: lib/nodes/CMakeFiles/nodes.dir/webrtc.cpp.o] Error 1
[build] gmake[1]: *** [CMakeFiles/Makefile2:492: lib/nodes/CMakeFiles/nodes.dir/all] Error 2
[build] gmake: *** [Makefile:156: all] Error 2

@n-eiling
Copy link
Copy Markdown
Member

I don't think this is related to this PR. The webrtc node type is notorious for not compiling on various platforms. I assume this is a version incompatibility? Maybe we should download and compile a specific version in cmake similar to libwebsockets...

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

I don't think this is related to this PR. The webrtc node type is notorious for not compiling on various platforms. I assume this is a version incompatibility? Maybe we should download and compile a specific version in cmake similar to libwebsockets...

It isn't caused by this PR but prevents me from testing. Also the master should work (again) on pi.

@stv0g
Copy link
Copy Markdown
Contributor Author

stv0g commented Jul 14, 2025

@IgnoreWarnings Which version of libdatachannel are you using?

Our deps.sh scripts is installing v0.22.6:

git clone ${GIT_OPTS} --recursive --branch v0.22.6 https://github.com/paullouisageneau/libdatachannel.git

This version includes the missing struct field which raises the error you posted above.

However, I noticed that we are not enforcing a version in the CMakeLists.txt:

find_package(LibDataChannel)

Similarily, also the the deps.sh is not performing a version check.. But just checks for availability.. Thats something we should fix..

@stv0g
Copy link
Copy Markdown
Contributor Author

stv0g commented Jul 14, 2025

@IgnoreWarnings Could you test it with cmake .. -DWITH_WEBRTC=OFF ?

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

@IgnoreWarnings Which version of libdatachannel are you using?

Our deps.sh scripts is installing v0.22.6:

git clone ${GIT_OPTS} --recursive --branch v0.22.6 https://github.com/paullouisageneau/libdatachannel.git

This version includes the missing struct field which raises the error you posted above.

However, I noticed that we are not enforcing a version in the CMakeLists.txt:

find_package(LibDataChannel)

Similarily, also the the deps.sh is not performing a version check.. But just checks for availability.. Thats something we should fix..

I fixed the compiler error by upgrading from 0.18.4 to 0.22.6 and pinning the version in cmake find package.
Adding the version check/update to deps.sh / cmake would be nice to prevent such errors.

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

@stv0g
commit bd505a3 (HEAD -> fix-table-print-arm, origin/fix-table-print-arm)

13:26:52 info stats             Node          │          Recv │          Sent │          Drop │          Skip │     OWD last │     OWD mean │    Rate last │    Rate mean │     Age mean │      Age Max │ Signals
13:26:52 info stats                           │          pkts │          pkts │          pkts │          pkts │         secs │         secs │      pkt/sec │      pkt/sec │         secs │          sec │ signals
13:26:52 info stats            ───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼─────────
[*** LOG ERROR #0001 ***] [2025-07-14 15:26:53] [stats] {unmatched '}' in format string}
[*** LOG ERROR #0003 ***] [2025-07-14 15:26:55] [stats] {unmatched '}' in format string}
[*** LOG ERROR #0005 ***] [2025-07-14 15:26:57] [stats] {unmatched '}' in format string}
[*** LOG ERROR #0006 ***] [2025-07-14 15:26:58] [stats] {unmatched '}' in format string}
^C13:26:59 info node             Received Interrupt signal. Terminating...

Comment thread common/lib/table.cpp Outdated
Signed-off-by: Steffen Vogel <steffen.vogel@opal-rt.com>
Signed-off-by: Steffen Vogel <steffen.vogel@opal-rt.com>
Signed-off-by: Steffen Vogel <steffen.vogel@opal-rt.com>
Signed-off-by: Steffen Vogel <steffen.vogel@opal-rt.com>
Signed-off-by: Steffen Vogel <steffen.vogel@opal-rt.com>
@stv0g
Copy link
Copy Markdown
Contributor Author

stv0g commented Jul 15, 2025

Oops my bad

@stv0g stv0g force-pushed the fix-table-print-arm branch from bd505a3 to dbd4f5a Compare July 15, 2025 09:20
@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

@stv0g pipeline failed integration timing, can you restart it?

@n-eiling
Copy link
Copy Markdown
Member

n-eiling commented Jul 15, 2025

@IgnoreWarnings I restarted it. It's ok now.

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

@stv0g commit bd505a3 (HEAD -> fix-table-print-arm, origin/fix-table-print-arm)

13:26:52 info stats             Node          │          Recv │          Sent │          Drop │          Skip │     OWD last │     OWD mean │    Rate last │    Rate mean │     Age mean │      Age Max │ Signals
13:26:52 info stats                           │          pkts │          pkts │          pkts │          pkts │         secs │         secs │      pkt/sec │      pkt/sec │         secs │          sec │ signals
13:26:52 info stats            ───────────────┼───────────────┼───────────────┼───────────────┼───────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼─────────
[*** LOG ERROR #0001 ***] [2025-07-14 15:26:53] [stats] {unmatched '}' in format string}
[*** LOG ERROR #0003 ***] [2025-07-14 15:26:55] [stats] {unmatched '}' in format string}
[*** LOG ERROR #0005 ***] [2025-07-14 15:26:57] [stats] {unmatched '}' in format string}
[*** LOG ERROR #0006 ***] [2025-07-14 15:26:58] [stats] {unmatched '}' in format string}
^C13:26:59 info node             Received Interrupt signal. Terminating...

@stv0g I still get the same error.
commit dbd4f5a (HEAD -> fix-table-print-arm, origin/fix-table-print-arm)

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

I heared you want to roll out soon and just wanted to remind you (since this PR was already opened 3 weeks ago) that I have a confirmed working hotfix in #933.

Copy link
Copy Markdown
Contributor

@pjungkamp pjungkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know whether the fmt::runtime_format_string<> type could be used as the type of the Table::rowFormat class member variable. It might be worth a try if this saves time in the Table::row.

Comment thread common/include/villas/table.hpp Outdated
Comment thread common/include/villas/log.hpp Outdated
Comment thread common/lib/table.cpp Outdated
Comment thread common/lib/table.cpp
n-eiling and others added 2 commits July 16, 2025 08:57
Signed-off-by: Niklas Eiling <niklas@eil.ing>
…dards

Co-authored-by: Philipp Jungkamp <philipp.jungkamp@rwth-aachen.de>
Signed-off-by: Niklas Eiling <niklas@eil.ing>
@stv0g
Copy link
Copy Markdown
Contributor Author

stv0g commented Jul 16, 2025

Thanks for you support 👍🏻

@stv0g stv0g requested review from n-eiling and pjungkamp July 16, 2025 07:10
@stv0g
Copy link
Copy Markdown
Contributor Author

stv0g commented Jul 16, 2025

Do you mind approving, so we can merge once the CI passes?

@n-eiling
Copy link
Copy Markdown
Member

Let's wait for @IgnoreWarnings to check if everything works.

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

Let's wait for @IgnoreWarnings to check if everything works.

Will test in couple of hours.

@IgnoreWarnings
Copy link
Copy Markdown
Collaborator

The numbers look right now on arm. The table header isnt aligned perfectly, but i think it wasnt aligned before.

10:28:26 info stats             Node          │          Recv │          Sent │          Drop │         Skip │     OWD last │     OWD mean │    Rate last │    Rate mean │     Age mean │      Age Max │ Signals
10:28:26 info stats                           │          pkts │          pkts │          pkts │         pkts │         secs │         secs │      pkt/sec │      pkt/sec │         secs │          sec │ signals
10:28:26 info stats            ───────────────┼───────────────┼───────────────┼───────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼──────────────┼─────────
10:28:27 info stats             signal_gen_0 │             0│             0│             0│            0│     0.000000│          nan│          inf│          nan│          nan│     0.000000│       1
10:28:28 info stats             signal_gen_0 │             1│             0│             0│            0│     0.999832│     0.999832│     1.000008│     1.000008│          nan│     0.000000│       1
10:28:29 info stats             signal_gen_0 │             2│             0│             0│            0│     0.999904│     0.999868│     1.000000│     1.000004│          nan│     0.000000│       1
10:28:30 info stats             signal_gen_0 │             3│             0│             0│            0│     0.999904│     0.999880│     1.000000│     1.000003│          nan│     0.000000│       1

@stv0g stv0g enabled auto-merge (rebase) July 16, 2025 11:05
@stv0g stv0g disabled auto-merge July 16, 2025 11:05
@n-eiling n-eiling merged commit 8bc146a into master Jul 16, 2025
3 checks passed
@n-eiling n-eiling deleted the fix-table-print-arm branch July 16, 2025 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Broken stats on aarch64 release build

5 participants