Skip to content

Conversation

@marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Jun 23, 2021

5 commits, per-commit review is recommended.

This is a re-submission/recycling of 85% of the earlier "restore early tracing" PR previously reviewed in #4334: everything except the actual and final restoration of early tracing which is currently failing on BDW and BYT which I have no time to debug right now. Maybe later.

@marc-hb marc-hb marked this pull request as ready for review June 23, 2021 03:46
@marc-hb marc-hb mentioned this pull request Jun 23, 2021
@marc-hb marc-hb requested review from kv2019i and paulstelian97 June 23, 2021 03:49
lyakh
lyakh previously requested changes Jun 23, 2021
@paulstelian97
Copy link
Collaborator

CC @iuliana-prodan The commit "timer.h: add platform_safe_get_time(timer) and use it in trace.c" is potentially interesting after the discussions yesterday.

@iuliana-prodan
Copy link
Contributor

CC @iuliana-prodan The commit "timer.h: add platform_safe_get_time(timer) and use it in trace.c" is potentially interesting after the discussions yesterday.

@paulstelian97 I was writing my review/questions while you were writing this :)

Copy link
Collaborator

@paulstelian97 paulstelian97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than existing comments this looks good. I'll wait for those to be resolved before approve though.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jun 25, 2021

If a SOF_MEM_ZONE_SYS_SHARED allocation fails you'll get a panic, no need to check

Thanks @lyakh , very useful. I changed the comment and... tried to use panics, which cost me 1+ day because of bug #1346 :-(

To avoid this happening again I'm adding some panic.c comments and a workaround suggestion in #4401

Copy link
Member

@lgirdwood lgirdwood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one question on whether we want a new timer API or just make the current one safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not be better to make the default version safe ? (and return time 0 to indicate no timer running)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, I don't know. You tell me. So far I've been spending at least 80% of my time trying to fix issues fairly distant from my actual objectives so this time I tried to minimize the changes but maybe I shouldn't have?

There is a variety of platform_timer_get(timer) implementations. Some implementations don't even look at their timer argument: so these platforms would then lose even more timestamps for no real reason.

At least one platform performs an assert(timer), maybe because it was considered valuable not to keep running with broken timers? Tracing is still useful without timestamps but I wouldn't be so sure about other areas: I can easily imagine some threads stuck forever.

Considering such "diversity" I chose not to open that can of worms and to change only the area in this PR. As of now the new "safe" function is meant to be used only in areas with tricky boot dependencies; like tracing.

(and return time 0 to indicate no timer running)

As explained in the comment, I'm not returning zero but on small value on purpose: because zero looks like something is wrong.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 looks perfectly fine to me. Look at dmesg soon after a system boot - the first messages are marked with time 0. Looks much more logical than any random "small value" to me.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This non-zero value is useful to make the difference with platform_timer_get(timer) returning zero (for good or bad reasons). When everyone and everything returns zero including buggy code (but not just), then there's no way to tell where it came from.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jun 27, 2021

I just pushed v3 5c9f2e8 that adds mtrace_printf() for Zephyr too, I had forgotten to extract that from my stack or other WIP.

EDIT: https://sof-ci.01.org/sofpr/PR4389/build9504/devicetest is suspiciously all green

Copy link
Collaborator

@lyakh lyakh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of minor suggestions, should be good after that

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 looks perfectly fine to me. Look at dmesg soon after a system boot - the first messages are marked with time 0. Looks much more logical than any random "small value" to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why (void *)? the first argument of put_header() is of type uint32_t *

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because packet is a char []. I'll make the first argument of put_header() a void *, that requires no other change anywhere, not even in the put_header() code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed? It's true, that memory cannot be freed, but the allocated buffer is still there and it's valid... Any specific need for this change? dma_trace_initialized() would return false without it too anyway?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dma_trace_initialized() would return false without it too anyway?

Yes but some part of the code not using dma_trace_initialized() would not and would crash. So this is safer and costs practically nothing.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's NULL here, why not just print a meaningful description like "not available" or something?

Copy link
Collaborator Author

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the very thorough review! I will apply all the other, unanswered suggestions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This non-zero value is useful to make the difference with platform_timer_get(timer) returning zero (for good or bad reasons). When everyone and everything returns zero including buggy code (but not just), then there's no way to tell where it came from.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dma_trace_initialized() would return false without it too anyway?

Yes but some part of the code not using dma_trace_initialized() would not and would crash. So this is safer and costs practically nothing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because packet is a char []. I'll make the first argument of put_header() a void *, that requires no other change anywhere, not even in the put_header() code.

marc-hb added 5 commits June 28, 2021 23:42
Zephyr's printk does not work like Linuxs printk

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Found by chance while looking for something else.

This makes no difference right now because we checked the "length" in
the line immediately before but let's do the right thing anyway, at the
very least it's more readable.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Returns a small constant instead of crashing platform_timer_get() when
timer is NULL.

This is required to restore early tracing before sof->timer is
initialized.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Direct access to mbox shared memory logging when DMA tracing is
either not initialized yet or disabled or found broken for any
reason.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Replace some tr_err() too.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
... down from ERROR. It was never an error, this was a hack to be
duplicated to both trace and etrace. This will make it easier to search
for actual errors.

Also make the banners slightly different between etrace and trace thanks
to new SHM: and DMA: prefixes. This makes identification of log files
easier.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Copy link
Collaborator Author

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://sof-ci.01.org/sofpr/PR4389/build9518/devicetest and everything else for that commit is green.

@lgirdwood lgirdwood merged commit 02c21c2 into thesofproject:main Jun 29, 2021
@marc-hb marc-hb deleted the mtrace-printf branch June 29, 2021 23:19
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 3, 2021

The addition of mtrace_printf() broke !CONFIG_TRACE, fixed in #4447

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants