Skip to content

Conversation

@marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Jul 9, 2024

This all started as an effort to solve test failures caused by "MAX"
volume. See #931 and others linked from there for more context.

Another obvious issue was the duplication of expect code across two
tests (check-pause-resume.sh and multiple-pause-resume.sh)

After a bit of time actually "testing the tests" I realized the original
author did not really understand expect or the problem. So I rewrote
the entire expect part. The list of previous issues is a bit too long
not to forget any but here are some:

  • The script could get "out of sync" with aplay and report that it was
    paused when it was actually resumed! And vice-versa.

  • De-duplication; obviously.

  • Moving to a separate script also solves the following problems:

    • Can be invoked, tested and debugged separately outside any shell script
    • Simplifies quoting
    • Unlocks editor features like syntax checking
  • Proper understanding and handling of newlines.

  • The expect script does not "sleep" anymore, which stops backpressuring
    and blocking the console output from aplay - with unknown side effects!

  • Adding log_user 0 makes the test logs readable at last

  • Add decent logging for easier maintenance

  • Logging timestamps demonstrate that the entire aproach is too slow for
    pause/resume cycles shorter than ~200 ms. Default values won't be
    changed yet but at least the problem is now obvious.

  • Handle "MAX" volume! Not an error yet because it happens really across
    the board (MAX does usualy not happen long enough to timeout and
    fail across the board) but the code is ready to upgrade the "MAX"
    warning to an ERROR with a one-line change.

  • Report EOF and timeouts differently.

  • Probably others.

cc:

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 9, 2024

@marc-hb marc-hb changed the title [TEST] Apause Add new Expect script tools/apause.exp Jul 9, 2024
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 9, 2024

NOCODEC timeouts in https://sof-ci.01.org/softestpr/PR1218/build597/devicetest/index.html :-( HDA is OK. Test just taking too long? More than 1h each time!

Same in https://sof-ci.01.org/softestpr/PR1218/build599/devicetest/index.html: only NOCODEC+multiple is timing out after 1+h, everything else is OK

Combinatorial explosion... already discussed in

  • internal issue # 40, 581,...

declare -a pipeline_idx_lst=([0]="0" [1]="1" [2]="2" [3]="3" [4]="4" [5]="5" [6]="6" [7]="7" [8]="8" [9]="9" [10]="10")

declare -a pipeline_combine_lst=([0]="0 1" [1]="0 2" [2]="0 3" [3]="0 4" [4]="0 5" [5]="0 6" [6]="0 7" [7]="0 8" [8]="0 9" [9]="0 10" [10]="1 2" [11]="1 3" [12]="1 4" [13]="1 5" [14]="1 6" [15]="1 7" [16]="1 8" [17]="1 9" [18]="1 10" [19]="2 3" [20]="2 4" [21]="2 5" [22]="2 6" [23]="2 7" [24]="2 8" [25]="2 9" [26]="2 10" [27]="3 4" [28]="3 5" [29]="3 6" [30]="3 7" [31]="3 8" [32]="3 9" [33]="3 10" [34]="4 5" [35]="4 6" [36]="4 7" [37]="4 8" [38]="4 9" [39]="4 10" [40]="5 6" [41]="5 7" [42]="5 8" [43]="5 9" [44]="5 10" [45]="6 7" [46]="6 8" [47]="6 9" [48]="6 10" [49]="7 8" [50]="7 9" [51]="7 10" [52]="8 9" [53]="8 10" [54]="9 10")

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 10, 2024

With the loop count reduced to 1 by default (also submitted separately at #1219), everything now passes:

https://sof-ci.01.org/softestpr/PR1218/build601/devicetest/index.html and
https://sof-ci.01.org/softestpr/PR1218/build602/devicetest/index.html and
https://sof-ci.01.org/softestpr/PR1218/build603/devicetest/index.html are all 100% green!

https://sof-ci.01.org/softestpr/PR1218/build604/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-pause-resume-playback-100 failed but it looks like an actual bug, not an issue with the test.

LNLM_SDW_AIOC failed because thesofproject/linux#5098

@marc-hb marc-hb marked this pull request as ready for review July 10, 2024 03:21
@marc-hb marc-hb requested a review from a team as a code owner July 10, 2024 03:21
So we can easily find other tools.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 11, 2024

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 11, 2024

Good example of MAX saturation timeout that would be addressed by this:

https://sof-ci.01.org/softestpr/PR1220/build633/devicetest/index.html?model=LNLM_RVP_HDA&testcase=check-pause-resume-capture-100

Copy link
Contributor

@kv2019i kv2019i left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Can you fixup the one bogus signed-off-by line?

marc-hb added 4 commits July 15, 2024 09:48
So we can invoke other tools easily.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
This all started as an effort to solve test failures caused by "MAX"
volume. See thesofproject#931 and others linked from there for more context.

Another obvious issue was the duplication of `expect` code across two
tests (check-pause-resume.sh and multiple-pause-resume.sh)

After a bit of time actually "testing the tests" I realized the original
author did not really understand `expect` or the problem. So I rewrote
the entire `expect` part. The list of previous issues is a bit too long
not to forget any but here are some:

- The script could get "out of sync" with aplay and report that it was
  paused when it was actually resumed! And vice-versa.

- De-duplication; obviously.

- Moving to a separate script also solves the following problems:
  - Can be invoked, tested and debugged separately outside any shell script
  - Simplifies quoting
  - Unlocks editor features like syntax checking

- Proper understanding and handling of newlines.

- The expect script does not "sleep" anymore, which stops backpressuring
  and blocking the console output from aplay - with unknown side effects!

- Adding `log_user 0` makes the test logs readable at last

- Add decent logging for easier maintenance

- Logging timestamps demonstrate that the entire aproach is too slow for
  pause/resume cycles shorter than ~200 ms. Default values won't be
  changed yet but at least the problem is now obvious.

- Handle "MAX" volume! Not an error yet because it happens really across
  the board (MAX does usualy not happen long enough to timeout and
  _fail_ across the board) but the code is ready to upgrade the "MAX"
  warning to an ERROR with a one-line change.

- Report EOF and timeouts differently.

- Probably others.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
See long commit message in previous commit adding apause.exp

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
See long commit message in previous commit adding apause.exp

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants