Extract metadata from flake/failure test output for certain backstop tests #685

dgoodwin · 2022-11-15T18:36:53Z

Begin extracting a little metadata from test flakes and failures on very specific set of backstop tests when we're importing job results. This will allow us to track trends in what alerts and pathological events specifically are causing problems over time.

Result is stored in a jsonb column:

postgres=# select rt.test_id, rt.status, md.metadata from prow_job_run_test_output_metadata md, prow_job_run_test_outputs o, prow_job_run_tests rt where md.prow_job_run_test_output_id = o.id and o.prow_job_run_test_id = rt.id limit 1;
 test_id | status |                                          metadata                                           
---------+--------+---------------------------------------------------------------------------------------------
   11497 |     12 | {"alert": "TargetDown", "state": "fired", "namespace": "openshift-machine-config-operator"}
(1 row)

And can be queried and grouped by normally:

SELECT                                                                                                                                                                                                                                 
    t.id as "test_id",                                                                                                                                                                                                                 
    md.metadata,                                                                                                                                                                                                                       
    count(md.metadata)                                                                                                                                                                                                                 
FROM                                                                                                                                                                                                                                   
    prow_job_run_test_output_metadata md,                                                                                                                                                                                              
        prow_job_run_test_outputs o,                                                                                                                                                                                                   
        prow_job_run_tests rt,                                                                                                                                                                                                         
        prow_job_runs r,                                                                                                                                                                                                               
        prow_jobs j,                                                                                                                                                                                                                   
        tests t                                                                                                                                                                                                                        
WHERE                                                                                                                                                                                                                                  
        md.prow_job_run_test_output_id = o.id AND                                                                                                                                                                                      
        o.prow_job_run_test_id = rt.id AND                                                                                                                                                                                             
        rt.status = 12 AND                                                                                                                                                                                                             
        rt.prow_job_run_id = r.id AND                                                                                                                                                                                                  
        r.prow_job_id = j.id AND                                                                                                                                                                                                       
        rt.test_id = t.id AND                                                                                                                                                                                                          
                t.name LIKE '%pathological%'                                                                                                                                                                                           
GROUP BY                                                                                                                                                                                                                               
    t.id, md.metadata                                                                                                                                                                                                                  
ORDER BY                                                                                                                                                                                                                               
    count DESC;

neisw · 2022-11-15T19:01:27Z

pkg/prowloader/testoutputmetadata.go

+			if i == 0 {
+				continue
+			}
+			//if results[regex.SubexpNames()[i]] != "" {


Looks like the linter objects to this line...

neisw · 2022-11-15T19:03:06Z

pkg/prowloader/testoutputmetadata.go

+			}
+			//if results[regex.SubexpNames()[i]] != "" {
+			results[regex.SubexpNames()[i]] = name
+			//}


and likely this one too. I guess either add spaces or just pull em out if you don't need them.

neisw · 2022-11-15T21:13:35Z

pkg/prowloader/testoutputmetadata.go

+		for _, re := range regexes {
+			matchMaps := findAllNamedMatches(re, line)
+
+			fmt.Printf("%v\n", matchMaps)


left over debugging?

Yup, thanks!

stbenjam · 2022-11-15T22:59:22Z

pkg/db/models/prow.go

+
+type ProwJobRunTestOutputMetadata struct {
+	gorm.Model
+	ProwJobRunTestOutputID uint


What do you think about an index on this?

stbenjam · 2022-11-15T23:03:32Z

Were you able to do JSONB queries? Something like SELECT * FROM prow_job_run_test_output_metadata WHERE metadata @> '{"namespace": "openshift-machine-config-operator"}';

dgoodwin · 2022-11-16T11:45:53Z

Were you able to do JSONB queries? Something like SELECT * FROM prow_job_run_test_output_metadata WHERE metadata @> '{"namespace": "openshift-machine-config-operator"}';

I have not tried with gorm yet just in psql.

dgoodwin · 2022-11-16T15:03:44Z

/hold

This needs more work.

…to match

…rt parsing

dgoodwin · 2022-11-17T14:34:25Z

Updated, I took a new approach of regex + string tokens, I want a flexible "parse this tag if it's there" approach (tag meaning ns/foobar or ns=foobar), and regexes seem quite poor at this, so now if the simple regex for things that should always match if we're interested in a line hits, we'll proceed to token parsing and pull those out too, whatever we can find.

Some of this will pickup new values coming in openshift/origin#27559

I think this is ready for another review.

/hold cancel

stbenjam · 2022-11-21T13:23:39Z

pkg/prowloader/prow.go

+				for _, m := range extractedMetadata {
+					jsonb := pgtype.JSONB{}
+					if err := jsonb.Set(m); err != nil {
+						panic(err)


Can we log a warning instead? I don't know if we'll really ever panic on jsonb.Set but the places where we panic during syncing risks data loss since it'll stop syncing and any further tasks.

stbenjam · 2022-11-21T13:26:38Z

Looks good as best I can tell, the text processing is a little complex. I really have just one comment about the panic above.

I think I would prefer is openshift-tests just returned serializable failure output (YAML?) for tests we want to store metadata about but we can tackle something like that later if we want to do it.

dgoodwin · 2022-11-21T13:38:05Z

Logging an error instead of panic added.

The json output occurred to me too but I couldn't see an easy way to do it. We'd need to either find a way to parse json out of test output, or correlate different output files with each junit file and figure out how to import them at different times. It would be a cleaner way to import for sure if we could figure out how best to do it. Maybe we could add an additional tag to the junit and parse it out somehow.

stbenjam · 2022-11-21T14:07:59Z

The json output occurred to me too but I couldn't see an easy way to do it. We'd need to either find a way to parse json out of test output, or correlate different output files with each junit file and figure out how to import them at different times. It would be a cleaner way to import for sure if we could figure out how best to do it. Maybe we could add an additional tag to the junit and parse it out somehow.

Ah I was just thinking sippy would know which tests it expects might have YAML output and try to unmarshal it (and fail quietly if it can't)

stbenjam · 2022-11-21T14:08:22Z

/lgtm

openshift-ci · 2022-11-21T14:10:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [dgoodwin,stbenjam]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci · 2022-11-21T14:43:58Z

@dgoodwin: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-ci bot requested review from bparees and deads2k November 15, 2022 18:37

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2022

dgoodwin changed the title ~~extract test failure tags~~ Extract metadata from flake/failure test output for certain backstop tests Nov 15, 2022

neisw reviewed Nov 15, 2022

View reviewed changes

stbenjam reviewed Nov 15, 2022

View reviewed changes

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 16, 2022

dgoodwin mentioned this pull request Nov 17, 2022

Flake and improve alert tests openshift/origin#27559

Merged

dgoodwin added 16 commits November 17, 2022 09:55

Extract alert names from upgrade test.

14de588

start extracting named groups from the regex

1cff4e5

switch to named params to generic maps

0ce9250

get upgrade alert regex working

61afca1

test no match for upgrade alerts regex

c3e3450

reuse alert regex for the conformance version

01c71ca

add test for same alert firing in two namespaces

949da73

imperfect attempt at matching pathological events

a29938c

fix pathological event parsing by going line by line and first regex …

31ed6d8

…to match

Extract metadata and store in jsonb column on import

de36782

fix some tests getting skipped due to suite inclusion issues

67ec67f

lint fix

dc7c7cc

add index for the metadata output id

8e60b1b

refactor to a regex + tokens approach for flexibility

3b9279c

add support for extracting result=allowed|failure and bug=url for ale…

a8f0289

…rt parsing

grab alert service, severity, reason while we're at it

3813fa7

openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 17, 2022

stbenjam reviewed Nov 21, 2022

View reviewed changes

dgoodwin added 2 commits November 21, 2022 09:39

lint fixes

aeee82d

Log error instead of panicing

4ae2c64

dgoodwin force-pushed the extract-test-failure-tags branch from cbc29b6 to 4ae2c64 Compare November 21, 2022 13:39

openshift-ci bot assigned stbenjam Nov 21, 2022

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 21, 2022

openshift-merge-robot merged commit a04a372 into openshift:master Nov 21, 2022

Extract metadata from flake/failure test output for certain backstop tests #685

Extract metadata from flake/failure test output for certain backstop tests #685

Uh oh!

Conversation

dgoodwin commented Nov 15, 2022 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neisw Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

neisw Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

neisw Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

dgoodwin Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

stbenjam Nov 15, 2022

Choose a reason for hiding this comment

Uh oh!

dgoodwin Nov 16, 2022

Choose a reason for hiding this comment

Uh oh!

stbenjam commented Nov 15, 2022

Uh oh!

dgoodwin commented Nov 16, 2022

Uh oh!

dgoodwin commented Nov 16, 2022

Uh oh!

dgoodwin commented Nov 17, 2022

Uh oh!

stbenjam Nov 21, 2022

Choose a reason for hiding this comment

Uh oh!

stbenjam commented Nov 21, 2022

Uh oh!

dgoodwin commented Nov 21, 2022

Uh oh!

stbenjam commented Nov 21, 2022

Uh oh!

stbenjam commented Nov 21, 2022

Uh oh!

openshift-ci bot commented Nov 21, 2022

Uh oh!

openshift-ci bot commented Nov 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dgoodwin commented Nov 15, 2022 •

edited by openshift-ci bot

Loading