Skip to content

Conversation

@dgoodwin
Copy link
Contributor

@dgoodwin dgoodwin commented Nov 15, 2022

TRT-665

Begin extracting a little metadata from test flakes and failures on very specific set of backstop tests when we're importing job results. This will allow us to track trends in what alerts and pathological events specifically are causing problems over time.

Result is stored in a jsonb column:

postgres=# select rt.test_id, rt.status, md.metadata from prow_job_run_test_output_metadata md, prow_job_run_test_outputs o, prow_job_run_tests rt where md.prow_job_run_test_output_id = o.id and o.prow_job_run_test_id = rt.id limit 1;
 test_id | status |                                          metadata                                           
---------+--------+---------------------------------------------------------------------------------------------
   11497 |     12 | {"alert": "TargetDown", "state": "fired", "namespace": "openshift-machine-config-operator"}
(1 row)

And can be queried and grouped by normally:

SELECT                                                                                                                                                                                                                                 
    t.id as "test_id",                                                                                                                                                                                                                 
    md.metadata,                                                                                                                                                                                                                       
    count(md.metadata)                                                                                                                                                                                                                 
FROM                                                                                                                                                                                                                                   
    prow_job_run_test_output_metadata md,                                                                                                                                                                                              
        prow_job_run_test_outputs o,                                                                                                                                                                                                   
        prow_job_run_tests rt,                                                                                                                                                                                                         
        prow_job_runs r,                                                                                                                                                                                                               
        prow_jobs j,                                                                                                                                                                                                                   
        tests t                                                                                                                                                                                                                        
WHERE                                                                                                                                                                                                                                  
        md.prow_job_run_test_output_id = o.id AND                                                                                                                                                                                      
        o.prow_job_run_test_id = rt.id AND                                                                                                                                                                                             
        rt.status = 12 AND                                                                                                                                                                                                             
        rt.prow_job_run_id = r.id AND                                                                                                                                                                                                  
        r.prow_job_id = j.id AND                                                                                                                                                                                                       
        rt.test_id = t.id AND                                                                                                                                                                                                          
                t.name LIKE '%pathological%'                                                                                                                                                                                           
GROUP BY                                                                                                                                                                                                                               
    t.id, md.metadata                                                                                                                                                                                                                  
ORDER BY                                                                                                                                                                                                                               
    count DESC;                                                                                                                                                                                                                        

@openshift-ci openshift-ci bot requested review from bparees and deads2k November 15, 2022 18:37
@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 15, 2022
@dgoodwin dgoodwin changed the title extract test failure tags Extract metadata from flake/failure test output for certain backstop tests Nov 15, 2022
if i == 0 {
continue
}
//if results[regex.SubexpNames()[i]] != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the linter objects to this line...

}
//if results[regex.SubexpNames()[i]] != "" {
results[regex.SubexpNames()[i]] = name
//}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and likely this one too. I guess either add spaces or just pull em out if you don't need them.

for _, re := range regexes {
matchMaps := findAllNamedMatches(re, line)

fmt.Printf("%v\n", matchMaps)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left over debugging?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, thanks!


type ProwJobRunTestOutputMetadata struct {
gorm.Model
ProwJobRunTestOutputID uint
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about an index on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

@stbenjam
Copy link
Member

Were you able to do JSONB queries? Something like SELECT * FROM prow_job_run_test_output_metadata WHERE metadata @> '{"namespace": "openshift-machine-config-operator"}';

@dgoodwin
Copy link
Contributor Author

Were you able to do JSONB queries? Something like SELECT * FROM prow_job_run_test_output_metadata WHERE metadata @> '{"namespace": "openshift-machine-config-operator"}';

I have not tried with gorm yet just in psql.

@dgoodwin
Copy link
Contributor Author

/hold

This needs more work.

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 16, 2022
@dgoodwin
Copy link
Contributor Author

Updated, I took a new approach of regex + string tokens, I want a flexible "parse this tag if it's there" approach (tag meaning ns/foobar or ns=foobar), and regexes seem quite poor at this, so now if the simple regex for things that should always match if we're interested in a line hits, we'll proceed to token parsing and pull those out too, whatever we can find.

Some of this will pickup new values coming in openshift/origin#27559

I think this is ready for another review.

/hold cancel

@openshift-ci openshift-ci bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 17, 2022
for _, m := range extractedMetadata {
jsonb := pgtype.JSONB{}
if err := jsonb.Set(m); err != nil {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we log a warning instead? I don't know if we'll really ever panic on jsonb.Set but the places where we panic during syncing risks data loss since it'll stop syncing and any further tasks.

@stbenjam
Copy link
Member

Looks good as best I can tell, the text processing is a little complex. I really have just one comment about the panic above.

I think I would prefer is openshift-tests just returned serializable failure output (YAML?) for tests we want to store metadata about but we can tackle something like that later if we want to do it.

@dgoodwin
Copy link
Contributor Author

Logging an error instead of panic added.

The json output occurred to me too but I couldn't see an easy way to do it. We'd need to either find a way to parse json out of test output, or correlate different output files with each junit file and figure out how to import them at different times. It would be a cleaner way to import for sure if we could figure out how best to do it. Maybe we could add an additional tag to the junit and parse it out somehow.

@dgoodwin dgoodwin force-pushed the extract-test-failure-tags branch from cbc29b6 to 4ae2c64 Compare November 21, 2022 13:39
@stbenjam
Copy link
Member

The json output occurred to me too but I couldn't see an easy way to do it. We'd need to either find a way to parse json out of test output, or correlate different output files with each junit file and figure out how to import them at different times. It would be a cleaner way to import for sure if we could figure out how best to do it. Maybe we could add an additional tag to the junit and parse it out somehow.

Ah I was just thinking sippy would know which tests it expects might have YAML output and try to unmarshal it (and fail quietly if it can't)

@stbenjam
Copy link
Member

/lgtm

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Nov 21, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 21, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dgoodwin, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 21, 2022

@dgoodwin: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@openshift-merge-robot openshift-merge-robot merged commit a04a372 into openshift:master Nov 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants