Skip to content

Conversation

@wesm
Copy link
Member

@wesm wesm commented Jun 22, 2020

This uses pandas to generate a sorted text table when using archery benchmark diff. Example:

#7506 (comment)

There's some other incidental changes

  • pandas is required for archery benchmark diff. I don't think there's value in reimplementing the stuff that pandas can do in a few lines of code (read JSON, create a sorted table and print it nicely for us).
  • The default # of benchmarks repetitions has been changed from 10 to 1 (see ARROW-9155 for context). IMHO more interactive benchmark results is more useful than higher precision. If you need higher precision you can pass --repetitions=10 on the command line
  • archery benchmark was building the unit tests unnecessarily. This also occluded a bug ARROW-9209, which is fixed here

@wesm
Copy link
Member Author

wesm commented Jun 22, 2020

@kszucs can you assist me with adapting ursabot for these changes? I think we can use pandas's DataFrame.to_html to create a colorized table for GitHub, too https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html

Changes that would be good to have in ursabot benchmark:

  • Pass through --cc and --cxx options
  • Pass through --repetitions

@github-actions
Copy link

@pitrou
Copy link
Member

pitrou commented Jun 22, 2020

Just a small question: why are m and b used for millions and billions, respectively? (I would probably expect M and G)

@kszucs
Copy link
Member

kszucs commented Jun 22, 2020

@kszucs can you assist me with adapting ursabot for these changes?

Sure.

I think we can use pandas's DataFrame.to_html to create a colorized table for GitHub, too https://pandas.pydata.org/pandas-docs/stable/user_guide/style.html

I'm afraid this is not going to work, because we can't embed any CSS into the comment, this is why we generate the ursabot responses as diffs.

Changes that would be good to have in ursabot benchmark:

  • Pass through --cc and --cxx options
  • Pass through --repetitions

@fsaintjacques
Copy link
Contributor

ursabot uses tabulate which I think is smaller dependencies.

@wesm
Copy link
Member Author

wesm commented Jun 22, 2020

I’m sort of -1 on using anything but pandas for data munging and data presentation in our tooling. It’s not a very large dependency and has everything we need. FWIW, the current Ursabot output doesn't even sort the results, which is really needed to easily make sense of what got faster or slower at a glance.

@kszucs
Copy link
Member

kszucs commented Jun 22, 2020

Using pandas is not a problem, but the results cannot be improved much other than sorting the table.

@wesm
Copy link
Member Author

wesm commented Jun 23, 2020

I improved the output to show the state.counters stuff

                  benchmark         baseline        contender  change %                                    counters
40            UniqueInt64/5    6.442 GiB/sec   18.346 GiB/sec   184.782  {'iterations': 145, 'null_percent': 100.0}
0            UniqueInt64/11    6.500 GiB/sec   18.364 GiB/sec   182.522  {'iterations': 145, 'null_percent': 100.0}
11            UniqueUInt8/5  812.047 MiB/sec    1.755 GiB/sec   121.298  {'iterations': 142, 'null_percent': 100.0}
7             UniqueUInt8/1  683.943 MiB/sec    1.253 GiB/sec    87.593    {'iterations': 117, 'null_percent': 0.1}
38            UniqueUInt8/4  762.983 MiB/sec  950.521 MiB/sec    24.580   {'iterations': 133, 'null_percent': 99.0}
29            UniqueUInt8/2  659.082 MiB/sec  820.410 MiB/sec    24.478    {'iterations': 114, 'null_percent': 1.0}
5             UniqueInt64/1    2.656 GiB/sec    3.300 GiB/sec    24.223     {'iterations': 60, 'null_percent': 0.1}
32            UniqueInt64/4    5.627 GiB/sec    6.772 GiB/sec    20.349   {'iterations': 119, 'null_percent': 99.0}
25           UniqueInt64/10    5.234 GiB/sec    6.294 GiB/sec    20.254   {'iterations': 110, 'null_percent': 99.0}
39  UniqueString100bytes/11   26.815 GiB/sec   31.122 GiB/sec    16.061   {'iterations': 48, 'null_percent': 100.0}
23    UniqueString10bytes/5    2.691 GiB/sec    3.113 GiB/sec    15.667   {'iterations': 48, 'null_percent': 100.0}
34   UniqueString100bytes/5   26.944 GiB/sec   31.015 GiB/sec    15.108   {'iterations': 48, 'null_percent': 100.0}
6    UniqueString10bytes/11    2.699 GiB/sec    3.096 GiB/sec    14.721   {'iterations': 49, 'null_percent': 100.0}
21   UniqueString100bytes/7    1.947 GiB/sec    2.217 GiB/sec    13.866      {'iterations': 3, 'null_percent': 0.1}
28            UniqueInt64/2    2.622 GiB/sec    2.904 GiB/sec    10.770     {'iterations': 59, 'null_percent': 1.0}
13            UniqueInt64/3    2.157 GiB/sec    2.343 GiB/sec     8.644    {'iterations': 48, 'null_percent': 10.0}
33   UniqueString100bytes/4   24.286 GiB/sec   26.030 GiB/sec     7.181    {'iterations': 43, 'null_percent': 99.0}
22            UniqueInt64/7    2.542 GiB/sec    2.707 GiB/sec     6.497     {'iterations': 56, 'null_percent': 0.1}
20  UniqueString100bytes/10   22.536 GiB/sec   23.985 GiB/sec     6.432    {'iterations': 40, 'null_percent': 99.0}
35    UniqueString10bytes/1  788.817 MiB/sec  836.008 MiB/sec     5.983     {'iterations': 14, 'null_percent': 0.1}
17    UniqueString10bytes/7  592.671 MiB/sec  628.054 MiB/sec     5.970     {'iterations': 10, 'null_percent': 0.1}
3     UniqueString10bytes/4    2.515 GiB/sec    2.658 GiB/sec     5.687    {'iterations': 45, 'null_percent': 99.0}
19   UniqueString10bytes/10    2.402 GiB/sec    2.529 GiB/sec     5.269    {'iterations': 42, 'null_percent': 99.0}
9    UniqueString100bytes/1    3.929 GiB/sec    4.077 GiB/sec     3.762      {'iterations': 7, 'null_percent': 0.1}
30    UniqueString10bytes/8  593.560 MiB/sec  610.253 MiB/sec     2.812     {'iterations': 10, 'null_percent': 1.0}
12    UniqueString10bytes/2  788.505 MiB/sec  808.396 MiB/sec     2.523     {'iterations': 14, 'null_percent': 1.0}
37   UniqueString100bytes/8    1.965 GiB/sec    1.998 GiB/sec     1.697      {'iterations': 3, 'null_percent': 1.0}
1    UniqueString100bytes/2    3.984 GiB/sec    4.025 GiB/sec     1.028      {'iterations': 7, 'null_percent': 1.0}
36   UniqueString100bytes/3    4.262 GiB/sec    4.293 GiB/sec     0.725     {'iterations': 8, 'null_percent': 10.0}
8     BuildStringDictionary   85.507 MiB/sec   85.687 MiB/sec     0.211                         {'iterations': 198}
16   UniqueString100bytes/9    2.121 GiB/sec    2.111 GiB/sec    -0.469     {'iterations': 4, 'null_percent': 10.0}
4    UniqueString100bytes/6    2.056 GiB/sec    2.043 GiB/sec    -0.626      {'iterations': 4, 'null_percent': 0.0}
10            UniqueUInt8/3  453.281 MiB/sec  448.407 MiB/sec    -1.075    {'iterations': 79, 'null_percent': 10.0}
14   UniqueString100bytes/0    4.100 GiB/sec    4.055 GiB/sec    -1.089      {'iterations': 7, 'null_percent': 0.0}
24            UniqueInt64/8    2.473 GiB/sec    2.443 GiB/sec    -1.202     {'iterations': 55, 'null_percent': 1.0}
26    UniqueString10bytes/9  615.880 MiB/sec  608.453 MiB/sec    -1.206    {'iterations': 11, 'null_percent': 10.0}
42    UniqueString10bytes/6  651.430 MiB/sec  640.128 MiB/sec    -1.735     {'iterations': 11, 'null_percent': 0.0}
27            UniqueUInt8/0    1.775 GiB/sec    1.738 GiB/sec    -2.063    {'iterations': 318, 'null_percent': 0.0}
31            UniqueInt64/9    2.076 GiB/sec    2.033 GiB/sec    -2.067    {'iterations': 46, 'null_percent': 10.0}
15          BuildDictionary    1.535 GiB/sec    1.503 GiB/sec    -2.079                         {'iterations': 277}
41            UniqueInt64/0    3.915 GiB/sec    3.827 GiB/sec    -2.262     {'iterations': 87, 'null_percent': 0.0}
43    UniqueString10bytes/3  802.729 MiB/sec  784.279 MiB/sec    -2.298    {'iterations': 14, 'null_percent': 10.0}
18            UniqueInt64/6    3.284 GiB/sec    3.178 GiB/sec    -3.229     {'iterations': 72, 'null_percent': 0.0}
2     UniqueString10bytes/0  895.983 MiB/sec  849.150 MiB/sec    -5.227     {'iterations': 16, 'null_percent': 0.0}

@wesm
Copy link
Member Author

wesm commented Jun 23, 2020

+1. The bot changes can't be done here so going to go ahead and merge this so I can use it more easily without having to switch branches (to use this branch) before running benchmarks

@wesm wesm closed this in 0d1b2f8 Jun 23, 2020
@wesm wesm deleted the ARROW-9201 branch June 23, 2020 13:52
@kszucs
Copy link
Member

kszucs commented Jun 23, 2020

I’m going to update the bot tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants