Skip to content

Parser that allows to load dumped model cards#2

Closed
BenjaminBossan wants to merge 36 commits intoalternative-model-card-implementationfrom
parse-model-cards-pandoc
Closed

Parser that allows to load dumped model cards#2
BenjaminBossan wants to merge 36 commits intoalternative-model-card-implementationfrom
parse-model-cards-pandoc

Conversation

@BenjaminBossan
Copy link
Copy Markdown
Owner

@BenjaminBossan BenjaminBossan commented Nov 30, 2022

As discussed here:

skops-dev#72 (comment)

@E-Aho @merveenoyan @adrinjalali I created this PR against my alternative-card-implementation PR since it build on it, thus pinging you here. Please let me know what you think of this approach and if we should continue with this feature.

Description

This feature adds a new function, skops.card.parse_modelcard. When passing it the path to a dumped model card, it parses it using pandoc and returns a Card object, which can be further modified by the user.

Example:

numpy as np
sklearn.linear_model import LinearRegression
skops.card import Card

X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.dot(X, np.array([1, 2])) + 3
regr = LinearRegression().fit(X, y)
card = Card(regr)
card.save("README.md")

# later, load the card again
skops.card import parse_card

parsed_card = parse_modelcard("README.md")

# continue editing the card
parsed_card.add(**{"My new section": "My new content"})

# overwrite old card with new one
parsed_card.save("README.md")

There is a test that shows that cards with tables and plots are already possible to be parsed, reproducing the original content 1:1, with the exception of line breaks. The latter should generally be conserved, but there are a few cases where there is one too many, but IMO it doesn't matter.

Comment

In the end, this turned out easier than I initially thought it would. The main difficulty are the data structures returned by the pandoc parser, for which I couldn't find any documentation. I guess Haskell code is just so self-documenting it's not needed.

For this reason (and out of laziness), there are probably quite a few edge cases that I haven't covered yet. Just as an example, when parsing tables, pandoc tells us how the columns are aligned. This information is currently completely discarded (we let tabulate choose the alignment). If we want to preserve the table alignment, we would need to make some changes.

I think, however, that it's fine not to support everything out of the box. We can incrementally add support for the rest as we collect more experience.

Implementation

This feature requires the alternative card implementation from skops-dev#203

pandoc is used for the following reasons:

  • widely used and thus battle tested
  • can read many other formats, not just markdown, so in theory, we should be able to read, e.g., rst model cards without modifying any code

The disadvantage is that pandoc is not a Python package, so users need to install it separately. But it is available on all common platforms.

For calling pandoc, I chose to shell out using subprocess. I think this should be fine but LMK if there is a better way.

There is a Python package that binds pandoc (https://github.com/boisgera/pandoc) but I don't think it's worth it for us to add it, just to avoid shelling out. The package seems to have low adoption and contains a bunch of stuff we don't need.

I chose to implement this feature such that the parser that generates the Card object should not have to know anything about markdown. Everything related to markdown is moved to a separate class in _markup.py which could be extended in the future to support other markup languages.

In an ideal world, we would not have to know anything about markdown at all. Instead the Card object should have methods (similar to what we already have for add_table etc.) that handles all special cases in a markup-language agnostic way. But in practice, this is far from being true. E.g. if a user wants to add bold text, there is no special method for it, so they would need to add raw markdown. The Card class is thus a leaky abstraction, which requires the parsing to directly deal with markdown.

TODOs

This PR is not finished. Remaining TODOs that come to mind:

  1. We need to merge the alternative card implementation
  2. Documentation has to be updated in several places
  3. Tests need to be more complex, right now only one Card instance is tested
  4. CI needs to install pandoc so that the tests are actually run
  5. There are some specifics here that won't work with all Python versions, like the use of TypedDict
  6. Clean up a couple of TODO comments and asserts

BenjaminBossan and others added 9 commits November 10, 2022 14:21
Solves skops-dev#96

Add possibility to instantiate a model card with a model passed as str or Path.

The model will be loaded from disk if accessed. For now, the model is not cached,
so passing the loaded model is still recommended.
* FEAT Audit before loading a skops file

* WIP

* numpy loaders

* scipy and src issue

* sklearn

* make tests pass

* remove pickle.py

* fix a few issues

* add missing files

* add get_untrusted_types and docs

* minor fix

* add more tests

* add a smoke test, failing though

* implement safety for functions

* add missing Tree children for audit

* add missing SparseMatrixNode children for audit

* tests pass

* remove safety tree code

* minor test

* fix ids in test

* move type ignore

* add more tests and some docs

* more comments

* fix recursive dump and get_untrusted_set

* Ben's comments

* address comments: sentinel, contextmanager, sorted

* move all children to the children attribute

* add complex pipeline test

* apply Ben's suggestions

* Card object should pass trusted to load

* Update skops/io/_dispatch.py

Co-authored-by: Erin Aho <erinaho@outlook.com>
As discussed here:

skops-dev#72 (comment)

Description

This feature adds a new function, skops.card.parse_modelcard. When
passing it the path to a dumped model card, it parses it using pandoc
and returns a Card object, which can be further modified by the user.

In the end, this turned out easier than I initially thought it would.
The main difficulty are the data structures returned by the pandoc
parser, for which I couldn't find any documentation. I guess Haskell
code is just self-documenting.

For this reason, there are probably quite a few edge cases that I
haven't covered yet. Just as an example, when parsing tables, pandoc
tells us how the columns are aligned. This information is currently
completely discarded (we let tabulate choose the alignment). If we want
to preserve the table alignment, we would need to make some changes

Implementation

This feature requires the alternative card implementation from skops-dev#203

pandoc is used for the following reasons:

- widely used and thus battle tested
- can read many other formats, not just markdown, so in theory, we
  should be able to read, e.g., rst model cards without modifying any
  code

The disadvantage is that pandoc is not a Python package, so users need
to install it separately. But it is available on all common platforms.

For calling pandoc, I chose to shell out using subprocess. I think this
should be fine but LMK if there is a better way.

There is a Python package that binds
pandoc (https://github.com/boisgera/pandoc) but I don't think it's worth
it for us to add it, just to avoid shelling out. The package seems to
have low adoption and contains a bunch of stuff we don't need.

I chose to implement this such that the parser that generates the Card
object should not have to know anything about Markdown. Everything
related to Markdown is moved to a separate class in _markup.py.

In an ideal world, we would not have to know anything about markdown
either. Instead the Card object shoud have methods (similar to what we
already have for add_plot etc.) that handles all of that. But in
practice, this is far from being true. E.g. if a user wants to add bold
text, there is no special method for it, so they would need to add raw
Markdown. The Card class is thus a leaky abstraction.

TODOs

This PR is not finished. Remaining TODOs that come to mind:

1. We need to merge the alternative card implementation
2. Documentation has to be updated in several places
3. Tests need to be more complex, right now only one Card is tested
4. CI needs to install pandoc so that the tests are actually run
5. There are some specifics here that won't work with all Python
   versions, like the use of TypedDict.
@BenjaminBossan BenjaminBossan marked this pull request as draft November 30, 2022 16:54
BenjaminBossan and others added 7 commits December 1, 2022 11:10
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
…-dev#215)

Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
* CI: Update version of GH checkout action

Version 4 should get rid of the deprecation warning:

The `set-output` command is deprecated and will be disabled soon. Please
upgrade to using Environment Files. For more information see:
https://github.blog/changelog/2022-10-11-github-actions-deprecating-save-state-and-set-output-commands/

* Oops, changed the wrong action's version
Most model cards have a yaml section at the top. It is now detached
before parsing with pandoc, then re-added afterwards.

Add a test with the model card from bert-base-uncased. It still fails
with some minor issues at the moment (most notably table column
alignment).
Now supports:

- Space
- Strong
- Emph
- Strikeout
- Subscript
- Superscript
- Plain
- Str
- RawInline
- RawBlock
- SoftBreak
- LineBreak
- Para
- Header
- Image
- CodeBlock
- Code
- Table
- Div
- Link
- BulletList
- Quoted
@BenjaminBossan
Copy link
Copy Markdown
Owner Author

I added support for the meta info yaml section of the start of model cards, as well as a bunch more markdown syntax. We now have:

  • Space
  • Strong
  • Emph
  • Strikeout
  • Subscript
  • Superscript
  • Plain
  • Str
  • RawInline
  • RawBlock
  • SoftBreak
  • LineBreak
  • Para
  • Header
  • Image
  • CodeBlock
  • Code
  • Table
  • Div
  • Link
  • BulletList
  • Quoted

The tests are extended to include the model card of the most popular model, bert-base-uncased. Probably there should be tests for the model cards of the top 5 models or so.

The test almost passes, there is still a diff for table alignment, trailing whitespace, some line breaks and indentation, but all the important content is replicated. Here is the complete diff:

< | Model | #params | Language |
< |------------------------|--------------------------------|-------|
< | [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) | 110M   | English |
< | [`bert-large-uncased`](https://huggingface.co/bert-large-uncased)              | 340M    | English | sub
< | [`bert-base-cased`](https://huggingface.co/bert-base-cased)        | 110M    | English |
< | [`bert-large-cased`](https://huggingface.co/bert-large-cased) | 340M    |  English |
< | [`bert-base-chinese`](https://huggingface.co/bert-base-chinese) | 110M    | Chinese |
< | [`bert-base-multilingual-cased`](https://huggingface.co/bert-base-multilingual-cased) | 110M | Multiple |
< | [`bert-large-uncased-whole-word-masking`](https://huggingface.co/bert-large-uncased-whole-word-masking) | 340M | English |
< | [`bert-large-cased-whole-word-masking`](https://huggingface.co/bert-large-cased-whole-word-masking) | 340M | English |
---
> | Model                                                                                                   | #params   | Language   |
> |---------------------------------------------------------------------------------------------------------|-----------|------------|
> | [`bert-base-uncased`](https://huggingface.co/bert-base-uncased)                                         | 110M      | English    |
> | [`bert-large-uncased`](https://huggingface.co/bert-large-uncased)                                       | 340M      | English    |
> | [`bert-base-cased`](https://huggingface.co/bert-base-cased)                                             | 110M      | English    |
> | [`bert-large-cased`](https://huggingface.co/bert-large-cased)                                           | 340M      | English    |
> | [`bert-base-chinese`](https://huggingface.co/bert-base-chinese)                                         | 110M      | Chinese    |
> | [`bert-base-multilingual-cased`](https://huggingface.co/bert-base-multilingual-cased)                   | 110M      | Multiple   |
> | [`bert-large-uncased-whole-word-masking`](https://huggingface.co/bert-large-uncased-whole-word-masking) | 340M      | English    |
> | [`bert-large-cased-whole-word-masking`](https://huggingface.co/bert-large-cased-whole-word-masking)     | 340M      | English    |
65c65
< You can use the raw model for either masked language modeling or next sentence prediction, but it's mostly intended to
---
> You can use the raw model for either masked language modeling or next sentence prediction, but it’s mostly intended to
197c197
< the other cases, it's another random sentence in the corpus. Note that what is considered a sentence here is a
---
> the other cases, it’s another random sentence in the corpus. Note that what is considered a sentence here is a
211c211
< used is Adam with a learning rate of 1e-4, \\(\beta_{1} = 0.9\\) and \\(\beta_{2} = 0.999\\), a weight decay of 0.01,
---
> used is Adam with a learning rate of 1e-4, \(\beta_{1} = 0.9\) and \(\beta_{2} = 0.999\), a weight decay of 0.01,
220,223c220,222
< | Task | MNLI-(m/mm) | QQP  | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE  | Average |
< |:----:|:-----------:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:|:-------:|
< |      | 84.6/83.4   | 71.2 | 90.5 | 93.5  | 52.1 | 85.8  | 88.9 | 66.4 | 79.6    |
< 
---
> | Task   | MNLI-(m/mm)   |   QQP |   QNLI |   SST-2 |   CoLA |   STS-B |   MRPC |   RTE |   Average |
> |--------|---------------|-------|--------|---------|--------|---------|--------|-------|-----------|
> |        | 84.6/83.4     |  71.2 |   90.5 |    93.5 |   52.1 |    85.8 |   88.9 |  66.4 |      79.6 |
248c247
< 	<img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">
---
> <img width="300px" src="https://cdn-media.huggingface.co/exbert/button.png">

The table alignment might not be possible to replicate exactly with tabulate.

IMO, this is already quite a good replication but it's hard to write a test that would allow this kind of diff to pass.

BenjaminBossan and others added 7 commits December 2, 2022 16:28
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
CI fix GH action deprecation coming from setup python step
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
I added 5 model cards from the top 10 most used models from the Hub (I
excluded cards that were too similar to one another). The tests were
rewritten so that they should now pass.

There are some limitations to the parser that results in the generated
cards not being 100% identical. Those limiations are now documented.
However, I don't believe that those limitations matter, as they make no
semantic difference but rather are stylistic or even invisible. The most
notable difference is the alignment of columns in tables.

The tests pass despite those differences because I rewrote them to
include a diff file for each model card. When the generated card is
compared to the original one, a diff is created and compared to the
checked in diff. This way, we have control over what diff we permit.

I had to exclude the folder containing the cards and diffs from the
pre-commit task "trailing-whitespace", as we need the trailing
whitespace in there.
Copy link
Copy Markdown

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty descent already!

Comment thread skops/card/_markup.py
markdown figure.

From the caller side, only the ``__call__`` method should be used, the rest
should be considered internals.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then doesn't it make sense for them to be _md_image/etc instead?

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll change that, wasn't sure how necessary that would be since it's in a private module. I'll also remove the md_ prefix, since the class already indicates that it's markdown.

@adrinjalali
Copy link
Copy Markdown

IMO, this is already quite a good replication but it's hard to write a test that would allow this kind of diff to pass.

So blue is what we generate? It looks better to me anyway. I'm happy with the results.

@BenjaminBossan
Copy link
Copy Markdown
Owner Author

So blue is what we generate? It looks better to me anyway. I'm happy with the results.

Green is what we generate, red is the original. Probably you mean it looks better because the width of the columns is aligned, which is true. Of course once rendered as html it doesn't matter, but when looking at the raw markdown, it looks indeed better.

@adrinjalali
Copy link
Copy Markdown

I'm colorblind, so for me GH shows green as blue and red as yellow-ish (I think).

Cool, then we're good. Let me know when you need a better review of this PR.

@BenjaminBossan
Copy link
Copy Markdown
Owner Author

Cool, then we're good. Let me know when you need a better review of this PR.

I'll try to finish the alternative model card PR today, once that's done I'll make this PR ready.

EdAbati and others added 12 commits December 12, 2022 10:52
Fixes skops-dev#223

There are other sklearn objects which we're not trusting and users will get an error for them and have to trust them. But this is a good start.
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
…ev#242)

Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Benjamin Bossan <BenjaminBossan@users.noreply.github.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Resolves skops-dev#226

The issue is resolved by adding support for bytes and bytearray
Description

The proposed model card implementation would allow to dynamically add
sections or overwrite them.

This is not a complete implementation but already covers most of the
features we already have and then some.

On top of these features, it would be possible to add more features like
creating a default Card with placeholders, just like the exisint
template, or the possibility to delete existing sections or to retrieve
the result of a certain section.

Implementation

The underlying data structure consists of a dict and a Section
dataclass.

All data is stored in a _data attribute with the type dict[str,
Section]. The dataclass hold the section contents, i.e. the section
title, the section content, and subsections, which again have the same
type. It's thus recursive data structure. Section title and dict key are
identical, which is mostly for convenience.

With this refactor, there are no separate data containers anymore for
eval results, template sections, extra sections, etc. They are all
treated the same.

IMHO, this greatly simplifies the code overall. The only complex
function that's left is the one needed to traverse the tree holding the
data, and even that is just 14 LOC.

Demo

To see how the new class can be used, take a look at the main function.
The resulting Card can be seen here:

https://huggingface.co/skops-ci/hf_hub_example-fcc0d6fe-d072-4f94-8fdb-6bf3bb917bca

* [WIP] Further align new model card design

Added a test that shows that the new card produces the same output as
the old card (except for a few non-deterministic parts). This includes
most of the idiosyncrasies of the old card we might want to change in
the future (e.g. inconsistent capitalization, use of empty lines). Some
of the more problematic behaviors of the old card class were, however,
fixed (e.g. creating an empty metrics table when there are no metrics).

The other tests have been reworked to use the new card features to make
them more precise. Often, that means that instead of having a very weak
test like "assert 'foo' in card.render()", it is now possible to select
the exact section and check that it equals the expected output.

This work is still unfinished, specifically it still lacks tests for the
card repr and for the newly added features.

* Make tests pass

Some refactoring to clean up things, rework repr, make repr tests pass.

* Add tests for new functionalities and docstrings

* Adjust tests to work with older sklearn versions

* Adjust examples to new card, more docs

Also some better type annotations.

* Continue fixing tests

* Try fixing Windows error by specifying encoding

* Adjust doctest: confusion matrix not stored in cwd

* Increase test coverage

* Try fixing test failure on Windows

* Replace old by new Card implementation

* Address reviewer comments

- Remove noise from docstring example
- Add the comma after model repr
- Add docstrings to private methods

* Add TODO notes for when Python 3.7 is dropped

* Add Hub model card template, add template arg

Users can now choose to use no template, skops template, hub template,
or their own template. Using their own template disables a lot of
prefilling (say, putting the model plot in the card) because we wouldn't
know where to put it. Users will need to call card.add for the otherwise
prefilled sections.

* Make _add_single return the Section

This can be useful, because otherwise it takes a bit of effort to
retrieve the latest section.

* Allow tables without rows to be added

It's ugly, but there is no technical reason from prohibiting the
addition of tables without rows. (Note, columns are still required).

This allows us to use TableSection for formatting the metrics, instead
of calling tabulate there directly. This is better, since we don't have
2 separate ways of creating metrics.

* Error when calling add_metric w/ invalid template

* Some amendments required after merging

Most notably, dynamic model loading in Cards was broken.

* A couple more changes:

1. Don't use Hub template for now
2. Document how to add new default templates in code
3. Nicer error message when using invalid template

* Add entry to changes.rst

* Fix small bug in generated code for loading

* Address reviewer comments

- Remove Python 3.7 compatibility code (requires skops-dev#246)
- Removed empty section in docs
- Don't add empty metrics table by default
- Remove option to select with a list of str (e.g. card.select(['Foo',
  'Bar'])) is no longer possible)
- Add option to chain selections using select method (e.g.
  card.select('Foo').select('Bar'))

* Adjust getting started code in model card

* Adjust docstrings

* Allow adding default sections with custom template

So far, if users had a custom template (including no template at all),
they would lose the possibility to add some default sections:

- metrics
- model plot
- hyperparamters
- getting started code

Now we expose methods to the users to add those sections with a simple
call (no need to manually format the content and use card.add). For this
to work, users have to indicate the desired section, since we would
otherwise not know where to put the content.

During the work on this, I also cleaned up the Card tests, which became
messier and messier over time. Related tests are now all contained in a
class, which makes it a little bit easier to see if for a certain
method, all test cases have been covered.
There was a change in pandoc, this now works with the latest pandoc
version.
For some reason, the order of items in the metainfo is no longer stable.
Therefore, the tests comparing the parsed card vs original card failed.

Now metainfo is excluded when comparing the cards. The metainfo is now
checked separately, in a way that disregards the order.
@BenjaminBossan
Copy link
Copy Markdown
Owner Author

Closing in favor of skops-dev#257

BenjaminBossan added a commit that referenced this pull request Feb 28, 2023
BenjaminBossan added a commit that referenced this pull request Apr 20, 2023
Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
fix missing token #2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants