-
-
Notifications
You must be signed in to change notification settings - Fork 35
WIP: review the overall structure #8
Conversation
bookserver/schemas.py
Outdated
| class Config: | ||
| orm_mode = True | ||
| # this tells pydantic to try read anything with attributes we give it as a model | ||
| # TODO: This doesn't make sense to me -- we're not using an ORM. What does this do? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't recall. It may be cruft some something I was experimenting with. I don't see anywhere in our code where we are using this Class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest removing it -- shall I do so?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little more research reminded me that this is how you configure Pydantic. See: https://pydantic-docs.helpmanual.io/usage/models/#orm-mode-aka-arbitrary-class-instances I'm not sure we want to remove it. I think a little better understanding of how we are using Pydantic will make the decision.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that page, but don't see how it's relevant to what we're doing. My understanding of the approach:
- Given a sqlalchemy table, auto-generate a Pydantic schema for it.
- From an endpoint, put data from Ajax into the validator and check it before writing to the db. The validator will also handle conversion from string to other data types if this came as query params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, we are not doing 1 yet. But I think we agree on that as the direction.
For 2, I think the other point is that the Schema can define what we get from the client as well as validate. Right now in web2py you really have no way of knowing what data might be in request.vars. Using one of these schema objects tells you immediately what you get. But I think that means that we will end up creating additional schema objects that do not have a one to one mapping with the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree -- creating additional schemas for things that don't map directly to a db table makes sense. The lack of any kind of validation in web2py code produced a lot of bugs/problems.
I mainly wanted to avoid duplicating schemas when the sqlalchemy Table structure already has that data (keep the code DRY and avoid mistakes where one place is updated but the other isn't.)
|
I'd like to sketch out an overall documentation approach with you -- I'll create a page on the Wiki for now as a jointly-edited space. Let's plan how the docs should look there, then move that structure to the code once we agree on it. |
|
I wrote a first pass at the outline at https://github.com/bnmnetp/BookServer/wiki/Overall-documentation-structure. Would you review and edit this? |
| rslogger.debug(f"Going to fetch {user_id}") | ||
| user = await fetch_user(user_id) | ||
| if user: | ||
| # TODO: I don't understand -- why do this here? Are we validating an existing user object? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue this example raises is whether we should be returning database objects or whether we should be returning pydantic objects. Either will work of course, but I think we should decide what to work with and then be consistent.
Using Pydantic objects everywhere except what we get immediately after a query provides some separation from the Database representation. But that may be overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see what benefit we get from separation; the db data has already been validated / put in a known format, so that seems like what we need with additional steps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One benefit we do get is that a Pydantic object is mypy-compatible as well as supporting autocomplete in some editors. Database results remain opaque. But that may not be enough of a benefit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to read up more on how FastAPI integrates with sqlalchemy. I'd also like to see an example of validating HTTP parameters, then storing that in the db, to see how the code ends up looking. That will help shape the "right" way to do things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in routers/rslogging.py the log_book_event does just that. As well as routers/assessment.py and the get_assessment_results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll take a look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the way these schemas work -- FastAPI is really nice. You did a great job finding a really solid framework!
d45a4e4 to
c1e5a6e
Compare
|
I spent some time digging through SQLAlchemy docs last night, and I think it's best to stay with the Core, instead of using the ORM:
Would you be willing to switch back to Core instead of using the ORM? |
|
I updated https://github.com/bnmnetp/BookServer/wiki/Overall-documentation-structure to add user docs and developer docs sections. Feedback would be great! |
|
Regarding Core vs ORM -- I didn't change the queries at all. I did change the inserts, but that was trivial. The only messing around I had to do was to get the async stuff right. My understanding -- which seems to have played out -- is that 1.4 / 2.0 are unifying the Core vs ORM apis. One thing I wanted to experiment with today was to see if the second Other than a day and a half of work I'm not highly invested in either Core or ORM. But before we decide to switch back I think its worth looking at whether the automated mapping to Pydantic will even work with tables. |
|
Sounds good. I'll write up the mapper and test on tables, then let you know. |
|
The nested |
bjones1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat, thanks for checking. That's a good step forward toward simpler code!
|
I just pushed code with the auto Pydantic model generation from a Table, not a declarative base. (The code supports both styles, though.) If you're fine, let's move to Core instead of ORM / declarative. |
|
I'm getting a flake8 failure that |
|
Also, this PR is now pretty much what I want it to be -- a good first pass through the code. What other changes would you like to see in it? When you approve, I can squash this down to a nicer set of commits. |
|
You can comment out line 57 for now.... None of that code in the try block will work until I make further progress. I don't know why flake8 isn't complaining for me.... It clearly should be. So.. given that we do not need the double layer of One thing that I actually like about ORM is that it does simplify the creation of a new thing very easy. I want to look a bit more at the insert-or-update scenario, but that seems simpler to me also. I think we should merge this PR before switching back if that is the final decision. |
|
I now have mypy checks passing, plus fixed a fair amount of actual problems that mypy found, |
|
Doing a little more research on ORM vs Core.... I want to make the best decision using the 1.4 / 2.0 future as our reference... I'm certainly not a SQLAlchemy expert. The select interface is unified and the preferred interface for 2.0 -- I think this is exactly what you want as it is the same as it was in Core for previous versions. And again my queries that I have in this code which were written thinking we were going with Core did not need to be modified! I think the orm way of doing queries was its biggest drawback and now that is gone. I think the session.merge(Object) would save us a fair amount of work and be far less error prone in the insert-or-update use case. You modify the object and call merge on it and commit it. If the object does not exist (ie no id) then a new object is automatically created. I think many of the issues with the ORM come from over use of the relationships or misunderstanding of relationships which I have always found to be quite ugly. But this seems to be an area where we can do our own thing and not use that feature if we don't want to. I don't see us running into the implicit I/O scenario, but I've just read about it and maybe I'm not seeing it clearly. Do you have a specific example from Runestone code where you see a risk of that? Just a quick FYI -- The We will need to be much more mindful of doing commits with SQLAlchemy as we don't have the autocommit behavior of the tight integration with web2py and DAL which implicitly did commits after a successful api call. I think this is another reason why I like isolating as much of the database functionality to crud.py as we can. |
|
I am also not a SQLAlchemy expert -- I'm also reading about it. I've used it in the past, but much of that experience is not longer relevant. I'm also very excited about the new 1.4 / 2.0 approach where the Core and ORM are more unified. That's a big step forward. Here are some rambling thoughts on databases with Python: Design goals
So, my major design goal that's not clearly met with an ORM is to stay as close as possible to the underlying SQL. To me, the SQLAlchemy section on Data Manipulation with the ORM provides a great example of how these two are widely separated. For example, a select actually means an update followed by a select with a I agree and am excited about the new SQLAlchemy features that use Python well -- for example, context managers which commit when closing are fantastic! I also agree that using relationships takes us farther from the underlying SQL. As you say, by not including these relationships in the tables, we can avoid this problem entirely. A plus of the SQLAlchemy ORM is that I prefer that style of defining the tables vs. the Core style. So, I think I'm still looking for a mythical third option. However, of existing implementations, either SQLAlchemy Core or DAL are probably our best alternatives. It might also be helpful to look at example of the common cases. Here's code I like: async def fetch_last_answer_table_entry(query_data: schemas.AssessmentRequest):
assessment = EVENT2TABLE[query_data.event]
tbl = answer_tables[assessment]
query = (
select(tbl)
.where(
and_(
tbl.div_id == query_data.div_id,
tbl.course_name == query_data.course,
tbl.sid == query_data.sid,
)
)
.order_by(tbl.timestamp.desc())
)
async with async_session() as session:
res = await session.execute(query)
return res.scalars().first()Here's a great example of using programmatic queries to compose a query with an obvious SQL outcome, a clear select statement, etc. In contrast, I'm not sure what SQL this generates: async def create_answer_table_entry(log_entry: schemas.LogItem):
values = { # omitted
}
tbl = answer_tables[EVENT2TABLE[log_entry.event]]
new_entry = tbl(**values)
async with async_session() as session:
session.add(new_entry)
return new_entryI would prefer to write this as: async def create_answer_table_entry(log_entry: schemas.LogItem):
values = { # omitted
}
tbl = answer_tables[EVENT2TABLE[log_entry.event]]
async with async_session() as session:
session.exeute(insert(tb), [values])An upsert would be a more interesting case to look at. |
|
Thanks, I think our disagreement may come down to a missing design goal and how much priority we place on it. I want a tool like sqlalchemy to make me productive and reduce the amount of repetitive code I (and other less experienced developers) need to write. So the example you liked is possible with ORM or Core and unless I'm missing something meets all four of your criteria as well as my additional. I actually prefer the first version of For the upsert use case I prefer something like: This means I don't have to write a lot of error prone repetitive code that amounts to an If you really want to do an explicit insert you can still do that with ORM just by using I'd like to resolve this so we can get moving again. Any suggestions for resolving this decision? |
|
One more quick note. Just from an aesthetic point of view I really like that I don't have to say |
|
Hmm. In the end, we can't really know how this will all end up, but just make our best educated guesses. Let's go with the ORM and see what happens. |
|
I'd like to clean up this PR before merging it. What else would you like done here before the merge? |
|
I think this is fine as it is. We will continue to evolve. I did think of one experiment I want to run that could tip the scales for me. Performance if inserting new rows to useinfo and the xxx_answers tables is significantly slower in ORM than Core then that is a big factor. Our top three API calls are:
I will check both ways and see how performance compares. I also know the average response time on academy. |
|
Sounds good! |
|
Well, performance testing was pretty uninteresting.... I timed the insertion of 10,000 rows into the useinfo table. I started by testing if the extra I then reran tests on a postgresql database and was mildly surprised when they failed immediately on a ForeignKey violation... 🤣 yeah, sqlite does not enforce those. So postgresql was actually slightly slower in this test than sqlite! I had kind of expected ORM to lose out because of the object creation. But was surprised that there wasn't really any difference. See notes below. |
|
Neat, good data. I'm glad to know there's not much performance difference. |
|
Is this ready to merge or were you still going to squash commits?? |
|
Let me do some squashing first... |
Improve existing schemas.
|
OK, I squashed / cleaned the commits. |
I created this PR as a way to providing a glimpse at my progress in reviewing the server. I plan on squashing a lot of these commits before it becomes a final PR.