Skip to content

Conversation

@patgarz
Copy link
Contributor

@patgarz patgarz commented Sep 3, 2023

CC: @vincbeck as owner of AIP-56

#32217 implemented the base and FAB user management interfaces. As a part of that PR, get_user_name was set to be FirstName LastName for FAB.

This is a poor choice. A username should be both invariant and unique; FirstName and LastName are self-configurable by all user roles by default (aside from Public) in the Airflow UI, and for users who configure a dirsync SSO to FAB and do not give permission to Edit My Profile, the user's display name is commonly self-configurable in those systems as well. Further, any sufficiently large system will eventually encounter a name clash between two concurrent users, essentially leaving a username (identifier) clash. As a result, in today's version of the code:

  • In the places where get_user_name is implemented (for example the Audit Log), the Audit Log cannot be used to establish an event chain if a user decides to change their name (which they can do by default without the aid of an administrator)
  • If two users share a First Name and Last Name, the Audit Log likewise cannot be used to identify the responsible user
  • In the worst case scenario, a malicious user could affect the ability for a security team to identify a chain of events by disguising themselves as a legitimate user, or by rapidly changing their name in between actions (again, a permission that does not require administrator intervention)

However, the user's display name is still an important and useful tool to provide a better UX. As a result, we fix this by performing the following changes:

  1. Add a new abstract method, auth_manager.get_user_display_name to augment the original API as proposed in AIP-56
  2. Change get_user_name to have fallback values available for backwards compatibility, but choose values that are sufficiently unique and invariant (email is not a perfect choice but it is better than display name)
  3. Implement get_user_display_name as the original FirstName LastName and migrate UI elements to use this new method instead of get_user_name.
  4. To fix the resultant regression of Add fullname to owner on logging #30185, a new field friendly_owner is added to the Log table which stores the user's Display Name for easier human-based audits of actions. (NOTE: This may be something to split out into a separate PR, which can be bundled with a minor version update. This is included in a separate commit to aid in this.)

This has a few implications:

  1. Audit logs now read the user's username (or email, or DB ID) instead of the user's full name to provide proper accountability. The owner field is also single-valued, which makes for easier queries and cross-analysis with other log sources.
  2. Though I don't see any place immediately this would happen, it's not unlikely that get_auth_manager.get_user_name will make its way into actual security code eventually. If it does, this type of clash leads into a simple and very dangerous privilege escalation.
  3. User display name is properly segmented out in the base class so that future developers of their own auth extensions can show the proper UI elements for a user's display name (instead of showing everything as the user identifier).

Additional ref: https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-56+Extensible+user+management


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues kind:documentation labels Sep 3, 2023
@patgarz patgarz force-pushed the fix-fab-username-accountability branch 2 times, most recently from 845fafb to 6010016 Compare September 3, 2023 15:13
Comment on lines 79 to 84
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work, I think

Suggested change
if user.username:
return user.username
elif user.email:
return user.email
else:
return self.get_user_id()
return user.username or user.email or self.get_user_id()

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that introducing 2 APIs instead of one make sense. They serve 2 different purpose. Thanks for doing it

@vincbeck vincbeck added the AIP-56 Extensible user management label Sep 5, 2023
@patgarz patgarz force-pushed the fix-fab-username-accountability branch from 6010016 to b181f97 Compare September 5, 2023 23:09
Copy link
Member

@uranusjr uranusjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending CI

@vincbeck
Copy link
Contributor

vincbeck commented Sep 6, 2023

Some tests are failing

@patgarz patgarz force-pushed the fix-fab-username-accountability branch from b181f97 to 75973c2 Compare September 8, 2023 02:18
Copy link
Member

@uranusjr uranusjr Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it’s a good idea to introduce None (a different type) here; either an empty string or "Anonymous" would be easier for consuming processes to handle.

Copy link
Contributor Author

@patgarz patgarz Sep 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, good catch. get_user_display_name returns an empty string when inputs are empty so this should do the same.

@patgarz patgarz force-pushed the fix-fab-username-accountability branch from 75973c2 to db6fd8a Compare September 9, 2023 16:22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this field ever NULL? Can we make the default an empty string instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, set this to be serverside "" and rebased.

@patgarz patgarz force-pushed the fix-fab-username-accountability branch from db6fd8a to 68efd6a Compare September 12, 2023 21:54
@patgarz
Copy link
Contributor Author

patgarz commented Sep 13, 2023

Looks like I messed up the rebase 😢 will fix

@patgarz patgarz force-pushed the fix-fab-username-accountability branch from 68efd6a to 81ae019 Compare September 13, 2023 04:30
@patgarz
Copy link
Contributor Author

patgarz commented Sep 13, 2023

Autogenerated migration elements should be fixed, but I'm a little clueless when it comes to debugging the specifics of why the MSSQL downgrade test is failing. Any advice @uranusjr ?

pyodbc.ProgrammingError: ('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Incorrect syntax near 'constraint'. (102) (SQLExecDirectW)")

@uranusjr
Copy link
Member

uranusjr commented Sep 13, 2023

MSSQL really doesn’t like server_default. If you can’t get this to work, try doing this instead:

a. Use default="" and nullable=False in the model
b. Use a three-step strategy in migration:
i. First add a column without default and with nullable=True
ii. Run an UPDATE to fill the column with an empty string
iii. Alter the column to set nullable=False and default=""
c. The downgrade should be simply a drop_column without any special parameters

@patgarz
Copy link
Contributor Author

patgarz commented Sep 15, 2023

@uranusjr After some review, testing, and thinking on it, I think it may be a bad idea to set a default value here on owner_display_name in isolation.

The Log table does not currently have any properties set aside from primary_key on the id column: https://github.com/apache/airflow/blob/main/airflow/models/log.py#L32-L40. Setting a default value for one column (when others are frequently NULL, and even owner itself has the possibility of being null even though it shouldn't ever be) is inconsistent and raises more questions than it solves, in my opinion.

Second, the Log table isn't responsible for any application logic; it's simply a friendlier-than-raw-text table to log user & system actions. Setting default or required values goes against that spirit as we want to log exactly the action that was taken with it associated data, and not munge that data as that makes troubleshooting (and auditing generally) harder. (Obviously, an empty string rather than NULL is fairly safe in owner_display_name, but I don't know if that holds for other columns, and it does make queries harder from a consistency standpoint.)

If we wanted to consider a more substantive rewrite of the Log model such that it did have default and nullable specifiers on applicable columns, I think this could be a worthwhile change. But I think that deserves a separate PR (that I'm happy to put together, though I'm not sure it solves more than it possibly breaks since the Log table isn't responsible for any application logic and is really just a friendlier mechanism when compared to a raw text log).

As such I think the current state of the PR is the desired state (assuming tests pass). Let me know your thoughts.

@vincbeck vincbeck merged commit ad79f52 into apache:main Sep 15, 2023
@ephraimbuddy ephraimbuddy added the changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) label Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AIP-56 Extensible user management area:UI Related to UI/UX. For Frontend Developers. area:webserver Webserver related Issues changelog:skip Changes that should be skipped from the changelog (CI, tests, etc..) kind:documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants