Skip to content

Conversation

@regisb
Copy link
Contributor

@regisb regisb commented Aug 19, 2020

In this PR, I progressively document feature toggles one application at a time, starting at the top of a list of high-priority applications.

To detect feature toggles, I grep the application code as follows:

git grep --show-function -E "FEATURES|Waffle|Config|Switch" -- ./path/to/the/application

Although I generally have not too much trouble figuring out how feature flags are used, there are some annotation fields for which I do not quite understand how they should be used:

  1. .. toggle_use_cases: should be one or more of incremental_release, launch_date, monitored_rollout, graceful_degradation, beta_testing, vip, opt_out, opt_in, open_edx. I have no idea what these values mean, so I just blindly set this annotation to "open_edx".
  2. .. toggle_category: what should we use for this field? Should this be the name of the django app in which the feature toggle is defined (this is what I have done until now).

cc @robrap: no need for a thorough review right now. Maybe just check that I'm headed in the right direction?

I'll list here remaining questions I have about the feature toggles:

  • videos.enable_devstack_video_uploads: I did not quite understand the point of this feature toggle. My interpretation is that short-lived authentication tokens are better suited for the devstack, but I have trouble understanding why.
  • lms/djangoapps/grades/config/waffle.py: what are the use cases for these flags? (see this comment)

@openedx-webhooks openedx-webhooks added blended PR is managed through 2U's blended developmnt program needs triage labels Aug 19, 2020
@openedx-webhooks
Copy link

Thanks for the pull request, @regisb! I've created BLENDED-554 to keep track of it in Jira. More details are on the BD-21 project page.

When this pull request is ready, tag your edX technical lead.

@robrap
Copy link
Contributor

robrap commented Aug 19, 2020

@regisb: Some quick comments:

  1. Seems you like the """ style better than #. Can you comment here on your preference so we can determine a best practice.
  2. We still need to determine what the proper annotations are. I don't think we have it right quite yet.
  3. For use cases, see: https://open-edx-proposals.readthedocs.io/en/latest/oep-0017-bp-feature-toggles.html#use-cases This is meant to be multi-value (as it currently stands). I'm not convinced it is required, and couldn't just be part of the description. What I think we do want to know is something like "permanent" vs "temporary", which might be much simpler. But, @nasthagiri made the original recommendation, so she may want to weigh in on this.
  4. I would propose we drop category altogether. I think the path is good enough. As a future idea, we could potentially provide an additional index based on paths, or pull out the part of the path in the doc to make it stand out as a pseudo-category.

Other thoughts:

  • toggle_warnings should go away, and just be a part of description.
  • toggle_creation_date, toggle_expiration_date, toggle_tickets, and toggle_status (and some options of toggle_use_cases) are mostly to help us determine what toggles can be retired. Maybe we could propose something simpler?

@regisb
Copy link
Contributor Author

regisb commented Aug 21, 2020

Seems you like the """ style better than #. Can you comment here on your preference so we can determine a best practice.

""" annotations allow for multi-line comments, but single-line annotations (prefixed by #) don't. I prefer to use multi-line comments as an incentive to write longer descriptions.

EDIT: turns out that multi-line comments are not supported when added right in the middle of dict entries, so we'll have to revert to long, single-line comments.

For use cases, see: https://open-edx-proposals.readthedocs.io/en/latest/oep-0017-bp-feature-toggles.html#use-cases This is meant to be multi-value (as it currently stands). I'm not convinced it is required, and couldn't just be part of the description. What I think we do want to know is something like "permanent" vs "temporary", which might be much simpler. But, @nasthagiri made the original recommendation, so she may want to weigh in on this.

Right. When commenting existing feature toggles, I have no way of discovering for what use case they were designed, so I'll just set this field to "open edx".

I would propose we drop category altogether. I think the path is good enough. As a future idea, we could potentially provide an additional index based on paths, or pull out the part of the path in the doc to make it stand out as a pseudo-category.

Sounds reasonable.

toggle_warnings should go away, and just be a part of description.

I am working on a code-annotations pull request that uses the toggle_warnings field to add a "warning" field to the generated docs. It's a fairly simple change and the visual impact is quite good (IMHO):

docs-warnings

toggle_creation_date, toggle_expiration_date, toggle_tickets, and toggle_status (and some options of toggle_use_cases) are mostly to help us determine what toggles can be retired. Maybe we could propose something simpler?

Sure. @nasthagiri are you open to modifications of the current feature toggle annotation schema? If yes, could we schedule a conversation with the product owners?

@robrap
Copy link
Contributor

robrap commented Aug 21, 2020

@regisb:

  1. I know I did the review, but I am surprised to learn that # (or any style comments) don't support multi-line annotations. Can we fix that?
  2. We should somehow keep track of any open questions you have per toggle. Maybe even include it as text in the description. Like use_cases, etc.
  3. Let's drop category.
  4. Let's leave Warning if you find it useful.

@regisb
Copy link
Contributor Author

regisb commented Aug 21, 2020

@robrap

I know I did the review, but I am surprised to learn that # (or any style comments) don't support multi-line annotations. Can we fix that?

It's difficult. In code-annotations, single line annotations are parsed line-by-line: https://github.com/edx/code-annotations/blob/97cd3397c29a492929fb8902eec40400206253b7/code_annotations/extensions/base.py#L60
I can look into this, but right now I don't see an elegant way to solve this. Sorry for not being clear about this earlier in the pull request :-/

@robrap
Copy link
Contributor

robrap commented Aug 21, 2020

@regisb:

  1. Absolute minimum, we should update the docs to clarify this limitation.
  2. Top-of-my head thought, is there any way to do the parsing in two steps where we strip the comments before doing a more in-depth processing? I guess I'd have to look at the code to understand where it breaks down when the comments are still there, and how that could be fixed. I can't get to that until later.
    Just note that the most important part of your documenting might be the data gathering part, because I'd much prefer these got left as # comments.

@regisb regisb force-pushed the regisb/document-feature-toggles branch from b0fe52e to 9e2cf7d Compare August 21, 2020 22:36
@nasthagiri
Copy link
Contributor

@regisb and @robrap Yes, I am open to modifications to the current scheme definition. Please write up your proposal for the change, to give me an idea of what you are thinking. The proposal could be a PR to amend the OEP.

If you'd like set up a 15-30mn meeting to hash it out, you can do so at https://calendly.com/nasthagiri

Thanks!

@regisb
Copy link
Contributor Author

regisb commented Aug 25, 2020

Please write up your proposal for the change, to give me an idea of what you are thinking.

I'll repost what I just wrote on Slack: https://app.slack.com/client/T02SNA1T6/threads/thread/C015XNCQDDG-1597777765.002700
What I have in mind is that the toggle_use_cases should not be used anymore, as the use case depends on which platform we are considering: some platforms may enable a feature in monitored rollout while others may enable it at a specific launch date https://open-edx-proposals.readthedocs.io/en/latest/oep-0017-bp-feature-toggles.html#use-cases Thus, it does not make sense to set in stone the use cases of a feature toggle.
I think I will also mostly stop using the toggle_category entries: this field is already considered optional by OEP-17. I think I'll want to use it to group together feature toggles that obviously belong to the same category, such as ORA2 for instance.

Thus, my PR on OEP 17 would get rid of the toggle_use_cases field. Would that make sense @nasthagiri? If you disagree, then I am not going to insist very much on pushing that change, as it's mostly edX who has interest in this particular annotation. If you think this topic mandates a discussion then I'll schedule a call.

@nasthagiri
Copy link
Contributor

nasthagiri commented Aug 26, 2020

@regisb I spoke to @robrap about this today.

  1. I agree with the removal of the toggle_category field.
  2. Let's add a new field to specify whether the toggle is temporary or permanent. For forward compatibility, use a non-boolean field such as toggle_life_expectancy. For now, the values can be temporary and permanent.
  3. If toggle_life_expectancy is present, then expect to have a toggle_use_case value to explain why this permanent toggle needs to continue to be in the codebase (adding to our long-term code complexity).

#3 is important for us since we need to hold our teams accountable for removing old toggles. Allowing them to just mark a toggle as permanent (if we only do #2) can be too easy and won't meet our objectives. Having them reference OEP-17 to look up (and update if needed) the permanent type of use cases helps them be intentional in their design process.

Also, to give you an idea of the "expected timelines" for the different use cases, here's what I had sent to @elhoyos several months ago:

Use Case 1: Incremental Release (expected timeline: 3 months)
Use Case 2: Launch Date (expected timeline: 3 months)
Use Case 3: Ops - Monitored Rollout (expected timeline: 3 months)
Use Case 4: Ops - Graceful Degradation (expected timeline: 5 years)
Use Case 5: Beta Testing (expected timeline: 6 months)
Use Case 6: VIP / White Label (expected timeline: 5 years)
Use Case 7: Opt-out (expected timeline: 2 years)
Use Case 8: Open edX option (expected timeline: 3 years)

@regisb regisb force-pushed the regisb/document-feature-toggles branch 6 times, most recently from 41a744d to f7fd5d7 Compare August 27, 2020 16:02
@regisb
Copy link
Contributor Author

regisb commented Aug 31, 2020

@nasthagiri:

I agree with the removal of the toggle_category field.
Let's add a new field to specify whether the toggle is temporary or permanent. For forward compatibility, use a non-boolean field such as toggle_life_expectancy. For now, the values can be temporary and permanent.

👌
Should I write an ADR for this?

If toggle_life_expectancy is present, then expect to have a toggle_use_case value to explain why this permanent toggle needs to continue to be in the codebase (adding to our long-term code complexity).

Should toggle_use_case be added also when toggle_life_expectancy=temporary, or only when toggle_life_expectancy=permanent?

Unfortunately there is currently no mechanism to enforce the presence of a field conditionally on another: https://code-annotations.readthedocs.io/en/latest/configuration.html
However, what I can do is patch the feature toggles Sphinx extension that generates the human-readable docs: that way, when someone adds a feature toggle with toggle_life_expectancy=permanent but without toggle_use_case then a warning will be added to the docs, and possibly in the console as well.

@nasthagiri
Copy link
Contributor

Should I write an ADR for this?

Yes. Either an ADR or update the OEP if the OEP itself is now invalid.

Should toggle_use_case be added also when toggle_life_expectancy=temporary, or only when toggle_life_expectancy=permanent?

I'd like to require toggle_use_case when toggle_life_expectancy=permanent. Otherwise, it can be optional.

warning will be added to the docs, and possibly in the console as well

When would this issue be detected? Do we generate the doc within the CI process of the PR that introduced the toggle? Or would this be in a separate CI process?

I ask since I'm wondering how tight the feedback loop would be for the developers to correct their toggle annotations.

@regisb
Copy link
Contributor Author

regisb commented Aug 31, 2020

When would this issue be detected? Do we generate the doc within the CI process of the PR that introduced the toggle? Or would this be in a separate CI process?

If we want to have a very short feedback loop, I can write an ad-hoc linting script that will check the generated yaml feature toggle report. This script could even be moved to a dedicated unit test. But running this script/test would have to be done separately for each repository, assuming we add feature toggles to different repos. The alternative is that issues would only be detected once a human looks at the generated docs.

@robrap
Copy link
Contributor

robrap commented Aug 31, 2020

@regisb @nasthagiri: Some additional thoughts:

First, let me summarize what I think is temporary vs permanent based on the expected timeline and the OEP.

Temporary:

  • Use Case 1: Incremental Release (expected timeline: 3 months)
  • Use Case 2: Launch Date (expected timeline: 3 months)
  • Use Case 3: Ops - Monitored Rollout (expected timeline: 3 months)
  • Use Case 5: Beta Testing (expected timeline: 6 months)

(Semi-)Permanent:

  • Use Case 4: Ops - Graceful Degradation (expected timeline: 5 years)
  • Use Case 6: VIP / White Label (expected timeline: 5 years)
  • Use Case 7: Opt-out (expected timeline: 2 years)
  • Use Case 8: Open edX option (expected timeline: 3 years)

Also, based on reading the docs, I'd like to propose the following:

Proposed Renames:

  • "Ops - Graceful Degradation" => "Ops - Circuit Breaker": I think this name better matches the use case. I think of "graceful degradation" as something that is built in and probably wouldn't require a toggle, and feels more misleading.
  • "Opt-out" => "Opt-out/Opt-in": Just want to add "Opt-in" to the name to better match the options provided in the annotations.

I don't think code-annotations supports blanks for annotations with choices. Some possible options we have:

  1. Add an allow_blank option to annotation configuration to allow people to skip toggle_use_cases for temporary use cases, and keep life expectancy as a new toggle.
  2. Add temporary as an additional toggle_use_cases choice that someone can use in place of any temporary use-cases. The new list of choices would be temporary, incremental_release, launch_date, monitored_rollout, circuit_breaker, beta_testing, vip, opt_out, opt_in, open_edx.
  3. Replace all temporary use_case choices with a single temporary choice. The new list of choices would be temporary, circuit_breaker, vip, opt_out, opt_in, open_edx

Note: the second and third options don't require separate linting to ensure permanent use cases are added, and don't require a code-annotations enhancement to support blanks with choices fields.

I'd vote for option 2, as long as our how_to documentation helps keep this simple. Otherwise, option 3 keeps the options simplest.

Also, for all options above, we could consider adding a prefix to the use cases, like temp_incremental_release, temp_launch_date, ..., permanent_circuit_breaker, permanent_vip, etc. The names are longer, but it helps more quickly categorize the use cases.

Additionally, we should make the following documentation updates:

  1. The OEP could use the terms temporary and permanent or semi-permanent in the descriptions. The "Proposed Renames" above should also happen. Would it be helpful to add expected lifespans to the OEP?
  2. We need a how_to for documenting toggles that provides example annotations, describes how to document temporary vs permanent toggles, and points to the OEP as needed for additional details of the use cases. This doc either needs to live in edx-toggles, or edx-toggles should point to this doc.
  3. The feature_toggle_annotations.yaml should have a comment pointing to the how_to for usage.

@nasthagiri
Copy link
Contributor

+1 to all of the above remarks.

@robrap
Copy link
Contributor

robrap commented Aug 31, 2020

@regisb: Do you feel comfortable starting a how_to PR in edx-toggles for annotating? We can hash out any remaining issues in the documentation, and then follow-up with changes to the yml and OEP (in parallel or serially as you see fit).

@regisb
Copy link
Contributor Author

regisb commented Sep 1, 2020

@robrap

Do you feel comfortable starting a how_to PR in edx-toggles for annotating? We can hash out any remaining issues in the documentation, and then follow-up with changes to the yml and OEP (in parallel or serially as you see fit).

Sure. I'll open the following PRs:

  • On edx-toggles to create a howto
  • On code-annotations to update the yaml configuration
  • On open-edx-proposals to amend OEP 17

@regisb
Copy link
Contributor Author

regisb commented Sep 1, 2020

@robrap I like your option 3:

Replace all temporary use_case choices with a single temporary choice. The new list of choices would be temporary, circuit_breaker, vip, opt_out, opt_in, open_edx

The reason why I like this approach is that:

  1. developers who document a feature toggle might not have any idea how the toggle is going to be enabled/disabled by the ops team.
  2. Also, the way it is deployed should not have an impact on the code.
  3. Finally, non-edX contributors don't have a clue either how the toggle is going to be deployed in other enviironments.

If we go with this approach, I'd suggest getting rid of the toggle_life_expectancy entirely: after all, if toggle_use_case is not temporary, then it means that the life expectancy is permanent, right? It would spare us from adding allow_blank properties to the choice fields. @nasthagiri what do you think?

Sorry if my explanations are confusing -- maybe we can schedule a call to clarify things?

@nasthagiri
Copy link
Contributor

@regisb Yes, completely agree. @robrap's option eliminates the need to have a separate toggle_life_expectancy.

@regisb regisb force-pushed the regisb/document-feature-toggles branch from c8f77ca to a4ba4ae Compare September 16, 2020 13:20
@regisb regisb changed the title [BD-21] Document feature toggles (WIP) [BD-21] Document feature toggles Sep 16, 2020
@regisb
Copy link
Contributor Author

regisb commented Sep 16, 2020

@feanil I rebased and removed the "WIP" label -- as far as I'm concerned, this is ready to merge.

@edx-status-bot
Copy link

Your PR has finished running tests. There were no failures.

@feanil
Copy link
Contributor

feanil commented Sep 17, 2020

PR for the flaky python test is here: https://github.com/edx/edx-platform/pull/25009

@feanil feanil merged commit 90bc2ff into openedx:master Sep 17, 2020
@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the staging environment in preparation for a release to production.

@edx-pipeline-bot
Copy link
Contributor

EdX Release Notice: This PR has been deployed to the production environment.

@regisb regisb deleted the regisb/document-feature-toggles branch October 5, 2020 07:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blended PR is managed through 2U's blended developmnt program merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants