Skip to content

Conversation

@richard-jones
Copy link
Contributor

@richard-jones richard-jones commented Apr 8, 2025


Introduces Current and Free tier of data services

Introduces current-data and free-data access tiers for:

  • OAI PMH
  • Public Data Dump
  • Journal CSV

This PR...

  • has scripts to run
  • has migrations to run
  • adds new infrastructure
  • changes the CI pipeline
  • affects the public site
  • affects the editorial area
  • affects the publisher area
  • affects the monitoring

Developer Checklist

Developers should review and confirm each of these items before requesting review

  • Code meets acceptance criteria from issue
  • Unit tests are written and all pass
  • User Test Scripts (if required) are written and have been run through
  • Project's coding standards are met
    • No deprecated methods are used
    • No magic strings/numbers - all strings are in constants or messages files
    • ES queries are wrapped in a Query object rather than inlined in the code
    • Where possible our common library functions have been used (e.g. dates manipulated via dates)
    • Cleaned up commented out code, etc
    • Urls are constructed with url_for not hard-coded
  • Code documentation and related non-code documentation has all been updated
  • Migration has been created and tested
  • There is a recent merge from develop

Reviewer Checklist

Reviewers should review and confirm each of these items before approval
If there are multiple reviewers, this section should be duplicated for each reviewer

  • Code meets acceptance criteria from issue
  • Unit tests are written and all pass
  • User Test Scripts (if required) are written and have been run through
  • Project's coding standards are met
    • No deprecated methods are used
    • No magic strings/numbers - all strings are in constants or messages files
    • ES queries are wrapped in a Query object rather than inlined in the code
    • Where possible our common library functions have been used (e.g. dates manipulated via dates)
    • Cleaned up commented out code, etc
    • Urls are constructed with url_for not hard-coded
  • Code documentation and related non-code documentation has all been updated
  • Migation has been created and tested
  • There is a recent merge from develop

Testing

User testing

Functional test scripts for premium service
https://doaj.github.io/doaj-docs/feature/4008_premium/testbook/index.html#premium_subscription/premium_access

No particular regression testing is required

Operational testing

I think it would be a good idea to get this up onto a test server and simulate the deployment and phase in process. We could adjust the config to shorten the delay time to speed up the testing:

# Premium phase in ON
PREMIUM_PHASE_IN = True

# enter the deployment date as the phase in start
PREMIUM_PHASE_IN_START = datetime(2025, 5, 16) 

# Set the delay to 7 days
NON_PREMIUM_DELAY_SECONDS = 7 * _DAY

This would allow us to see the premium mode phase in and then start to operate normally over the course of a week.

To do this, we have developed the following plan:

  1. Deploy the code with the appropriate PREMIUM_PHASE_IN_START and a 7 day non_premium delay.
  2. Import only 1000 articles and journals to enable rapid data dump generation and easy review of data records for conformance to the phase in process
  3. Create a set of test user accounts, as detailed here (in addition to the usual test user accounts): https://docs.google.com/spreadsheets/d/1WAMNf0kco7j9Csi45Hw73O2nU7thCI2eWFhmmLVHVuk/edit?gid=2095824013#gid=2095824013
  4. Run a cron script every 3 - 6 hours which will inject new content into the system, to simulate ongoing addition of journals and articles. This script is in portality/scripts/add_sample_journals_and_articles.py
  5. @Steven-Eardley @RK206 and @richard-jones to independently run daily tests to confirm that the phase in is behaving as expected, following the testing plan here: https://docs.google.com/spreadsheets/d/1WAMNf0kco7j9Csi45Hw73O2nU7thCI2eWFhmmLVHVuk/edit?gid=0#gid=0
    • In order to see the latest journals and articles in the PDD, this script will extrat the 100 newest records from the tar: portality/scripts/top_100_records_from_pdd.py. This works over both Journal and Article PDD files. Use as follows:
python portality/scripts/top_100_records_from_pdd.py [path to tar] [path to output file]

Deployment

Configuration changes

The following fields are added and may need new production configuration values:

# aws container name for journal csvs
STORE_JOURNAL_CSV_CONTAINER = "doaj-journal-csv-placeholder"

# how long should the temporary URL for journal csvs last
JOURNAL_CSV_URL_TIMEOUT = 3600

# Should the system enforce premium membership mode
PREMIUM_MODE = True

# should the system respect phase-in mode, accommodating the phase-in start as the
# oldest date for non-premium content
PREMIUM_PHASE_IN = False
PREMIUM_PHASE_IN_START = datetime(2025, 5, 16)

# What is the delay non-premium users have to data access
NON_PREMIUM_DELAY_SECONDS = 30 * _DAY

The following fields are changed, and will need new production configuration values

STORE_S3_SCOPES = {
    ...
    constants.STORE__SCOPE__JOURNAL_CSV: {
        "aws_access_key_id": "put this in your dev/test/production.cfg",
        "aws_secret_access_key": "put this in your dev/test/production.cfg"
    }
}

# Note that data dump is now produced daily
HUEY_SCHEDULE = {
    ...
    "public_data_dump": {"month": "*", "day": "*", "day_of_week": "*", "hour": "10", "minute": "0"},
    ...
}

The following fields have been removed

EXTRA_JOURNALCSV_LOGGING = False

Initial deployment configuration

On the day of release the following configuration values from the above will need to be set with custom values different from the settings.py file.

STORE_JOURNAL_CSV_CONTAINER = "doaj-journal-csv"
PREMIUM_PHASE_IN = True
PREMIUM_PHASE_IN_START = datetime(2025, 5, 16) # set to release date
STORE_S3_SCOPES = {
    ...
    constants.STORE__SCOPE__JOURNAL_CSV: {
        "aws_access_key_id": "put this in your dev/test/production.cfg",
        "aws_secret_access_key": "put this in your dev/test/production.cfg"
    }
}

Migrations

The following migration will:

  1. remove the journal CSV cache record
  2. migrate the public data dump cache record to the new format
python portality/migrate/20250521_4008_premium/migrate.py

It does not migrate the Journal CSV, so as soon as possible the following script should also be run in production:

python portality/scripts/journalcsv.py

I also think that we will want to go back and check after a week of deployment, as we will probably need to manually delete the old journal csv and public data dump records that will be a hang-over from the previous version.

Once these scripts have been run, we may also want to upgrade all the users who will be given premium from the start. This can be done with:

python portality/scripts/apply_user_roles.py [csv of users and roles]

The [csv of users and roles] should be provided by @dommitchell and is of the form (do not include a header row):

"username1","role1, role2"
"username2","role1, role3"

Monitoring

We probably should monitor to ensure the PDD is successfuly generated every day

@richard-jones richard-jones changed the title add premium option to pmh endpoint (testing in progress) premium service Apr 8, 2025
@richard-jones richard-jones marked this pull request as ready for review May 22, 2025 09:06
Copy link
Contributor

@RK206 RK206 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@amdomanska
Copy link
Contributor

@richard-jones

Few quick tweaks:

  • Raised the breakpoint a bit to fit the new label.
  • Moved the "Premium" label into the main nav template—fixed a few bugs stopping non-dropdown items from showing.
  • Added the new nav item data to nav.yml.
  • Changed $gold from #D6AF36 → #ffd700 to match the star icon; barely noticeable, slightly brighter, not used elsewhere.

My changes

@Steven-Eardley Steven-Eardley added the CONFLICT WITH BASE Merge base branch (usually develop or master) in to resolve label Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CONFLICT WITH BASE Merge base branch (usually develop or master) in to resolve

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants