Skip to content

bmkramer/ukri_oa_baseline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 

Repository files navigation

UKRI OA baseline values - exploration using open data sources

Code documented here is used to generate the dataset accompanying the 2024 report
"Monitoring and evaluation of UKRI's Open Access Policy: Exploring the use of open data sources to inform baseline values"

Report: https://doi.org/10.5281/zenodo.12804855
Dataset: https://doi.org/10.5281/zenodo.12801805

General description

The repository contains SQL scripts used to collect bibliographic metadata of UKRI-funded and UK-affiliated research output (journal articles only) published between 2012 and 2022, as well as data on open access availability, publisher, national and international collaborations, citations, views and downloads, altmetrics and subjects (fields).

This project makes use of Curtin Open Knowledge Initiative (COKI) infrastructure, which is documented on GitHub: https://github.com/The-Academic-Observatory. Here, a number of open data sources (including Crossref, OpenAlex and Unpaywall) are ingested into a Google Big Query environment, which can then be queried via SQL. Additional data sources can be ingested manually, and similarly queried via SQL.

Data sources

The scripts use the following data sources included in the COKI Google Big Query environment:

In addition, a number of supplementary open data sources were manually added to the Google Big Query environment for this project. These are included in this repository in the folder supplementary_sources

Workflow description

The SQL scripts in this repository, when run in the COKI Google Big Query environment as described above, each generate an intermediate table in Google Big Query with the results of that particular query for each record in the dataset (bibliographic metadata, open access classfication, etc). The final SQL script combines all intermediate files by matching on DOIs. The resulting final dataset containing all variables can then be exported from Google Big Query as csv file.

All scripts are annotated to explain the different parts of the code.

Step 1

ukri_oa_baseline_query_1_corpus.sql - collect bibliographic metadata for UKRI-funded and UK-affiliated journal articles from Gateway to Research, Crossref and OpenAlex (limited to publications with Crossref DOI)

Step 2

ukri_oa_baseline_query_2_oa_classification.sql - for each record, collect open access information from Unpaywall

Step 3

ukri_oa_baseline_query_3_publishers.sql - for each record, collect publisher information from Crossref

Step 4

ukri_oa_baseline_query_4_collaborations.sql - for each record, collect information on national and international collaborations from OpenAlex

Step 5

ukri_oa_baseline_query_5_citations.sql - for each record, collect citation information from OpenAlex

Step 6

ukri_oa_baseline_query_6_views_downloads.sql - for each record, collect usage information (views and downloads) from IRUS-UK

Step 7

ukri_oa_baseline_query_7_event_data.sql - for each record, collect altmetrics information (Twitter, newsfeeds, Reddit links, Wikipedia) from Crossref Event Data

Step 8

ukri_oa_baseline_query_8_fields.sql - for each record, collect subject classification from OpenAlex

Step 9

ukri_oa_baseline_query_9_combine_data.sql - combine all intermediate files by matching on DOI

About

Code for establishing baseline values for the monitoring and evaluation of UKRI’s Open Access Policy

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors