This Python library and scripts help with data handling in and out of OSIRIS.
Use OsirisIO to operate on the MongoDB of your OSIRIS. This included import of new activities with automated validation.
ATTENTION: This library is under active development and should be used with caution. If you plan to change the data of your OSIRIS instance directly on MongoDB, we recommend to NOT directly work on of your live system. Work on a copy instead and dump/restore your changes back into the live system afterwards.
This library need Python to be installed. Please look up how to install the latest Python version on your OS. You will also need pip to install packages.
When you have Python installed, you can create an virtual environment (see here) for more information on python virtual environments).
This command will create a folder named 'venv' and all needed files for the virtual environment with in the folder:
python3 -m venv ./venvNext step is to start the virtual environment:
source ./venv/bin/activateNow you can install the python library osirisdata and all dependencies. Therefore you can run:
make installThis Script generates JSON schema of all the activity types in your OSIRIS instance.
Make sure that you have a config.yaml file in the top directory of this project with your OSIRIS information in it.
An example for the config file is config.template.yaml.
Execute the script:
python scripts/generate_json_schema.py If there are no Errors you can find the JSON schemas of all your activity types in the schemas directory.
This is a short example on how to use the OsirisIO function:
First you have to initialize a instance of OsirisIO with your MongoDB connection information. OsirisIO will automatically build pydantic models for the all defined activity types of your OSIRIS instance. This can be turned off by setting validation to False. The option validate_extra is used to define how the pydantic models should behave when encountering field names that are not defined in the activity types of your OSIRIS instance. You can choose between allow, ignore or forbid, for more information see here.
Import and initialize:
from osirisdata.osiris_io import OsirisIO
OSIRIS = OsirisIO(
connection="MONGODB CONNECTION STRING",
database="NAME OF YOUR OSIRIS DATABASE IN MONGODB"
validation=True, # optional, default = True
validate_extra='ignore' # optional, default = ignore
)Here are some examples of the functions offered by OsirisIO.
Delete a whole collection like this:
OSIRIS.delete_collection('activities')This will delete the all entries in the activities collection
To add a new activity to OSIRIS you can build up a Python dict with all needed information and simply add the activity with add_activity():
activity = {
"title": "OSIRIS data",
"date": {
"year": 2026,
"month": 5,
"day": 7
}, ...
}
OSIRIS.add_activity(activity)NOTE: The
add_activityfunction will automatically validate the activity dictionary with the pydantic model for the activity type. You can turn off the functionality in the initialization step of OsirisIO.
Setup OpenAlexParser, you have to find out your institute id from OpenAlex.
OSIRIS = OsirisIO(connection="mongodb://localhost:27017/", database="osiris")
OPENALEX = OpenAlexParser(OSIRIS, YourInstituteId, YourEmail)