Skip to content

OSIRIS-Solutions/osiris-data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OSIRIS data

This Python library and scripts help with data handling in and out of OSIRIS.

Use OsirisIO to operate on the MongoDB of your OSIRIS. This included import of new activities with automated validation.

ATTENTION: This library is under active development and should be used with caution. If you plan to change the data of your OSIRIS instance directly on MongoDB, we recommend to NOT directly work on of your live system. Work on a copy instead and dump/restore your changes back into the live system afterwards.

Setup

Environment

This library need Python to be installed. Please look up how to install the latest Python version on your OS. You will also need pip to install packages.

When you have Python installed, you can create an virtual environment (see here) for more information on python virtual environments).

This command will create a folder named 'venv' and all needed files for the virtual environment with in the folder:

python3 -m venv ./venv

Next step is to start the virtual environment:

source ./venv/bin/activate

Install osirisdata

Now you can install the python library osirisdata and all dependencies. Therefore you can run:

make install

Use Scripts

generate_json_schema.py

This Script generates JSON schema of all the activity types in your OSIRIS instance.

Make sure that you have a config.yaml file in the top directory of this project with your OSIRIS information in it. An example for the config file is config.template.yaml.

Execute the script:

python scripts/generate_json_schema.py 

If there are no Errors you can find the JSON schemas of all your activity types in the schemas directory.

Use Library

OsirisIO

This is a short example on how to use the OsirisIO function:

First you have to initialize a instance of OsirisIO with your MongoDB connection information. OsirisIO will automatically build pydantic models for the all defined activity types of your OSIRIS instance. This can be turned off by setting validation to False. The option validate_extra is used to define how the pydantic models should behave when encountering field names that are not defined in the activity types of your OSIRIS instance. You can choose between allow, ignore or forbid, for more information see here.

Import and initialize:

from osirisdata.osiris_io import OsirisIO

OSIRIS = OsirisIO(
    connection="MONGODB CONNECTION STRING", 
    database="NAME OF YOUR OSIRIS DATABASE IN MONGODB"
    validation=True, # optional, default = True
    validate_extra='ignore' # optional, default = ignore
)

Here are some examples of the functions offered by OsirisIO.

Delete a whole collection like this:

OSIRIS.delete_collection('activities')

This will delete the all entries in the activities collection

To add a new activity to OSIRIS you can build up a Python dict with all needed information and simply add the activity with add_activity():

activity = {
    "title": "OSIRIS data",
    "date": {
        "year": 2026,
        "month": 5,
        "day": 7
    }, ...
}

OSIRIS.add_activity(activity)

NOTE: The add_activity function will automatically validate the activity dictionary with the pydantic model for the activity type. You can turn off the functionality in the initialization step of OsirisIO.

OpenAlexParser

Setup OpenAlexParser, you have to find out your institute id from OpenAlex.

OSIRIS = OsirisIO(connection="mongodb://localhost:27017/", database="osiris")
OPENALEX = OpenAlexParser(OSIRIS, YourInstituteId, YourEmail)

About

Python library for working with OSIRIS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors