Skip to content

feat: ✨ convert redcap data dict to resource properties#35

Open
martonvago wants to merge 4 commits intomainfrom
feat/convert-to-resources
Open

feat: ✨ convert redcap data dict to resource properties#35
martonvago wants to merge 4 commits intomainfrom
feat/convert-to-resources

Conversation

@martonvago
Copy link
Collaborator

@martonvago martonvago commented Mar 20, 2026

Description

This PR adds the ability to convert a saved REDCap data dict to resource properties.
Followed this plan: #23 (comment)
Does not include a resource for events/visits. Not sure if we want that?

Closes #25 closes #26

This PR needs an in-depth review.

Checklist

  • Formatted Markdown
  • Ran just run-all

Comment on lines +14 to +24
def _map(x: Iterable[In], fn: Callable[[In], Out]) -> list[Out]:
return list(map(fn, x))


def _filter(x: Iterable[In], fn: Callable[[In], bool]) -> list[In]:
return list(filter(fn, x))


def _flat_map(items: Iterable[In], fn: Callable[[In], Iterable[Out]]) -> list[Out]:
"""Maps and flattens the items by one level."""
return list(chain.from_iterable(map(fn, items)))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These could come from soil.
Or we could decide #34 and put them in internals.py

form_name: str, fields: list[dict[str, str]]
) -> sp.ResourceProperties:
visit_field = sp.FieldProperties(
name="visit",
Copy link
Collaborator Author

@martonvago martonvago Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or event?

Could contain the unique_event_name or event_id of a REDCap event

),
)

# Discard fields displayed for information only
Copy link
Collaborator Author

@martonvago martonvago Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

descriptive fields are there to give information/instructions to people filling in the form, as far as I understand.

checkbox fields are treated separately. See #32 (comment) for details.

form_redcap_fields,
lambda field: sp.FieldProperties(
name=field["field_name"],
title=field["field_name"],
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find a better equivalent for title. There's field_label but that's often a long question or explanation in Danish.

Comment on lines +70 to +73
constraints=sp.ConstraintsProperties(
required=_get_required(field),
enum=_get_categories(field),
),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if we want constraints.
The enum constraint is the same as categories above.

Comment on lines +85 to +86
title=form_name,
description=form_name,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The title could be the instrument_label that we can get by making another API call (it's not in the data dict).

I wasn't able to find an equivalent for description.

"""Parses the choices into the choice number and choice value.

E.g.:
Input: "1, first choice|2, second choice|3, third choice"
Copy link
Collaborator Author

@martonvago martonvago Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that choices usually look like this but that basically anything can be entered in the UI. This function throws an error if an unexpected format is given because I thought that was better than potentially making a silent mistake somewhere.

Does anyone know if we expect other formats? E.g. A, first choice|B, second choice|C, third choice?

return _map(
_get_choices(checkbox_field),
lambda choice: sp.FieldProperties(
name=f"{checkbox_field['field_name']}___{choice[0]}",
Copy link
Collaborator Author

@martonvago martonvago Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how the data is structured by REDCap



def _get_description(redcap_field: dict[str, str]) -> str:
description = redcap_field["field_annotation"]
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seemed to be the property that was almost always filled in and in English.
There is also field_note, which is not always filled in and in Danish.

title=field["field_name"],
type=_get_type(field),
description=_get_description(field),
categories=_get_categories(field),
Copy link
Collaborator Author

@martonvago martonvago Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a list of strings.
We can also have a list of objects like {"value": 1, "label": "apple"} to keep track of the REDCap number of the choices as well.

case "slider":
return "number"
case _:
raise NotImplementedError(_get_error_message(redcap_field, "field_type"))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only included the field types that were in the test data. There are some more.

case "text" | "calc" | "radio" | "notes" | "file":
return "string"
case "slider":
return "number"
Copy link
Collaborator Author

@martonvago martonvago Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think slider can be integer or float, both are included in number

@martonvago martonvago moved this from Todo to In Review in Data development Mar 20, 2026
@martonvago martonvago marked this pull request as ready for review March 20, 2026 11:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

Convert data dictionary items within each resource into properties Reproducibly split data dictionary into resources

1 participant