Skip to content
Closed
Show file tree
Hide file tree
Changes from 10 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3b9b559
lots of stuff
egeakman May 24, 2024
a3db579
port funcs to 2024
egeakman May 25, 2024
0c50a62
update
egeakman May 25, 2024
1d106e0
oops + more readable + tell what event are we transforming
egeakman May 25, 2024
96111ab
better slug dupe check + optimize
egeakman May 25, 2024
08bcbde
add documentation
egeakman May 29, 2024
39a96e3
Update README.md
egeakman May 29, 2024
ecb1cc3
Update README.md
egeakman May 29, 2024
4276fa5
add configuration to readme
egeakman May 29, 2024
aba49d6
Use model_dump_json to be able to serialize datetime
egeakman May 29, 2024
4a0d477
Merge branch 'main' into port-to-2024
egeakman May 31, 2024
4e433ec
.env + documentation + extract more socials
egeakman May 31, 2024
fcceb66
exist_ok
egeakman Jun 1, 2024
b666971
url extraction functions
egeakman Jun 1, 2024
5798b4b
Tried to put timings under a different model
egeakman Jun 2, 2024
7818471
correct typing at some places
egeakman Jun 2, 2024
84d3387
better overall structure
egeakman Jun 2, 2024
339ba50
typing
egeakman Jun 2, 2024
df0ad5f
Add resources to the schema
egeakman Jun 2, 2024
f5e635f
Update README.md
egeakman Jun 2, 2024
66fa79f
oops missed this one
egeakman Jun 2, 2024
ee3f018
change gitx_url to gitx
egeakman Jun 2, 2024
96eb614
Add tests for mastodon and linkedin url extraction
NMertsch Jun 3, 2024
1dec5c8
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jun 3, 2024
ce1de63
better code structure
egeakman Jun 4, 2024
de3f67d
Separate files
egeakman Jun 4, 2024
d875052
naming
egeakman Jun 4, 2024
42aba10
speaker website_url
egeakman Jun 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,11 @@ download:
python -m src.download

transform:
ifeq ($(ALLOW_DUPES), true)
python -m src.transform --allow-dupes
else
python -m src.transform

endif

all: download transform

Expand Down
37 changes: 36 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,37 @@
# programapi
Program API

This project downloads, processes, saves, and serves the static JSON files containing details of accepted speakers and submissions via an API.

What this project does step-by-step:

1. Downloads the Pretalx speaker and submission data, and saves it as JSON files.
2. Transforms the JSON files into a format that is easier to work with and OK to serve publicly. This includes removing unnecessary/private fields, and adding new fields.
3. Serves the JSON files via an API.

## Installation

1. Clone the repository.
2. Install the dependency management tool: ``make deps/pre``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe an idea for the future. What about we just install programapi as a Python package? And have a following functionality:

  • programapi download
  • programapi transform
    By the way do you think that we can avoid saving unnecessary / private fields when downloading?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it and didn't want to do it when there was bunch of other stuff to implement first. But I support it, I think we can do it after this PR.

About the private/unused fields:

  • IINM some fields like email, do_not_record cannot be excluded when downloading.
  • Answers can probably be excluded, but we do ?questions=all to get all the answers, so we can manage what to include/exclude in the model ad-hoc.

3. Install the dependencies: ``make deps/install``
4. Set up ``pre-commit``: ``make pre-commit``

## Configuration

You can change the event in the [``config.py``](src/config.py) file. It is set to ``europython-2024`` right now.

## Usage

- Run the whole process: ``make all``
- Run only the download process: ``make download``
- Run only the transformation process: ``make transform``

**Note:** Don't forget to set the ``PRETALX_TOKEN`` environment variable before running the download process. And please don't make too many requests to the Pretalx API, it might get angry 🤪

## API

The API is served at ``programapi24.europython.eu/2024``. It has two endpoints (for now):

- ``/speakers.json``: Returns the list of confirmed speakers.
- ``/sessions.json``: Returns the list of confirmed sessions.

**Note:** See [this page](data/examples/README.md) for the explanations of the fields in the returned JSON files.
123 changes: 123 additions & 0 deletions data/examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Explaining the output data

**Note:** Some of the fields may be `null` or empty (`""`).

## `sessions.json`

<details>
<summary>Example session data JSON</summary>

```json
{
"A1B2C3": {
"code": "A1B2C3",
"title": "Example talk",
"speakers": [
"B4D5E6",
...
],
"submission_type": "Talk",
"slug": "example-talk",
"track": "Some Track",
"state": "confirmed",
"abstract": "This is an example talk. It is a great talk.",
"tweet": "This is an example talk.",
"duration": "60",
"level": "intermediate",
"delivery": "in-person",
"room": "South Hall 2A",
"start": "2024-07-10T14:00:00+02:00",
"end": "2024-07-10T15:00:00+02:00",
"talks_in_parallel": [
"F7G8H9",
...
],
"talks_after": [
"I0J1K2",
...
],
"talks_before": [
"L3M4N5",
...
],
"next_talk": "O6P7Q8",
"prev_talk": "R9S0T1",
"website_url": "https://ep2024.europython.eu/session/example-talk/"
},
}
```
</details>

&nbsp;

The fields are as follows:

| Key | Type | Notes |
|--------------------|-----------------------------------|---------------------------------------------------------------|
| `code` | `string` | Unique identifier for the session |
| `title` | `string` | Title of the session |
| `speakers` | `list[string]` | List of codes of the speakers |
| `submission_type` | `string` | Type of the session (e.g. Talk, Workshop, Poster, etc.) |
| `slug` | `string` | URL-friendly version of the title |
| `track` | `string` \| `null` | Track of the session (e.g. PyData, Web, etc.) |
| `state` | `string` | State of the session (e.g. confirmed, canceled, etc.) |
| `abstract` | `string` | Abstract of the session |
| `tweet` | `string` | Tweet-length description of the session |
| `duration` | `string` | Duration of the session in minutes |
| `level` | `string` | Level of the session (e.g. beginner, intermediate, advanced) |
| `delivery` | `string` | Delivery mode of the session (e.g. in-person, remote) |
| `room` | `string` \| `null` | Room where the session will be held |
| `start` | `datetime (ISO format)` \| `null` | Start time of the session |
| `end` | `datetime (ISO format)` \| `null` | End time of the session |
| `talks_in_parallel`| `list[string]` \| `null` | List of codes of sessions happening in parallel |
| `talks_after` | `list[string]` \| `null` | List of codes of sessions happening after this session |
| `talks_before` | `list[string]` \| `null` | List of codes of sessions happening before this session |
| `next_talk` | `string` \| `null` | Code of the next session in the same room |
| `prev_talk` | `string` \| `null` | Code of the previous session in the same room |
| `website_url` | `string` | URL of the session on the conference website |

&nbsp;

## `speakers.json`

<details>
<summary>Example speaker data JSON</summary>

```json
{
"B4D5E6": {
"code": "B4D5E6",
"name": "A Speaker",
"biography": "Some bio",
"avatar": "https://pretalx.com/media/avatars/picture.jpg",
"slug": "a-speaker",
"submissions": [
"A1B2C3",
...
],
"affiliation": "A Company",
"homepage": "https://example.com",
"twitter": "example",
"mastodon": "example"
},
...
}
```
</details>

&nbsp;

The fields are as follows:

| Key | Type | Notes |
|----------------|--------------------|-----------------------------------------------------------------------|
| `code` | `string` | Unique identifier for the speaker |
| `name` | `string` | Name of the speaker |
| `biography` | `string` \| `null` | Biography of the speaker |
| `avatar` | `string` \| `null` | URL of the speaker's avatar |
| `slug` | `string` | URL-friendly version of the name |
| `submissions` | `list[string]` | List of codes of the sessions the speaker is speaking at |
| `affiliation` | `string` \| `null` | Affiliation of the speaker |
| `homepage` | `string` \| `null` | URL of the speaker's homepage |
| `twitter` | `string` \| `null` | Twitter handle of the speaker |
| `mastodon` | `string` \| `null` | Mastodon handle of the speaker |
10 changes: 6 additions & 4 deletions data/examples/output/sessions.json
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,9 @@
"end": null,
"talks_in_parallel": null,
"talks_after": null,
"next_talk_code": null,
"prev_talk_code": null,
"talks_before": null,
"next_talk": null,
"prev_talk": null,
"website_url": "https://ep2024.europython.eu/session/this-is-a-test-talk-from-a-test-speaker-about-a-test-topic"
},
"B8CD4F": {
Expand All @@ -43,8 +44,9 @@
"end": null,
"talks_in_parallel": null,
"talks_after": null,
"next_talk_code": null,
"prev_talk_code": null,
"talks_before": null,
"next_talk": null,
"prev_talk": null,
"website_url": "https://ep2024.europython.eu/session/a-talk-with-shorter-title"
}
}
File renamed without changes.
File renamed without changes.
3 changes: 3 additions & 0 deletions src/download.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
"speakers?questions=all",
]

if not Config.raw_path.exists():
Config.raw_path.mkdir(parents=True)

for resource in resources:
url = base_url + f"{resource}"

Expand Down
Loading