-
Notifications
You must be signed in to change notification settings - Fork 15
[BD-27] Upload transcripts downloaded by the video_download script #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BD-27] Upload transcripts downloaded by the video_download script #61
Conversation
|
Thanks for the pull request, @janimo! I've created BLENDED-828 to keep track of it in Jira. More details are on the BD-27 project page. When this pull request is ready, tag your edX technical lead. |
181bf45 to
b9a68ea
Compare
|
@janimo this looks good - but a few questions.
|
|
|
@alangsto , I was attempting to test on my devstack first, but underestimated the complexity of setting up the video pipeline. Though, I get a 404 on edge, and staging for the API admin: https://edge.edx.org/api-admin/ |
|
@kaizoku, no worries about testing on edge or stage then, we can take that on as part of our testing of this tool. @janimo one last request, could you please update the documentation for the video + transcript tooling here in this file (https://github.com/edx/cc2olx/blob/master/src/cc2olx/tools/README.rst). Thanks! |
|
@alangsto I've updated the README to mention transcripts. One thing I am not sure about is what other (if any) changes are needed in the cc2olx tool to fully support transcripts. The documentation I used at https://edx.readthedocs.io/projects/edx-open-learning-xml/en/latest/components/video-components.html doesn't explain much about transcripts in video blocks, and I did not find online examples of video blocks with transcript support enabled. While uploading a transcript will associate the video with it via de edx id tag, I am still not sure whether the video xblock should explicitly list the transcripts itself. |
|
@janimo have you tried adding a fake transcript to a video component in studio, and looking at the resulting olx from the course export? That would be my first suggestion to understand how xblocks and transcripts work together. I myself am not familiar, but had thought this would be included in the work for this ticket. |
|
@janimo, yes, video blocks do need to specify the transcripts associated to the video. We'll need to add OLX generation to link the transcripts to the associated video here. |
This add the new column that contains dash separated, alphabetically ordered language codes to the output CSV. This is used by cc2olx to add transcript tags in the generated video block.
074b2a0 to
87f65ee
Compare
|
I've added a related pull which addresses the cc2olx part, once the video upload script generates a CSV with language codes. |
|
@janimo, I setup the video pipeline in my devstack and it looks like the transcript upload endpoint doesn't accept the API credentials, as it uses the basic django It looks like we'll need to find another way to upload transcripts programmatically. Did you find any other endpoints while working on this? |
|
@kaizoku I did not find another endpoint, but I did not look either after I found this one TBH. |
| ----------------- | ||
|
|
||
| The video upload tool uploads video files to edX's video encoding pipeline. | ||
| The video upload tool uploads video files and associated transcripts to edX's video encoding pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaizoku can we also include documentation for the video download tool? It can be added to this file, but maybe under another section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I added a section below describing the options to the video download tool, its expected input and output, and how it can be used with the video upload tool. Does this extra section look good?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great, thanks for adding!
| Arguments: | ||
| * filename: the transcript filename | ||
| * edx_video_id: the video ID of the video this transcript is for | ||
| * language_code: the language of the transcript |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add access token to docstring args
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that. It's added now.
tests/test_tools.py
Outdated
| mocker.patch("cc2olx.tools.video_upload.open") | ||
| mocker.patch("cc2olx.tools.video_upload.requests.post", side_effect=upload_transcript_side_effect) | ||
|
|
||
| response = upload_transcript("filename", "edxid", "en") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing access_token arg
| The ``course-id`` argument is the ID of the course as it appears in Studio. For example, ``course-v1:edX+111222+111222``. | ||
|
|
||
| The ``directory`` argument is a directory containing video files that will be uploaded to edX's video encoding pipeline. | ||
| The ``directory`` argument is a directory (such as the one created by the video download tool) containing video files and transcripts, that will be uploaded to edX's video encoding pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How should this directory be structured? Will all of the video files and transcripts be within the same directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some clarification to it's file searching behavior in this section. Does that help on this point?
|
@kaizoku this looks good, I can merge once you fix the formatting |
|
Ah sorry for the formatting @alangsto. I just noticed I've fixed the formatting issue here. |
This will look for .srt files downloaded by youtube-dl along with the videos and will upload them after each video is uploaded.
I have only tested this with a devstack, checking that the transcript_uploads/ endpoint is called successfully.
I've also added tests for argument parsing to increase CI coverage and have the checks pass.