Proposal
Overview
Recently, I am working on improving our user experience from a non-technical perspective, i.e. installation, documentation and etc. I have dived into the documentation-build ecosystem in the Python community and found that there is no perfect solution so far. Therefore, this post will discuss how we should build a documentation system for our project and make it modern, beautiful, robust, easy to test and maintain.
Current tools
There are mainly two documentation-building tools so far in the open-source community, namely sphinx and docusaurus. Sphinx is the python-specific tool to generate html pages from docstring while docusaurus is a universal documentation framework for all languages. Their comparisons are listed below.
| Criterion |
Sphinx |
Docusaurus |
| Extract documentation from docstring |
yes |
no |
| Support multi-language |
complicated |
easy |
| Support versioning |
requires RTD |
Easy |
| Support branch version |
requires RTD |
Requires extra efforts |
| Documentation format |
rst, markdown is supported |
markdown, MDX is supported |
| Theme |
A bit old-fashioned |
More modern and elegant |
Requirements and Challenges
We want to allow the community to engage in the documentation update as well. Therefore, we want to make the whole process as simple as possible. It is rather difficult to write rst. Even if sphinx allows for markdown files, customizing the theme is still a troublesome task. For simplicity, we would rather choose docusaurus as the foundation to build up our documentation system.
However, the key problem is that we sometimes need to generate API documentation from our docstring. Docusaurus does not support this natively. Therefore, we have to implement docstring -> mdx conversion on our own.
Architecture

Workflow
Colossal-AI repository
All documentation markdown files are stored in the /docs directory. This directory contains two folders for internalization.
- docs
- en
- zh
- README.md
- version.js
- sidebar.js
The sidebar.js is for docusaurus to define the table of contents on the sidebar. version.js tells docusaurus which versions to include on the website.
When a release PR is created, the Colossal-AI repository will trigger the Build Doc CI in the Colossal-AI Documentation repository. This part will be discussed below.
Colossal-AI Documentation repository
We used to keep all the markdown files in this repository. However, this makes it difficult to maintain as not all python developers know how to play with web stuff. Therefore, we docouple documentation and website and only keep the web-related stuff in this repository.
As Docusaurus does not extract documentation from docstring, so we cannot use something like automodule in sphinx. However, this can be implemented on our own. In fact, Huggingface has provided some code for autodoc. The problem is that it uses svelte which is not compatible with docusaurus, or it is troublesome to make them compatible. Therefore, we have to adapt this autodoc function to tailwind-based html elements.
Afterwords, we can embed the generated html in the documentation markdown file with this plugin.
Make documentation testable
We need to test our documentation regularly, so that we can be notified for any code break. One way to do this is to write our documentation as jupyter notebooks. In this way, we can convert this notebook to python file for testing and to markdown files for web rendering.
Plan
Self-service
Proposal
Overview
Recently, I am working on improving our user experience from a non-technical perspective, i.e. installation, documentation and etc. I have dived into the documentation-build ecosystem in the Python community and found that there is no perfect solution so far. Therefore, this post will discuss how we should build a documentation system for our project and make it modern, beautiful, robust, easy to test and maintain.
Current tools
There are mainly two documentation-building tools so far in the open-source community, namely sphinx and docusaurus. Sphinx is the python-specific tool to generate html pages from docstring while docusaurus is a universal documentation framework for all languages. Their comparisons are listed below.
Requirements and Challenges
We want to allow the community to engage in the documentation update as well. Therefore, we want to make the whole process as simple as possible. It is rather difficult to write rst. Even if sphinx allows for markdown files, customizing the theme is still a troublesome task. For simplicity, we would rather choose docusaurus as the foundation to build up our documentation system.
However, the key problem is that we sometimes need to generate API documentation from our docstring. Docusaurus does not support this natively. Therefore, we have to implement docstring -> mdx conversion on our own.
Architecture
Workflow
Colossal-AI repository
All documentation markdown files are stored in the
/docsdirectory. This directory contains two folders for internalization.The
sidebar.jsis for docusaurus to define the table of contents on the sidebar.version.jstells docusaurus which versions to include on the website.When a release PR is created, the Colossal-AI repository will trigger the
Build DocCI in theColossal-AI Documentationrepository. This part will be discussed below.Colossal-AI Documentation repository
We used to keep all the markdown files in this repository. However, this makes it difficult to maintain as not all python developers know how to play with web stuff. Therefore, we docouple documentation and website and only keep the web-related stuff in this repository.
As Docusaurus does not extract documentation from docstring, so we cannot use something like
automodulein sphinx. However, this can be implemented on our own. In fact, Huggingface has provided some code for autodoc. The problem is that it usessveltewhich is not compatible with docusaurus, or it is troublesome to make them compatible. Therefore, we have to adapt thisautodocfunction to tailwind-based html elements.Afterwords, we can embed the generated html in the documentation markdown file with this plugin.
Make documentation testable
We need to test our documentation regularly, so that we can be notified for any code break. One way to do this is to write our documentation as jupyter notebooks. In this way, we can convert this notebook to python file for testing and to markdown files for web rendering.
Plan
docschangesSelf-service