-
Notifications
You must be signed in to change notification settings - Fork 1
Add support for book covers, copy chapter button, and automatic books directory #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…d automatic books directory
WalkthroughThis PR transforms a single-file EPUB reader into a batch-processing library system. It introduces cover image detection and serving, updates data models with metadata fields (cover, processed_at, version), redesigns the library UI with cover display cards, and adds clipboard functionality to the reader interface. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Browser as Client<br/>(Browser)
participant Server as FastAPI<br/>Server
participant FileSystem as File System<br/>(books/ dir)
User->>Browser: Navigate to library
Browser->>Server: GET /library
Server->>FileSystem: Scan books/ for *_data folders
FileSystem-->>Server: Return EPUB metadata objects
Server->>Server: Extract cover from metadata
Server-->>Browser: Return library HTML + book list with cover fields
Browser->>Browser: Render book cards with cover images
User->>Browser: View book cover image
Browser->>Server: GET /cover/{book_id}/{image_name}
Server->>FileSystem: Read cover image file
FileSystem-->>Server: Image bytes
Server-->>Browser: Return image with proper MIME type
Browser->>Browser: Display cover on card
User->>Browser: Click "Read Book"
Browser->>Server: GET /read/{book_id}
Server-->>Browser: Return reader HTML + chapter content
Browser->>Browser: Render reader with chapter text
User->>Browser: Click "Copy Chapter"
Browser->>Browser: Copy chapterText to clipboard
Browser->>Browser: Show success feedback
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Areas requiring extra attention:
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
server.py (1)
72-99: Bug: redirect_to_first_chapter doesn’t passrequestThe
/read/{book_id}route currently callsread_chapterwithout the requiredrequestargument, which will raise aTypeErrorat runtime if that endpoint is hit. Either wirerequestthrough or remove the helper. Minimal fix:-@app.get("/read/{book_id}", response_class=HTMLResponse) -async def redirect_to_first_chapter(book_id: str): - """Helper to just go to chapter 0.""" - return await read_chapter(book_id=book_id, chapter_index=0) +@app.get("/read/{book_id}", response_class=HTMLResponse) +async def redirect_to_first_chapter(request: Request, book_id: str): + """Helper to just go to chapter 0.""" + return await read_chapter(request=request, book_id=book_id, chapter_index=0)
🧹 Nitpick comments (7)
templates/reader.html (1)
120-167: Copy handler is correct; consider a tiny robustness tweakLogic for
chapterTextserialization andcopyChapter()is clean and safe, with proper error handling and user feedback. If you ever want to harden this a bit more, an optional tweak is to early‑return when!navigator.clipboardand show a more explicit “clipboard not supported” message, instead of a generic failure.reader3.py (3)
79-99: HTML cleaning and plain-text extraction are appropriateFiltering out
script/style/iframe/video/nav/form/button/inputplus comments is a good balance for safety vs. preserving reading content, and theget_text+ whitespace collapse is ideal for LLM/search text. If you later see noisy text from<head>elements, you could switchextract_plain_textto operate onbody or soup, but it’s not required now.
102-139: TOC parsing and fallback are reasonable; consider a small guardThe TOC parsing and fallback TOC generation are straightforward and consistent with how
reader.htmlresolves chapters via filenames. One thing to consider: if anysection.hreforitem.hrefcan beNoneor empty, the.split("#")calls would raise; a defensiveif not section.href: continue(and same foritem.href) would make this more robust against odd EPUBs.Also applies to: 141-158
190-271: Image extraction and cover heuristics look good; a few minor notesThe two‑pass image handling (cover selection, then file extraction + mapping) is well thought out:
- Heuristics (OPF cover id, exact
cover.*, “cover” substring, ISBN pattern) match common EPUB layouts.- The
image_mapkeyed by both full internal path and basename nicely supports messy HTMLsrcvalues.- Sanitizing filenames before writing and back‑patching
cover_filenametosafe_fnamekeeps disk and metadata in sync.If you ever want to squeeze a bit more perf/clarity, you could hoist the
isbn_patterncompile to module scope and potentially merge the twoget_items()loops, but behavior is correct as is.README.md (1)
11-21: Usage flow matches the new batch mode; clarify output location (optional)The README now correctly reflects the
uv run reader3.pybatch mode and the server flow. If you want to avoid minor confusion, you might explicitly say thatdracula_datais created inside thebooks/directory (i.e.books/dracula_data), since both the CLI and server expect that layout.templates/library.html (1)
9-21: Library card layout and cover handling look good; consider lazy-loading coversThe grid/card refactor, conditional cover vs. placeholder, and wiring to
book.id/book.coverare all consistent with the server data and give a nice UI upgrade. If you end up with many books, addingloading="lazy"to the<img>tag would be a simple perf win:<img src="/cover/{{ book.id }}/{{ book.cover }}" alt="{{ book.title }}" class="book-cover" loading="lazy">Also applies to: 31-46
server.py (1)
135-139: Server entrypoint is fine for local useThe
__main__block with a fixed host/port and a simple startup print is appropriate for this toy server. If you later want to expose this viauv run server.pyonly, you could drop the__main__guard entirely and let uvicorn be configured externally, but that’s strictly optional.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
README.md(1 hunks)reader3.py(12 hunks)server.py(5 hunks)templates/library.html(2 hunks)templates/reader.html(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server.py (1)
reader3.py (1)
Book(62-73)
🔇 Additional comments (6)
templates/reader.html (1)
39-59: Copy button styling looks solidThe fixed-position button styling and hover/click states are consistent with the rest of the UI and have sensible z-index and hit area. No issues from a layout or UX perspective.
reader3.py (3)
19-73: Data model extensions align with usageThe additions to
ChapterContent,TOCEntry,BookMetadata.cover, andBook(toc,images,processed_at,version) are coherent and line up with howserver.pyand the templates consume them. Usingfile_href/anchoronTOCEntryandhrefonChapterContentgives a clean bridge to the JSspineMap.
161-184: Robust metadata extraction with sensible defaults
get_list/get_onewrappers plus defaults for title (“Untitled”) and language (“en”) are good choices. Initializingcover=Nonehere and deferring to the image pipeline keeps metadata concerns cleanly separated.
293-331: Chapter HTML processing is consistent with the readerDecoding with
errors="ignore", fixing<img>src viaimage_map, stripping non‑body chrome, and usingextract_plain_text(soup)for thetextfield all line up with the intended usage in the reader and for copy‑to‑clipboard. Usinghref=item.get_name()is key to matching the TOCfile_hrefin the JSspineMap.server.py (2)
16-36: BOOKS_DIR + cached loading are consistent with reader3.py outputPointing
BOOKS_DIRat"books"and usingload_book_cachedwith a 10‑entry LRU cache fits the new batch layout fromreader3.py(books/<name>_data/book.pkl). The library view’s per‑book dict matches the template fields (id,title,author,chapters,cover) and avoids disk hits on subsequent requests thanks to caching.Also applies to: 43-60
102-119: Image and cover serving endpoints are safe and match the file layoutBoth
serve_imageandserve_covercorrectly sanitizebook_idandimage_nameusingos.path.basenameand only serve fromBOOKS_DIR/<book_id>/images/, returning 404 when missing. This lines up with howreader3.pywrites images and how the library/reader templates build image URLs.Also applies to: 121-133
| processed_at=datetime.now().isoformat(), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Batch CLI works; add a check for missing books/ directory
The batch loop over .epub files in books/ and the summary printout are straightforward and match the README. One minor robustness gap: os.listdir("books/") will raise if the directory doesn’t exist. A small guard would make the script fail more gracefully:
if __name__ == "__main__":
- # all books file in /books/
- for epub_file in [
- "books/" + f for f in os.listdir("books/") if f.endswith(".epub")
- ]:
+ # all books files in /books/
+ books_dir = "books"
+ if not os.path.isdir(books_dir):
+ print("No 'books' directory found. Create it and drop .epub files into it first.")
+ raise SystemExit(1)
+
+ epub_files = [
+ os.path.join(books_dir, f) for f in os.listdir(books_dir) if f.endswith(".epub")
+ ]
+ if not epub_files:
+ print("No .epub files found in 'books/'. Nothing to do.")
+ raise SystemExit(0)
+
+ for epub_file in epub_files:
assert os.path.exists(epub_file), "File not found."
out_dir = os.path.splitext(epub_file)[0] + "_data"Also applies to: 348-350, 356-371
🤖 Prompt for AI Agents
In reader3.py around lines 341-342 (also affecting 348-350 and 356-371), the
code calls os.listdir("books/") which will raise if the books/ directory is
missing; add a guard that checks for the existence of the directory
(os.path.isdir or try/except OSError) before listing, and handle the missing-dir
case by either logging a clear error and exiting gracefully or creating the
directory, then ensure the rest of the batch loop and summary printing treat an
empty file list properly so the script doesn't crash when books/ is absent.
Summary by CodeRabbit
New Features
Documentation
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.