Skip to content

Conversation

@MartinEBravo
Copy link
Owner

@MartinEBravo MartinEBravo commented Nov 29, 2025

Summary by CodeRabbit

  • New Features

    • Batch EPUB processing from books/ directory
    • Web server with library view displaying book covers
    • Copy chapter text to clipboard functionality in reader
  • Documentation

    • Updated README with server setup and batch processing instructions
  • Improvements

    • Enhanced cover image detection and display
    • Improved library grid layout with responsive design and hover effects
    • Better metadata tracking with version and timestamp information

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Nov 29, 2025

Walkthrough

This PR transforms a single-file EPUB reader into a batch-processing library system. It introduces cover image detection and serving, updates data models with metadata fields (cover, processed_at, version), redesigns the library UI with cover display cards, and adds clipboard functionality to the reader interface.

Changes

Cohort / File(s) Summary
Core EPUB Processing & Metadata
reader3.py
Extended Book, BookMetadata, and TOCEntry dataclasses with cover image filename, processed_at timestamp, and version fields. Implemented cover detection heuristics, image extraction to dedicated directory with filename sanitization, and image path mapping. Refactored CLI entrypoint from single-file processing to batch processing all EPUBs in books/ directory, producing per-EPUB output directories.
Server & Library Management
server.py
Updated BOOKS_DIR from current directory to "books". Added cover metadata field to library view responses. Implemented new /cover/{book_id}/{image_name} endpoint for serving cover images with path sanitization and error handling. Adjusted directory scanning and response formatting.
Frontend UI & UX
templates/library.html, templates/reader.html
Redesigned library view with refined grid layout (minmax(200px, 1fr)), increased max-width to 1000px, added conditional cover image display with letter-based placeholder fallback, and hover effects on book cards. Added copy-chapter functionality to reader with button, clipboard integration, and transient success feedback.
Documentation
README.md
Updated usage examples to reflect new batch-processing workflow with books/ directory structure. Added server invocation guidance and localhost:8123 access instructions.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Browser as Client<br/>(Browser)
    participant Server as FastAPI<br/>Server
    participant FileSystem as File System<br/>(books/ dir)
    
    User->>Browser: Navigate to library
    Browser->>Server: GET /library
    Server->>FileSystem: Scan books/ for *_data folders
    FileSystem-->>Server: Return EPUB metadata objects
    Server->>Server: Extract cover from metadata
    Server-->>Browser: Return library HTML + book list with cover fields
    Browser->>Browser: Render book cards with cover images
    
    User->>Browser: View book cover image
    Browser->>Server: GET /cover/{book_id}/{image_name}
    Server->>FileSystem: Read cover image file
    FileSystem-->>Server: Image bytes
    Server-->>Browser: Return image with proper MIME type
    Browser->>Browser: Display cover on card
    
    User->>Browser: Click "Read Book"
    Browser->>Server: GET /read/{book_id}
    Server-->>Browser: Return reader HTML + chapter content
    Browser->>Browser: Render reader with chapter text
    
    User->>Browser: Click "Copy Chapter"
    Browser->>Browser: Copy chapterText to clipboard
    Browser->>Browser: Show success feedback
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Areas requiring extra attention:

  • reader3.py: Cover detection heuristics and image path mapping logic; batch processing refactoring of CLI entrypoint
  • server.py: New /cover/ endpoint path sanitization and error handling; BOOKS_DIR path traversal safety
  • templates/library.html: Conditional cover rendering and responsive grid layout behavior; placeholder fallback logic
  • Integration points: Ensure cover filename consistency between reader3.py metadata extraction and server.py cover endpoint path resolution

Poem

🐰 A burrow of books, now neatly displayed,
With covers that shine in a grid well-made,
We batch-process tales from the books/ cascade,
Then copy their wisdom—no effort, no trade!
A hop, skip, and scroll through our digital glade! 📚✨

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the three main features added: book cover support, a copy chapter button, and automatic books directory processing.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/all_books

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server.py (1)

72-99: Bug: redirect_to_first_chapter doesn’t pass request

The /read/{book_id} route currently calls read_chapter without the required request argument, which will raise a TypeError at runtime if that endpoint is hit. Either wire request through or remove the helper. Minimal fix:

-@app.get("/read/{book_id}", response_class=HTMLResponse)
-async def redirect_to_first_chapter(book_id: str):
-    """Helper to just go to chapter 0."""
-    return await read_chapter(book_id=book_id, chapter_index=0)
+@app.get("/read/{book_id}", response_class=HTMLResponse)
+async def redirect_to_first_chapter(request: Request, book_id: str):
+    """Helper to just go to chapter 0."""
+    return await read_chapter(request=request, book_id=book_id, chapter_index=0)
🧹 Nitpick comments (7)
templates/reader.html (1)

120-167: Copy handler is correct; consider a tiny robustness tweak

Logic for chapterText serialization and copyChapter() is clean and safe, with proper error handling and user feedback. If you ever want to harden this a bit more, an optional tweak is to early‑return when !navigator.clipboard and show a more explicit “clipboard not supported” message, instead of a generic failure.

reader3.py (3)

79-99: HTML cleaning and plain-text extraction are appropriate

Filtering out script/style/iframe/video/nav/form/button/input plus comments is a good balance for safety vs. preserving reading content, and the get_text + whitespace collapse is ideal for LLM/search text. If you later see noisy text from <head> elements, you could switch extract_plain_text to operate on body or soup, but it’s not required now.


102-139: TOC parsing and fallback are reasonable; consider a small guard

The TOC parsing and fallback TOC generation are straightforward and consistent with how reader.html resolves chapters via filenames. One thing to consider: if any section.href or item.href can be None or empty, the .split("#") calls would raise; a defensive if not section.href: continue (and same for item.href) would make this more robust against odd EPUBs.

Also applies to: 141-158


190-271: Image extraction and cover heuristics look good; a few minor notes

The two‑pass image handling (cover selection, then file extraction + mapping) is well thought out:

  • Heuristics (OPF cover id, exact cover.*, “cover” substring, ISBN pattern) match common EPUB layouts.
  • The image_map keyed by both full internal path and basename nicely supports messy HTML src values.
  • Sanitizing filenames before writing and back‑patching cover_filename to safe_fname keeps disk and metadata in sync.

If you ever want to squeeze a bit more perf/clarity, you could hoist the isbn_pattern compile to module scope and potentially merge the two get_items() loops, but behavior is correct as is.

README.md (1)

11-21: Usage flow matches the new batch mode; clarify output location (optional)

The README now correctly reflects the uv run reader3.py batch mode and the server flow. If you want to avoid minor confusion, you might explicitly say that dracula_data is created inside the books/ directory (i.e. books/dracula_data), since both the CLI and server expect that layout.

templates/library.html (1)

9-21: Library card layout and cover handling look good; consider lazy-loading covers

The grid/card refactor, conditional cover vs. placeholder, and wiring to book.id/book.cover are all consistent with the server data and give a nice UI upgrade. If you end up with many books, adding loading="lazy" to the <img> tag would be a simple perf win:

<img src="/cover/{{ book.id }}/{{ book.cover }}" alt="{{ book.title }}" class="book-cover" loading="lazy">

Also applies to: 31-46

server.py (1)

135-139: Server entrypoint is fine for local use

The __main__ block with a fixed host/port and a simple startup print is appropriate for this toy server. If you later want to expose this via uv run server.py only, you could drop the __main__ guard entirely and let uvicorn be configured externally, but that’s strictly optional.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 64960f9 and 9970481.

📒 Files selected for processing (5)
  • README.md (1 hunks)
  • reader3.py (12 hunks)
  • server.py (5 hunks)
  • templates/library.html (2 hunks)
  • templates/reader.html (3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
server.py (1)
reader3.py (1)
  • Book (62-73)
🔇 Additional comments (6)
templates/reader.html (1)

39-59: Copy button styling looks solid

The fixed-position button styling and hover/click states are consistent with the rest of the UI and have sensible z-index and hit area. No issues from a layout or UX perspective.

reader3.py (3)

19-73: Data model extensions align with usage

The additions to ChapterContent, TOCEntry, BookMetadata.cover, and Book (toc, images, processed_at, version) are coherent and line up with how server.py and the templates consume them. Using file_href/anchor on TOCEntry and href on ChapterContent gives a clean bridge to the JS spineMap.


161-184: Robust metadata extraction with sensible defaults

get_list/get_one wrappers plus defaults for title (“Untitled”) and language (“en”) are good choices. Initializing cover=None here and deferring to the image pipeline keeps metadata concerns cleanly separated.


293-331: Chapter HTML processing is consistent with the reader

Decoding with errors="ignore", fixing <img> src via image_map, stripping non‑body chrome, and using extract_plain_text(soup) for the text field all line up with the intended usage in the reader and for copy‑to‑clipboard. Using href=item.get_name() is key to matching the TOC file_href in the JS spineMap.

server.py (2)

16-36: BOOKS_DIR + cached loading are consistent with reader3.py output

Pointing BOOKS_DIR at "books" and using load_book_cached with a 10‑entry LRU cache fits the new batch layout from reader3.py (books/<name>_data/book.pkl). The library view’s per‑book dict matches the template fields (id, title, author, chapters, cover) and avoids disk hits on subsequent requests thanks to caching.

Also applies to: 43-60


102-119: Image and cover serving endpoints are safe and match the file layout

Both serve_image and serve_cover correctly sanitize book_id and image_name using os.path.basename and only serve from BOOKS_DIR/<book_id>/images/, returning 404 when missing. This lines up with how reader3.py writes images and how the library/reader templates build image URLs.

Also applies to: 121-133

Comment on lines +341 to 342
processed_at=datetime.now().isoformat(),
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Batch CLI works; add a check for missing books/ directory

The batch loop over .epub files in books/ and the summary printout are straightforward and match the README. One minor robustness gap: os.listdir("books/") will raise if the directory doesn’t exist. A small guard would make the script fail more gracefully:

 if __name__ == "__main__":
-    # all books file in /books/
-    for epub_file in [
-        "books/" + f for f in os.listdir("books/") if f.endswith(".epub")
-    ]:
+    # all books files in /books/
+    books_dir = "books"
+    if not os.path.isdir(books_dir):
+        print("No 'books' directory found. Create it and drop .epub files into it first.")
+        raise SystemExit(1)
+
+    epub_files = [
+        os.path.join(books_dir, f) for f in os.listdir(books_dir) if f.endswith(".epub")
+    ]
+    if not epub_files:
+        print("No .epub files found in 'books/'. Nothing to do.")
+        raise SystemExit(0)
+
+    for epub_file in epub_files:
         assert os.path.exists(epub_file), "File not found."
         out_dir = os.path.splitext(epub_file)[0] + "_data"

Also applies to: 348-350, 356-371

🤖 Prompt for AI Agents
In reader3.py around lines 341-342 (also affecting 348-350 and 356-371), the
code calls os.listdir("books/") which will raise if the books/ directory is
missing; add a guard that checks for the existence of the directory
(os.path.isdir or try/except OSError) before listing, and handle the missing-dir
case by either logging a clear error and exiting gracefully or creating the
directory, then ensure the rest of the batch loop and summary printing treat an
empty file list properly so the script doesn't crash when books/ is absent.

@MartinEBravo MartinEBravo merged commit 338d752 into master Nov 29, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants