diff --git a/README.md b/README.md index 7809583..8e53624 100644 --- a/README.md +++ b/README.md @@ -6,8 +6,17 @@ A Chrome browser extension that monitors GitHub Copilot Tasks pages and speaks t - Automatically monitors `https://github.com/copilot/tasks/*` pages - Speaks markdown content from Copilot's responses as they appear +- **Granular content breakdown**: Each paragraph, header, and list item becomes a separate speech item for easy navigation +- **Intelligent text filtering** for better speech quality: + - **HTML structure awareness**: Adds natural pauses after block elements (paragraphs, headers, list items) + - **Markdown filtering**: Removes separator lines (`===...`, `---...`) to avoid repetitive speech + - Adds natural pauses after headers (`# Title`, `## Subtitle`) + - Announces list item numbers ("Item 1", "Item 2") + - Announces bullet points ("Bullet point") + - Cleans up excessive punctuation (`!!!!` → `!`) + - Works with both markdown text and HTML-rendered content - Visual highlighting of the element currently being spoken -- Navigation controls: Previous, Pause/Play, Next +- Navigation controls: Previous, Pause/Play, Next (navigate between individual sections) - Progress slider to jump to any item in the conversation - Test Speak button to verify speech functionality - **Speech verbosity control** with three levels: @@ -40,9 +49,18 @@ The **New Only** checkbox (enabled by default) controls whether to skip pre-exis - When unchecked: Speaks all content found on the page, including what was already there When new text content is detected, it is queued for speaking. After the first user interaction (click or keypress), items are spoken automatically using the Web Speech API with: +- **Granular section breakdown**: Content is split into individual paragraphs, headers, and list items for better navigation - A 2-second delay between items for better pacing - Visual highlighting (yellow background) on the element currently being spoken - Configurable speech rate and pitch settings saved across sessions +- **Intelligent text filtering** to improve speech quality: + - **HTML structure awareness**: Detects block elements (paragraphs, headers, list items) and creates separate speech items for each + - **Markdown filtering**: Headers (`# Title`, `## Subtitle`, etc.) are converted to natural sentences with pauses + - Separator lines (`===...`, `---...`, etc.) are removed to avoid repetitive speech + - Numbered lists (`1.`, `2.`, etc.) are announced as "Item 1", "Item 2" with pauses + - Bullet lists (`*`, `-`, `+`) are announced as "Bullet point" with pauses + - Excessive punctuation (`!!!!`, `????`) is cleaned up for cleaner speech + - Works seamlessly with both markdown text and HTML-rendered content from Copilot ## Installation @@ -109,9 +127,18 @@ The extension consists of: - **Manifest** (`manifest.json`): Extension configuration with proper permissions and content script injection ### Key Features +- **Granular Navigation**: Content is split into individual sections (paragraphs, headers, list items) for precise navigation - **Speech Queue**: Items are queued and spoken sequentially with 2-second delays - **Visual Feedback**: Yellow highlighting indicates which element is currently being spoken - **User Interaction Requirement**: Complies with browser autoplay policies by requiring initial user interaction +- **Intelligent Text Filtering**: Automatically processes both markdown and HTML to improve speech quality + - Section breakdown: Each paragraph, header, and list item becomes a separate navigable speech item + - HTML structure awareness: Adds natural pauses after block-level elements + - Removes separator lines (e.g., `============`) + - Adds natural pauses after headers + - Announces list item numbers and bullet points + - Cleans up excessive punctuation + - Works with both plain markdown text and HTML-rendered content - **Persistent Settings**: Speech rate and pitch preferences are saved using chrome.storage.sync - **Smart Content Filtering**: Only speaks Copilot responses and status messages, excludes tool execution logs diff --git a/content.js b/content.js index e19f7c0..c622c39 100644 --- a/content.js +++ b/content.js @@ -231,11 +231,189 @@ function queueSpeech(text) { } } +// Helper function to format bullet point content for speech +function formatBulletContent(content) { + if (content.trim().length === 0) { + return `Bullet point.`; + } + return `Bullet point. ${content}.`; +} + +// Filter text for better speech synthesis +// Handles markdown complications: headers, separators, lists, etc. +function filterTextForSpeech(text) { + if (!text || text.trim().length === 0) { + return text; + } + + let filtered = text; + + // 1. Handle separator lines (===..., ---..., etc.) + // Remove lines with 4+ consecutive repeated characters + filtered = filtered.replace(/^[=_\*-]{4,}$/gm, ''); + + // 2. Handle headers with # symbols + // Add pauses after headers by converting them to sentences with periods + filtered = filtered.replace(/^(#{1,6})[ \t]+(.+)$/gm, (match, hashes, title) => { + // Return the title with a period to create a natural pause + return title + '.'; + }); + + // 3. Handle numbered lists (1., 2., 3., etc.) + // Announce the item number and add pauses between items + filtered = filtered.replace(/^(\d+)\.[ \t]+([^\n]*)$/gm, (match, number, content) => { + // Handle empty list items gracefully + if (content.trim().length === 0) { + return `Item ${number}.`; + } + return `Item ${number}. ${content}.`; + }); + + // 4. Handle bullet lists (*, -, +) + // Announce "bullet" and add pauses between items + // Handle dash bullets first (to process before star/plus for clarity) + filtered = filtered.replace(/^-[ \t]+([^\n]*)$/gm, (match, content) => formatBulletContent(content)); + // Handle star and plus bullets + filtered = filtered.replace(/^[\*+][ \t]+([^\n]*)$/gm, (match, content) => formatBulletContent(content)); + + // 5. Clean up excessive repeated punctuation (e.g., "!!!!" -> "!", but not periods) + filtered = filtered.replace(/([!?]){4,}/g, '$1'); + + // 6. Remove any multiple consecutive line breaks that may have been created + filtered = filtered.replace(/\n{3,}/g, '\n\n'); + + // 7. Clean up any leading/trailing whitespace + filtered = filtered.trim(); + + return filtered; +} + +// Helper function to extract text sections from HTML with structure awareness +// Returns an array of text sections from block-level elements for more granular speech control +function extractTextSectionsFromHTML(element) { + // Block-level elements that should be treated as separate speech sections + const sectionElements = new Set([ + 'P', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', + 'LI', 'BLOCKQUOTE', 'PRE' + ]); + + // Container elements that we traverse but don't create sections for + const containerElements = new Set([ + 'DIV', 'UL', 'OL', 'TABLE', 'TR', 'TD', 'TH', + 'SECTION', 'ARTICLE', 'HEADER', 'FOOTER', 'NAV', 'ASIDE' + ]); + + const sections = []; + + // Extract text from a single section element + function extractSectionText(node) { + let text = ''; + + function walkNodes(n) { + if (n.nodeType === Node.TEXT_NODE) { + const content = n.textContent.trim(); + if (content) { + text += content + ' '; + } + } else if (n.nodeType === Node.ELEMENT_NODE) { + // For inline elements, just continue walking + for (let child of n.childNodes) { + walkNodes(child); + } + } + } + + walkNodes(node); + return text.trim(); + } + + // Walk through nodes and identify sections + function findSections(node) { + if (node.nodeType === Node.ELEMENT_NODE) { + const tagName = node.tagName; + + // If this is a section element, extract its text as a separate item + if (sectionElements.has(tagName)) { + const text = extractSectionText(node); + console.log(`${TAG}: [extractTextSectionsFromHTML] Found ${tagName} element, text length: ${text.length}, text: "${text.substring(0, 50)}${text.length > 50 ? '...' : ''}"`); + if (text) { + sections.push({ text, element: node }); + } else { + console.log(`${TAG}: [extractTextSectionsFromHTML] ${tagName} has NO TEXT, skipping`); + } + } else if (containerElements.has(tagName)) { + // For containers, process children to find sections + for (let child of node.childNodes) { + findSections(child); + } + } else { + // For other elements, process children + for (let child of node.childNodes) { + findSections(child); + } + } + } + } + + findSections(element); + + console.log(`${TAG}: [extractTextSectionsFromHTML] Total sections found: ${sections.length}`); + return sections; +} + +// Helper function to extract text from HTML with structure awareness (legacy) +// Adds pauses after block-level elements for more natural speech +function extractTextFromHTML(element) { + // Block-level elements that should have pauses after them + const blockElements = new Set([ + 'P', 'DIV', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', + 'LI', 'UL', 'OL', 'BLOCKQUOTE', 'PRE', + 'TABLE', 'TR', 'TD', 'TH', 'SECTION', 'ARTICLE', + 'HEADER', 'FOOTER', 'NAV', 'ASIDE' + ]); + + let text = ''; + + // Walk through all child nodes + function walkNodes(node) { + if (node.nodeType === Node.TEXT_NODE) { + // Add text content + const content = node.textContent.trim(); + if (content) { + text += content + ' '; + } + } else if (node.nodeType === Node.ELEMENT_NODE) { + const tagName = node.tagName; + + // Process children first + for (let child of node.childNodes) { + walkNodes(child); + } + + // Add pause after block elements + if (blockElements.has(tagName)) { + // Add period for natural pause if text doesn't already end with punctuation + if (text.length > 0 && !/[.!?]\s*$/.test(text)) { + text = text.trim() + '. '; + } + } + } + } + + walkNodes(element); + + return text.trim(); +} + // Extract text from a markdown paragraph element function extractTextFromElement(element) { - // Get text content and clean it up - const text = element.textContent.trim(); - return text; + // Use HTML-aware extraction to preserve structure and add natural pauses + const text = extractTextFromHTML(element); + + // Apply speech filter to handle markdown complications + const filteredText = filterTextForSpeech(text); + + return filteredText; } // Helper function to check if an element has a parent with a specific class @@ -294,17 +472,41 @@ function addSpokenItem(text, element) { return false; } -// Process a markdown container and extract all inner text +// Process a markdown container and extract text sections function processMarkdownContainer(container, sessionContainer) { // Check if this container should be spoken based on verbosity if (!shouldSpeakElement(container, sessionContainer)) { + console.log(`${TAG}: Skipping container due to verbosity filter`); return; } - // Extract all text content from the markdown container (not just
blocks) - const text = extractTextFromElement(container); - if (text) { - addSpokenItem(text, container); + console.log(`${TAG}: Processing markdown container for sections...`); + + // Try to extract text as separate sections for better granularity + const sections = extractTextSectionsFromHTML(container); + + console.log(`${TAG}: Found ${sections.length} sections in container`); + + if (sections.length > 0) { + // Process each section separately + sections.forEach((section, index) => { + console.log(`${TAG}: Section ${index + 1} [${section.element.tagName}]: "${section.text.substring(0, 80)}..."`); + const filteredText = filterTextForSpeech(section.text); + if (filteredText) { + console.log(`${TAG}: Filtered text: "${filteredText.substring(0, 80)}..."`); + const added = addSpokenItem(filteredText, section.element); + console.log(`${TAG}: Section ${index + 1} ${added ? 'ADDED' : 'SKIPPED (duplicate or filtered)'}`); + } else { + console.log(`${TAG}: Section ${index + 1} SKIPPED (empty after filtering)`); + } + }); + } else { + console.log(`${TAG}: No sections found, using fallback extraction`); + // Fallback to extracting all text as one item (for elements with no block structure) + const text = extractTextFromElement(container); + if (text) { + addSpokenItem(text, container); + } } } @@ -416,46 +618,74 @@ function processSessionContainer(sessionContainer) { console.log(`${TAG}: Set up content observer for session container`); } -// Observe a markdown container for new paragraphs +// Observe a markdown container for new content sections function observeMarkdownContainer(container, sessionContainer) { + // Section elements we want to detect when dynamically added + const sectionElements = new Set([ + 'P', 'H1', 'H2', 'H3', 'H4', 'H5', 'H6', + 'LI', 'BLOCKQUOTE', 'PRE' + ]); + const observer = new MutationObserver((mutations) => { mutations.forEach((mutation) => { mutation.addedNodes.forEach((node) => { if (node.nodeType === Node.ELEMENT_NODE) { - if (node.tagName === 'P') { - //console.log(`${TAG}: Found new
element`); - // Check if this paragraph should be spoken based on verbosity + // Check if the added node itself is a section element + if (sectionElements.has(node.tagName)) { if (shouldSpeakElement(node, sessionContainer)) { - const text = extractTextFromElement(node); - if (addSpokenItem(text, node)) { - //console.log(`${TAG}: New paragraph detected`); + const text = extractSectionText(node); + const filteredText = filterTextForSpeech(text); + if (filteredText) { + addSpokenItem(filteredText, node); } } } - // Check for nested paragraphs - const nestedPs = node.querySelectorAll('p'); - if (nestedPs.length > 0) { - //console.log(`${TAG}: Found ${nestedPs.length} nested
element(s)`); - } - nestedPs.forEach(p => { - if (shouldSpeakElement(p, sessionContainer)) { - const text = extractTextFromElement(p); - if (addSpokenItem(text, p)) { - //console.log(`${TAG}: New nested paragraph detected`); + + // Also check for nested section elements + sectionElements.forEach(tagName => { + const nestedElements = node.querySelectorAll(tagName.toLowerCase()); + nestedElements.forEach(elem => { + if (shouldSpeakElement(elem, sessionContainer)) { + const text = extractSectionText(elem); + const filteredText = filterTextForSpeech(text); + if (filteredText) { + addSpokenItem(filteredText, elem); + } } - } + }); }); } }); }); }); + + // Helper to extract text from a single section (reused from extractTextSectionsFromHTML) + function extractSectionText(node) { + let text = ''; + + function walkNodes(n) { + if (n.nodeType === Node.TEXT_NODE) { + const content = n.textContent.trim(); + if (content) { + text += content + ' '; + } + } else if (n.nodeType === Node.ELEMENT_NODE) { + for (let child of n.childNodes) { + walkNodes(child); + } + } + } + + walkNodes(node); + return text.trim(); + } observer.observe(container, { childList: true, subtree: true }); - console.log(`${TAG}: Observing markdown container for new paragraphs`); + console.log(`${TAG}: Observing markdown container for new content sections`); } // Find and monitor the main TaskChat container diff --git a/seeds/copilot_runonsentence.html b/seeds/copilot_runonsentence.html new file mode 100644 index 0000000..ba72433 --- /dev/null +++ b/seeds/copilot_runonsentence.html @@ -0,0 +1,80 @@ +
Perfect! Now I understand the codebase structure. Let me analyze the problem:
+The issue is that markdown formatting complications cause problems in text-to-speech:
+The fix needs to be in the extractTextFromElement function in content.js, where we need to preprocess markdown text before speaking it.
+
Let me create a plan and report progress:
+Perfect! Let me create a final summary of all the work completed:
+I successfully addressed the new comment from @paulpv (PR comment #3697897503) which identified that the + issue wasn't just with markdown, but also with HTML-rendered content from GitHub Copilot.
+The original implementation used element.textContent which flattens the
+ HTML structure, causing all text from nested elements (paragraphs, headers, list items) to run together as
+ one long sentence without natural pauses.
1. Added extractTextFromHTML() function (commit
+ fa7feb2):
<p>, <h1>-<h6>, <li>, <div>, <section>, etc.2. Updated extractTextFromElement() function:
extractTextFromHTML() instead of direct textContentfilterTextForSpeech() markdown filter on
+ top3. Updated documentation (commit 7022beb):
+The extension now handles both:
+Example from the PR comment:
+The solution maintains backward compatibility while adding intelligent HTML structure awareness for more + natural speech output.
+