Skip to content

Conversation

@Meldiron
Copy link
Contributor

@Meldiron Meldiron commented Oct 22, 2025

Similar to listRepositoryContents(), but this is meant to get actual content of a specific file. Useful for package.json, for example, for framework detection

Summary by CodeRabbit

New Features

  • Added ability to retrieve individual file contents from repositories, complementing existing directory listing functionality.
  • Supports fetching files from specific branches or references with metadata (SHA, size).
  • Includes proper error handling for missing files.

@coderabbitai
Copy link

coderabbitai bot commented Oct 22, 2025

Walkthrough

A new getRepositoryContent() method is added to the VCS Adapter interface to retrieve individual file contents from a repository. The GitHub adapter implements this method by calling the GitHub contents API, handling base64-encoded responses, and throwing a FileNotFound exception on retrieval failures. A new exception class is created for this error case, and comprehensive tests validate the implementation across multiple scenarios including different paths, branches, and error conditions.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

The diff spans four files with mixed complexity: the abstract method addition is trivial, the GitHub implementation includes API logic with base64 decoding and error handling (moderate complexity), the new exception class is trivial, and the test suite covers multiple scenarios (moderate complexity). While the changes follow a consistent pattern and the scope is focused, the heterogeneous nature of the edits—particularly the API integration logic and comprehensive test coverage—requires careful reasoning for each component.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 57.14% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "feat?: getRepositoryContent()" directly and specifically identifies the main change in the changeset. It references the exact method name being added across the VCS adapter interface and its GitHub implementation. The title is concise, clear, and specific enough that a teammate reviewing the commit history would understand that this PR introduces a new feature for fetching the content of a specific file from a repository. This aligns with the PR objectives, which explicitly state the purpose of adding this method to fetch actual file content.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat-file-content

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
tests/VCS/Adapter/GitHubTest.php (1)

184-232: Stronger exception assertions and minor test ergonomics

  • Replace the try/catch + $threw pattern with PHPUnit’s expectException in separate tests or a data provider for the “non-existing branch” and “missing file” cases. This reads cleaner and avoids duplicated scaffolding.
  • Consider asserting array keys before usage to surface contract regressions early:
    • $this->assertArrayHasKey('sha', $response); same for 'size' and 'content'.
src/VCS/Exception/FileNotFound.php (1)

1-7: Add minimal PHPDoc and consider making the exception final

Helps static analyzers and communicates intent. Example:

 <?php
 
 namespace Utopia\VCS\Exception;
 
+/**
+ * Thrown when a requested repository file/path cannot be found or retrieved.
+ */
-class FileNotFound extends \Exception
+final class FileNotFound extends \Exception
 {
 }
src/VCS/Adapter.php (1)

235-245: Tighten return typing and document the throw contract

Declare the exact shape to aid Psalm/PhpStan users and advertise the exception surfaced by implementations.

     /**
      * Get contents of the specified file.
      *
      * @param  string  $owner Owner name
      * @param  string  $repositoryName Name of the repository
      * @param  string  $path Path to the file
      * @param  string  $ref The name of the commit/branch/tag
-     * @return array<string, mixed> File details
+     * @return array{sha:string, size:int, content:string} File details
+     * @throws \Utopia\VCS\Exception\FileNotFound
      */
     abstract public function getRepositoryContent(string $owner, string $repositoryName, string $path, string $ref = ''): array;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e665ea and 4403b72.

⛔ Files ignored due to path filters (1)
  • composer.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • src/VCS/Adapter.php (1 hunks)
  • src/VCS/Adapter/Git/GitHub.php (2 hunks)
  • src/VCS/Exception/FileNotFound.php (1 hunks)
  • tests/VCS/Adapter/GitHubTest.php (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
tests/VCS/Adapter/GitHubTest.php (3)
src/VCS/Exception/FileNotFound.php (1)
  • FileNotFound (5-7)
src/VCS/Adapter.php (1)
  • getRepositoryContent (244-244)
src/VCS/Adapter/Git/GitHub.php (1)
  • getRepositoryContent (209-238)
src/VCS/Adapter.php (1)
src/VCS/Adapter/Git/GitHub.php (1)
  • getRepositoryContent (209-238)
src/VCS/Adapter/Git/GitHub.php (2)
src/VCS/Exception/FileNotFound.php (1)
  • FileNotFound (5-7)
src/VCS/Adapter.php (2)
  • getRepositoryContent (244-244)
  • call (280-389)

Comment on lines +209 to +238
public function getRepositoryContent(string $owner, string $repositoryName, string $path, string $ref = ''): array
{
$url = "/repos/$owner/$repositoryName/contents/" . $path;
if (!empty($ref)) {
$url .= "?ref=$ref";
}

$response = $this->call(self::METHOD_GET, $url, ['Authorization' => "Bearer $this->accessToken"]);

if ($response['headers']['status-code'] !== 200) {
throw new FileNotFound();
}

$encoding = $response['body']['encoding'];

$content = '';
if ($encoding === 'base64') {
$content = base64_decode($response['body']['content']);
} else {
throw new FileNotFound();
}

$output = [
'sha' => $response['body']['sha'],
'size' => $response['body']['size'],
'content' => $content
];

return $output;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden path handling, key guards, and base64 decoding; add raw fallback

  • URL-encode path segments and strip leading “/” to avoid double slashes and special-char breaks.
  • Guard against missing keys (encoding/content/sha/size) to prevent notices.
  • Normalize and strictly decode base64 (GitHub inserts newlines).
  • Optional: fallback to raw response via Accept header when encoding/content is absent (e.g., large files).
-    public function getRepositoryContent(string $owner, string $repositoryName, string $path, string $ref = ''): array
+    public function getRepositoryContent(string $owner, string $repositoryName, string $path, string $ref = ''): array
     {
-        $url = "/repos/$owner/$repositoryName/contents/" . $path;
+        // Normalize and encode path segments
+        $safePath = implode('/', array_map('rawurlencode', explode('/', ltrim($path, '/'))));
+        $url = "/repos/$owner/$repositoryName/contents/{$safePath}";
         if (!empty($ref)) {
             $url .= "?ref=$ref";
         }
 
-        $response = $this->call(self::METHOD_GET, $url, ['Authorization' => "Bearer $this->accessToken"]);
+        $headers = ['Authorization' => "Bearer $this->accessToken", 'Accept' => 'application/vnd.github+json'];
+        $response = $this->call(self::METHOD_GET, $url, $headers);
 
         if ($response['headers']['status-code'] !== 200) {
             throw new FileNotFound();
         }
 
-        $encoding = $response['body']['encoding'];
+        $body = $response['body'] ?? [];
+        // Ensure we are dealing with a file-like payload
+        if (!is_array($body) || ($body['type'] ?? 'file') !== 'file') {
+            throw new FileNotFound();
+        }
+
+        $encoding = $body['encoding'] ?? '';
 
         $content = '';
-        if ($encoding === 'base64') {
-            $content = base64_decode($response['body']['content']);
-        } else {
-            throw new FileNotFound();
-        }
+        if ($encoding === 'base64' && isset($body['content'])) {
+            // GitHub inserts newlines into base64 content
+            $b64 = str_replace(["\n", "\r"], '', (string) $body['content']);
+            $decoded = base64_decode($b64, true);
+            if ($decoded === false) {
+                throw new FileNotFound();
+            }
+            $content = $decoded;
+        } else {
+            // Fallback: request raw content for large/unencoded files
+            $raw = $this->call(self::METHOD_GET, $url, ['Authorization' => "Bearer $this->accessToken", 'Accept' => 'application/vnd.github.raw']);
+            if (($raw['headers']['status-code'] ?? 0) !== 200 || !is_string($raw['body'] ?? null)) {
+                throw new FileNotFound();
+            }
+            $content = (string) $raw['body'];
+        }
 
         $output = [
-           'sha' => $response['body']['sha'],
-           'size' => $response['body']['size'],
+           'sha' => (string) ($body['sha'] ?? ''),
+           'size' => (int) ($body['size'] ?? strlen($content)),
            'content' => $content
         ];
 
         return $output;
     }
🤖 Prompt for AI Agents
In src/VCS/Adapter/Git/GitHub.php around lines 209 to 238, harden
getRepositoryContent by (1) normalizing the path: trim any leading "/" and
URL-encode each path segment before joining so special characters and
double-slashes are avoided; (2) guard all array accesses
(headers['status-code'], body['encoding'], body['content'], body['sha'],
body['size']) with isset/array_key_exists and throw FileNotFound (or a clear
error) if required fields are missing; (3) when decoding base64, first normalize
the content by removing whitespace/newlines then call base64_decode with strict
mode and check for false, throwing FileNotFound on failure; and (4) optionally
implement a fallback: if encoding/content are absent, retry the request with an
Accept: application/vnd.github.v3.raw header (or similar) and treat the raw
response body as content while still validating sha/size if present.

@Meldiron Meldiron merged commit 28457cf into main Oct 22, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants