Skip to content

Conversation

@catturtle123
Copy link
Contributor

@catturtle123 catturtle123 commented Aug 22, 2025

📌 Overview

indexing

🔍 Related Issues

✨ Changes

-indexing
🐛 Fix: Autowired 해결
✨ feature: indexing

📸 Screenshots / Test Results (Optional)

Attach images or videos if necessary.

✅ Checklist

  • I have followed the code style guidelines.
  • I have removed unnecessary comments or console logs.
  • I have written/updated related tests.
  • I have verified that all features work correctly in my local environment.

🗒️ Additional Notes

Add any other context or information here.

Summary by CodeRabbit

  • New Features

    • Index creation now requires at least one selected file, ingests file content, chunks text, and generates embeddings via OpenAI to produce a fully populated index.
    • Added HTTP content loader and text chunking utilities.
  • Refactor

    • S3 URL handling standardized to a fixed public base and utilities reorganized.
  • Chores

    • Added OpenAI embedding and vector-store configuration/dependencies (requires OPENAI_KEY).
  • Tests

    • Improved test DI and container setup for more reliable test runs.

@catturtle123 catturtle123 self-assigned this Aug 22, 2025
@catturtle123 catturtle123 added the feature New feature or request label Aug 22, 2025
@catturtle123 catturtle123 linked an issue Aug 22, 2025 that may be closed by this pull request
1 task
@coderabbitai
Copy link

coderabbitai bot commented Aug 22, 2025

Walkthrough

Adds Spring AI/OpenAI and pgvector starter dependencies, introduces embedding interfaces and OpenAI embedder, implements content loading and text chunking, changes ChunkEmbedding to store FloatArray with vector(1536), implements full createIndex flow producing embeddings, reorganizes utility packages, adjusts S3 URL behavior, and updates tests and local config.

Changes

Cohort / File(s) Summary
Build and dependencies
build.gradle
Enables org.springframework.ai:spring-ai-starter-model-openai, adds org.springframework.ai:spring-ai-starter-vector-store-pgvector, removes legacy com.pgvector:pgvector:0.1.6; minor quoting change.
Index embedding feature
src/main/kotlin/.../domain/index/embed/Embedder.kt, src/main/kotlin/.../domain/index/embed/OpenAIEmbbeder.kt, src/main/kotlin/.../domain/index/entity/ChunkEmbedding.kt, src/main/kotlin/.../domain/index/service/IndexService.kt, src/main/kotlin/.../domain/index/dto/IndexRequestDTO.kt, src/main/kotlin/.../domain/index/dto/IndexResponseDTO.kt, src/main/kotlin/.../global/util/TextChunker.kt, src/main/kotlin/.../global/util/loader/ContentLoader.kt, src/main/kotlin/.../global/util/loader/HttpContentLoader.kt
Adds Embedder interface and OpenAIEmbedder component (dim=1536); changes ChunkEmbedding.embedding to FloatArray and column to vector(1536); implements IndexService.createIndex flow: validation, load DataFiles, content loading, chunking (TextChunker), embedding per chunk, attach embeddings to index; adds dataFileId (non-empty) to create DTO and small DTO formatting tweak.
Utility package reorg
src/main/kotlin/.../global/util/converter/FileConvertUtil.kt, src/main/kotlin/.../global/util/converter/MultipartJackson2HttpMessageConverter.kt, src/main/kotlin/.../global/storage/FakeS3Util.kt, src/main/kotlin/.../domain/document/service/DataFileService.kt, src/main/kotlin/.../global/util/s3/S3Type.kt, src/main/kotlin/.../global/util/s3/S3Util.kt
Moves utility files into converter and s3 subpackages and updates imports; most files unchanged aside from package paths.
S3 behavior change
src/main/kotlin/.../global/util/s3/S3UtilImpl.kt
Replaces dynamic S3 URL construction with a hard-coded base URL concatenated with the key in urlFromKey.
Configuration
src/main/resources/application-local.yml
Adds spring.ai.openai.api-key (from ${OPENAI_KEY}) and spring.ai.model.embedding.model: text-embedding-3-small with empty options mapping.
Tests
src/test/kotlin/.../domain/document/service/DataFileServiceTest.kt, src/test/kotlin/.../domain/index/service/IndexServiceTest.kt
Switches tests to constructor-based Spring injection with @Autowired on parameters; replaces Testcontainers @Container usage with @ServiceConnection; updates imports for moved utilities.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant IS as IndexService
  participant DFR as DataFileRepository
  participant CLR as ContentLoader
  participant TC as TextChunker
  participant EMB as Embedder
  participant IR as IndexRepository
  participant DB as Database

  Client->>IS: createIndex(req)
  IS->>IS: validateOverlap(req)
  IS->>DFR: findAllById(req.dataFileId)
  DFR-->>IS: DataFiles
  IS->>IS: require(embedder.dim == req.embeddingModel.dim)
  IS->>IR: save(Index.from(req))
  IR-->>IS: persisted Index
  loop each DataFile
    IS->>CLR: load(file.url)
    CLR-->>IS: content
    alt content not blank
      IS->>TC: chunkByCharsSeq(content, size, overlap)
      TC-->>IS: chunks
      loop each chunk
        IS->>EMB: embed(chunk)
        EMB-->>IS: FloatArray(1536)
        IS->>DB: persist ChunkEmbedding(embedding, content, index)
      end
    end
  end
  IS-->>Client: IndexPreviewResponse
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Assessment against linked issues

Objective Addressed Explanation
Match embedding size during chunking/embedding and store embeddings [#10]

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Hard-coded S3 URL in urlFromKey (src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt, urlFromKey) Changing S3 URL construction to a fixed base URL modifies storage URL resolution unrelated to embedding size validation or storing embeddings.

Possibly related PRs

Poem

I nibbled the text into tidy little bites,
Overlap counted in soft, warm lights,
Vectors tucked in burrows deep,
Indices hum while rabbits sleep.
OpenAI winds and chunk-by-chunk delights. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feature/#10-indexing

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (11)
src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt (1)

18-38: Broaden content-type detection or document security expectations.

Relying on file extension is OK as a fallback, but it’s spoofable. If this mapping is used for security-sensitive branching (e.g., parsing), consider content sniffing (Apache Tika) or magic-byte checks. As a minimal improvement, cover common types encountered in RAG pipelines.

Apply this small extension of mappings:

@@
     return when (ext) {
         "png" -> "image/png"
         "jpg", "jpeg" -> "image/jpeg"
+        "webp" -> "image/webp"
+        "svg" -> "image/svg+xml"
+        "heic" -> "image/heic"
         "pdf" -> "application/pdf"
         "txt" -> "text/plain"
         "csv" -> "text/csv"
         "md" -> "text/markdown"
         "json" -> "application/json"
+        "rtf" -> "application/rtf"
         "zip" -> "application/zip"
+        "7z" -> "application/x-7z-compressed"
         "doc" -> "application/msword"
         "docx" -> "application/vnd.openxmlformats-officedocument.wordprocessingml.document"
         "xls" -> "application/vnd.ms-excel"
         "xlsx" -> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
         "ppt" -> "application/vnd.ms-powerpoint"
         "pptx" -> "application/vnd.openxmlformats-officedocument.presentationml.presentation"
         else -> "application/octet-stream"
     }

If you want, I can wire in Tika with a graceful fallback to this map.

src/main/kotlin/simplerag/ragback/global/util/converter/MultipartJackson2HttpMessageConverter.kt (1)

9-25: Disable canRead as well; current converter may intercept application/octet-stream reads unexpectedly.

You’ve disabled writes, but reads remain enabled by default for application/octet-stream via AbstractJackson2HttpMessageConverter. This can hijack request body resolution and regress multipart/byte[] handling.

Apply:

@@
 @Component
 class MultipartJackson2HttpMessageConverter
     (objectMapper: ObjectMapper) :
     AbstractJackson2HttpMessageConverter(objectMapper, MediaType.APPLICATION_OCTET_STREAM) {
 
+    override fun canRead(clazz: Class<*>, mediaType: MediaType?): Boolean = false
+    override fun canRead(type: Type?, contextClass: Class<*>?, mediaType: MediaType?): Boolean = false
+    override fun canRead(mediaType: MediaType?): Boolean = false
+
     override fun canWrite(clazz: Class<*>, mediaType: MediaType?): Boolean {
         return false
     }
 
     override fun canWrite(type: Type?, clazz: Class<*>, mediaType: MediaType?): Boolean {
         return false
     }
 
     override fun canWrite(mediaType: MediaType?): Boolean {
         return false
     }
 }

Optionally also annotate with @order(Ordered.LOWEST_PRECEDENCE) for belt-and-braces.

src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt (2)

25-27: Logger initialized with the wrong class

This logs under GlobalExceptionHandler instead of S3UtilImpl, making logs harder to filter.

Apply:

-    private val log = LoggerFactory.getLogger(GlobalExceptionHandler::class.java)
+    private val log = LoggerFactory.getLogger(S3UtilImpl::class.java)

Also remove the now-unused GlobalExceptionHandler import.


100-107: keyFromUrl should decode percent-encoded paths

URLs returned by clients/CDN may be encoded; deleting with an encoded key can fail. Decode safely.

Apply:

-    override fun keyFromUrl(url: String): String? {
-        val path = try {
-            URI(url).path // e.g. "/market/menu/uuid_name.jpg"
-        } catch (_: Exception) {
-            return null
-        }
-        return path.removePrefix("/").ifBlank { null }
-    }
+    override fun keyFromUrl(url: String): String? {
+        val rawPath = try {
+            URI(url).rawPath // keep encoding, decode ourselves
+        } catch (_: Exception) {
+            return null
+        }
+        if (rawPath.isNullOrBlank()) return null
+        return java.net.URLDecoder.decode(rawPath.removePrefix("/"), java.nio.charset.StandardCharsets.UTF_8)
+            .ifBlank { null }
+    }
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)

7-9: Add an index on the FK column to speed lookups by Index.

Read-paths will commonly fetch chunk embeddings per index. Add a DB index for indexes_id.

Apply:

-@Table(name = "chunk_embeddings")
+@Table(
+    name = "chunk_embeddings",
+    indexes = [jakarta.persistence.Index(name = "idx_chunk_embeddings_indexes_id", columnList = "indexes_id")]
+)
src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (3)

55-69: Consider asserting that chunk embeddings are created when content exists.

Currently, createIndexTest only checks IDs/names. Adding an assertion on chunkEmbeddings count (or a repository count) would verify the end-to-end indexing pipeline.

I can add a fixture DataFile with a short text and assert that embeddings were persisted for it.


59-69: Embedding dimension mismatch in tests will cause failures

The IndexService.createIndex method explicitly checks that the embedder’s dim (1536) matches the requested model’s dim and throws INVALID_INPUT on mismatch. Since OpenAIEmbedder.dim = 1536 but the tests use EmbeddingModel.TEXT_EMBEDDING_3_LARGE (3072), any test invoking createIndex with the large model will now fail.

• File: src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
Line ranges: 59–69, 76–83, 96–100, 126–133, 156–163, 178–185, 216–220, 241–245, 268–270, 291–295

Please update all occurrences of EmbeddingModel.TEXT_EMBEDDING_3_LARGE in these tests to EmbeddingModel.TEXT_EMBEDDING_3_SMALL (1536) to align with the current embedder configuration. If support for the 3072-dim model is intended, instead update OpenAIEmbedder.dim (and related wiring) to 3072.

Suggested diff for one occurrence (apply similarly across all):

-    IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true)
+    IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_SMALL, true)

72-84: Align overlap validation and tests with the “allow equal overlap” rule

The service’s validateOverlap currently throws on overlapSize >= chunkingSize, disallowing equal values, but the team has agreed that overlapSize may be equal to chunkingSize (i.e. only overlap > chunking should error). Please update both the implementation and the tests accordingly:

• In IndexService.kt (around line 97), change the guard to throw only when overlapSize > chunkingSize:

-    private fun validateOverlap(overlapSize: Int, chunkingSize: Int) {
-        if (overlapSize >= chunkingSize) throw IndexException(ErrorCode.OVERLAP_OVERFLOW)
-    }
+    private fun validateOverlap(overlapSize: Int, chunkingSize: Int) {
+        if (overlapSize >  chunkingSize) throw IndexException(ErrorCode.OVERLAP_OVERFLOW)
+    }

• In IndexServiceTest.kt (create‐index overlap test at lines 72–84), trigger the error only when overlap > chunking. For example:

-    val indexCreateRequest =
-        IndexCreateRequest("test", 1, 1, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true)
+    val indexCreateRequest =
+        IndexCreateRequest("test", 1, 2, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true)

(The error message "overlap 크기는 chunking 크기를 넘을 수 없습니다." remains appropriate, since it forbids only exceeding.)

• Apply the same update to the index‐update overlap test (lines 233–256): use an overlapSize one greater than chunkingSize to verify the exception.

src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (2)

215-219: Use existing sha256Hex() util to compute the 12-char key prefix.

Reduces duplication and potential inconsistencies.

-        val hash12 = MessageDigest.getInstance("SHA-256")
-            .digest(bytes).joinToString("") { "%02x".format(it) }
-            .take(12)
+        val hash12 = sha256Hex(bytes).take(12)

282-284: Prefer assertFalse over equals(false) for boolean assertions.

Improves readability and assertion messages.

-        assertEquals(dataFiles.hasNext, false)
+        assertFalse(dataFiles.hasNext)
src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (1)

97-99: Adjust overlap validation to allow equality

The project’s learning indicates that overlapSize should be permitted to equal chunkingSize (i.e. enforce overlapSize ≤ chunkingSize) and that this check will live in the service layer as part of the chunking‐embedding save flow. The current implementation throws on >= and tests have been written to expect equality to fail, which conflicts with that decision.

Please apply the following mandatory updates:

  • In IndexService.validateOverlap, change the guard from >= to > so that equality is allowed.
  • Update any affected unit tests that currently assert an exception when overlapSize == chunkingSize to expect success instead.
  • Optionally, clarify the exception message to reflect “overlap 크기는 chunking 크기보다 작거나 같아야 합니다.” or similar.
  • Add a note in the upcoming chunking‐embedding save PR that this validation was introduced here.
--- a/src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt
+++ b/src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt
@@ -97,7 +97,7 @@ class IndexService {
     private fun validateOverlap(overlapSize: Int, chunkingSize: Int) {
-        if (overlapSize >= chunkingSize) throw IndexException(ErrorCode.OVERLAP_OVERFLOW)
+        if (overlapSize >  chunkingSize) throw IndexException(ErrorCode.OVERLAP_OVERFLOW)
     }
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 39da5d8 and 4d7b409.

📒 Files selected for processing (20)
  • build.gradle (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/embed/Embedder.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/global/util/converter/MultipartJackson2HttpMessageConverter.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/global/util/loader/ContentLoader.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/global/util/loader/HttpContentLoader.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/global/util/s3/S3Type.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/global/util/s3/S3Util.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt (2 hunks)
  • src/main/resources/application-local.yml (1 hunks)
  • src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (1 hunks)
  • src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: catturtle123
PR: Simple-RAG/BACK#6
File: src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt:16-22
Timestamp: 2025-08-17T13:12:02.200Z
Learning: catturtle123이 Index 엔티티의 overlapSize ≤ chunkingSize 검증을 엔티티 레벨이 아닌 서비스 레이어에서 구현하기로 결정했으며, chunking embedding 저장 관련 PR에서 이 검증 로직을 추가할 예정이므로 해당 PR에서 언급해야 함 (Simple-RAG/BACK 프로젝트)
📚 Learning: 2025-08-17T07:14:26.887Z
Learnt from: catturtle123
PR: Simple-RAG/BACK#6
File: src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt:6-22
Timestamp: 2025-08-17T07:14:26.887Z
Learning: ChunkEmbedding 엔티티에서 embedding 필드는 벡터 임베딩 저장용이고, content 필드는 원본 파일/텍스트 청크 저장용으로 설계됨 (catturtle123 프로젝트)

Applied to files:

  • src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt
🧬 Code graph analysis (11)
src/main/kotlin/simplerag/ragback/global/util/s3/S3Util.kt (2)
src/main/kotlin/simplerag/ragback/global/util/S3UtilImpl.kt (3)
  • s3 (18-110)
  • { "file" } (89-89)
  • { null } (108-108)
src/main/kotlin/simplerag/ragback/global/util/S3Util.kt (1)
  • upload (5-11)
src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt (1)
src/main/kotlin/simplerag/ragback/global/util/S3UtilImpl.kt (1)
  • s3 (18-110)
src/main/kotlin/simplerag/ragback/global/util/s3/S3Type.kt (3)
src/main/kotlin/simplerag/ragback/global/util/S3Type.kt (1)
  • label (4-8)
src/main/kotlin/simplerag/ragback/global/util/S3UtilImpl.kt (3)
  • s3 (18-110)
  • { null } (108-108)
  • { "file" } (89-89)
src/main/kotlin/simplerag/ragback/global/util/S3Util.kt (1)
  • upload (5-11)
src/main/kotlin/simplerag/ragback/domain/index/embed/Embedder.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (1)
  • dim (3-49)
src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (4)
  • { "Embedding must not be empty; expected fixed dimension (e.g., 1536)" } (10-10)
  • { "Embedding (FloatArray) must not be null" } (9-9)
  • { (26-28)
  • { (11-13)
src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt (1)
src/main/kotlin/simplerag/ragback/global/util/FileConvertUtil.kt (1)
  • sha256 (8-11)
src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt (1)
src/main/kotlin/simplerag/ragback/global/util/S3Util.kt (2)
  • upload (5-11)
  • deleteByUrl (8-8)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/converter/IndexConverter.kt (1)
  • toIndexPreviewResponse (29-34)
src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (1)
  • name (10-57)
src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (1)
  • dim (3-49)
src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (1)
  • name (10-57)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (1)
  • name (10-57)
src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileRequestDTO.kt (2)
  • min (7-11)
  • message (13-20)
src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/controller/IndexController.kt (1)
  • indexService (11-59)
src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (2)
  • name (10-57)
  • update (48-55)
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/repository/ChunkEmbeddingRepository.kt (1)
  • interface ChunkEmbeddingRepository : JpaRepository<ChunkEmbedding, Long> (6-6)
src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (4)
  • { "Embedding (FloatArray) must not be null" } (9-9)
  • { "Embedding must not be empty; expected fixed dimension (e.g., 1536)" } (10-10)
  • { (26-28)
  • autoApply (6-34)
🔇 Additional comments (14)
src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt (2)

40-57: Streaming digest implementation is solid.

Buffered + DigestInputStream with an 8KB buffer, proper use blocks, and byte counting look correct and memory-safe for large files.


1-1: Confirm package relocation and import updates

All previous imports of the moved functions have been updated to the new simplerag.ragback.global.util.converter package, and no residual references to the old package remain. Specifically:

  • src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt now imports
    • computeMetricsStreaming
    • resolveContentType
  • src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt now imports sha256Hex
  • src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt now imports sha256Hex

The FileMetrics type is declared and consumed entirely within FileConvertUtil.kt—there are no external references requiring its import elsewhere.

Package move is fully verified; approving changes.

src/main/kotlin/simplerag/ragback/global/util/s3/S3Type.kt (1)

1-8: All dependent imports updated and S3Type usage verified

  • Verified that every external reference to S3Type imports from the new package simplerag.ragback.global.util.s3.S3Type (e.g. in FakeS3Util.kt, DataFileService.kt, DataFileServiceTest.kt).
  • S3Util.kt and S3UtilImpl.kt reside in the same package, so they correctly reference S3Type without an import.
  • No lingering imports from the old package path were found.
  • Enum contents remain unchanged; the leading/trailing slashes in ORIGINAL_FILE("/ORIGINAL/") are handled correctly by the existing buildKey trimming logic.
src/main/kotlin/simplerag/ragback/global/util/s3/S3Util.kt (1)

1-11: Import verification complete – no drift detected
All callers (services, fakes, and tests) now correctly reference the moved interface at simplerag.ragback.global.util.s3.S3Util:

  • src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt
  • src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt
  • src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (uses FakeS3Util)

No further action required.

src/main/kotlin/simplerag/ragback/global/util/loader/ContentLoader.kt (1)

1-6: Overlap validation is implemented in IndexService

The validateOverlap guard is present and invoked in both index creation and update operations:

  • In src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt:
    • Line 29–30: validateOverlap(req.overlapSize, req.chunkingSize) on createIndex
    • Line 83–84: validateOverlap(indexUpdateRequest.overlapSize, indexUpdateRequest.chunkingSize) on update
    • Lines 97–99:
    private fun validateOverlap(overlapSize: Int, chunkingSize: Int) {
        if (overlapSize >= chunkingSize) throw IndexException(ErrorCode.OVERLAP_OVERFLOW)
    }
  • Additionally, TextChunker.chunkByCharsSeq enforces require(overlap in 0 until size) to guarantee overlapSize < chunkingSize before chunking.

No further changes are needed here.

src/main/resources/application-local.yml (1)

11-17: Mis-scoped Spring AI properties – adjust embedding model nesting

The model setting must live under spring.ai.openai.embedding.options, not at spring.ai.model.embedding, so that Spring AI’s auto-configuration picks it up (docs.spring.io). For example, update application-local.yml as follows:

 spring:
   ai:
     openai:
       api-key: ${OPENAI_KEY}
-    model:
-      embedding:
-        options:
-        model: text-embedding-3-small
+      embedding:
+        options:
+          model: text-embedding-3-small

• (Optional) You can still explicitly enable the OpenAI embedding auto-config via

spring:
  ai:
    model:
      embedding: openai

but it defaults to openai if unset (docs.spring.io).

Runtime check:
• Verify at startup logs that the active embedder reports text-embedding-3-small.
• Ensure your vector store expects 1 536-dimensional embeddings, matching text-embedding-3-small’s default output (airbyte.com, docs.spring.io).

build.gradle (1)

29-31: Please manually verify Spring AI BOM compatibility

We weren’t able to resolve the effective spring-ai and Spring Boot starter versions due to Gradle daemon connection issues in this environment. To ensure you won’t run into dependency convergence problems at runtime:

• Confirm which Spring Boot plugin version you’re using (e.g. 3.5.x) in your build.gradle.
• Run a dependency-insight report locally to see the resolved spring-ai BOM version:

./gradlew dependencyInsight \
  --dependency org.springframework.experimental:spring-ai \
  --configuration runtimeClasspath

• Verify that the BOM version (currently 1.0.1) is supported by your Spring Boot version. If it isn’t, bump or align springAiVersion to the matching spring-ai BOM release for Spring Boot 3.5.x.

src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (2)

14-18: Dimension consistency: fixed column vs. per-record embeddingDim.

You persist embeddingDim alongside a fixed vector(1536) column. If you plan to support models with different dimensions (the issue mentions “adjust embedding size during chunking embedding”), a fixed column dimension will block this. If you intend to standardize on 1536 only, consider removing embeddingDim to avoid drift; otherwise, make the column flexible (schema per model, per-table, or a generic storage form) and validate embedding.size == embeddingDim.

Would you like me to propose a migration strategy for multi-dimension support (separate tables per model, or a generic storage with check constraints)?


11-12: Switch to text for content is appropriate.

Mapping content to text is a good fit for large chunks and avoids driver-specific @Lob handling quirks.

src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt (2)

12-19: Chunking loop logic is correct and handles the tail chunk cleanly.

step = size - overlap, guarded to be ≥1; end is clamped; final chunk exits via break. Looks good.


5-6: Service-layer overlap validation aligns with chunker requirement

The existing validateOverlap implementation in IndexService.kt already throws an IndexException when overlapSize >= chunkingSize (lines 97–99), thereby rejecting overlap == chunkingSize to match the strict < size rule enforced by TextChunker.chunkByCharsSeq (lines 5–6). No further changes are needed.

src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt (1)

7-9: Import path refactor looks consistent with the package move.

Aligns with global.util.s3 and global.util.converter. No behavior changes.

src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (1)

20-23: Imports updated to new packages; service logic unchanged.

The refactor maintains behavior and compiles against the reorganized utility packages.

src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (1)

22-22: Trailing comma addition is fine and future-proof for diffs.

No semantic changes; keeps diffs cleaner when adding fields.

Comment on lines +50 to 57
implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'

// test
testImplementation "org.springframework.boot:spring-boot-testcontainers"
testImplementation "org.testcontainers:postgresql"
testImplementation "org.testcontainers:junit-jupiter"
testImplementation "com.pgvector:pgvector:0.1.6"
testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Unnecessary duplicate dependency in tests

spring-ai-starter-vector-store-pgvector is already on the main classpath; duplicating in tests can mask classpath issues and slow resolution.

Apply:

-    testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
🤖 Prompt for AI Agents
In build.gradle around lines 50 to 57, the test classpath redundantly includes
'org.springframework.ai:spring-ai-starter-vector-store-pgvector' which is
already declared on the main implementation classpath; remove the duplicate
testImplementation entry to avoid masking classpath issues and speed dependency
resolution, leaving only the testcontainers and junit testImplementation lines
in that block.

Comment on lines +13 to +15
@field:NotEmpty
val dataFileId: List<Long>,

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Enforce overlapSize ≤ chunkingSize in the service (per team decision)

Per retrieved learnings, the validation belongs in the service layer, not the entity. Ensure IndexService.createIndex guards this invariant and returns a 400 with a clear message when violated.

I can add a guard + unit test (valid/invalid pairs) if you want me to push a patch.

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt around
lines 13 to 15, the DTO currently only declares dataFileId and the
overlap/chunking invariant must be enforced in the service layer: update
IndexService.createIndex to validate that request.overlapSize <=
request.chunkingSize, and if the check fails return an HTTP 400 response with a
clear error message (e.g., "overlapSize must be less than or equal to
chunkingSize"); add unit tests for both valid and invalid pairs to assert
success and the 400/error message on violation.

🧹 Nitpick (assertive)

Prefer @SiZe(min=1) for lists and align naming to plural ("dataFileIds")

  • Project already uses @SiZe(min=1) for lists (see DataFileBulkCreateRequest); use the same for consistency.
  • Name should reflect plurality.

Apply:

-import jakarta.validation.constraints.NotEmpty
+import jakarta.validation.constraints.Size
@@
-    @field:NotEmpty
-    val dataFileId: List<Long>,
+    @field:Size(min = 1, message = "최소 하나 이상의 파일 ID가 필요합니다")
+    val dataFileIds: List<Long>,
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@field:NotEmpty
val dataFileId: List<Long>,
// Replace the import
import jakarta.validation.constraints.Size
// Within IndexRequestDTO.kt
@field:Size(min = 1, message = "최소 하나 이상의 파일 ID가 필요합니다")
val dataFileIds: List<Long>,
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt around
lines 13 to 15, replace the @field:NotEmpty annotation on the List property with
@field:Size(min = 1) and rename the property from dataFileId to dataFileIds for
plural consistency; update the import to javax.validation.constraints.Size
(remove NotEmpty import), and update all usages/constructors/JSON mappings/tests
that reference dataFileId to use dataFileIds so compilation and serialization
remain correct.

Comment on lines +3 to +6
interface Embedder {
val dim: Int
fun embed(text: String): FloatArray
} No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Verify implementor naming (“OpenAIEmbbeder” vs “OpenAIEmbedder”)

Typo in class/file names can confuse DI and readers.


🏁 Script executed:

#!/bin/bash
# Find embedder implementations and spot typos
rg -nP --type=kotlin -C2 '(class|interface)\s+OpenAIEm[bB]ed{1,2}er|Embedder' | sed -n '1,120p'

Length of output: 2492


Rename file to match class name
The file name OpenAIEmbbeder.kt contains a typo that doesn’t align with the class it declares (OpenAIEmbedder). This mismatch can confuse readers and tooling.

• src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt → rename to → OpenAIEmbedder.kt

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt around
lines 3 to 6 the file name contains a typo and does not match the declared class
OpenAIEmbedder; rename the file to OpenAIEmbedder.kt, update any
imports/usages/references across the project to the new filename, and ensure
package declarations and CI/build files (if referencing the path) still resolve
correctly so the class name and filename are consistent.

🧹 Nitpick (assertive)

Add batch embedding to improve throughput; keep single-call default

Batching reduces network overhead and cost. Provide a default method to avoid breaking implementors.

Apply:

 interface Embedder {
     val dim: Int
     fun embed(text: String): FloatArray
+    fun embedAll(texts: List<String>): List<FloatArray> = texts.map(::embed)
 }

Optionally consider making these suspend and handling upstream backpressure later.

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/Embedder.kt around lines
3 to 6, add a default batch embedding method to the interface while keeping the
existing single-item embed to avoid breaking implementors: declare a new
function (e.g., fun embedBatch(texts: List<String>): List<FloatArray>) with a
default implementation that iterates over texts and calls the existing
embed(text: String) for each entry and returns the list of embeddings; keep the
existing val dim and fun embed as-is; optionally mark both signatures as suspend
later if you intend to introduce asynchronous/backpressure handling upstream.

Comment on lines +1 to +13
package simplerag.ragback.domain.index.embed

import org.springframework.ai.openai.OpenAiEmbeddingModel
import org.springframework.stereotype.Component

@Component
class OpenAIEmbedder(
private val openAiEmbeddingModel: OpenAiEmbeddingModel
) : Embedder {
override val dim: Int = 1536
override fun embed(text: String): FloatArray =
openAiEmbeddingModel.embed(text)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Hard-coded dim=1536 will reject 3072-dim models; derive dimension from the configured model. Also, filename typo.

  • If the configured OpenAI model is text-embedding-3-large (3072), current dim=1536 forces INVALID_INPUT.
  • Consider inferring dim from the configured modelId (or a bean property) using your EmbeddingModel enum.
  • Minor: file name has a typo (“Embbeder”). Consider renaming to OpenAiEmbedder.kt for consistency.

Apply this diff to make dim consistent with configuration (adjust imports if package paths differ):

 package simplerag.ragback.domain.index.embed

 import org.springframework.ai.openai.OpenAiEmbeddingModel
+import org.springframework.beans.factory.annotation.Value
 import org.springframework.stereotype.Component
+import simplerag.ragback.domain.index.entity.enums.EmbeddingModel

 @Component
-class OpenAIEmbedder(
-    private val openAiEmbeddingModel: OpenAiEmbeddingModel
+class OpenAIEmbedder(
+    private val openAiEmbeddingModel: OpenAiEmbeddingModel,
+    @Value("\${spring.ai.openai.embedding.options.model:text-embedding-3-small}")
+    private val modelId: String
 ) : Embedder {
-    override val dim: Int = 1536
+    override val dim: Int =
+        EmbeddingModel.findByModelId(modelId)?.dim
+            ?: error("Unsupported embedding modelId: $modelId")
     override fun embed(text: String): FloatArray =
         openAiEmbeddingModel.embed(text)
 }

If you will support multiple models per index concurrently, consider a factory to return an Embedder bound to req.embeddingModel instead of a single global bean.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
package simplerag.ragback.domain.index.embed
import org.springframework.ai.openai.OpenAiEmbeddingModel
import org.springframework.stereotype.Component
@Component
class OpenAIEmbedder(
private val openAiEmbeddingModel: OpenAiEmbeddingModel
) : Embedder {
override val dim: Int = 1536
override fun embed(text: String): FloatArray =
openAiEmbeddingModel.embed(text)
}
package simplerag.ragback.domain.index.embed
import org.springframework.ai.openai.OpenAiEmbeddingModel
import org.springframework.beans.factory.annotation.Value
import org.springframework.stereotype.Component
import simplerag.ragback.domain.index.entity.enums.EmbeddingModel
@Component
class OpenAIEmbedder(
private val openAiEmbeddingModel: OpenAiEmbeddingModel,
@Value("\${spring.ai.openai.embedding.options.model:text-embedding-3-small}")
private val modelId: String
) : Embedder {
override val dim: Int =
EmbeddingModel.findByModelId(modelId)?.dim
?: error("Unsupported embedding modelId: $modelId")
override fun embed(text: String): FloatArray =
openAiEmbeddingModel.embed(text)
}
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt lines
1-13, the embedder hardcodes dim=1536 which will break for 3072-dim models and
the filename/class name contains a typo; change the implementation to derive the
embedding dimension from the configured model (inject the configured modelId or
an EmbeddingModel/enum bean), map modelId to its dimension (e.g., 1536 for
text-embedding-3-small, 3072 for text-embedding-3-large) and set override val
dim accordingly with a sensible fallback and error if unknown, replace the
misspelled filename/class name (OpenAIEmbbeder -> OpenAiEmbedder) to match
project conventions and update imports/usages; optionally consider providing a
factory if multiple model types will be supported concurrently.

Comment on lines 60 to +61
override fun urlFromKey(key: String): String =
s3.utilities()
.getUrl { it.bucket(bucket).key(key) }
.toExternalForm()
"https://mukit-s3.s3.ap-northeast-2.amazonaws.com/" + key
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Hard-coded S3 URL will break across environments and mishandles URL encoding

Using a fixed bucket/region string prevents deploy-time changes and can yield invalid URLs for keys with spaces or special chars. Let the AWS SDK construct the URL (handles region and encoding).

Apply:

-    override fun urlFromKey(key: String): String =
-        "https://mukit-s3.s3.ap-northeast-2.amazonaws.com/" + key
+    override fun urlFromKey(key: String): String =
+        s3.utilities().getUrl { it.bucket(bucket).key(key) }.toExternalForm()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
override fun urlFromKey(key: String): String =
s3.utilities()
.getUrl { it.bucket(bucket).key(key) }
.toExternalForm()
"https://mukit-s3.s3.ap-northeast-2.amazonaws.com/" + key
override fun urlFromKey(key: String): String =
s3.utilities().getUrl { it.bucket(bucket).key(key) }.toExternalForm()
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt around lines
60-61, the code returns a hard-coded S3 URL which embeds bucket/region and does
not handle URL encoding; replace this with a call into the AWS SDK so the SDK
builds a correct, encoded URL and you can avoid hard-coded bucket/region. Inject
or read the configured bucket/region values from your app config, then call the
S3 client utilities to build the URL (e.g., use S3Utilities.getUrl/GetUrlRequest
or S3Client.utilities().getUrl(...) / S3Presigner for presigned URLs) with the
bucket and key; return the SDK-produced URL string instead of concatenating
strings so encoding and region resolution are handled correctly.

Comment on lines +3 to +4
object TextChunker {
fun chunkByCharsSeq(raw: String, size: Int, overlap: Int): Sequence<String> = sequence {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider an optional word-boundary mode to reduce mid-word splits.

Character windows are fine for a first pass. If quality matters, add a whitespace-aware variant (e.g., try to extend to the next whitespace within a small slack).

I can add chunkByWordsSeq(...) with a small slack range if desired.

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt around lines
3-4, add a whitespace-aware variant (either an optional mode on chunkByCharsSeq
or a new chunkByWordsSeq) that avoids splitting words: when computing each
window end, allow a small "slack" (configurable Int) to look forward up to slack
characters for the next whitespace and extend the end to that whitespace if
found; ensure the produced chunk size does not exceed size+slack, preserve the
requested overlap by computing the next window start = previousStart + size -
overlap (or adjust start based on the extended end while maintaining overlap
semantics), handle edge cases (no whitespace found within slack, end of string)
by falling back to the original character-boundary end, and keep this as a
Sequence<String> generator to match existing API.

Comment on lines +22 to +26
private fun normalize(s: String): String =
s.replace("\r\n", "\n").replace("\r", "\n")
.replace(Regex("[ \t]+"), " ")
.replace(Regex("\\n{3,}"), "\n\n")
.trim()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Avoid recompiling regexes on every call; precompile once.

Creating Regex instances per call adds overhead for large inputs. Hoist them to private val fields.

-object TextChunker {
+object TextChunker {
+    private val SPACES = Regex("[ \\t]+")
+    private val MANY_NL = Regex("\\n{3,}")
@@
-    private fun normalize(s: String): String =
-        s.replace("\r\n", "\n").replace("\r", "\n")
-            .replace(Regex("[ \t]+"), " ")
-            .replace(Regex("\\n{3,}"), "\n\n")
-            .trim()
+    private fun normalize(s: String): String =
+        s.replace("\r\n", "\n").replace("\r", "\n")
+            .replace(SPACES, " ")
+            .replace(MANY_NL, "\n\n")
+            .trim()
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt around lines 22
to 26, the normalize function currently constructs Regex("[ \t]+") and
Regex("\\n{3,}") on every call which is wasteful; hoist these regexes to
precompiled private vals (either as private val fields in the class or inside a
companion object/top-level private vals) like private val MULTI_WHITESPACE =
Regex("[ \t]+") and private val MULTIPLE_BLANK_LINES = Regex("\\n{3,}") and then
replace the inline Regex(...) usages in normalize with those precompiled vals so
Regex objects are reused across calls.

Comment on lines 31 to 39
@SpringBootTest
@ActiveProfiles("test")
class DataFileServiceTest(
private val dataFileService: DataFileService,
private val dataFileRepository: DataFileRepository,
private val tagRepository: TagRepository,
private val dataFileTagRepository: DataFileTagRepository,
private val s3Util: FakeS3Util
@Autowired val dataFileService: DataFileService,
@Autowired val dataFileRepository: DataFileRepository,
@Autowired val tagRepository: TagRepository,
@Autowired val dataFileTagRepository: DataFileTagRepository,
@Autowired val s3Util: FakeS3Util
) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Constructor injection in tests: add @Testconstructor(autowireMode = ALL) for clarity and portability.

Relying on @Autowired on constructor params alone can be fragile across Boot versions. Align with IndexServiceTest by enabling constructor autowiring explicitly.

 @SpringBootTest
 @ActiveProfiles("test")
+@org.springframework.test.context.TestConstructor(autowireMode = org.springframework.test.context.TestConstructor.AutowireMode.ALL)
 class DataFileServiceTest(
-    @Autowired val dataFileService: DataFileService,
-    @Autowired val dataFileRepository: DataFileRepository,
-    @Autowired val tagRepository: TagRepository,
-    @Autowired val dataFileTagRepository: DataFileTagRepository,
-    @Autowired val s3Util: FakeS3Util
+    @Autowired val dataFileService: DataFileService,
+    @Autowired val dataFileRepository: DataFileRepository,
+    @Autowired val tagRepository: TagRepository,
+    @Autowired val dataFileTagRepository: DataFileTagRepository,
+    @Autowired val s3Util: FakeS3Util
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@SpringBootTest
@ActiveProfiles("test")
class DataFileServiceTest(
private val dataFileService: DataFileService,
private val dataFileRepository: DataFileRepository,
private val tagRepository: TagRepository,
private val dataFileTagRepository: DataFileTagRepository,
private val s3Util: FakeS3Util
@Autowired val dataFileService: DataFileService,
@Autowired val dataFileRepository: DataFileRepository,
@Autowired val tagRepository: TagRepository,
@Autowired val dataFileTagRepository: DataFileTagRepository,
@Autowired val s3Util: FakeS3Util
) {
@SpringBootTest
@ActiveProfiles("test")
@org.springframework.test.context.TestConstructor(
autowireMode = org.springframework.test.context.TestConstructor.AutowireMode.ALL
)
class DataFileServiceTest(
@Autowired val dataFileService: DataFileService,
@Autowired val dataFileRepository: DataFileRepository,
@Autowired val tagRepository: TagRepository,
@Autowired val dataFileTagRepository: DataFileTagRepository,
@Autowired val s3Util: FakeS3Util
) {
//
}
🤖 Prompt for AI Agents
In
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt
around lines 31 to 39, the test class relies on constructor @Autowired
parameters but lacks the explicit Spring Boot test constructor autowiring
annotation; add @TestConstructor(autowireMode =
TestConstructor.AutowireMode.ALL) above the class declaration (and import
org.springframework.test.context.TestConstructor) so constructor injection is
explicit and consistent with IndexServiceTest.

Comment on lines +30 to +31
@Autowired val indexService: IndexService,
@Autowired val indexRepository: IndexRepository,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Drop redundant @Autowired on constructor params (you already enabled constructor autowiring).

Since the class is annotated with @Testconstructor(autowireMode = ALL), the @Autowired annotations on the primary constructor parameters are unnecessary noise.

Apply this diff:

-class IndexServiceTest(
-    @Autowired val indexService: IndexService,
-    @Autowired val indexRepository: IndexRepository,
+class IndexServiceTest(
+    val indexService: IndexService,
+    val indexRepository: IndexRepository,
 )
🤖 Prompt for AI Agents
In src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
around lines 30 to 31, the primary constructor parameters are redundantly
annotated with @Autowired despite @TestConstructor(autowireMode = ALL) enabling
constructor autowiring; remove the @Autowired annotations from the constructor
parameters (keep the parameter types and names intact) so the test uses
constructor injection provided by the class-level configuration and avoid
unnecessary annotations.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
build.gradle (1)

51-55: Remove duplicate vector-store dependency from test classpath

spring-ai-starter-vector-store-pgvector is already on the main classpath (Line 49). Duplicating it in tests can mask classpath issues and slow resolution.

Apply:

-    testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 4d7b409 and c930fa7.

📒 Files selected for processing (1)
  • build.gradle (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: catturtle123
PR: Simple-RAG/BACK#6
File: src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt:16-22
Timestamp: 2025-08-17T13:12:02.200Z
Learning: catturtle123이 Index 엔티티의 overlapSize ≤ chunkingSize 검증을 엔티티 레벨이 아닌 서비스 레이어에서 구현하기로 결정했으며, chunking embedding 저장 관련 PR에서 이 검증 로직을 추가할 예정이므로 해당 PR에서 언급해야 함 (Simple-RAG/BACK 프로젝트)
🔇 Additional comments (1)
build.gradle (1)

39-49: Sanity-check Spring AI BOM alignment

I see you’re importing the Spring AI BOM in build.gradle (lines 65–69) via

dependencyManagement {
    imports {
        mavenBom "org.springframework.ai:spring-ai-bom:${springAiVersion}"
    }
}

and you declare the two starters without explicit versions on lines 39 and 49 (plus a testImport at line 55):

  • implementation 'org.springframework.ai:spring-ai-starter-model-openai'
  • implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
  • testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'

Please verify the following:

• Ensure springAiVersion is defined (for example, in gradle.properties).
• Set your JAVA_HOME to a valid JDK installation, then run:

./gradlew dependencies --configuration runtimeClasspath \
  | grep 'org.springframework.ai:spring-ai-starter'

• Confirm both starters resolve to version ${springAiVersion} (i.e. you see lines like
org.springframework.ai:spring-ai-starter-model-openai : ${springAiVersion}
and
org.springframework.ai:spring-ai-starter-vector-store-pgvector : ${springAiVersion}).

If you encounter any “unmanaged version” warnings or failures, bump springAiVersion to a release that includes both starters. Happy to help draft a compatibility matrix if needed.

implementation 'com.fasterxml.jackson.module:jackson-module-kotlin'
implementation 'org.jetbrains.kotlin:kotlin-reflect'
// implementation 'org.springframework.ai:spring-ai-starter-model-openai'
implementation 'org.springframework.ai:spring-ai-starter-model-openai'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

OpenAI starter added — ensure tests don’t require a real API key or make network calls

Adding spring-ai-starter-model-openai will create OpenAI beans at context startup. Make sure your test profile stubs or disables outbound calls so unit tests are hermetic.

Run this to verify there’s a test profile and it isolates OpenAI config:

If you want, I can provide a minimal application-test.yml that sets a dummy key and a @TestConfiguration that supplies a no-op EmbeddingModel bean for tests.


🏁 Script executed:

#!/bin/bash
# 1) Check for test profile configs
fd -a 'application-*.yml' -uu | sed -n 's#^#- #p'

# 2) Look for OpenAI properties in test profile(s)
rg -n --glob 'src/**/resources/**' -C2 'spring\.ai\.openai\.(api-key|base-url|embedding|chat)'

# 3) Grep for any tests using real OpenAI clients
rg -nP --type=kt --type=java -C2 '\bOpenAI|OpenAi|openai|EmbeddingModel|ChatClient'

Length of output: 317


🏁 Script executed:

#!/bin/bash
# 1) Search for tests that bootstrap the Spring context (which would load OpenAI beans)
rg -n '@SpringBootTest' -C2 --glob '**/*.java' --glob '**/*.kt'

# 2) Search for any @TestConfiguration classes that might override the OpenAI beans
rg -n '@TestConfiguration' -C2 --glob '**/*.java' --glob '**/*.kt'

# 3) Confirm whether a test-specific application-test.yml exists
fd -a 'application-test*.yml' -uu | sed -n 's#^#- #p'

Length of output: 1699


🏁 Script executed:

#!/bin/bash
# Display contents of the test profile configuration
sed -n '1,200p' src/test/resources/application-test.yml

Length of output: 235


Configure test profile to stub or disable OpenAI beans

Your application-test.yml (src/test/resources/application-test.yml) only configures the datasource and JPA—there are no overrides for spring.ai.openai.*, and no @TestConfiguration to stub out the OpenAI clients. Because your tests use @SpringBootTest with the “test” profile, they will attempt to instantiate the real OpenAI beans at startup, leading to missing-API-key errors or unwanted network calls.

• application-test.yml (lines 1–200) shows only:

spring:
  datasource:
    driver-class-name: org.postgresql.Driver
  jpa:
    database-platform: org.hibernate.dialect.PostgreSQLDialect
    hibernate:
      ddl-auto: create-drop

(no spring.ai.openai.* props)

• No @TestConfiguration or @MockBean was found to override EmbeddingModel, ChatCompletionModel, or the OpenAI client.
• Tests in src/test/kotlin/** annotated with @SpringBootTest + @ActiveProfiles("test") will load the real starter beans and fail (or hit the network).

To fix, you must:

  1. Add dummy OpenAI props to your test profile:

    spring:
      ai:
        openai:
          api-key: dummy
          base-url: http://localhost:8080
    # …existing datasource/jpa…
  2. Stub out the OpenAI beans, for example via a @TestConfiguration class in src/test/java or src/test/kotlin:

    @TestConfiguration
    public class OpenAIMockConfig {
      @Bean
      EmbeddingModel embeddingModel() {
        return input -> Collections.emptyList();
      }
    
      @Bean
      ChatCompletionModel chatCompletionModel() {
        return request -> ChatCompletionResponse.builder().choices(...).build();
      }
    }
  3. (Alternate) Use @MockBean in your test classes:

    @SpringBootTest
    @ActiveProfiles("test")
    class YourServiceTest {
      @MockBean EmbeddingModel embeddingModel;
      @MockBean ChatCompletionModel chatCompletionModel;
      // …
    }

Let me know if you’d like a complete minimal example of application-test.yml plus test‐stub config.

🤖 Prompt for AI Agents
In build.gradle around line 39, the test profile doesn't override or stub OpenAI
beans so @SpringBootTest with profile "test" will instantiate real OpenAI
clients; add test-time configuration to prevent real API usage by (1) adding
dummy OpenAI properties to src/test/resources/application-test.yml
(spring.ai.openai.api-key and base-url) and (2) providing test-only beans that
stub EmbeddingModel and ChatCompletionModel via a @TestConfiguration class in
src/test/java or src/test/kotlin (or alternatively annotate tests with @MockBean
for those interfaces) so tests run without network calls or missing-key errors.

Comment on lines +47 to 50
// postgresql
implementation 'org.postgresql:postgresql'
implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Prefer runtimeOnly for the JDBC driver to keep the compile classpath lean

The driver isn’t needed at compile time. Switching to runtimeOnly follows Spring Boot’s guidance and reduces classpath noise.

Apply:

-    implementation 'org.postgresql:postgresql'
+    runtimeOnly 'org.postgresql:postgresql'
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// postgresql
implementation 'org.postgresql:postgresql'
implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
// postgresql
runtimeOnly 'org.postgresql:postgresql'
implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
🤖 Prompt for AI Agents
In build.gradle around lines 47 to 50, the PostgreSQL JDBC driver is declared as
an implementation dependency even though it is only required at runtime; change
the org.postgresql:postgresql dependency to runtimeOnly
'org.postgresql:postgresql' (leave the spring-ai-starter-vector-store-pgvector
dependency as implementation) so the driver is not on the compile classpath and
follows Spring Boot guidance.

@catturtle123 catturtle123 merged commit d991822 into develop Aug 22, 2025
1 check passed
@catturtle123 catturtle123 deleted the feature/#10-indexing branch August 22, 2025 05:58
@coderabbitai coderabbitai bot mentioned this pull request Aug 22, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

✨ [Feature] 인덱스 임베딩 저장

2 participants