-
Notifications
You must be signed in to change notification settings - Fork 0
✨ Feature: indexing #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds Spring AI/OpenAI and pgvector starter dependencies, introduces embedding interfaces and OpenAI embedder, implements content loading and text chunking, changes ChunkEmbedding to store FloatArray with vector(1536), implements full createIndex flow producing embeddings, reorganizes utility packages, adjusts S3 URL behavior, and updates tests and local config. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor Client
participant IS as IndexService
participant DFR as DataFileRepository
participant CLR as ContentLoader
participant TC as TextChunker
participant EMB as Embedder
participant IR as IndexRepository
participant DB as Database
Client->>IS: createIndex(req)
IS->>IS: validateOverlap(req)
IS->>DFR: findAllById(req.dataFileId)
DFR-->>IS: DataFiles
IS->>IS: require(embedder.dim == req.embeddingModel.dim)
IS->>IR: save(Index.from(req))
IR-->>IS: persisted Index
loop each DataFile
IS->>CLR: load(file.url)
CLR-->>IS: content
alt content not blank
IS->>TC: chunkByCharsSeq(content, size, overlap)
TC-->>IS: chunks
loop each chunk
IS->>EMB: embed(chunk)
EMB-->>IS: FloatArray(1536)
IS->>DB: persist ChunkEmbedding(embedding, content, index)
end
end
end
IS-->>Client: IndexPreviewResponse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (11)
src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt (1)
18-38: Broaden content-type detection or document security expectations.Relying on file extension is OK as a fallback, but it’s spoofable. If this mapping is used for security-sensitive branching (e.g., parsing), consider content sniffing (Apache Tika) or magic-byte checks. As a minimal improvement, cover common types encountered in RAG pipelines.
Apply this small extension of mappings:
@@ return when (ext) { "png" -> "image/png" "jpg", "jpeg" -> "image/jpeg" + "webp" -> "image/webp" + "svg" -> "image/svg+xml" + "heic" -> "image/heic" "pdf" -> "application/pdf" "txt" -> "text/plain" "csv" -> "text/csv" "md" -> "text/markdown" "json" -> "application/json" + "rtf" -> "application/rtf" "zip" -> "application/zip" + "7z" -> "application/x-7z-compressed" "doc" -> "application/msword" "docx" -> "application/vnd.openxmlformats-officedocument.wordprocessingml.document" "xls" -> "application/vnd.ms-excel" "xlsx" -> "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet" "ppt" -> "application/vnd.ms-powerpoint" "pptx" -> "application/vnd.openxmlformats-officedocument.presentationml.presentation" else -> "application/octet-stream" }If you want, I can wire in Tika with a graceful fallback to this map.
src/main/kotlin/simplerag/ragback/global/util/converter/MultipartJackson2HttpMessageConverter.kt (1)
9-25: Disable canRead as well; current converter may intercept application/octet-stream reads unexpectedly.You’ve disabled writes, but reads remain enabled by default for application/octet-stream via AbstractJackson2HttpMessageConverter. This can hijack request body resolution and regress multipart/byte[] handling.
Apply:
@@ @Component class MultipartJackson2HttpMessageConverter (objectMapper: ObjectMapper) : AbstractJackson2HttpMessageConverter(objectMapper, MediaType.APPLICATION_OCTET_STREAM) { + override fun canRead(clazz: Class<*>, mediaType: MediaType?): Boolean = false + override fun canRead(type: Type?, contextClass: Class<*>?, mediaType: MediaType?): Boolean = false + override fun canRead(mediaType: MediaType?): Boolean = false + override fun canWrite(clazz: Class<*>, mediaType: MediaType?): Boolean { return false } override fun canWrite(type: Type?, clazz: Class<*>, mediaType: MediaType?): Boolean { return false } override fun canWrite(mediaType: MediaType?): Boolean { return false } }Optionally also annotate with @order(Ordered.LOWEST_PRECEDENCE) for belt-and-braces.
src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt (2)
25-27: Logger initialized with the wrong classThis logs under GlobalExceptionHandler instead of S3UtilImpl, making logs harder to filter.
Apply:
- private val log = LoggerFactory.getLogger(GlobalExceptionHandler::class.java) + private val log = LoggerFactory.getLogger(S3UtilImpl::class.java)Also remove the now-unused GlobalExceptionHandler import.
100-107: keyFromUrl should decode percent-encoded pathsURLs returned by clients/CDN may be encoded; deleting with an encoded key can fail. Decode safely.
Apply:
- override fun keyFromUrl(url: String): String? { - val path = try { - URI(url).path // e.g. "/market/menu/uuid_name.jpg" - } catch (_: Exception) { - return null - } - return path.removePrefix("/").ifBlank { null } - } + override fun keyFromUrl(url: String): String? { + val rawPath = try { + URI(url).rawPath // keep encoding, decode ourselves + } catch (_: Exception) { + return null + } + if (rawPath.isNullOrBlank()) return null + return java.net.URLDecoder.decode(rawPath.removePrefix("/"), java.nio.charset.StandardCharsets.UTF_8) + .ifBlank { null } + }src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)
7-9: Add an index on the FK column to speed lookups by Index.Read-paths will commonly fetch chunk embeddings per index. Add a DB index for
indexes_id.Apply:
-@Table(name = "chunk_embeddings") +@Table( + name = "chunk_embeddings", + indexes = [jakarta.persistence.Index(name = "idx_chunk_embeddings_indexes_id", columnList = "indexes_id")] +)src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (3)
55-69: Consider asserting that chunk embeddings are created when content exists.Currently, createIndexTest only checks IDs/names. Adding an assertion on chunkEmbeddings count (or a repository count) would verify the end-to-end indexing pipeline.
I can add a fixture DataFile with a short text and assert that embeddings were persisted for it.
59-69: Embedding dimension mismatch in tests will cause failuresThe
IndexService.createIndexmethod explicitly checks that the embedder’sdim(1536) matches the requested model’sdimand throwsINVALID_INPUTon mismatch. SinceOpenAIEmbedder.dim = 1536but the tests useEmbeddingModel.TEXT_EMBEDDING_3_LARGE(3072), any test invokingcreateIndexwith the large model will now fail.• File:
src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
Line ranges: 59–69, 76–83, 96–100, 126–133, 156–163, 178–185, 216–220, 241–245, 268–270, 291–295Please update all occurrences of
EmbeddingModel.TEXT_EMBEDDING_3_LARGEin these tests toEmbeddingModel.TEXT_EMBEDDING_3_SMALL(1536) to align with the current embedder configuration. If support for the 3072-dim model is intended, instead updateOpenAIEmbedder.dim(and related wiring) to 3072.Suggested diff for one occurrence (apply similarly across all):
- IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true) + IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_SMALL, true)
72-84: Align overlap validation and tests with the “allow equal overlap” ruleThe service’s
validateOverlapcurrently throws onoverlapSize >= chunkingSize, disallowing equal values, but the team has agreed thatoverlapSizemay be equal tochunkingSize(i.e. only overlap > chunking should error). Please update both the implementation and the tests accordingly:• In IndexService.kt (around line 97), change the guard to throw only when
overlapSize > chunkingSize:- private fun validateOverlap(overlapSize: Int, chunkingSize: Int) { - if (overlapSize >= chunkingSize) throw IndexException(ErrorCode.OVERLAP_OVERFLOW) - } + private fun validateOverlap(overlapSize: Int, chunkingSize: Int) { + if (overlapSize > chunkingSize) throw IndexException(ErrorCode.OVERLAP_OVERFLOW) + }• In IndexServiceTest.kt (create‐index overlap test at lines 72–84), trigger the error only when overlap > chunking. For example:
- val indexCreateRequest = - IndexCreateRequest("test", 1, 1, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true) + val indexCreateRequest = + IndexCreateRequest("test", 1, 2, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true)(The error message
"overlap 크기는 chunking 크기를 넘을 수 없습니다."remains appropriate, since it forbids only exceeding.)• Apply the same update to the index‐update overlap test (lines 233–256): use an
overlapSizeone greater thanchunkingSizeto verify the exception.src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (2)
215-219: Use existing sha256Hex() util to compute the 12-char key prefix.Reduces duplication and potential inconsistencies.
- val hash12 = MessageDigest.getInstance("SHA-256") - .digest(bytes).joinToString("") { "%02x".format(it) } - .take(12) + val hash12 = sha256Hex(bytes).take(12)
282-284: Prefer assertFalse over equals(false) for boolean assertions.Improves readability and assertion messages.
- assertEquals(dataFiles.hasNext, false) + assertFalse(dataFiles.hasNext)src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (1)
97-99: Adjust overlap validation to allow equalityThe project’s learning indicates that
overlapSizeshould be permitted to equalchunkingSize(i.e. enforceoverlapSize ≤ chunkingSize) and that this check will live in the service layer as part of the chunking‐embedding save flow. The current implementation throws on>=and tests have been written to expect equality to fail, which conflicts with that decision.Please apply the following mandatory updates:
- In
IndexService.validateOverlap, change the guard from>=to>so that equality is allowed.- Update any affected unit tests that currently assert an exception when
overlapSize == chunkingSizeto expect success instead.- Optionally, clarify the exception message to reflect “overlap 크기는 chunking 크기보다 작거나 같아야 합니다.” or similar.
- Add a note in the upcoming chunking‐embedding save PR that this validation was introduced here.
--- a/src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt +++ b/src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt @@ -97,7 +97,7 @@ class IndexService { private fun validateOverlap(overlapSize: Int, chunkingSize: Int) { - if (overlapSize >= chunkingSize) throw IndexException(ErrorCode.OVERLAP_OVERFLOW) + if (overlapSize > chunkingSize) throw IndexException(ErrorCode.OVERLAP_OVERFLOW) } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (20)
build.gradle(1 hunks)src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/embed/Embedder.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/converter/MultipartJackson2HttpMessageConverter.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/loader/ContentLoader.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/loader/HttpContentLoader.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/s3/S3Type.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/s3/S3Util.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt(2 hunks)src/main/resources/application-local.yml(1 hunks)src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt(1 hunks)src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: catturtle123
PR: Simple-RAG/BACK#6
File: src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt:16-22
Timestamp: 2025-08-17T13:12:02.200Z
Learning: catturtle123이 Index 엔티티의 overlapSize ≤ chunkingSize 검증을 엔티티 레벨이 아닌 서비스 레이어에서 구현하기로 결정했으며, chunking embedding 저장 관련 PR에서 이 검증 로직을 추가할 예정이므로 해당 PR에서 언급해야 함 (Simple-RAG/BACK 프로젝트)
📚 Learning: 2025-08-17T07:14:26.887Z
Learnt from: catturtle123
PR: Simple-RAG/BACK#6
File: src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt:6-22
Timestamp: 2025-08-17T07:14:26.887Z
Learning: ChunkEmbedding 엔티티에서 embedding 필드는 벡터 임베딩 저장용이고, content 필드는 원본 파일/텍스트 청크 저장용으로 설계됨 (catturtle123 프로젝트)
Applied to files:
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt
🧬 Code graph analysis (11)
src/main/kotlin/simplerag/ragback/global/util/s3/S3Util.kt (2)
src/main/kotlin/simplerag/ragback/global/util/S3UtilImpl.kt (3)
s3(18-110){ "file" }(89-89){ null }(108-108)src/main/kotlin/simplerag/ragback/global/util/S3Util.kt (1)
upload(5-11)
src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt (1)
src/main/kotlin/simplerag/ragback/global/util/S3UtilImpl.kt (1)
s3(18-110)
src/main/kotlin/simplerag/ragback/global/util/s3/S3Type.kt (3)
src/main/kotlin/simplerag/ragback/global/util/S3Type.kt (1)
label(4-8)src/main/kotlin/simplerag/ragback/global/util/S3UtilImpl.kt (3)
s3(18-110){ null }(108-108){ "file" }(89-89)src/main/kotlin/simplerag/ragback/global/util/S3Util.kt (1)
upload(5-11)
src/main/kotlin/simplerag/ragback/domain/index/embed/Embedder.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (1)
dim(3-49)src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (4)
{ "Embedding must not be empty; expected fixed dimension (e.g., 1536)" }(10-10){ "Embedding (FloatArray) must not be null" }(9-9){(26-28){(11-13)
src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt (1)
src/main/kotlin/simplerag/ragback/global/util/FileConvertUtil.kt (1)
sha256(8-11)
src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt (1)
src/main/kotlin/simplerag/ragback/global/util/S3Util.kt (2)
upload(5-11)deleteByUrl(8-8)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/converter/IndexConverter.kt (1)
toIndexPreviewResponse(29-34)src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (1)
name(10-57)
src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (1)
dim(3-49)src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (1)
name(10-57)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (1)
name(10-57)src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileRequestDTO.kt (2)
min(7-11)message(13-20)
src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/controller/IndexController.kt (1)
indexService(11-59)src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (2)
name(10-57)update(48-55)
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/repository/ChunkEmbeddingRepository.kt (1)
interface ChunkEmbeddingRepository : JpaRepository<ChunkEmbedding, Long>(6-6)src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (4)
{ "Embedding (FloatArray) must not be null" }(9-9){ "Embedding must not be empty; expected fixed dimension (e.g., 1536)" }(10-10){(26-28)autoApply(6-34)
🔇 Additional comments (14)
src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt (2)
40-57: Streaming digest implementation is solid.Buffered + DigestInputStream with an 8KB buffer, proper use blocks, and byte counting look correct and memory-safe for large files.
1-1: Confirm package relocation and import updatesAll previous imports of the moved functions have been updated to the new
simplerag.ragback.global.util.converterpackage, and no residual references to the old package remain. Specifically:
- src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt now imports
• computeMetricsStreaming
• resolveContentType- src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt now imports sha256Hex
- src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt now imports sha256Hex
The
FileMetricstype is declared and consumed entirely withinFileConvertUtil.kt—there are no external references requiring its import elsewhere.Package move is fully verified; approving changes.
src/main/kotlin/simplerag/ragback/global/util/s3/S3Type.kt (1)
1-8: All dependent imports updated and S3Type usage verified
- Verified that every external reference to S3Type imports from the new package
simplerag.ragback.global.util.s3.S3Type(e.g. in FakeS3Util.kt, DataFileService.kt, DataFileServiceTest.kt).- S3Util.kt and S3UtilImpl.kt reside in the same package, so they correctly reference S3Type without an import.
- No lingering imports from the old package path were found.
- Enum contents remain unchanged; the leading/trailing slashes in
ORIGINAL_FILE("/ORIGINAL/")are handled correctly by the existing buildKey trimming logic.src/main/kotlin/simplerag/ragback/global/util/s3/S3Util.kt (1)
1-11: Import verification complete – no drift detected
All callers (services, fakes, and tests) now correctly reference the moved interface atsimplerag.ragback.global.util.s3.S3Util:
- src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt
- src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt
- src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (uses FakeS3Util)
No further action required.
src/main/kotlin/simplerag/ragback/global/util/loader/ContentLoader.kt (1)
1-6: Overlap validation is implemented in IndexServiceThe
validateOverlapguard is present and invoked in both index creation and update operations:
- In
src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt:
• Line 29–30:validateOverlap(req.overlapSize, req.chunkingSize)oncreateIndex
• Line 83–84:validateOverlap(indexUpdateRequest.overlapSize, indexUpdateRequest.chunkingSize)onupdate
• Lines 97–99:private fun validateOverlap(overlapSize: Int, chunkingSize: Int) { if (overlapSize >= chunkingSize) throw IndexException(ErrorCode.OVERLAP_OVERFLOW) }- Additionally,
TextChunker.chunkByCharsSeqenforcesrequire(overlap in 0 until size)to guaranteeoverlapSize < chunkingSizebefore chunking.No further changes are needed here.
src/main/resources/application-local.yml (1)
11-17: Mis-scoped Spring AI properties – adjust embedding model nestingThe
modelsetting must live underspring.ai.openai.embedding.options, not atspring.ai.model.embedding, so that Spring AI’s auto-configuration picks it up (docs.spring.io). For example, updateapplication-local.ymlas follows:spring: ai: openai: api-key: ${OPENAI_KEY} - model: - embedding: - options: - model: text-embedding-3-small + embedding: + options: + model: text-embedding-3-small• (Optional) You can still explicitly enable the OpenAI embedding auto-config via
spring: ai: model: embedding: openaibut it defaults to
openaiif unset (docs.spring.io).Runtime check:
• Verify at startup logs that the active embedder reportstext-embedding-3-small.
• Ensure your vector store expects 1 536-dimensional embeddings, matchingtext-embedding-3-small’s default output (airbyte.com, docs.spring.io).build.gradle (1)
29-31: Please manually verify Spring AI BOM compatibilityWe weren’t able to resolve the effective
spring-aiand Spring Boot starter versions due to Gradle daemon connection issues in this environment. To ensure you won’t run into dependency convergence problems at runtime:• Confirm which Spring Boot plugin version you’re using (e.g.
3.5.x) in yourbuild.gradle.
• Run a dependency-insight report locally to see the resolvedspring-aiBOM version:./gradlew dependencyInsight \ --dependency org.springframework.experimental:spring-ai \ --configuration runtimeClasspath• Verify that the BOM version (currently
1.0.1) is supported by your Spring Boot version. If it isn’t, bump or alignspringAiVersionto the matchingspring-aiBOM release for Spring Boot 3.5.x.src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (2)
14-18: Dimension consistency: fixed column vs. per-recordembeddingDim.You persist
embeddingDimalongside a fixedvector(1536)column. If you plan to support models with different dimensions (the issue mentions “adjust embedding size during chunking embedding”), a fixed column dimension will block this. If you intend to standardize on 1536 only, consider removingembeddingDimto avoid drift; otherwise, make the column flexible (schema per model, per-table, or a generic storage form) and validateembedding.size == embeddingDim.Would you like me to propose a migration strategy for multi-dimension support (separate tables per model, or a generic storage with check constraints)?
11-12: Switch totextfor content is appropriate.Mapping content to
textis a good fit for large chunks and avoids driver-specific@Lobhandling quirks.src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt (2)
12-19: Chunking loop logic is correct and handles the tail chunk cleanly.
step = size - overlap, guarded to be ≥1;endis clamped; final chunk exits viabreak. Looks good.
5-6: Service-layer overlap validation aligns with chunker requirementThe existing
validateOverlapimplementation inIndexService.ktalready throws anIndexExceptionwhenoverlapSize >= chunkingSize(lines 97–99), thereby rejectingoverlap == chunkingSizeto match the strict< sizerule enforced byTextChunker.chunkByCharsSeq(lines 5–6). No further changes are needed.src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt (1)
7-9: Import path refactor looks consistent with the package move.Aligns with
global.util.s3andglobal.util.converter. No behavior changes.src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (1)
20-23: Imports updated to new packages; service logic unchanged.The refactor maintains behavior and compiles against the reorganized utility packages.
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (1)
22-22: Trailing comma addition is fine and future-proof for diffs.No semantic changes; keeps diffs cleaner when adding fields.
| implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector' | ||
|
|
||
| // test | ||
| testImplementation "org.springframework.boot:spring-boot-testcontainers" | ||
| testImplementation "org.testcontainers:postgresql" | ||
| testImplementation "org.testcontainers:junit-jupiter" | ||
| testImplementation "com.pgvector:pgvector:0.1.6" | ||
| testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector' | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Unnecessary duplicate dependency in tests
spring-ai-starter-vector-store-pgvector is already on the main classpath; duplicating in tests can mask classpath issues and slow resolution.
Apply:
- testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'🤖 Prompt for AI Agents
In build.gradle around lines 50 to 57, the test classpath redundantly includes
'org.springframework.ai:spring-ai-starter-vector-store-pgvector' which is
already declared on the main implementation classpath; remove the duplicate
testImplementation entry to avoid masking classpath issues and speed dependency
resolution, leaving only the testcontainers and junit testImplementation lines
in that block.
| @field:NotEmpty | ||
| val dataFileId: List<Long>, | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Enforce overlapSize ≤ chunkingSize in the service (per team decision)
Per retrieved learnings, the validation belongs in the service layer, not the entity. Ensure IndexService.createIndex guards this invariant and returns a 400 with a clear message when violated.
I can add a guard + unit test (valid/invalid pairs) if you want me to push a patch.
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt around
lines 13 to 15, the DTO currently only declares dataFileId and the
overlap/chunking invariant must be enforced in the service layer: update
IndexService.createIndex to validate that request.overlapSize <=
request.chunkingSize, and if the check fails return an HTTP 400 response with a
clear error message (e.g., "overlapSize must be less than or equal to
chunkingSize"); add unit tests for both valid and invalid pairs to assert
success and the 400/error message on violation.
🧹 Nitpick (assertive)
Prefer @SiZe(min=1) for lists and align naming to plural ("dataFileIds")
- Project already uses @SiZe(min=1) for lists (see DataFileBulkCreateRequest); use the same for consistency.
- Name should reflect plurality.
Apply:
-import jakarta.validation.constraints.NotEmpty
+import jakarta.validation.constraints.Size
@@
- @field:NotEmpty
- val dataFileId: List<Long>,
+ @field:Size(min = 1, message = "최소 하나 이상의 파일 ID가 필요합니다")
+ val dataFileIds: List<Long>,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @field:NotEmpty | |
| val dataFileId: List<Long>, | |
| // Replace the import | |
| import jakarta.validation.constraints.Size | |
| // Within IndexRequestDTO.kt | |
| @field:Size(min = 1, message = "최소 하나 이상의 파일 ID가 필요합니다") | |
| val dataFileIds: List<Long>, |
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt around
lines 13 to 15, replace the @field:NotEmpty annotation on the List property with
@field:Size(min = 1) and rename the property from dataFileId to dataFileIds for
plural consistency; update the import to javax.validation.constraints.Size
(remove NotEmpty import), and update all usages/constructors/JSON mappings/tests
that reference dataFileId to use dataFileIds so compilation and serialization
remain correct.
| interface Embedder { | ||
| val dim: Int | ||
| fun embed(text: String): FloatArray | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Verify implementor naming (“OpenAIEmbbeder” vs “OpenAIEmbedder”)
Typo in class/file names can confuse DI and readers.
🏁 Script executed:
#!/bin/bash
# Find embedder implementations and spot typos
rg -nP --type=kotlin -C2 '(class|interface)\s+OpenAIEm[bB]ed{1,2}er|Embedder' | sed -n '1,120p'Length of output: 2492
Rename file to match class name
The file name OpenAIEmbbeder.kt contains a typo that doesn’t align with the class it declares (OpenAIEmbedder). This mismatch can confuse readers and tooling.
• src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt → rename to → OpenAIEmbedder.kt
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt around
lines 3 to 6 the file name contains a typo and does not match the declared class
OpenAIEmbedder; rename the file to OpenAIEmbedder.kt, update any
imports/usages/references across the project to the new filename, and ensure
package declarations and CI/build files (if referencing the path) still resolve
correctly so the class name and filename are consistent.
🧹 Nitpick (assertive)
Add batch embedding to improve throughput; keep single-call default
Batching reduces network overhead and cost. Provide a default method to avoid breaking implementors.
Apply:
interface Embedder {
val dim: Int
fun embed(text: String): FloatArray
+ fun embedAll(texts: List<String>): List<FloatArray> = texts.map(::embed)
}Optionally consider making these suspend and handling upstream backpressure later.
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/Embedder.kt around lines
3 to 6, add a default batch embedding method to the interface while keeping the
existing single-item embed to avoid breaking implementors: declare a new
function (e.g., fun embedBatch(texts: List<String>): List<FloatArray>) with a
default implementation that iterates over texts and calls the existing
embed(text: String) for each entry and returns the list of embeddings; keep the
existing val dim and fun embed as-is; optionally mark both signatures as suspend
later if you intend to introduce asynchronous/backpressure handling upstream.
| package simplerag.ragback.domain.index.embed | ||
|
|
||
| import org.springframework.ai.openai.OpenAiEmbeddingModel | ||
| import org.springframework.stereotype.Component | ||
|
|
||
| @Component | ||
| class OpenAIEmbedder( | ||
| private val openAiEmbeddingModel: OpenAiEmbeddingModel | ||
| ) : Embedder { | ||
| override val dim: Int = 1536 | ||
| override fun embed(text: String): FloatArray = | ||
| openAiEmbeddingModel.embed(text) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Hard-coded dim=1536 will reject 3072-dim models; derive dimension from the configured model. Also, filename typo.
- If the configured OpenAI model is text-embedding-3-large (3072), current dim=1536 forces INVALID_INPUT.
- Consider inferring dim from the configured modelId (or a bean property) using your EmbeddingModel enum.
- Minor: file name has a typo (“Embbeder”). Consider renaming to OpenAiEmbedder.kt for consistency.
Apply this diff to make dim consistent with configuration (adjust imports if package paths differ):
package simplerag.ragback.domain.index.embed
import org.springframework.ai.openai.OpenAiEmbeddingModel
+import org.springframework.beans.factory.annotation.Value
import org.springframework.stereotype.Component
+import simplerag.ragback.domain.index.entity.enums.EmbeddingModel
@Component
-class OpenAIEmbedder(
- private val openAiEmbeddingModel: OpenAiEmbeddingModel
+class OpenAIEmbedder(
+ private val openAiEmbeddingModel: OpenAiEmbeddingModel,
+ @Value("\${spring.ai.openai.embedding.options.model:text-embedding-3-small}")
+ private val modelId: String
) : Embedder {
- override val dim: Int = 1536
+ override val dim: Int =
+ EmbeddingModel.findByModelId(modelId)?.dim
+ ?: error("Unsupported embedding modelId: $modelId")
override fun embed(text: String): FloatArray =
openAiEmbeddingModel.embed(text)
}If you will support multiple models per index concurrently, consider a factory to return an Embedder bound to req.embeddingModel instead of a single global bean.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| package simplerag.ragback.domain.index.embed | |
| import org.springframework.ai.openai.OpenAiEmbeddingModel | |
| import org.springframework.stereotype.Component | |
| @Component | |
| class OpenAIEmbedder( | |
| private val openAiEmbeddingModel: OpenAiEmbeddingModel | |
| ) : Embedder { | |
| override val dim: Int = 1536 | |
| override fun embed(text: String): FloatArray = | |
| openAiEmbeddingModel.embed(text) | |
| } | |
| package simplerag.ragback.domain.index.embed | |
| import org.springframework.ai.openai.OpenAiEmbeddingModel | |
| import org.springframework.beans.factory.annotation.Value | |
| import org.springframework.stereotype.Component | |
| import simplerag.ragback.domain.index.entity.enums.EmbeddingModel | |
| @Component | |
| class OpenAIEmbedder( | |
| private val openAiEmbeddingModel: OpenAiEmbeddingModel, | |
| @Value("\${spring.ai.openai.embedding.options.model:text-embedding-3-small}") | |
| private val modelId: String | |
| ) : Embedder { | |
| override val dim: Int = | |
| EmbeddingModel.findByModelId(modelId)?.dim | |
| ?: error("Unsupported embedding modelId: $modelId") | |
| override fun embed(text: String): FloatArray = | |
| openAiEmbeddingModel.embed(text) | |
| } |
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt lines
1-13, the embedder hardcodes dim=1536 which will break for 3072-dim models and
the filename/class name contains a typo; change the implementation to derive the
embedding dimension from the configured model (inject the configured modelId or
an EmbeddingModel/enum bean), map modelId to its dimension (e.g., 1536 for
text-embedding-3-small, 3072 for text-embedding-3-large) and set override val
dim accordingly with a sensible fallback and error if unknown, replace the
misspelled filename/class name (OpenAIEmbbeder -> OpenAiEmbedder) to match
project conventions and update imports/usages; optionally consider providing a
factory if multiple model types will be supported concurrently.
| override fun urlFromKey(key: String): String = | ||
| s3.utilities() | ||
| .getUrl { it.bucket(bucket).key(key) } | ||
| .toExternalForm() | ||
| "https://mukit-s3.s3.ap-northeast-2.amazonaws.com/" + key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard-coded S3 URL will break across environments and mishandles URL encoding
Using a fixed bucket/region string prevents deploy-time changes and can yield invalid URLs for keys with spaces or special chars. Let the AWS SDK construct the URL (handles region and encoding).
Apply:
- override fun urlFromKey(key: String): String =
- "https://mukit-s3.s3.ap-northeast-2.amazonaws.com/" + key
+ override fun urlFromKey(key: String): String =
+ s3.utilities().getUrl { it.bucket(bucket).key(key) }.toExternalForm()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| override fun urlFromKey(key: String): String = | |
| s3.utilities() | |
| .getUrl { it.bucket(bucket).key(key) } | |
| .toExternalForm() | |
| "https://mukit-s3.s3.ap-northeast-2.amazonaws.com/" + key | |
| override fun urlFromKey(key: String): String = | |
| s3.utilities().getUrl { it.bucket(bucket).key(key) }.toExternalForm() |
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt around lines
60-61, the code returns a hard-coded S3 URL which embeds bucket/region and does
not handle URL encoding; replace this with a call into the AWS SDK so the SDK
builds a correct, encoded URL and you can avoid hard-coded bucket/region. Inject
or read the configured bucket/region values from your app config, then call the
S3 client utilities to build the URL (e.g., use S3Utilities.getUrl/GetUrlRequest
or S3Client.utilities().getUrl(...) / S3Presigner for presigned URLs) with the
bucket and key; return the SDK-produced URL string instead of concatenating
strings so encoding and region resolution are handled correctly.
| object TextChunker { | ||
| fun chunkByCharsSeq(raw: String, size: Int, overlap: Int): Sequence<String> = sequence { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Consider an optional word-boundary mode to reduce mid-word splits.
Character windows are fine for a first pass. If quality matters, add a whitespace-aware variant (e.g., try to extend to the next whitespace within a small slack).
I can add chunkByWordsSeq(...) with a small slack range if desired.
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt around lines
3-4, add a whitespace-aware variant (either an optional mode on chunkByCharsSeq
or a new chunkByWordsSeq) that avoids splitting words: when computing each
window end, allow a small "slack" (configurable Int) to look forward up to slack
characters for the next whitespace and extend the end to that whitespace if
found; ensure the produced chunk size does not exceed size+slack, preserve the
requested overlap by computing the next window start = previousStart + size -
overlap (or adjust start based on the extended end while maintaining overlap
semantics), handle edge cases (no whitespace found within slack, end of string)
by falling back to the original character-boundary end, and keep this as a
Sequence<String> generator to match existing API.
| private fun normalize(s: String): String = | ||
| s.replace("\r\n", "\n").replace("\r", "\n") | ||
| .replace(Regex("[ \t]+"), " ") | ||
| .replace(Regex("\\n{3,}"), "\n\n") | ||
| .trim() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Avoid recompiling regexes on every call; precompile once.
Creating Regex instances per call adds overhead for large inputs. Hoist them to private val fields.
-object TextChunker {
+object TextChunker {
+ private val SPACES = Regex("[ \\t]+")
+ private val MANY_NL = Regex("\\n{3,}")
@@
- private fun normalize(s: String): String =
- s.replace("\r\n", "\n").replace("\r", "\n")
- .replace(Regex("[ \t]+"), " ")
- .replace(Regex("\\n{3,}"), "\n\n")
- .trim()
+ private fun normalize(s: String): String =
+ s.replace("\r\n", "\n").replace("\r", "\n")
+ .replace(SPACES, " ")
+ .replace(MANY_NL, "\n\n")
+ .trim()🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt around lines 22
to 26, the normalize function currently constructs Regex("[ \t]+") and
Regex("\\n{3,}") on every call which is wasteful; hoist these regexes to
precompiled private vals (either as private val fields in the class or inside a
companion object/top-level private vals) like private val MULTI_WHITESPACE =
Regex("[ \t]+") and private val MULTIPLE_BLANK_LINES = Regex("\\n{3,}") and then
replace the inline Regex(...) usages in normalize with those precompiled vals so
Regex objects are reused across calls.
| @SpringBootTest | ||
| @ActiveProfiles("test") | ||
| class DataFileServiceTest( | ||
| private val dataFileService: DataFileService, | ||
| private val dataFileRepository: DataFileRepository, | ||
| private val tagRepository: TagRepository, | ||
| private val dataFileTagRepository: DataFileTagRepository, | ||
| private val s3Util: FakeS3Util | ||
| @Autowired val dataFileService: DataFileService, | ||
| @Autowired val dataFileRepository: DataFileRepository, | ||
| @Autowired val tagRepository: TagRepository, | ||
| @Autowired val dataFileTagRepository: DataFileTagRepository, | ||
| @Autowired val s3Util: FakeS3Util | ||
| ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Constructor injection in tests: add @Testconstructor(autowireMode = ALL) for clarity and portability.
Relying on @Autowired on constructor params alone can be fragile across Boot versions. Align with IndexServiceTest by enabling constructor autowiring explicitly.
@SpringBootTest
@ActiveProfiles("test")
+@org.springframework.test.context.TestConstructor(autowireMode = org.springframework.test.context.TestConstructor.AutowireMode.ALL)
class DataFileServiceTest(
- @Autowired val dataFileService: DataFileService,
- @Autowired val dataFileRepository: DataFileRepository,
- @Autowired val tagRepository: TagRepository,
- @Autowired val dataFileTagRepository: DataFileTagRepository,
- @Autowired val s3Util: FakeS3Util
+ @Autowired val dataFileService: DataFileService,
+ @Autowired val dataFileRepository: DataFileRepository,
+ @Autowired val tagRepository: TagRepository,
+ @Autowired val dataFileTagRepository: DataFileTagRepository,
+ @Autowired val s3Util: FakeS3Util
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @SpringBootTest | |
| @ActiveProfiles("test") | |
| class DataFileServiceTest( | |
| private val dataFileService: DataFileService, | |
| private val dataFileRepository: DataFileRepository, | |
| private val tagRepository: TagRepository, | |
| private val dataFileTagRepository: DataFileTagRepository, | |
| private val s3Util: FakeS3Util | |
| @Autowired val dataFileService: DataFileService, | |
| @Autowired val dataFileRepository: DataFileRepository, | |
| @Autowired val tagRepository: TagRepository, | |
| @Autowired val dataFileTagRepository: DataFileTagRepository, | |
| @Autowired val s3Util: FakeS3Util | |
| ) { | |
| @SpringBootTest | |
| @ActiveProfiles("test") | |
| @org.springframework.test.context.TestConstructor( | |
| autowireMode = org.springframework.test.context.TestConstructor.AutowireMode.ALL | |
| ) | |
| class DataFileServiceTest( | |
| @Autowired val dataFileService: DataFileService, | |
| @Autowired val dataFileRepository: DataFileRepository, | |
| @Autowired val tagRepository: TagRepository, | |
| @Autowired val dataFileTagRepository: DataFileTagRepository, | |
| @Autowired val s3Util: FakeS3Util | |
| ) { | |
| // … | |
| } |
🤖 Prompt for AI Agents
In
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt
around lines 31 to 39, the test class relies on constructor @Autowired
parameters but lacks the explicit Spring Boot test constructor autowiring
annotation; add @TestConstructor(autowireMode =
TestConstructor.AutowireMode.ALL) above the class declaration (and import
org.springframework.test.context.TestConstructor) so constructor injection is
explicit and consistent with IndexServiceTest.
| @Autowired val indexService: IndexService, | ||
| @Autowired val indexRepository: IndexRepository, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Drop redundant @Autowired on constructor params (you already enabled constructor autowiring).
Since the class is annotated with @Testconstructor(autowireMode = ALL), the @Autowired annotations on the primary constructor parameters are unnecessary noise.
Apply this diff:
-class IndexServiceTest(
- @Autowired val indexService: IndexService,
- @Autowired val indexRepository: IndexRepository,
+class IndexServiceTest(
+ val indexService: IndexService,
+ val indexRepository: IndexRepository,
)🤖 Prompt for AI Agents
In src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
around lines 30 to 31, the primary constructor parameters are redundantly
annotated with @Autowired despite @TestConstructor(autowireMode = ALL) enabling
constructor autowiring; remove the @Autowired annotations from the constructor
parameters (keep the parameter types and names intact) so the test uses
constructor injection provided by the class-level configuration and avoid
unnecessary annotations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
build.gradle (1)
51-55: Remove duplicate vector-store dependency from test classpath
spring-ai-starter-vector-store-pgvectoris already on the main classpath (Line 49). Duplicating it in tests can mask classpath issues and slow resolution.Apply:
- testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
build.gradle(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: catturtle123
PR: Simple-RAG/BACK#6
File: src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt:16-22
Timestamp: 2025-08-17T13:12:02.200Z
Learning: catturtle123이 Index 엔티티의 overlapSize ≤ chunkingSize 검증을 엔티티 레벨이 아닌 서비스 레이어에서 구현하기로 결정했으며, chunking embedding 저장 관련 PR에서 이 검증 로직을 추가할 예정이므로 해당 PR에서 언급해야 함 (Simple-RAG/BACK 프로젝트)
🔇 Additional comments (1)
build.gradle (1)
39-49: Sanity-check Spring AI BOM alignmentI see you’re importing the Spring AI BOM in
build.gradle(lines 65–69) viadependencyManagement { imports { mavenBom "org.springframework.ai:spring-ai-bom:${springAiVersion}" } }and you declare the two starters without explicit versions on lines 39 and 49 (plus a testImport at line 55):
implementation 'org.springframework.ai:spring-ai-starter-model-openai'implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'Please verify the following:
• Ensure
springAiVersionis defined (for example, ingradle.properties).
• Set yourJAVA_HOMEto a valid JDK installation, then run:./gradlew dependencies --configuration runtimeClasspath \ | grep 'org.springframework.ai:spring-ai-starter'• Confirm both starters resolve to version
${springAiVersion}(i.e. you see lines like
org.springframework.ai:spring-ai-starter-model-openai : ${springAiVersion}
and
org.springframework.ai:spring-ai-starter-vector-store-pgvector : ${springAiVersion}).If you encounter any “unmanaged version” warnings or failures, bump
springAiVersionto a release that includes both starters. Happy to help draft a compatibility matrix if needed.
| implementation 'com.fasterxml.jackson.module:jackson-module-kotlin' | ||
| implementation 'org.jetbrains.kotlin:kotlin-reflect' | ||
| // implementation 'org.springframework.ai:spring-ai-starter-model-openai' | ||
| implementation 'org.springframework.ai:spring-ai-starter-model-openai' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
OpenAI starter added — ensure tests don’t require a real API key or make network calls
Adding spring-ai-starter-model-openai will create OpenAI beans at context startup. Make sure your test profile stubs or disables outbound calls so unit tests are hermetic.
Run this to verify there’s a test profile and it isolates OpenAI config:
If you want, I can provide a minimal application-test.yml that sets a dummy key and a @TestConfiguration that supplies a no-op EmbeddingModel bean for tests.
🏁 Script executed:
#!/bin/bash
# 1) Check for test profile configs
fd -a 'application-*.yml' -uu | sed -n 's#^#- #p'
# 2) Look for OpenAI properties in test profile(s)
rg -n --glob 'src/**/resources/**' -C2 'spring\.ai\.openai\.(api-key|base-url|embedding|chat)'
# 3) Grep for any tests using real OpenAI clients
rg -nP --type=kt --type=java -C2 '\bOpenAI|OpenAi|openai|EmbeddingModel|ChatClient'Length of output: 317
🏁 Script executed:
#!/bin/bash
# 1) Search for tests that bootstrap the Spring context (which would load OpenAI beans)
rg -n '@SpringBootTest' -C2 --glob '**/*.java' --glob '**/*.kt'
# 2) Search for any @TestConfiguration classes that might override the OpenAI beans
rg -n '@TestConfiguration' -C2 --glob '**/*.java' --glob '**/*.kt'
# 3) Confirm whether a test-specific application-test.yml exists
fd -a 'application-test*.yml' -uu | sed -n 's#^#- #p'Length of output: 1699
🏁 Script executed:
#!/bin/bash
# Display contents of the test profile configuration
sed -n '1,200p' src/test/resources/application-test.ymlLength of output: 235
Configure test profile to stub or disable OpenAI beans
Your application-test.yml (src/test/resources/application-test.yml) only configures the datasource and JPA—there are no overrides for spring.ai.openai.*, and no @TestConfiguration to stub out the OpenAI clients. Because your tests use @SpringBootTest with the “test” profile, they will attempt to instantiate the real OpenAI beans at startup, leading to missing-API-key errors or unwanted network calls.
• application-test.yml (lines 1–200) shows only:
spring:
datasource:
driver-class-name: org.postgresql.Driver
jpa:
database-platform: org.hibernate.dialect.PostgreSQLDialect
hibernate:
ddl-auto: create-drop(no spring.ai.openai.* props)
• No @TestConfiguration or @MockBean was found to override EmbeddingModel, ChatCompletionModel, or the OpenAI client.
• Tests in src/test/kotlin/** annotated with @SpringBootTest + @ActiveProfiles("test") will load the real starter beans and fail (or hit the network).
To fix, you must:
-
Add dummy OpenAI props to your test profile:
spring: ai: openai: api-key: dummy base-url: http://localhost:8080 # …existing datasource/jpa…
-
Stub out the OpenAI beans, for example via a
@TestConfigurationclass insrc/test/javaorsrc/test/kotlin:@TestConfiguration public class OpenAIMockConfig { @Bean EmbeddingModel embeddingModel() { return input -> Collections.emptyList(); } @Bean ChatCompletionModel chatCompletionModel() { return request -> ChatCompletionResponse.builder().choices(...).build(); } }
-
(Alternate) Use
@MockBeanin your test classes:@SpringBootTest @ActiveProfiles("test") class YourServiceTest { @MockBean EmbeddingModel embeddingModel; @MockBean ChatCompletionModel chatCompletionModel; // … }
Let me know if you’d like a complete minimal example of application-test.yml plus test‐stub config.
🤖 Prompt for AI Agents
In build.gradle around line 39, the test profile doesn't override or stub OpenAI
beans so @SpringBootTest with profile "test" will instantiate real OpenAI
clients; add test-time configuration to prevent real API usage by (1) adding
dummy OpenAI properties to src/test/resources/application-test.yml
(spring.ai.openai.api-key and base-url) and (2) providing test-only beans that
stub EmbeddingModel and ChatCompletionModel via a @TestConfiguration class in
src/test/java or src/test/kotlin (or alternatively annotate tests with @MockBean
for those interfaces) so tests run without network calls or missing-key errors.
| // postgresql | ||
| implementation 'org.postgresql:postgresql' | ||
| implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector' | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Prefer runtimeOnly for the JDBC driver to keep the compile classpath lean
The driver isn’t needed at compile time. Switching to runtimeOnly follows Spring Boot’s guidance and reduces classpath noise.
Apply:
- implementation 'org.postgresql:postgresql'
+ runtimeOnly 'org.postgresql:postgresql'📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // postgresql | |
| implementation 'org.postgresql:postgresql' | |
| implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector' | |
| // postgresql | |
| runtimeOnly 'org.postgresql:postgresql' | |
| implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector' |
🤖 Prompt for AI Agents
In build.gradle around lines 47 to 50, the PostgreSQL JDBC driver is declared as
an implementation dependency even though it is only required at runtime; change
the org.postgresql:postgresql dependency to runtimeOnly
'org.postgresql:postgresql' (leave the spring-ai-starter-vector-store-pgvector
dependency as implementation) so the driver is not on the compile classpath and
follows Spring Boot guidance.
📌 Overview
indexing
🔍 Related Issues
✨ Changes
-indexing
🐛 Fix: Autowired 해결
✨ feature: indexing
📸 Screenshots / Test Results (Optional)
Attach images or videos if necessary.
✅ Checklist
🗒️ Additional Notes
Add any other context or information here.
Summary by CodeRabbit
New Features
Refactor
Chores
Tests