-
Notifications
You must be signed in to change notification settings - Fork 0
✨ Feature: add content extractor #20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughReplaces S3-based file handling with in-app content extraction and storage. Adds PDF/DOCX/TXT extractors and a resolver. Shifts entities’ ids to non-null, changes DataFile to store content (LOB) instead of URL, updates services/DTOs accordingly, removes S3 and URL loader utilities/config, adjusts tests and dependencies. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant DataFileService
participant ExtractorResolver
participant Extractor
participant DataFileRepo
participant TagRepo
participant DataFileTagRepo
Client->>DataFileService: upload(files, items)
loop for each file
DataFileService->>ExtractorResolver: extractContent(file, type)
ExtractorResolver->>Extractor: supports(type)?
alt supported
ExtractorResolver->>Extractor: extract(file)
Extractor-->>ExtractorResolver: content (String)
else unsupported
ExtractorResolver-->>DataFileService: throw INVALID_FILE_TYPE
end
DataFileService->>DataFileRepo: save(DataFile.with(content))
end
DataFileService->>TagRepo: find/create tags
DataFileService->>DataFileTagRepo: saveAll(mappings)
DataFileService-->>Client: upload response
sequenceDiagram
participant Client
participant IndexService
participant DataFileRepo
participant Embedder
participant IndexRepo
participant ChunkEmbeddingRepo
Client->>IndexService: createIndex(request{dataFileIds,...})
IndexService->>DataFileRepo: findAllById(dataFileIds)
loop for each DataFile
IndexService->>IndexService: chunk(DataFile.content)
IndexService->>Embedder: embed(chunk)
Embedder-->>IndexService: vector
IndexService->>ChunkEmbeddingRepo: save(embedding)
end
IndexService->>IndexRepo: save(Index)
IndexService-->>Client: IndexDetailResponse
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changes
Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 40
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (10)
src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt (2)
34-51: Streaming hash implementation looks good; minor reuse and clarityReuse the existing
sha256Hexhelper to avoid duplicate hex logic.- val sha256 = digest.digest().joinToString("") { "%02x".format(it) } - return FileMetrics(sha256, totalBytes) + return FileMetrics(sha256Hex(digest.digest()), totalBytes)Efficient streaming with
DigestInputStreamand an 8KB buffer is appropriate.
20-31: Enforce extension-first allowlist to match available extractorsI verified that the codebase only provides
TxtContentExtractor,PdfContentExtractor, andDocxContentExtractor—there are no extractors or tests for CSV, Markdown, or JSON formats. Therefore theFileConvertUtil.resolveContentType()implementation must be restricted to exactly those three types, and must not trust a spoofedcontentType.Please apply the following mandatory refactor in
src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt:-private val supportedByExt = mapOf( - "pdf" to "application/pdf", - "txt" to "text/plain", - "csv" to "text/csv", - "md" to "text/markdown", - "json" to "application/json", - "docx" to "application/vnd.openxmlformats-officedocument.wordprocessingml.document", -) +private val supportedByExt = mapOf( + "pdf" to "application/pdf", + "txt" to "text/plain", + "docx" to "application/vnd.openxmlformats-officedocument.wordprocessingml.document", +) private val supportedContentTypes = supportedByExt.values.toSet() fun MultipartFile.resolveContentType(): String { // 1) Derive from filename extension only if supported val ext = originalFilename - ?.substringAfterLast('.', "") + ?.substringAfterLast('.', "") ?.lowercase() supportedByExt[ext]?.let { return it } // 2) Fall back to client-provided contentType only if it’s in the allowlist val ct = contentType?.lowercase() if (ct != null && ct in supportedContentTypes) { return ct } throw CustomException(ErrorCode.INVALID_FILE_TYPE) }– If CSV, MD or JSON support is truly required, please add corresponding
ContentExtractorimplementations and unit tests before re-adding them here.src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (2)
14-18: Enforce embedding size invariants close to the data — prevent runtime index errors earlyYou’re storing
FloatArrayin avector(1536)column while also persistingembeddingDim. Ifembedding.sizedoesn’t matchembeddingDim(or 1536), you’ll get DB errors or silent data skew.
- If you plan to support multiple models with different dimensions, avoid hardcoding
vector(1536)at the column level (usevectorand enforce dimension elsewhere), or segregate by table per dimension.- Add an entity-level check to fail fast.
Apply this guard:
class ChunkEmbedding( @@ var embedding: FloatArray, @@ val embeddingDim: Int, @@ val index: Index, -) : BaseEntity() { +) : BaseEntity() { + init { + require(embedding.isNotEmpty()) { "embedding must not be empty" } + require(embedding.size == embeddingDim) { + "embedding size (${embedding.size}) must equal embeddingDim ($embeddingDim)" + } + // If the column is vector(1536), also enforce it here to avoid DB-time failures. + // require(embedding.size == 1536) { "embedding must be 1536-dim to fit vector(1536) column" } + }
14-16: Ensure proper Hibernate mapping for thevector(1536)columnIt looks like the project currently depends on the JDBC library
com.pgvector:pgvector:0.1.6, but I did not find any Hibernate vector module, JPAAttributeConverter, or@JdbcTypeCode/@Arrayannotations in the Kotlin entity classes. Without one of these, Hibernate will treatFloatArrayas a standard SQL array (float4[]) and fail to read/write the Postgresvector(1536)type. (mvnrepository.com)Please address this in one of the following ways:
Use Hibernate 6.4+ built-in vector support
- Add the Maven/Gradle dependency
org.hibernate.orm:hibernate-vector:6.4.x- In
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt(lines 14–16), update theembeddingproperty:import org.hibernate.annotations.Array import org.hibernate.annotations.JdbcTypeCode import org.hibernate.type.SqlTypes @Column(name = "embedding", columnDefinition = "vector(1536)", nullable = false) @JdbcTypeCode(SqlTypes.VECTOR) @Array(length = 1536) var embedding: FloatArraySee the official pgvector-Java README for Hibernate usage. (github.com)
Or use the PGvector Java type directly
- Change the property to
var embedding: PGvector- Implement or add an
AttributeConverter<PGvector, PGvector>(or use a community pgvector-Hibernate integration) to marshal betweenPGvectorand the database.File to update:
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt(lines 14–16)src/main/kotlin/simplerag/ragback/domain/prompt/entity/FewShot.kt (1)
13-19: Nit: Consider columnDefinition for large text fields for portability/clarityYou’re using
@Lobonanswerandevidence. Depending on the dialect, specifyingcolumnDefinition = "text"(for PostgreSQL) can reduce ambiguity in schema generation. Non-blocking.Example:
- @Column(name = "answer", nullable = false) + @Column(name = "answer", nullable = false, columnDefinition = "text") @@ - @Column(name = "evidence", nullable = false) + @Column(name = "evidence", nullable = false, columnDefinition = "text")src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (2)
49-59: Guard domain invariants early (chunking vs overlap, trimming)
- Ensure
overlapSizeis less thanchunkingSizeto avoid infinite or degenerate chunking.- You already trim
snapshotName; good.Apply:
fun toIndex(createRequest: IndexCreateRequest): Index { + require(createRequest.chunkingSize > 0) { "chunkingSize must be > 0" } + require(createRequest.overlapSize >= 0) { "overlapSize must be >= 0" } + require(createRequest.overlapSize < createRequest.chunkingSize) { + "overlapSize (${createRequest.overlapSize}) must be less than chunkingSize (${createRequest.chunkingSize})" + } return Index( snapshotName = createRequest.snapshotName.trim(), overlapSize = createRequest.overlapSize, chunkingSize = createRequest.chunkingSize,
62-69: Also validate on updates to avoid drifting into invalid stateMirror the same requirements in update() to keep invariants consistent.
Apply:
fun update(req: IndexUpdateRequest) { - snapshotName = req.snapshotName.trim() - chunkingSize = req.chunkingSize - overlapSize = req.overlapSize + require(req.chunkingSize > 0) { "chunkingSize must be > 0" } + require(req.overlapSize >= 0) { "overlapSize must be >= 0" } + require(req.overlapSize < req.chunkingSize) { + "overlapSize (${req.overlapSize}) must be less than chunkingSize (${req.chunkingSize})" + } + snapshotName = req.snapshotName.trim() + chunkingSize = req.chunkingSize + overlapSize = req.overlapSize similarityMetric = req.similarityMetric topK = req.topK reranker = req.reranker }src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt (1)
33-41: Fix non-nullableidlookup in tag mappingThe
DataFile.idproperty is declared as a non-nullableLong(val id: Long = 0), so the safe-call (?.let { … }) doesn’t compile and isn’t needed. Update the tag-lookup to use the non-nullableiddirectly:• File:
src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt(within thefromfunction)- val tags = file.id?.let { tagsByFileId[it] } ?: emptyList() + val tags = tagsByFileId[file.id] ?: emptyList()This change simplifies the code and aligns with the non-nullable declaration of
DataFile.id.src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (2)
58-89: Add happy-path extractor coverage for PDF/DOCX.You’re validating TXT extraction; consider adding similar tests that upload small PDF and DOCX files and assert non-blank
contentand correct tag normalization. I can draft minimal in-memory fixtures if helpful.
174-187: Fix PDF MIME type in testsThe
text/pdfvalue isn’t a valid PDF MIME type—it should beapplication/pdf. Updating this in your fixtures and assertions will keep the tests accurate and reduce confusion for future readers.Affected locations:
- src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt:183
- src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt:206
Proposed diff:
--- a/src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt @@ -181,7 +181,7 @@ DataFile( title = "exists2", - type = "text/pdf", + type = "application/pdf", sizeBytes = 0, sha256 = sha2, content = "fake://original/exists.txt", @@ -204,7 +204,7 @@ val dataFileDetailResponse2 = dataFiles.dataFileDetailResponseList[1] assertEquals(dataFileDetailResponse2.title, "exists2") - assertEquals(dataFileDetailResponse2.type, "text/pdf") + assertEquals(dataFileDetailResponse2.type, "application/pdf") assertEquals(dataFileDetailResponse2.sizeMB, 0.0) assertEquals(dataFileDetailResponse2.sha256, sha2)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (34)
build.gradle(1 hunks)src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileRequestDTO.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt(2 hunks)src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/document/entity/DataFileTag.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/document/entity/Tag.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt(5 hunks)src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt(2 hunks)src/main/kotlin/simplerag/ragback/domain/index/embed/FakeEmbder.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/entity/DataFileIndex.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt(4 hunks)src/main/kotlin/simplerag/ragback/domain/prompt/entity/FewShot.kt(1 hunks)src/main/kotlin/simplerag/ragback/domain/prompt/entity/Prompt.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/config/S3Config.kt(0 hunks)src/main/kotlin/simplerag/ragback/global/error/ErrorCode.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt(0 hunks)src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.kt(2 hunks)src/main/kotlin/simplerag/ragback/global/util/extractor/ContentExtractor.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/extractor/ContentExtractorResolver.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/extractor/DocxContentExtractor.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/extractor/PdfContentExtractor.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/extractor/TxtContentExtractor.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/loader/ContentLoader.kt(0 hunks)src/main/kotlin/simplerag/ragback/global/util/loader/HttpContentLoader.kt(0 hunks)src/main/kotlin/simplerag/ragback/global/util/s3/S3Type.kt(0 hunks)src/main/kotlin/simplerag/ragback/global/util/s3/S3Util.kt(0 hunks)src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt(0 hunks)src/main/resources/application-local.yml(2 hunks)src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt(5 hunks)src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt(5 hunks)src/test/resources/application-test.yml(1 hunks)
💤 Files with no reviewable changes (7)
- src/main/kotlin/simplerag/ragback/global/config/S3Config.kt
- src/main/kotlin/simplerag/ragback/global/util/loader/ContentLoader.kt
- src/main/kotlin/simplerag/ragback/global/storage/FakeS3Util.kt
- src/main/kotlin/simplerag/ragback/global/util/loader/HttpContentLoader.kt
- src/main/kotlin/simplerag/ragback/global/util/s3/S3Util.kt
- src/main/kotlin/simplerag/ragback/global/util/s3/S3Type.kt
- src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt
🧰 Additional context used
🧬 Code graph analysis (17)
src/main/kotlin/simplerag/ragback/global/error/ErrorCode.kt (2)
src/main/kotlin/simplerag/ragback/global/error/CustomException.kt (4)
errorCode(19-23)errorCode(14-17)errorCode(9-12)errorCode(3-7)src/main/kotlin/simplerag/ragback/global/error/GlobalExceptionHandler.kt (2)
handleFileException(72-78)handleMissingPart(37-43)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt (2)
dataFileId(11-34)max(36-53)src/main/kotlin/simplerag/ragback/domain/index/controller/IndexController.kt (1)
updateIndexes(41-48)
src/main/kotlin/simplerag/ragback/global/util/extractor/ContentExtractor.kt (2)
src/main/kotlin/simplerag/ragback/global/util/loader/ContentLoader.kt (2)
load(4-6)load(5-5)src/main/kotlin/simplerag/ragback/global/util/s3/S3Util.kt (1)
upload(5-11)
src/main/kotlin/simplerag/ragback/domain/index/entity/DataFileIndex.kt (2)
src/main/kotlin/simplerag/ragback/global/entity/BaseEntity.kt (1)
name(11-21)src/main/kotlin/simplerag/ragback/domain/chat/entity/Model.kt (1)
name(8-29)
src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (1)
src/main/kotlin/simplerag/ragback/global/util/loader/ContentLoader.kt (1)
load(4-6)
src/main/kotlin/simplerag/ragback/domain/index/embed/FakeEmbder.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt (2)
openAiEmbeddingModel(6-13)embed(11-12)src/main/kotlin/simplerag/ragback/domain/index/embed/Embedder.kt (2)
dim(3-6)embed(5-5)
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)
src/main/kotlin/simplerag/ragback/domain/index/repository/ChunkEmbeddingRepository.kt (1)
interface ChunkEmbeddingRepository : JpaRepository<ChunkEmbedding, Long>(6-6)
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (1)
src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (1)
dataFileService(20-64)
src/main/kotlin/simplerag/ragback/domain/prompt/entity/FewShot.kt (2)
src/main/kotlin/simplerag/ragback/global/entity/BaseEntity.kt (1)
name(11-21)src/main/kotlin/simplerag/ragback/domain/chat/entity/Model.kt (1)
name(8-29)
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (1)
src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (1)
dataFileService(20-64)
src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (1)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt (1)
dataFileId(11-34)
src/main/kotlin/simplerag/ragback/domain/document/entity/DataFileTag.kt (1)
src/main/kotlin/simplerag/ragback/domain/document/repository/DataFileTagRepository.kt (1)
existsByDataFileIdAndTagId(8-15)
src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (1)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt (1)
dataFileId(11-34)
src/main/kotlin/simplerag/ragback/domain/prompt/entity/Prompt.kt (2)
src/main/kotlin/simplerag/ragback/domain/chat/entity/Model.kt (1)
name(8-29)src/main/kotlin/simplerag/ragback/global/entity/BaseEntity.kt (1)
name(11-21)
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt (1)
openAiEmbeddingModel(6-13)src/main/kotlin/simplerag/ragback/domain/index/embed/Embedder.kt (2)
dim(3-6)embed(5-5)
src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileRequestDTO.kt (1)
src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (1)
dataFileService(20-64)
src/main/kotlin/simplerag/ragback/domain/document/entity/Tag.kt (1)
src/main/kotlin/simplerag/ragback/global/entity/BaseEntity.kt (1)
name(11-21)
🪛 YAMLlint (1.37.1)
src/test/resources/application-test.yml
[error] 14-14: no new line character at the end of file
(new-line-at-end-of-file)
src/main/resources/application-local.yml
[error] 23-23: no new line character at the end of file
(new-line-at-end-of-file)
🪛 detekt (1.23.8)
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt
[warning] 56-56: The caught exception is swallowed. The original exception could be lost.
(detekt.exceptions.SwallowedException)
🔇 Additional comments (23)
src/main/kotlin/simplerag/ragback/global/error/ErrorCode.kt (1)
16-16: ✅ File type validation consistently surfaces INVALID_FILE_TYPE exceptionI’ve confirmed that all content-type resolution now goes through
MultipartFile.resolveContentType(), which throwsCustomException(ErrorCode.INVALID_FILE_TYPE)for any unsupported extension. A search for manual extension checks or default content-type fallbacks found no other occurrences outside of:
src/main/kotlin/simplerag/ragback/global/util/converter/FileConvertUtil.ktsrc/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.ktNo additional call sites need updates—this change fully covers unsupported file-type handling.
src/main/kotlin/simplerag/ragback/domain/index/entity/DataFileIndex.kt (1)
19-24: Ensure JPA-generated IDs are nullable and mutableI’ve verified there are no other Kotlin entities defining an
@Idfield with a default numeric value (sentinel0). OnlyDataFileIndex.ktneeds updating:• File: src/main/kotlin/simplerag/ragback/domain/index/entity/DataFileIndex.kt
Apply:) : BaseEntity() { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) @Column(name = "data_files_indexes_id") - val id: Long = 0 + var id: Long? = null }With this change, JPA will correctly recognize an uninitialized (
null) identifier before persistence.src/main/resources/application-local.yml (2)
15-17: Suspicious empty key under spring.ai.model.embedding
options:is empty and thenmodel: text-embedding-3-smallsits at the same indentation level. If the binder expectsspring.ai.model.embedding.model, the emptyoptions:line is unnecessary and could be misleading.Confirm your binder paths and consider removing the empty
options:or nesting under it if intentional.
9-9: Verify create-drop is confined to the local profileSearch across
src/main/resourcesshows onlyapplication-local.ymldefinesddl-auto: create-dropNo other profiles or the default configuration specify
spring.jpa.hibernate.ddl-autowith a destructive setting.Please ensure that:
- The local profile is strictly used in disposable, developer-only environments.
- There’s no path in CI/CD or shared servers that activates the local profile.
- You have safeguards (documentation, deployment scripts, environment checks) to prevent accidental use of this profile against any persistent/shared database.
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (1)
35-35: Non-nullindexIdis safe to keep
All uses oftoIndexPreviewResponseandtoIndexDetailResponsehappen after theIndexentity has been saved or loaded from the repository, soindex.idis guaranteed to be non-null. No controller returns these DTOs before persistence, and all service methods map only fully initialized entities.src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileRequestDTO.kt (2)
7-11: Nice: Request-level @Valid on items ensures nested validation is appliedGood use of container element validation with a clear minimum-size message.
11-20: Nested type references updated — no further action required
- Verified with
rg -nP -g '*.kt' -C2 '\bDataFileCreateItem\b'that the only occurrences ofDataFileCreateItemare its declaration insideDataFileBulkCreateRequestand the test imports/uses inDataFileServiceTest.kt.- Test code correctly imports the nested type via
import simplerag.ragback.domain.document.dto.DataFileBulkCreateRequest.DataFileCreateItem.- No remaining controllers, clients, or other modules reference a top-level
DataFileCreateItem, so all downstream code has been updated for the new FQCN.build.gradle (2)
49-49: The scripts above will reveal whether a Hibernate@Typeannotation, a customTypeContributor, or anycom.pgvector‐specific configuration is present, as well as inspect database and JPA settings. Once we have these results, we can confirm if the PGvector mapping is fully wired up or if additional AttributeConverter/column mapping work is required.
60-62: Verify dependency footprint and pin extractor library versionsI wasn’t able to inspect the runtimeClasspath here due to Gradle daemon connectivity errors. Please verify locally that pulling in PDFBox and POI-OOXML doesn’t drag in unexpectedly large transitive dependencies, and pin their versions centrally for easier future CVE upgrades:
• In build.gradle (around lines 60–62), replace the direct
implementationcalls with centrally managed constraints:dependencyManagement { imports { mavenBom "org.springframework.ai:spring-ai-bom:${springAiVersion}" } + dependencies { + dependency "org.apache.pdfbox:pdfbox:2.0.30" + dependency "org.apache.poi:poi-ooxml:5.2.5" + } }• Run locally:
./gradlew --no-daemon dependencies --configuration runtimeClasspath # or per-dependency: ./gradlew dependencyInsight \ --configuration runtimeClasspath \ --dependency org.apache.pdfbox:pdfbox ./gradlew dependencyInsight \ --configuration runtimeClasspath \ --dependency org.apache.poi:poi-ooxml• Ensure your extraction code only processes text so you don’t inadvertently load images or large binary streams into memory.
Let me know if any heavy transitive pulls appear so we can decide on exclusions or shading.
src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt (1)
25-28: Ensure lazy loading forcontentand verify bytecode enhancement
- File:
src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt(lines 25–28)Apply the following diff to mark
contentas a lazy LOB and use an explicit Postgres text column:- @Column(nullable = false) - @Lob - val content: String, + @Lob + @Basic(fetch = FetchType.LAZY) + @Column(nullable = false, columnDefinition = "text") + val content: String,Next steps:
• I did not locate any Hibernate bytecode‐enhancement settings in your configuration files or build scripts. Please verify that build-time enhancement is enabled (for example, via the
org.hibernate.ormGradle plugin or the corresponding Maven plugin) so that@Basic(fetch = LAZY)on a non-@OneToMany/@ManyToOnefield takes effect.
• If you cannot enable bytecode enhancement, reconsider isolating this largecontentpayload—either move it into a separate entity or load it via a dedicated repository method when needed.src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt (2)
44-64: Size conversion and timestamp mapping verifiedBoth
sizeMBconversion andlastModifiedmapping have been confirmed as correct—no changes required.
sizeMBusestoMegaBytes(2)to round to two decimal places as intended.BaseEntity.updatedAtis declared as a non-nullableLocalDateTimeand annotated with@LastModifiedDatealongside@EntityListeners(AuditingEntityListener::class), ensuring it’s automatically set on persist and update.
69-73: Mapping Safety Confirmed: Tag.id is Non-nullableTag entity defines
val id: Long = 0, ensuring a non-nullable identifier. Thefrom(tag: Tag): TagDTOmapper inDataFileResponseDTO.ktis therefore safe and will not encounter null-related issues.• Checked in
src/main/kotlin/simplerag/ragback/domain/document/entity/Tag.ktat line 18:
val id: Long = 0
• Mapping occurs insrc/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.ktlines 69–73:
fun from(tag: Tag): TagDTO = TagDTO(tag.id, tag.name)Approving these code changes.
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (1)
23-25: Comma in enum looks good; please confirm the E5 embedder outputs 768-dimensional vectorsI didn’t find any Kotlin class or function in the repo that implements “E5_BASE” embedding logic, so we need to be sure wherever E5_BASE is used, the embedding pipeline actually emits 768-length vectors.
• Verify your E5 embedding implementation (wherever you call
"intfloat/e5-base-v2") produces 768-dimensional output.
• Add or update tests that assertEmbeddingModel.E5_BASE.dim == 768and that the returned vector’s size matches.src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (4)
67-71: Consistent not-found handling.Switch to
IndexException(ErrorCode.NOT_FOUND)reads consistent with the rest of the service’s exception semantics.
78-86: Update path reads clean and safe.Null-guard + update + mapping back to preview is standard; no issues spotted.
90-94: Delete path is correct.Null-guard and delete are straightforward.
39-44: Approved: Sequence return type confirmed and DTO property naming consistent
LGTM; direct use of persisted content is correct.
TextChunker.chunkByCharsSeqreturns aSequence<String>, so you’re already using the most memory-friendly iteration for large files.IndexRequestDTOdefinesval dataFileId: List<Long>and there are no occurrences ofdataFileIds, so the DTO property name is consistent.src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (1)
82-83: Good assertion on persisted content.Verifies we’re storing extracted content, not a URL. This guards the new model well.
src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (1)
35-36: LGTM: injecting repositories for test setup/teardownWiring
DataFileRepositoryandChunkEmbeddingRepositorydirectly improves test determinism and speeds up setup.src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (4)
24-24: LGTM: class-level read-only transaction defaultSetting
@Transactional(readOnly = true)at class level with selective write overrides on mutating methods is a solid default.
78-80: LGTM: grouping by non-null id simplifies pagination assemblyUsing
it.dataFile.iddirectly is consistent with the non-null id migration.
88-93: LGTM: explicit 404 mapping for deletion pathResolving by id first and throwing
NOT_FOUNDkeeps controller semantics clear.
52-53: Do not reject empty content during extraction; validate post-extraction if neededThe extractor’s responsibility is to return whatever it finds—even if that’s an empty result—reserving outright file rejection for truly unreadable or corrupted inputs. If your business logic requires that files contain non-blank content, enforce that requirement after calling
extractContent, not within the extractor itself.• Location: src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt, around line 52
• Change: remove or ignore blank‐content rejection here; add an after-extraction validation step in the service or pipeline if needed.
| testImplementation "org.testcontainers:postgresql" | ||
| testImplementation "org.testcontainers:junit-jupiter" | ||
| testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector' | ||
| testImplementation 'com.pgvector:pgvector:0.1.6' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Avoid version drift; centralize pgvector version and deduplicate test dependency.
Declare the version once and reuse it for both main and test scopes. Also, you typically don’t need testImplementation if implementation already pulls it transitively for tests.
Apply:
- implementation 'com.pgvector:pgvector:0.1.6'
+ def pgvectorVersion = '0.1.6'
+ implementation "com.pgvector:pgvector:${pgvectorVersion}"
...
- testImplementation 'com.pgvector:pgvector:0.1.6'
+ // If tests require explicit access, keep this; otherwise remove.
+ testImplementation "com.pgvector:pgvector:${pgvectorVersion}"🤖 Prompt for AI Agents
In build.gradle around line 55, the pgvector version is hardcoded for
testImplementation which risks version drift and duplicates the dependency;
declare a single version variable (e.g., ext or extra property like
pgvectorVersion = '0.1.6') and use it for the implementation dependency
(implementation "com.pgvector:pgvector:${pgvectorVersion}"), then remove the
testImplementation line unless tests need a separate scope—if tests require it,
reference the same variable (testImplementation
"com.pgvector:pgvector:${pgvectorVersion}") so the version is centralized and
not duplicated.
| data class DataFileCreateItem( | ||
| @field:NotBlank(message = "title은 비어있을 수 없습니다") | ||
| @field:Size(max = 100) | ||
| val title: String, | ||
|
|
||
| data class DataFileCreateItem( | ||
| @field:NotBlank(message = "title은 비어있을 수 없습니다") | ||
| @field:Size(max = 100) | ||
| val title: String, | ||
|
|
||
| @field:Size(max = 10, message = "태그는 최대 10개까지 가능합니다") | ||
| val tags: List<String> = emptyList() | ||
| ) | ||
| @field:Size(max = 10, message = "태그는 최대 10개까지 가능합니다") | ||
| val tags: List<String> = emptyList() | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Optional: Strengthen validation on tags to prevent blank/oversized tag values
Currently only the list size is validated. Consider enforcing per-tag constraints.
Apply:
data class DataFileCreateItem(
@field:NotBlank(message = "title은 비어있을 수 없습니다")
@field:Size(max = 100)
val title: String,
- @field:Size(max = 10, message = "태그는 최대 10개까지 가능합니다")
- val tags: List<String> = emptyList()
+ @field:Size(max = 10, message = "태그는 최대 10개까지 가능합니다")
+ val tags: List<@NotBlank(message = "빈 태그는 허용되지 않습니다") @Size(max = 30, message = "태그는 최대 30자까지 가능합니다") String> = emptyList()
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| data class DataFileCreateItem( | |
| @field:NotBlank(message = "title은 비어있을 수 없습니다") | |
| @field:Size(max = 100) | |
| val title: String, | |
| data class DataFileCreateItem( | |
| @field:NotBlank(message = "title은 비어있을 수 없습니다") | |
| @field:Size(max = 100) | |
| val title: String, | |
| @field:Size(max = 10, message = "태그는 최대 10개까지 가능합니다") | |
| val tags: List<String> = emptyList() | |
| ) | |
| @field:Size(max = 10, message = "태그는 최대 10개까지 가능합니다") | |
| val tags: List<String> = emptyList() | |
| ) | |
| data class DataFileCreateItem( | |
| @field:NotBlank(message = "title은 비어있을 수 없습니다") | |
| @field:Size(max = 100) | |
| val title: String, | |
| @field:Size(max = 10, message = "태그는 최대 10개까지 가능합니다") | |
| val tags: List<@NotBlank(message = "빈 태그는 허용되지 않습니다") @Size(max = 30, message = "태그는 최대 30자까지 가능합니다") String> = emptyList() | |
| ) |
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileRequestDTO.kt
around lines 12 to 19, strengthen validation on the tags list by adding
per-element constraints so individual tags cannot be blank or too long; change
the tags declaration to use type-use annotations (e.g. List<@field:NotBlank
@field:Size(max = 30) String> or preferably List<@NotBlank @Size(max = 30)
String>) and add @field:Valid if needed, update/adjust validation messages and
imports accordingly so each tag is validated for non-blank content and a maximum
length (choose an appropriate max like 30).
| id = file.id, | ||
| sha256 = file.sha256, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Map id defensively; entity ids can be null before persistence.
After switching entity id to nullable (recommended), avoid propagating a null or “0” id to API responses.
Apply:
- id = file.id,
+ id = requireNotNull(file.id) { "DataFile.id must be assigned" },
sha256 = file.sha256,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| id = file.id, | |
| sha256 = file.sha256, | |
| ) | |
| id = requireNotNull(file.id) { "DataFile.id must be assigned" }, | |
| sha256 = file.sha256, | |
| ) |
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt
around lines 21 to 23, the code maps file.id directly but entity ids can be null
before persistence; change the mapping to handle nulls defensively: either make
DataFileResponseDTO.id nullable (Long?) and assign id = file.id ?: null, or if
the API requires a non-null id, fail fast by checking file.id and throw an
IllegalStateException with a clear message when it's null—update the DTO type
and callers accordingly so no null or sentinel “0” value is returned in
responses.
| @Id @GeneratedValue(strategy = GenerationType.IDENTITY) | ||
| @Column(name = "data_files_id") | ||
| val id: Long? = null, | ||
| ) : BaseEntity() { | ||
| val id: Long = 0 | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generated ID cannot reliably be a non-null val initialized to 0; use nullable or mutable id.
Hibernate needs to assign the identifier after insert. val id: Long = 0 risks persistence or hydration issues. The common pattern is var id: Long? = null. This also aligns with DTOs that can assert non-null post-persist.
Apply:
- @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
- @Column(name = "data_files_id")
- val id: Long = 0
+ @Id
+ @GeneratedValue(strategy = GenerationType.IDENTITY)
+ @Column(name = "data_files_id")
+ var id: Long? = nullFollow-up: update DTO mappers to require a non-null id at mapping time (see comments in DataFileResponseDTO.kt).
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Id @GeneratedValue(strategy = GenerationType.IDENTITY) | |
| @Column(name = "data_files_id") | |
| val id: Long? = null, | |
| ) : BaseEntity() { | |
| val id: Long = 0 | |
| @Id | |
| @GeneratedValue(strategy = GenerationType.IDENTITY) | |
| @Column(name = "data_files_id") | |
| var id: Long? = null |
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt around
lines 29-32, the entity declares a non-null val id initialized to 0 which
prevents Hibernate from assigning the generated identifier correctly; change the
declaration to a nullable mutable property (e.g., var id: Long? = null) so the
persistence provider can set the ID after insert, and then update DTO mappers to
treat the entity ID as nullable at mapping time and assert non-null only after
persist (per comment in DataFileResponseDTO.kt).
| fun from(title: String, type: String, sizeBytes : Long, sha256 : String, content: String): DataFile { | ||
| return DataFile(title, type, sizeBytes, sha256, content) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Factory method aligns with new model; consider basic invariants.
Add lightweight checks to guard against invalid inputs, e.g., blank title/type, negative sizes.
Example:
- fun from(title: String, type: String, sizeBytes : Long, sha256 : String, content: String): DataFile {
- return DataFile(title, type, sizeBytes, sha256, content)
- }
+ fun from(title: String, type: String, sizeBytes: Long, sha256: String, content: String): DataFile {
+ require(title.isNotBlank()) { "title must not be blank" }
+ require(type.isNotBlank()) { "type must not be blank" }
+ require(sizeBytes >= 0) { "sizeBytes must be >= 0" }
+ require(sha256.length == 64) { "sha256 must be 64-char hex" }
+ return DataFile(title, type, sizeBytes, sha256, content)
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| fun from(title: String, type: String, sizeBytes : Long, sha256 : String, content: String): DataFile { | |
| return DataFile(title, type, sizeBytes, sha256, content) | |
| } | |
| fun from(title: String, type: String, sizeBytes: Long, sha256: String, content: String): DataFile { | |
| require(title.isNotBlank()) { "title must not be blank" } | |
| require(type.isNotBlank()) { "type must not be blank" } | |
| require(sizeBytes >= 0) { "sizeBytes must be >= 0" } | |
| require(sha256.length == 64) { "sha256 must be 64-char hex" } | |
| return DataFile(title, type, sizeBytes, sha256, content) | |
| } |
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt around
lines 34 to 36, the factory method does not validate inputs; add lightweight
invariants to guard against invalid data by checking: title and type are not
blank (trim before check), sizeBytes is >= 0, sha256 is not blank (optionally
validate length/hex if desired), and content is non-null (allow empty if
acceptable); enforce these using Kotlin require(...) or throw
IllegalArgumentException with clear messages, and return the DataFile only after
all checks pass.
| val datafile = dataFileRepository.save(DataFile("a", "txt", 1000000L, "sdf", "sfd")) | ||
|
|
||
| val indexCreateRequest = | ||
| IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true) | ||
| IndexCreateRequest(listOf(datafile.id), "test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_SMALL, true) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Prefer factory constructor to honor domain invariants
Use DataFile.from(...) for clarity and to avoid bypassing domain invariants. Also, give the SHA-256 a realistic shape to prevent accidental collisions in future tests.
- val datafile = dataFileRepository.save(DataFile("a", "txt", 1000000L, "sdf", "sfd"))
+ val datafile = dataFileRepository.save(
+ DataFile.from(
+ title = "a",
+ type = "txt",
+ sizeBytes = 1_000_000L,
+ sha256 = "00000000000000000000000000000000000000000000000000000000000000aa",
+ content = "sfd"
+ )
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| val datafile = dataFileRepository.save(DataFile("a", "txt", 1000000L, "sdf", "sfd")) | |
| val indexCreateRequest = | |
| IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true) | |
| IndexCreateRequest(listOf(datafile.id), "test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_SMALL, true) | |
| val datafile = dataFileRepository.save( | |
| DataFile.from( | |
| title = "a", | |
| type = "txt", | |
| sizeBytes = 1_000_000L, | |
| sha256 = "00000000000000000000000000000000000000000000000000000000000000aa", | |
| content = "sfd" | |
| ) | |
| ) | |
| val indexCreateRequest = | |
| IndexCreateRequest( | |
| listOf(datafile.id), | |
| "test", | |
| 1, | |
| 0, | |
| SimilarityMetric.COSINE, | |
| 1, | |
| EmbeddingModel.TEXT_EMBEDDING_3_SMALL, | |
| true | |
| ) |
🤖 Prompt for AI Agents
In src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
around lines 62 to 66, the test constructs a DataFile via the primary
constructor which can bypass domain invariants and uses a short dummy SHA-256;
change the test to create the DataFile using DataFile.from(...) (the factory
constructor) and pass a realistic 64-character hex SHA-256 string so the factory
enforces invariants and reduces accidental collisions in future tests.
| IndexCreateRequest(listOf(datafile.id), "test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_SMALL, true) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Nit: use named arguments for readability on long parameter lists
IndexCreateRequest(...) has many positional parameters. Named args reduce mistakes when signatures evolve.
Example:
val indexCreateRequest = IndexCreateRequest(
dataFileId = listOf(datafile.id),
snapshotName = "test",
chunkingSize = 1,
overlapSize = 0,
similarityMetric = SimilarityMetric.COSINE,
topK = 1,
embeddingModel = EmbeddingModel.TEXT_EMBEDDING_3_SMALL,
reranker = true
)🤖 Prompt for AI Agents
In src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
around lines 65 to 66, the IndexCreateRequest invocation uses many positional
parameters which harms readability and future-proofing; change it to use named
arguments for each parameter (e.g., dataFileId = ..., snapshotName = ...,
chunkingSize = ..., overlapSize = ..., similarityMetric = ..., topK = ...,
embeddingModel = ..., reranker = ...) so the call is self-documenting and
resilient to signature changes.
| val indexCreateRequest = | ||
| IndexCreateRequest("test", 1, 1, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true) | ||
| IndexCreateRequest(listOf(1),"test", 1, 1, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fragile test: constructing request with a non-existent DataFile id
IndexCreateRequest(listOf(1), ...) assumes a DataFile with id = 1 exists. Test order is not guaranteed; this can intermittently fail if cleanup changes or the first test doesn’t run. Persist a local DataFile and use its id.
// given
- val indexCreateRequest =
- IndexCreateRequest(listOf(1),"test", 1, 1, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true)
+ val df = dataFileRepository.save(
+ DataFile.from(
+ title = "overlap",
+ type = "txt",
+ sizeBytes = 10L,
+ sha256 = "00000000000000000000000000000000000000000000000000000000000000bb",
+ content = "x"
+ )
+ )
+ val indexCreateRequest =
+ IndexCreateRequest(listOf(df.id), "test", 1, 1, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true)🤖 Prompt for AI Agents
In src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
around lines 81 to 83, the test constructs IndexCreateRequest with a hardcoded
DataFile id (listOf(1)) which is fragile because tests run in any order and that
id may not exist; instead create and persist a local DataFile within the test
(use the test repository or factory to save a DataFile) and use the returned
entity's id when building IndexCreateRequest, and ensure any necessary cleanup
or transactional rollback is in place so the test is isolated.
| ddl-auto: create-drop | ||
| ai: | ||
| openai: | ||
| api-key: dummy | ||
| model: | ||
| embedding: | ||
| options: | ||
| model: text-embedding-3-small No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
YAML indentation/keys are off for embedding model; property path should be under openai.embedding.options.model.
The current structure:
- has
options:andmodel:at the same indentation - uses
spring.ai.model.embedding...instead ofspring.ai.openai.embedding...
This will not bind to OpenAiEmbeddingModel.
Apply:
spring:
datasource:
driver-class-name: org.postgresql.Driver
jpa:
database-platform: org.hibernate.dialect.PostgreSQLDialect
hibernate:
ddl-auto: create-drop
ai:
openai:
api-key: dummy
- model:
- embedding:
- options:
- model: text-embedding-3-small
+ embedding:
+ options:
+ model: text-embedding-3-smallValidate property binding:
🏁 Script executed:
#!/bin/bash
# Expect this property path to exist in tests after the fix.
rg -n 'spring\.ai\.openai\.embedding\.options\.model' src/test/resources --type=yamlLength of output: 86
🏁 Script executed:
#!/bin/bash
# Display application-test.yml with line numbers
nl -ba src/test/resources/application-test.yml || echo "application-test.yml not found"
# Show context around ai, openai, embedding, options, and model keys
rg -n -C2 '^ *ai:' src/test/resources/application-test.yml || true
rg -n -C2 '^ *openai:' src/test/resources/application-test.yml || true
rg -n -C2 '^ *embedding:' src/test/resources/application-test.yml || true
rg -n -C2 '^ *options:' src/test/resources/application-test.yml || true
rg -n -C2 '^ *model:' src/test/resources/application-test.yml || trueLength of output: 1066
Fix YAML nesting for embedding model in application-test.yml
The embedding block is currently defined under spring.ai.model instead of under spring.ai.openai, so Spring won’t bind the OpenAiEmbeddingModel. Please update src/test/resources/application-test.yml to nest embedding (and its options.model) under openai:
spring:
datasource:
driver-class-name: org.postgresql.Driver
jpa:
database-platform: org.hibernate.dialect.PostgreSQLDialect
hibernate:
ddl-auto: create-drop
ai:
openai:
api-key: dummy
- model:
- embedding:
- options:
- model: text-embedding-3-small
+ embedding:
+ options:
+ model: text-embedding-3-small• File to update: src/test/resources/application-test.yml (around lines 11–14)
• After applying, verify the binding by running:
rg -n 'spring\.ai\.openai\.embedding\.options\.model' src/test/resources --type=yaml🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 14-14: no new line character at the end of file
(new-line-at-end-of-file)
🤖 Prompt for AI Agents
In src/test/resources/application-test.yml around lines 7 to 14, the embedding
configuration is nested under spring.ai.model instead of spring.ai.openai; move
the entire embedding block (including options and model) so it is a child of
openai (spring.ai.openai.embedding.options.model) so Spring can bind
OpenAiEmbeddingModel; update indentation accordingly and save the file, then
verify with the provided ripgrep command to ensure
spring.ai.openai.embedding.options.model is present.
| model: | ||
| embedding: | ||
| options: | ||
| model: text-embedding-3-small No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Add a trailing newline.
Satisfy linters and POSIX text file conventions.
- model: text-embedding-3-small
+ model: text-embedding-3-small
+📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| model: text-embedding-3-small | |
| model: text-embedding-3-small | |
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 14-14: no new line character at the end of file
(new-line-at-end-of-file)
🤖 Prompt for AI Agents
In src/test/resources/application-test.yml at line 14, the file is missing a
trailing newline; add a newline character at the end of the file so the last
line ("model: text-embedding-3-small") is terminated by a newline to satisfy
linters and POSIX conventions.
📌 Overview
add content extractor
🔍 Related Issues
✨ Changes
♻️ Refactor: val 불변성 확
♻️ Refactor: dto 내재화
🐛 Fix: dto 에러 수정
✨ Feature: add ContentExtractor
♻️ Refactor: loader를 extractor로 변경
✅ Test: index test add
📸 Screenshots / Test Results (Optional)
Attach images or videos if necessary.
✅ Checklist
🗒️ Additional Notes
Add any other context or information here.
Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Chores