Skip to content

Conversation

@catturtle123
Copy link
Contributor

@catturtle123 catturtle123 commented Aug 21, 2025

📌 PR 개요

index, dataFile converting refactoring

🔍 관련 이슈

✨ 변경 사항

📸 스크린샷 / 동작 확인 결과 (선택)

필요하다면 이미지나 동영상 첨부.

✅ 체크리스트

  • 코드 스타일을 준수했습니다.
  • 불필요한 주석이나 콘솔 로그를 제거했습니다.
  • 관련 테스트를 작성/수정했습니다.
  • 로컬 환경에서 모든 기능이 정상 작동함을 확인했습니다.

🗒️ 추가 메모

기타 참고 사항을 적어주세요.

Summary by CodeRabbit

  • New Features
    • 파일 상세 응답에 태그, 최종 수정 시각(lastModified) 및 크기(MB)가 추가되었고, 업로드에 새로운 미리보기 응답 형식이 제공됩니다.
  • Changes
    • 데이터 파일 목록 조회에 기본 페이징 값(cursor=0, take=10)과 cursor 최소값 검증이 적용됩니다.
    • 임베딩 모델에 대형 모델 옵션이 추가되었습니다.
  • Performance
    • 파일 목록의 태그 조회가 일괄 처리로 최적화되어 호출 횟수가 감소합니다.
  • Refactor
    • 도메인↔DTO 변환이 정적 팩토리/컴패니언으로 통합되고 일부 변환 유틸이 제거되었습니다.
  • Tests
    • 테스트환경이 Testcontainers 기반 PostgreSQL로 전환되고 테스트 의존성 및 초기화 스크립트가 추가/업데이트되었습니다.

@catturtle123 catturtle123 self-assigned this Aug 21, 2025
@catturtle123 catturtle123 added the chore other work label Aug 21, 2025
@catturtle123 catturtle123 linked an issue Aug 21, 2025 that may be closed by this pull request
1 task
@coderabbitai
Copy link

coderabbitai bot commented Aug 21, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Upload now returns DataFilePreviewResponseList; listing endpoint adds default paging params (cursor=0, take=10) with cursor validation. DataFile DTOs and service use factory/companion converters and bulk tag fetching. Index converters moved into companion objects and test infra switched to Testcontainers Postgres with pgvector.

Changes

Cohort / File(s) Summary
DataFile Controller
src/.../document/controller/DataFileController.kt
upload return type changed to ApiResponse<DataFilePreviewResponseList>; getDataFiles adds @RequestParam(defaultValue="0") cursor, @RequestParam(defaultValue="10") take, and @Min(0) on cursor.
DataFile DTOs
src/.../document/dto/DataFileResponseDTO.kt
Renamed DataFileResponseListDataFilePreviewResponseList; added DataFilePreviewResponse.from(...); updated DataFileDetailResponse fields (updatedAtlastModified, sizeBytessizeMB); added TagDTO companion factories and MB conversion util.
DataFile Entity
src/.../document/entity/DataFile.kt
Added companion object factory from(title, type, sizeBytes, sha256, fileUrl).
DataFileTag Repository
src/.../document/repository/DataFileTagRepository.kt
Removed single-entity JPQL method; added @EntityGraph(attributePaths = ["tag"]) fun findAllByDataFileIn(dataFiles: Collection<DataFile>): List<DataFileTag> for bulk tag fetch.
DataFile Service
src/.../document/service/DataFileService.kt
upload uses DataFile.from(...) and returns DataFilePreviewResponseList; getDataFiles bulk-fetches tags via findAllByDataFileIn, builds tagsByFileId map, uses DataFileDetailResponseList.from(...), computes nextCursor from last item.
Index conversions & DTOs
src/.../index/converter/IndexConverter.kt, src/.../index/dto/IndexResponseDTO.kt, src/.../index/entity/Index.kt, src/.../index/service/IndexService.kt
Deleted standalone converter file; added companion-object conversion helpers on Index and Index DTOs; IndexService updated to use these companions.
Build & Tests / Test DB
build.gradle, src/test/.../service/*Test.kt, src/test/resources/application-test.yml, src/test/resources/db/init.sql
Removed H2 test runtime; added Testcontainers Postgres, pgvector deps; tests wired to PostgreSQLContainer via @ServiceConnection and @TestConstructor; test config switched to Postgres dialect; db/init.sql adds CREATE EXTENSION IF NOT EXISTS vector;.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Controller
  participant Service
  participant Storage
  participant Repo as DataFileRepo
  participant TagRepo as DataFileTagRepo

  Client->>Controller: POST /data-files (files, req)
  Controller->>Service: upload(files, req)
  loop per file
    Service->>Storage: store file -> (fileUrl,size,sha256)
    Service->>Service: DataFile.from(...)
    Service->>Repo: save(dataFile)
    Repo-->>Service: saved dataFile (id)
    Service->>Service: DataFilePreviewResponse.from(dataFile)
  end
  Service-->>Controller: DataFilePreviewResponseList
  Controller-->>Client: 200 ApiResponse<DataFilePreviewResponseList>
Loading
sequenceDiagram
  autonumber
  participant Client
  participant Controller
  participant Service
  participant Repo as DataFileRepo
  participant TagRepo as DataFileTagRepo

  Client->>Controller: GET /data-files?cursor&take (defaults 0,10)
  Controller->>Service: getDataFiles(cursor, take)
  Service->>Repo: findPageAfter(cursor, take)
  Repo-->>Service: Page<DataFile>(content, hasNext)
  Service->>TagRepo: findAllByDataFileIn(content)
  TagRepo-->>Service: List<DataFileTag> (eager tag)
  Service->>Service: Map tags by fileId (TagDTO.from)
  Service->>Service: DataFileDetailResponseList.from(content, tagsByFileId, nextCursor, hasNext)
  Service-->>Controller: DataFileDetailResponseList
  Controller-->>Client: 200 ApiResponse<DataFileDetailResponseList>
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Assessment against linked issues

Objective Addressed Explanation
dataFile 로직을 converter로 수정하고 엔티티 수정 로직 변경 (#12) Conversions moved to companion/factory methods and DTO companion mappers instead of a standalone "converter" module—unclear if this matches the issue's requirement to use a dedicated converter.

Assessment against linked issues: Out-of-scope changes

Code Change Explanation
Deletion of index converter and companion conversions added to Index/DTO (src/main/kotlin/.../index/converter/IndexConverter.kt removed; src/main/kotlin/.../index/entity/Index.kt and src/main/kotlin/.../index/dto/IndexResponseDTO.kt changed) Linked issue #12 targets DataFile converter refactor; moving index converters is unrelated to the stated objective.
Test infra / DB config changes (build.gradle deps; src/test/resources/application-test.yml; Testcontainers in tests: src/test/.../*Test.kt) Linked issue focuses on dataFile converter refactor; switching tests from H2 → Testcontainers/Postgres and adding pgvector are infrastructure changes not described in the issue.

Possibly related PRs

"나는 토끼, 코드 위를 깡충깡충,
프리뷰 한 줄씩 반짝반짝 🐇
태그는 모아 한 번에, 팩토리로 빚어,
테스트는 포스트그레스와 함께 춤추네.
깡충—리팩토링 축하해!"

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6ecf023 and 18243ea.

📒 Files selected for processing (2)
  • src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (2 hunks)
  • src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (2 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor/#12-datafile-converter

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@catturtle123 catturtle123 changed the title 🚀 [Chore] index, dataFile converting refactoring ♻️ [Refactor] index, dataFile converting refactoring Aug 21, 2025
@catturtle123 catturtle123 force-pushed the refactor/#12-datafile-converter branch from a2110ed to d5bc901 Compare August 21, 2025 07:34
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (1)

63-70: update() 경로에서도 동일한 불변식 강제 필요

생성과 동일한 제약을 업데이트에도 적용하세요. 현재는 서비스에서만 검증하며, 엔티티 레벨 방어가 없습니다.

적용 제안(diff):

     fun update(req: IndexUpdateRequest) {
-        snapshotName = req.snapshotName.trim()
-        chunkingSize = req.chunkingSize
-        overlapSize = req.overlapSize
+        val name = req.snapshotName.trim()
+        require(name.isNotEmpty()) { "snapshotName must not be blank" }
+        require(req.chunkingSize >= 1) { "chunkingSize must be >= 1" }
+        require(req.overlapSize >= 0) { "overlapSize must be >= 0" }
+        require(req.overlapSize < req.chunkingSize) { "overlapSize must be < chunkingSize" }
+        snapshotName = name
+        chunkingSize = req.chunkingSize
+        overlapSize = req.overlapSize
         similarityMetric = req.similarityMetric
         topK = req.topK
         reranker = req.reranker
     }
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (1)

34-58: DTO 상세 응답도 동일하게 불변성 강화 및 매핑 가독성 개선

  • IndexDetailResponse.indexIdval이 적절합니다.
  • 매핑 정적 메서드 대신 확장 함수를 고려하면 호출부 가독성이 좋아집니다. 예: index.toDetailResponse().

적용 제안(diff 1 — 불변성):

 data class IndexDetailResponse(
-    var indexId: Long?,
+    val indexId: Long?,
     val snapshotName: String,
     val chunkingSize: Int,
     val overlapSize: Int,
     val similarityMetric: SimilarityMetric,
     val topK: Int,
     val embeddingModel: EmbeddingModel,
     val reranker: Boolean,
 ) {

적용 제안(대안 2 — 확장 함수, 별도 파일에 배치 권장):

// 예시: IndexMappers.kt
package simplerag.ragback.domain.index.dto

import simplerag.ragback.domain.index.entity.Index

fun Index.toPreviewResponse() = IndexPreviewResponse(
    indexId = id,
    snapshotName = snapshotName
)

fun List<Index>.toPreviewResponseList() =
    IndexPreviewResponseList(map { it.toPreviewResponse() })

fun Index.toDetailResponse() = IndexDetailResponse(
    indexId = id,
    snapshotName = snapshotName,
    chunkingSize = chunkingSize,
    overlapSize = overlapSize,
    similarityMetric = similarityMetric,
    topK = topK,
    embeddingModel = embeddingModel,
    reranker = reranker
)
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (1)

151-160: 태그 연결 시 매 요청 O(n) 존재 여부 조회 — 1~2쿼리로 축소 가능

existsByDataFileIdAndTagId를 태그 개수만큼 호출하고 있어 RTT가 커집니다. 한 번의 조회로 기존 연결 태그 id를 얻은 뒤, 차집합만 벌크 저장하면 성능이 개선됩니다.

서비스 변경(예시):

-    private fun attachTagsIfMissing(dataFile: DataFile, tags: List<Tag>) {
-        val fileId = dataFile.id ?: return
-        tags.forEach { tag ->
-            val tagId = tag.id ?: return@forEach
-            val exists = dataFileTagRepository.existsByDataFileIdAndTagId(fileId, tagId)
-            if (!exists) {
-                dataFileTagRepository.save(DataFileTag(tag = tag, dataFile = dataFile))
-            }
-        }
-    }
+    private fun attachTagsIfMissing(dataFile: DataFile, tags: List<Tag>) {
+        val fileId = dataFile.id ?: return
+        val tagIds = tags.mapNotNull { it.id }.toSet()
+        if (tagIds.isEmpty()) return
+        val existingTagIds = dataFileTagRepository.findExistingTagIdsByDataFileIdAndTagIds(fileId, tagIds)
+        val toLink = tags
+            .filter { it.id != null && it.id !in existingTagIds }
+            .map { DataFileTag(tag = it, dataFile = dataFile) }
+        if (toLink.isNotEmpty()) dataFileTagRepository.saveAll(toLink)
+    }

Repository에 보조 메서드 추가(예시):

@Query("select dft.tag.id from DataFileTag dft where dft.dataFile.id = :dataFileId and dft.tag.id in :tagIds")
fun findExistingTagIdsByDataFileIdAndTagIds(
    @Param("dataFileId") dataFileId: Long,
    @Param("tagIds") tagIds: Set<Long>
): Set<Long>
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 55f2b92 and d5bc901.

⛔ Files ignored due to path filters (2)
  • .DS_Store is excluded by !**/.DS_Store
  • src/.DS_Store is excluded by !**/.DS_Store
📒 Files selected for processing (9)
  • src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (2 hunks)
  • src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt (2 hunks)
  • src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/document/repository/DataFileTagRepository.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (2 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/converter/IndexConverter.kt (0 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (2 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (2 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (3 hunks)
💤 Files with no reviewable changes (1)
  • src/main/kotlin/simplerag/ragback/domain/index/converter/IndexConverter.kt
🧰 Additional context used
🧬 Code graph analysis (5)
src/main/kotlin/simplerag/ragback/domain/document/repository/DataFileTagRepository.kt (1)
src/main/kotlin/simplerag/ragback/domain/document/entity/DataFileTag.kt (1)
  • name (6-24)
src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt (3)
src/main/kotlin/simplerag/ragback/domain/index/entity/DataFileIndex.kt (1)
  • name (7-22)
src/main/kotlin/simplerag/ragback/domain/document/entity/DataFileTag.kt (1)
  • name (6-24)
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (1)
  • file (214-220)
src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (1)
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (4)
  • res (162-165)
  • uploadOk (43-74)
  • dataFileService (28-221)
  • dataFileService (84-84)
src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt (2)
src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileRequestDTO.kt (2)
  • message (13-20)
  • min (7-11)
src/main/kotlin/simplerag/ragback/domain/document/entity/DataFileTag.kt (1)
  • name (6-24)
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (1)
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (4)
  • dataFileService (28-221)
  • uploadOk (43-74)
  • dataFileService (110-110)
  • dataFileService (84-84)
🔇 Additional comments (8)
src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt (1)

49-61: 도메인 불변식(require) 누락 — 생성 시 유효성 보강 필요

서비스 레벨에서만 overlap < chunkingSize를 검증하면, 다른 호출 경로에서 엔티티가 불완전 상태로 만들어질 수 있습니다. 엔티티 팩토리에서도 최소 불변식을 강제하세요. 또한 공백 이름 방어가 필요합니다.

적용 제안(diff):

     companion object {
-        fun toIndex(createRequest: IndexCreateRequest): Index {
-            return Index(
-                snapshotName = createRequest.snapshotName.trim(),
-                overlapSize = createRequest.overlapSize,
-                chunkingSize = createRequest.chunkingSize,
-                similarityMetric = createRequest.similarityMetric,
-                topK = createRequest.topK,
-                embeddingModel = createRequest.embeddingModel,
-                reranker = createRequest.reranker
-            )
-        }
+        fun toIndex(createRequest: IndexCreateRequest): Index {
+            val name = createRequest.snapshotName.trim()
+            require(name.isNotEmpty()) { "snapshotName must not be blank" }
+            require(createRequest.chunkingSize >= 1) { "chunkingSize must be >= 1" }
+            require(createRequest.overlapSize >= 0) { "overlapSize must be >= 0" }
+            require(createRequest.overlapSize < createRequest.chunkingSize) { "overlapSize must be < chunkingSize" }
+            return Index(
+                snapshotName = name,
+                overlapSize = createRequest.overlapSize,
+                chunkingSize = createRequest.chunkingSize,
+                similarityMetric = createRequest.similarityMetric,
+                topK = createRequest.topK,
+                embeddingModel = createRequest.embeddingModel,
+                reranker = createRequest.reranker
+            )
+        }
     }
⛔ Skipped due to learnings
Learnt from: catturtle123
PR: Simple-RAG/BACK#6
File: src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt:16-22
Timestamp: 2025-08-17T13:12:02.200Z
Learning: catturtle123이 Index 엔티티의 overlapSize ≤ chunkingSize 검증을 엔티티 레벨이 아닌 서비스 레이어에서 구현하기로 결정했으며, chunking embedding 저장 관련 PR에서 이 검증 로직을 추가할 예정이므로 해당 PR에서 언급해야 함 (Simple-RAG/BACK 프로젝트)
src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (3)

29-29: LGTM — 리스트 변환 책임을 DTO 계층으로 위임한 점 적절

서비스가 도메인 로딩과 검증만 담당하고, 표현 변환은 DTO 동반 객체로 위임되어 역할 분리가 명확합니다.


36-36: LGTM — 단건 조회 변환 경로 일관성 유지

상세 조회에서도 동일한 DTO 변환 경로를 유지했습니다.


50-50: LGTM — 업데이트 후 프리뷰 반환 정책 일관

업데이트 결과를 프리뷰로 통일한 점은 API 일관성 측면에서 무난합니다. 추후 요구가 생기면 상세 DTO 반환으로 확장 가능합니다.

src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (1)

16-16: 업로드 응답 DTO 전환 일관성 양호

DataFilePreviewResponseList로의 전환이 서비스/테스트와 일치합니다. API 표면 변경도 적절합니다.

Also applies to: 42-45

src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt (1)

9-24: 업로드 프리뷰 DTO 분리 좋습니다

  • 업로드 응답을 프리뷰 전용으로 분리한 점(필수 정보만 노출) 합리적입니다.
  • requireNotNull(file.id)로 일관된 검증을 추가한 것도 👍
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (2)

60-62: 팩토리 사용 및 프리뷰 DTO 생성 방식 정돈: OK

  • DataFile.from(...)로 생성 경로를 통일한 점 좋습니다.
  • DataFilePreviewResponse.from(dataFile)로 DTO 생성 책임을 DTO에 위임한 것도 👍

Also applies to: 69-73


39-45: 입력 크기/정합성 검증 로직 적절

files.isEmpty()files.size != req.items.size 검증으로 조기 실패 처리한 점 적절합니다. 추가로 컨트롤러에서 @Size(min=1)도 있어 방어가 이중화되어 안정적입니다.

Comment on lines +54 to 63
fun from(file: DataFile, tags: List<TagDTO>): DataFileDetailResponse =
DataFileDetailResponse(
id = requireNotNull(file.id) { "DataFile.id is null" },
title = file.title,
type = file.type,
lastModified = file.updatedAt,
sizeMB = file.sizeBytes.toMegaBytes(2),
tags = tags,
sha256 = file.sha256,
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

DTO nullability/가변성 정합성: id는 불변·비nullable로 두는 것이 안전합니다

DataFileDetailResponseid가 현재 var Long?인데, from()에서 requireNotNull로 보장하고 있습니다. DTO 사용성/타입 안정성을 위해 val id: Long 으로 선언을 바꾸는 것이 좋습니다(프리뷰 DTO와도 일관).

적용 예:

-data class DataFileDetailResponse(
-    var id: Long?,
+data class DataFileDetailResponse(
+    val id: Long,
     val title: String,
     ...
 )

from() 구현은 그대로 동작합니다.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fun from(file: DataFile, tags: List<TagDTO>): DataFileDetailResponse =
DataFileDetailResponse(
id = requireNotNull(file.id) { "DataFile.id is null" },
title = file.title,
type = file.type,
lastModified = file.updatedAt,
sizeMB = file.sizeBytes.toMegaBytes(2),
tags = tags,
sha256 = file.sha256,
)
data class DataFileDetailResponse(
val id: Long,
val title: String,
...
)
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt
around lines 54 to 63, the DataFileDetailResponse DTO currently declares id as a
mutable nullable (var Long?) while from() guarantees non-null via
requireNotNull; change the DTO declaration to an immutable non-nullable id (val
id: Long) to reflect the contract and ensure consistency with preview DTOs,
update any primary constructor/assignments and call sites that expect a nullable
or mutable id accordingly (no change needed to the from() factory).

Comment on lines +31 to +36
) : BaseEntity() {
companion object {
fun from(title: String, type: String, sizeBytes : Long, sha256 : String, fileUrl: String): DataFile {
return DataFile(title, type, sizeBytes, sha256, fileUrl)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

companion factory 도입은 적절합니다. 콜론 앞 공백 제거(+ 선택적으로 @JvmStatic) 제안

  • 코틀린 컨벤션에 맞춰 sizeBytes : Long, sha256 : String의 공백을 제거하면 일관성이 좋아집니다.
  • 자바에서 정적 호출이 필요하다면 @JvmStatic 추가를 고려해도 좋습니다(선택).

다음과 같이 정리할 수 있습니다:

-    companion object {
-        fun from(title: String, type: String, sizeBytes : Long, sha256 : String, fileUrl: String): DataFile {
-            return DataFile(title, type, sizeBytes, sha256, fileUrl)
-        }
-    }
+    companion object {
+        @JvmStatic
+        fun from(title: String, type: String, sizeBytes: Long, sha256: String, fileUrl: String): DataFile {
+            return DataFile(title, type, sizeBytes, sha256, fileUrl)
+        }
+    }

(필요 시 파일 상단에 import kotlin.jvm.JvmStatic 추가)

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt around
lines 31 to 36, the companion factory method has inconsistent spacing before
colons in parameters and could expose a static JVM-friendly API; remove the
extra spaces so parameters read `sizeBytes: Long, sha256: String` (and any other
params) to follow Kotlin conventions, and optionally annotate the factory with
@JvmStatic and add `import kotlin.jvm.JvmStatic` at the top if you need
Java-style static access.

Comment on lines 7 to 18
data class IndexPreviewResponseList(
val indexDetailResponse: List<IndexPreviewResponse>,
)
) {
companion object {
fun toIndexPreviewResponseList(
indexes: List<Index>
): IndexPreviewResponseList {
val indexList = indexes.map { IndexPreviewResponse.toIndexPreviewResponse(it) }
return IndexPreviewResponseList(indexList)
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

속성명 의미 불일치/오타: indexDetailResponse → 미리보기 컬렉션에 맞는 이름으로 교정 필요

IndexPreviewResponseList는 “프리뷰 리스트”인데 필드명이 indexDetailResponse(디테일)가 되어 있습니다. 직렬화 키까지 잘못 노출될 수 있어 API 혼동/호환성 문제가 큽니다.

적용 제안(diff):

 data class IndexPreviewResponseList(
-
-    val indexDetailResponse: List<IndexPreviewResponse>,
+    val items: List<IndexPreviewResponse>,
 ) {
     companion object {
         fun toIndexPreviewResponseList(
             indexes: List<Index>
         ): IndexPreviewResponseList {
-            val indexList = indexes.map { IndexPreviewResponse.toIndexPreviewResponse(it) }
-            return IndexPreviewResponseList(indexList)
+            val items = indexes.map { IndexPreviewResponse.toIndexPreviewResponse(it) }
+            return IndexPreviewResponseList(items)
         }
     }
 }

아래 스크립트로 기존 코드베이스/문서/테스트에서 indexDetailResponse 사용 흔적을 점검해 주세요. 응답 JSON 키 변경이 의도인지 확인이 필요합니다.


🏁 Script executed:

#!/bin/bash
# repo 루트에서 실행
rg -n --json -C2 '"indexDetailResponse"|indexDetailResponse\b'

Length of output: 5420


IndexPreviewResponseList 속성명 수정 필요
IndexPreviewResponseList는 “프리뷰 리스트”인데 필드명이 indexDetailResponse(디테일)로 잘못 지정되어 있어 의미 불일치 및 직렬화 키 혼동을 유발합니다. 아래 항목을 반영해 필드명을 items(또는 도메인에 적합한 명칭)로 변경하고 연관 코드·테스트를 함께 수정해주세요.

  • src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt
    • val indexDetailResponse: List<IndexPreviewResponse>val items: List<IndexPreviewResponse>
    • companion object의 매핑 함수 내 변수명 및 생성자 호출부(IndexPreviewResponseList(indexList)) → items로 변경
  • src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
    • indexes.indexDetailResponseindexes.items로 수정
  • (선택) JSON 직렬화 호환성 유지가 필요할 경우 Jackson/Kotlinx Serialization 어노테이션(@JsonProperty("indexDetailResponse") 또는 @SerialName("indexDetailResponse")) 추가 검토

위 변경은 API 응답 스펙의 Breaking Change이므로, 클라이언트 영향 범위 확인 및 릴리즈 노트/버전 관리 방침에 따라 사전 공지·조치해야 합니다.

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt around
lines 7-18, the data class field is incorrectly named indexDetailResponse for a
preview list; rename the property to items: List<IndexPreviewResponse>, update
the companion mapping variable (indexList -> items or itemsList) and the
constructor call to use the new items property, and then update all usages
(notably
src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
changing indexes.indexDetailResponse to indexes.items). If you need JSON
compatibility, annotate the new property with
@JsonProperty("indexDetailResponse") or @SerialName("indexDetailResponse") as
appropriate and run tests.

Comment on lines 20 to +32
data class IndexPreviewResponse(
var indexId: Long?,
val snapshotName: String
)
) {
companion object {
fun toIndexPreviewResponse(index: Index): IndexPreviewResponse {
return IndexPreviewResponse(
indexId = index.id,
snapshotName = index.snapshotName,
)
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

불필요한 가변성 제거: indexId는 val로 충분

IndexPreviewResponse.indexId는 매핑 시 한 번 설정되면 변경되지 않습니다. varval로 불변성을 높여 주세요.

적용 제안(diff):

 data class IndexPreviewResponse(
-    var indexId: Long?,
+    val indexId: Long?,
     val snapshotName: String
 ) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data class IndexPreviewResponse(
var indexId: Long?,
val snapshotName: String
)
) {
companion object {
fun toIndexPreviewResponse(index: Index): IndexPreviewResponse {
return IndexPreviewResponse(
indexId = index.id,
snapshotName = index.snapshotName,
)
}
}
}
data class IndexPreviewResponse(
val indexId: Long?,
val snapshotName: String
) {
companion object {
fun toIndexPreviewResponse(index: Index): IndexPreviewResponse {
return IndexPreviewResponse(
indexId = index.id,
snapshotName = index.snapshotName,
)
}
}
}
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt around
lines 20 to 32, the IndexPreviewResponse.indexId property is declared as mutable
(var) but only set once during construction; change it to an immutable property
(val) so indexId is final, updating the primary constructor and any usages
accordingly to preserve behavior and compilation.


import jakarta.persistence.*
import jakarta.validation.constraints.Min
import simplerag.ragback.domain.index.dto.IndexCreateRequest
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

🛠️ Refactor suggestion

엔티티가 DTO(IndexCreateRequest)에 의존 — 계층 침식(layering) 우려

엔티티가 DTO를 import/참조하면 도메인 계층이 표현/전달 계층에 결합됩니다. 변환은 별도 Mapper 또는 DTO 확장 함수로 이동하는 것을 권장합니다. 서비스에서는 엔티티를 알지만, 엔티티는 DTO를 몰라야 합니다.

권장 대안(예시): DTO 확장 함수로 이동

// 예: src/main/kotlin/simplerag/ragback/domain/index/dto/IndexCreateRequestMapper.kt
package simplerag.ragback.domain.index.dto

import simplerag.ragback.domain.index.entity.Index

fun IndexCreateRequest.toEntity(): Index =
    Index(
        snapshotName = this.snapshotName.trim(),
        chunkingSize = this.chunkingSize,
        overlapSize = this.overlapSize,
        similarityMetric = this.similarityMetric,
        topK = this.topK,
        embeddingModel = this.embeddingModel,
        reranker = this.reranker
    )

그 후 서비스에서는 indexRepository.save(indexCreateRequest.toEntity()) 형태로 사용합니다.

Also applies to: 49-61

Comment on lines +22 to +23
val createdIndex = indexRepository.save(Index.toIndex(indexCreateRequest))
return IndexPreviewResponse.toIndexPreviewResponse(createdIndex)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

엔티티-DTO 결합도 감소: 생성 시 매퍼(확장 함수) 사용 권장

Index.toIndex(request)는 엔티티가 DTO를 알아야 하는 구조입니다. DTO→엔티티 변환은 전용 매퍼/확장 함수로 이동하여 서비스에서 사용하세요.

적용 제안(diff):

-        val createdIndex = indexRepository.save(Index.toIndex(indexCreateRequest))
+        val createdIndex = indexRepository.save(indexCreateRequest.toEntity())

위 변경은 다음 전제와 함께 동작합니다: IndexCreateRequest.toEntity() 확장 함수(매퍼)가 DTO 패키지에 추가되어 있어야 합니다. (이전에 남긴 코멘트 참고)

매퍼로 이관 시, 기존 정적 팩토리 사용 흔적을 정리하세요.


🏁 Script executed:

#!/bin/bash
# repo 루트에서 실행
rg -n -C2 'Index\.toIndex\s*\('

Length of output: 655


엔티티–DTO 결합도 감소를 위한 매퍼 분리 적용 요청

아래와 같이 서비스 계층에서 DTO→엔티티 변환을 직접 호출하는 대신, DTO 패키지에 전용 확장 함수로 매핑 로직을 분리해주세요.

• 변경 대상

  • src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (22번째 줄)

• 제안 diff

-        val createdIndex = indexRepository.save(Index.toIndex(indexCreateRequest))
+        val createdIndex = indexRepository.save(indexCreateRequest.toEntity())

• 후속 작업

  1. DTO 패키지에 IndexCreateRequest.toEntity(): Index 확장 함수를 구현
    // 예시: src/main/kotlin/simplerag/ragback/domain/index/dto/IndexCreateRequest.kt
    package simplerag.ragback.domain.index.dto
    
    import simplerag.ragback.domain.index.entity.Index
    
    fun IndexCreateRequest.toEntity(): Index = Index(
        name        = this.name,
        overlapSize = this.overlapSize,
        chunkingSize= this.chunkingSize,
        // 나머지 프로퍼티 매핑…
    )
  2. 엔티티 Index 클래스에서 기존 정적 팩토리 메서드 toIndex(request: IndexCreateRequest) 제거

위 작업을 통해 엔티티와 DTO 간 결합도를 낮추고, 매핑 로직을 한곳에서 관리할 수 있습니다.

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt around
lines 22-23, the service currently calls the entity factory
Index.toIndex(indexCreateRequest); replace this with a DTO-side mapper by
implementing an extension fun IndexCreateRequest.toEntity(): Index in
src/main/kotlin/simplerag/ragback/domain/index/dto/, update IndexService to call
indexRepository.save(indexCreateRequest.toEntity()), and then remove the static
factory method Index.toIndex(request: IndexCreateRequest) from the Index entity
class so mapping logic lives in the DTO package.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (1)

34-56: Ensure IndexDetailResponse enforces a non-null, immutable ID

To keep the preview and detail DTOs in sync—and to guarantee that every detail response always carries a valid indexId—please update IndexDetailResponse as follows:

• File: src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt

  • Change the property from a mutable, nullable var indexId: Long? to an immutable, non-nullable val indexId: Long.
  • In the mapper, wrap index.id with requireNotNull(...) so you fail fast if it ever is unexpectedly null.

Suggested diff:

 data class IndexDetailResponse(
-    var indexId: Long?,
+    val indexId: Long,
     val snapshotName: String,
     val chunkingSize: Int,
     val overlapSize: Int,
     val similarityMetric: SimilarityMetric,
     val topK: Int,
     val embeddingModel: EmbeddingModel,
     val reranker: Boolean,
 ) {
     companion object {
         fun toIndexDetailResponse(index: Index): IndexDetailResponse {
             return IndexDetailResponse(
-                indexId = index.id,
+                indexId = requireNotNull(index.id) { "Index.id must not be null" },
                 snapshotName = index.snapshotName,
                 chunkingSize = index.chunkingSize,
                 overlapSize = index.overlapSize,
                 similarityMetric = index.similarityMetric,
                 topK = index.topK,
                 embeddingModel = index.embeddingModel,
                 reranker = index.reranker,
             )
         }
     }
 }

• This mirrors the (presumed) val indexId: Long in IndexPreviewResponse, preventing drift between your two response shapes.
• The service and controller layers already work with a non-nullable Long (fun getIndex(indexId: Long): …), so this change aligns the DTO with its callers without breaking the contract.

src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (2)

106-116: Don’t silently swallow S3 cleanup failures — at least log them

Currently failures in afterCompletion cleanup are ignored, which matches the static-analysis warning. Log at WARN so we can observe potential storage leaks.

-                        uploadedUrls.forEach { runCatching { s3Util.deleteByUrl(it) } }
+                        uploadedUrls.forEach { url ->
+                            runCatching { s3Util.deleteByUrl(url) }
+                                .onFailure { ex -> log.warn("Failed S3 rollback delete for url={}", url, ex) }
+                        }

Add a logger once at class scope:

// at top-level of the class
private val log = org.slf4j.LoggerFactory.getLogger(DataFileService::class.java)

148-157: Reduce N queries and harden against races when attaching tags

existsBy... per tag can cause N round-trips and still races against concurrent inserts (unique constraint on data_files_id,tags_id). Fetch existing links once and bulk-insert the rest; ignore duplicates on concurrent races.

-    private fun attachTagsIfMissing(dataFile: DataFile, tags: List<Tag>) {
-        val fileId = dataFile.id ?: return
-        tags.forEach { tag ->
-            val tagId = tag.id ?: return@forEach
-            val exists = dataFileTagRepository.existsByDataFileIdAndTagId(fileId, tagId)
-            if (!exists) {
-                dataFileTagRepository.save(DataFileTag(tag = tag, dataFile = dataFile))
-            }
-        }
-    }
+    private fun attachTagsIfMissing(dataFile: DataFile, tags: List<Tag>) {
+        val fileId = dataFile.id ?: return
+        val tagIds = tags.mapNotNull { it.id }.toSet()
+        if (tagIds.isEmpty()) return
+
+        // Fetch existing links once
+        val existingIds: Set<Long> =
+            dataFileTagRepository.findAllByDataFileIn(listOf(dataFile))
+                .mapNotNull { it.tag.id }
+                .toSet()
+
+        val toAttach = tags
+            .filter { it.id != null && it.id !in existingIds }
+            .map { DataFileTag(tag = it, dataFile = dataFile) }
+
+        if (toAttach.isNotEmpty()) {
+            try {
+                dataFileTagRepository.saveAll(toAttach)
+            } catch (ex: DataIntegrityViolationException) {
+                // Concurrent attach: safe to ignore conflicts due to unique constraint
+                log.debug("Duplicate tag-links detected on concurrent attach for dataFileId={}", fileId, ex)
+            }
+        }
+    }
♻️ Duplicate comments (6)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (2)

7-16: Use neutral, stable field name for the list + preserve JSON contract via annotation

The property name indexPreviewResponseList is verbose and redundant inside IndexPreviewResponseList. Prefer items internally while keeping the serialized key unchanged to avoid a breaking API change. Also align the local variable name accordingly.

Apply:

@@
-package simplerag.ragback.domain.index.dto
+package simplerag.ragback.domain.index.dto
@@
-import simplerag.ragback.domain.index.entity.Index
+import simplerag.ragback.domain.index.entity.Index
+import com.fasterxml.jackson.annotation.JsonProperty
@@
 data class IndexPreviewResponseList(
-    val indexPreviewResponseList: List<IndexPreviewResponse>,
+    @JsonProperty("indexPreviewResponseList")
+    val items: List<IndexPreviewResponse>,
 ) {
     companion object {
         fun toIndexPreviewResponseList(
             indexes: List<Index>
         ): IndexPreviewResponseList {
-            val indexList = indexes.map { IndexPreviewResponse.toIndexPreviewResponse(it) }
-            return IndexPreviewResponseList(indexList)
+            val items = indexes.map { IndexPreviewResponse.toIndexPreviewResponse(it) }
+            return IndexPreviewResponseList(items)
         }
     }
 }

Verification:

  • Confirm no external clients rely on the previous key indexDetailResponse. If they do, consider a versioned change or a transitional header/flag to switch keys.
  • Update tests/assertions that reference the Kotlin property name (internal), not the JSON key (unchanged).

20-32: Make ID non-null and immutable in responses

Response DTOs should carry a stable, non-null identifier. Make indexId a val Long and assert non-null at mapping time.

 data class IndexPreviewResponse(
-    var indexId: Long?,
+    val indexId: Long,
     val snapshotName: String
 ) {
     companion object {
         fun toIndexPreviewResponse(index: Index): IndexPreviewResponse {
             return IndexPreviewResponse(
-                indexId = index.id,
+                indexId = requireNotNull(index.id),
                 snapshotName = index.snapshotName,
             )
         }
     }
 }

Notes:

  • This encodes the contract “responses always have an ID.” If there is a legitimate case where ID can be absent, keep Long? but still switch to val.
  • Add/adjust tests to fail fast if an unsaved entity slips into the mapper.
src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (1)

49-50: Input validation on cursor/take looks right

Adding @min(0) for cursor and bounds for take closes the validation gap flagged earlier. Good tighten-up.

src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt (2)

45-66: Make id immutable and non-nullable in detail DTO to match factory contract

DataFileDetailResponse.from guarantees a non-null id, but the DTO declares var id: Long?. Make it val id: Long for type safety and consistency with the preview DTO.

 data class DataFileDetailResponse(
-    var id: Long?,
+    val id: Long,
     val title: String,
     val type: String,
     val lastModified: LocalDateTime,
     val sizeMB: Double,
     val tags: List<TagDTO>,
     val sha256: String,
 )

80-85: Consistency win: kotlin.math.pow + round

Switching to Kotlin’s math extensions keeps things consistent and readable.

src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (1)

88-91: nextCursor calc simplified — nice and safe

Using lastOrNull with hasNext removes index math and edge cases as previously suggested.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d5bc901 and f36a9ca.

📒 Files selected for processing (5)
  • src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (2 hunks)
  • src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt (2 hunks)
  • src/main/kotlin/simplerag/ragback/domain/document/repository/DataFileTagRepository.kt (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (2 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (5)
src/main/kotlin/simplerag/ragback/domain/document/repository/DataFileTagRepository.kt (1)
src/main/kotlin/simplerag/ragback/domain/document/entity/DataFileTag.kt (1)
  • name (6-24)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/converter/IndexConverter.kt (5)
  • toIndexPreviewResponse (29-34)
  • toIndexDetailResponse (36-47)
  • toIndexPreviewResponseList (22-27)
  • toIndexPreviewResponse (25-25)
  • toIndex (10-20)
src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (1)
  • indexRepository (15-67)
src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (1)
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (5)
  • dataFileService (28-280)
  • dataFileService (118-118)
  • res (170-173)
  • dataFileService (92-92)
  • dataFileService (153-153)
src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt (1)
src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt (1)
  • from (33-35)
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (1)
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (4)
  • dataFileService (28-280)
  • dataFileService (118-118)
  • dataFileService (92-92)
  • res (170-173)
🪛 detekt (1.23.8)
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt

[warning] 62-62: The caught exception is swallowed. The original exception could be lost.

(detekt.exceptions.SwallowedException)

🔇 Additional comments (7)
src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt (1)

24-31: LGTM: Companion mappers with named args are clear and safe

The mapper reads well, uses named parameters, and isolates conversion. Once ID immutability is applied, this block is solid.

src/main/kotlin/simplerag/ragback/domain/document/repository/DataFileTagRepository.kt (1)

11-12: Good fix: eager-fetch Tag to kill N+1 in list path

Switching to @entitygraph(attributePaths = ["tag"]) on the bulk finder is the right call and aligns with the service-side grouping logic. This should eliminate the tag-loader N+1 raised previously.

src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (1)

42-45: All old DTO references removed—please verify API docs & client integrations
Search results show DataFileResponseList is no longer referenced in the codebase; only DataFilePreviewResponseList remains. Before merging, please:

  • Confirm OpenAPI/Swagger definitions reflect the new DataFilePreviewResponseList contract
  • Verify any frontend or SDK code consuming this endpoint has been updated
  • Ensure there are no lingering mentions of DataFileResponseList in external documentation or client libraries
src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt (1)

10-25: Preview DTO factory enforces non-null id — LGTM

Companion factory with requireNotNull on DataFile.id is clean and consistent with test expectations.

src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (3)

38-73: Upload return type refactor is coherent end-to-end

Returning DataFilePreviewResponseList and creating items via DataFilePreviewResponse.from keeps the upload flow minimal and aligns with tests.


82-87: Bulk tag fetch + group-by is the right shape

Using findAllByDataFileIn and grouping by dataFile.id is efficient and leverages the repository’s EntityGraph to avoid N+1.


95-104: Consider deleting the S3 object when deleting a DataFile

Service-level delete removes DB rows but leaves the blob in storage. Unless a lifecycle job handles orphan cleanup, delete it here (and ignore failures).

     fun deleteFile(dataFilesId: Long) {
         val dataFile = dataFileRepository.findDataFileById(dataFilesId) ?: throw FileException(
             ErrorCode.NOT_FOUND,
             dataFilesId.toString()
         )
 
         dataFileTagRepository.deleteAllByDataFile(dataFile)
 
-        dataFileRepository.delete(dataFile)
+        runCatching { dataFile.fileUrl?.let(s3Util::deleteByUrl) }
+            .onFailure { ex -> log.warn("Failed to delete S3 object for dataFileId={} url={}", dataFilesId, dataFile.fileUrl, ex) }
+        dataFileRepository.delete(dataFile)
     }

Confirm whether S3 cleanup is expected here or handled elsewhere (e.g., async job). If handled elsewhere, consider documenting that contract in the service javadoc.

Comment on lines 27 to +43
data class DataFileDetailResponseList(
val dataFileDetailResponseList: List<DataFileDetailResponse>,
val cursor: Long?,
val hasNext: Boolean,
)
) {
companion object {
fun from(files: List<DataFile>, tagsByFileId: Map<Long, List<TagDTO>>, cursor: Long?, hasNext: Boolean): DataFileDetailResponseList =
DataFileDetailResponseList(
dataFileDetailResponseList = files.map { file ->
val tags = file.id?.let { tagsByFileId[it] } ?: emptyList()
DataFileDetailResponse.from(file, tags)
},
cursor = cursor,
hasNext = hasNext,
)
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Naming clarity: consider nextCursor instead of cursor (non-blocking)

The field cursor holds the next pagination token (last visible id when hasNext). Renaming to nextCursor would reduce ambiguity in clients. Defer if API stability outweighs the gain.

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt
around lines 27 to 43, rename the ambiguous field cursor to nextCursor: change
the data class property name, the companion.from() parameter and its assignment,
and update all call sites that construct or read this DTO; if you must preserve
external API compatibility, add a JSON annotation (e.g.,
@JsonProperty("cursor")) on nextCursor so serialized payloads still expose
"cursor" while internal code uses nextCursor.

Comment on lines +75 to +77
fun from(dataFileTags: List<DataFileTag>): List<TagDTO> =
dataFileTags.map { from(it.tag) }
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Deterministic tag list (optional): ensure stable order and dedup

If UI/tests depend on a stable tag order, sort and deduplicate when converting from link rows.

-        fun from(dataFileTags: List<DataFileTag>): List<TagDTO> =
-            dataFileTags.map { from(it.tag) }
+        fun from(dataFileTags: List<DataFileTag>): List<TagDTO> =
+            dataFileTags
+                .map { it.tag }
+                .distinctBy { it.id ?: it.name.uppercase() }
+                .sortedBy { it.name }
+                .map { from(it) }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fun from(dataFileTags: List<DataFileTag>): List<TagDTO> =
dataFileTags.map { from(it.tag) }
}
fun from(dataFileTags: List<DataFileTag>): List<TagDTO> =
dataFileTags
.map { it.tag }
.distinctBy { it.id ?: it.name.uppercase() }
.sortedBy { it.name }
.map { from(it) }
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt
around lines 75 to 77, the conversion from DataFileTag to TagDTO can produce a
non-deterministic, duplicate list; deduplicate by tag identity (e.g., id or
name) and then sort to ensure a stable order for UI/tests — e.g., transform
dataFileTags to unique tags using distinctBy { it.tag.id } (or distinctBy {
it.tag.name } if ids aren't available), map to TagDTO, and then sortedBy {
it.name } before returning.

Comment on lines +11 to +12
@EntityGraph(attributePaths = ["tag"])
fun findAllByDataFileIn(dataFiles: Collection<DataFile>): List<DataFileTag>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Optional: add id-based variant to avoid entity IN quirks and enable reuse

Spring Data handles IN with entities, but using ids can be clearer and sometimes more efficient across providers. Consider exposing an id-based overload (can coexist with the current method):

 interface DataFileTagRepository : JpaRepository<DataFileTag, Long> {
   fun existsByDataFileIdAndTagId(dataFileId: Long, tagId: Long): Boolean

   @EntityGraph(attributePaths = ["tag"])
   fun findAllByDataFileIn(dataFiles: Collection<DataFile>): List<DataFileTag>

+  @EntityGraph(attributePaths = ["tag"])
+  fun findAllByDataFileIdIn(dataFileIds: Collection<Long>): List<DataFileTag>
 
   fun deleteAllByDataFile(dataFile: DataFile)
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@EntityGraph(attributePaths = ["tag"])
fun findAllByDataFileIn(dataFiles: Collection<DataFile>): List<DataFileTag>
interface DataFileTagRepository : JpaRepository<DataFileTag, Long> {
fun existsByDataFileIdAndTagId(dataFileId: Long, tagId: Long): Boolean
@EntityGraph(attributePaths = ["tag"])
fun findAllByDataFileIn(dataFiles: Collection<DataFile>): List<DataFileTag>
@EntityGraph(attributePaths = ["tag"])
fun findAllByDataFileIdIn(dataFileIds: Collection<Long>): List<DataFileTag>
fun deleteAllByDataFile(dataFile: DataFile)
}
🤖 Prompt for AI Agents
In
src/main/kotlin/simplerag/ragback/domain/document/repository/DataFileTagRepository.kt
around lines 11-12, add an id-based overload to avoid entity-IN quirks by
providing a method that accepts a collection of DataFile IDs (e.g. Long or UUID
depending on your DataFile.id type) and returns List<DataFileTag; keep the
existing findAllByDataFileIn method. Apply the same @EntityGraph(attributePaths
= ["tag"]) to the new method to preserve fetch behavior and ensure the method
name follows Spring Data conventions (findAllByDataFileIdIn) so the framework
can derive the query. Ensure the repository imports and generic types match your
DataFile id type.

Comment on lines 60 to 64
val dataFile = try {
dataFileRepository.save(DataFile(meta.title, type, sizeBytes, sha256, fileUrl))
dataFileRepository.save(DataFile.from(meta.title, type, sizeBytes, sha256, fileUrl))
} catch (ex: DataIntegrityViolationException) {
throw FileException(ErrorCode.ALREADY_FILE, sha256)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Preserve original constraint-violation cause when rethrowing

Rethrowing as FileException loses the DB stacktrace. Preserve the cause to aid prod debugging.

-            } catch (ex: DataIntegrityViolationException) {
-                throw FileException(ErrorCode.ALREADY_FILE, sha256)
-            }
+            } catch (ex: DataIntegrityViolationException) {
+                // Prefer constructor with cause; otherwise use initCause fallback.
+                throw FileException(ErrorCode.ALREADY_FILE, sha256, ex)
+                // or:
+                // throw FileException(ErrorCode.ALREADY_FILE, sha256).also { it.initCause(ex) }
+            }

Run to confirm FileException supports a cause parameter:


🏁 Script executed:

#!/bin/bash
rg -nP --glob '!**/build/**' 'class\s+FileException|\bFileException\(' -C2

Length of output: 2394


🏁 Script executed:

#!/bin/bash
# Display the FileException class definition and its constructor signature
sed -n '1,200p' src/main/kotlin/simplerag/ragback/global/error/CustomException.kt

Length of output: 824


Preserve the original constraint‐violation exception as the cause

The FileException constructor accepts an optional cause: Throwable? parameter, so you can retain the full DB stack trace by passing in the caught exception:

             } catch (ex: DataIntegrityViolationException) {
-                throw FileException(ErrorCode.ALREADY_FILE, sha256)
+                throw FileException(ErrorCode.ALREADY_FILE, sha256, ex)
             }

Alternatively, you could write:

throw FileException(ErrorCode.ALREADY_FILE, sha256)
    .also { it.initCause(ex) }

This will make debugging in production much easier by preserving the original cause.

🧰 Tools
🪛 detekt (1.23.8)

[warning] 62-62: The caught exception is swallowed. The original exception could be lost.

(detekt.exceptions.SwallowedException)

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt
around lines 60 to 64, the catch block throws a new FileException without
preserving the original DataIntegrityViolationException; change the throw to
include the caught exception as the cause (either by passing ex into the
FileException constructor's cause parameter or by calling initCause on the new
exception) so the original DB stack trace is retained for debugging.

@@ -1,16 +1,35 @@
package simplerag.ragback.domain.index.dto

import simplerag.ragback.domain.index.entity.Index
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Layering: DTOs importing domain entities couples API and domain layers

Placing mapping logic in DTO companions introduces a dependency from DTO -> domain. If you aim for a clean layered architecture (API ↔ Mapper ↔ Domain), prefer extension mappers or a dedicated mapper object to decouple layers and ease testing.

Example (extension mappers):

// src/main/kotlin/simplerag/ragback/domain/index/mapper/IndexMappers.kt
package simplerag.ragback.domain.index.mapper

import simplerag.ragback.domain.index.entity.Index
import simplerag.ragback.domain.index.dto.*

fun Index.toPreviewResponse() = IndexPreviewResponse(
    indexId = requireNotNull(id),
    snapshotName = snapshotName,
)

fun Index.toDetailResponse() = IndexDetailResponse(
    indexId = requireNotNull(id),
    snapshotName = snapshotName,
    chunkingSize = chunkingSize,
    overlapSize = overlapSize,
    similarityMetric = similarityMetric,
    topK = topK,
    embeddingModel = embeddingModel,
    reranker = reranker,
)

fun List<Index>.toPreviewResponseList() =
    IndexPreviewResponseList(this.map { it.toPreviewResponse() })

Service usage stays terse: indexes.toPreviewResponseList().

Comment on lines 7 to +16
data class IndexPreviewResponseList(
val indexDetailResponse: List<IndexPreviewResponse>,
)
val indexPreviewResponseList: List<IndexPreviewResponse>,
) {
companion object {
fun toIndexPreviewResponseList(
indexes: List<Index>
): IndexPreviewResponseList {
val indexList = indexes.map { IndexPreviewResponse.toIndexPreviewResponse(it) }
return IndexPreviewResponseList(indexList)
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Sanity-check usages and JSON keys across the repo

Given several recent renames (indexDetailResponseindexPreviewResponseList → proposed items), please verify no stale references remain in code, tests, or docs.


🏁 Script executed:

#!/bin/bash
# From repo root: find potential stale/recent keys/usages.
set -euo pipefail

rg -n -C2 -S '\bindexDetailResponse\b|\"indexDetailResponse\"'
rg -n -C2 -S '\bindexPreviewResponseList\b|\"indexPreviewResponseList\"'
rg -n -C2 -S '\bitems\b|\"items\"'

# Ensure DTOs are only used for serialization (responses), not deserialization.
rg -n -C2 -S 'IndexPreviewResponseList|IndexPreviewResponse|IndexDetailResponse' src/ | sed -n '1,200p'

Length of output: 222


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Search for stale JSON keys and references
rg -n -C2 -e '\bindexDetailResponse\b' -e '"indexDetailResponse"' .
rg -n -C2 -e '\bindexPreviewResponseList\b' -e '"indexPreviewResponseList"' .
rg -n -C2 -e '\bitems\b' -e '"items"' .

# Ensure DTOs are only used for serialization (responses), not deserialization
rg -n -C2 -e 'IndexPreviewResponseList' -e 'IndexPreviewResponse' -e 'IndexDetailResponse' src/

Length of output: 15243


Stale references after recent renames: tests & DTO JSON key

A quick grep confirms that although the core DTO and service layers have been updated, there are still lingering references to the old names and JSON keys that need to be cleaned up:

• In src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (around line 95), the assertion

assertThat(indexes.indexDetailResponse.size)…  

should be updated to

assertThat(indexes.indexPreviewResponseList.size)…  

• The property in IndexPreviewResponseList is still named and serialized as

val indexPreviewResponseList: List<IndexPreviewResponse>

which by default emits the JSON key "indexPreviewResponseList". If your target schema uses "items" instead, you’ll need to either:
– Rename the property to items in IndexPreviewResponseList, or
– Annotate it with @JsonProperty("items") (and update any OpenAPI/docs accordingly).

• Double-check any API-documentation (OpenAPI spec, README, etc.) for stale occurrences of indexDetailResponse, indexPreviewResponseList, or JSON examples still showing the old key names.

Please address these test failures and DTO renames before merging.

Comment on lines 40 to 43
val embeddingModel: EmbeddingModel,
val reranker: Boolean,
) No newline at end of file
) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Boolean name reranker is ambiguous; consider intent-revealing name

reranker: Boolean reads like a noun, not a flag. Prefer isRerankerEnabled (or useReranker) for clarity. Preserve the JSON key with @JsonProperty("reranker") if wire compatibility is required.

Example:

-    val reranker: Boolean,
+    @JsonProperty("reranker")
+    val isRerankerEnabled: Boolean,
@@
-                reranker = index.reranker,
+                isRerankerEnabled = index.reranker,

(Requires import com.fasterxml.jackson.annotation.JsonProperty.)

Also applies to: 45-56

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt around
lines 41 to 43 (and similarly for the other occurrences at 45-56), the Boolean
property name reranker is ambiguous; rename it to an intent-revealing flag such
as isRerankerEnabled (or useReranker) and add @JsonProperty("reranker") to
preserve the existing JSON key; update the import to include
com.fasterxml.jackson.annotation.JsonProperty, change the property name in the
primary constructor and throughout the class/consumers, and run/adjust any
serialization tests or callers to use the new property name while keeping wire
compatibility via the annotation.

@catturtle123 catturtle123 changed the title ♻️ [Refactor] index, dataFile converting refactoring ♻️ Refactor: index, dataFile converting refactoring Aug 21, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
build.gradle (1)

34-39: Remove duplicate Jackson Kotlin module dependency

jackson-module-kotlin is declared twice (Lines 34 and 38). Keep a single declaration.

     implementation("com.fasterxml.jackson.module:jackson-module-kotlin")
-    implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
+    implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
     implementation 'org.springframework.boot:spring-boot-starter-validation'
     implementation 'org.springframework.boot:spring-boot-starter-web'
-    implementation 'com.fasterxml.jackson.module:jackson-module-kotlin'
+    // (duplicate removed)
     implementation 'org.jetbrains.kotlin:kotlin-reflect'
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (1)

147-156: Reduce N queries on link creation by batching existence checks

Current loop issues one exists query per tag. Prefetch existing links once and insert missing in bulk, guarding for races.

     private fun attachTagsIfMissing(dataFile: DataFile, tags: List<Tag>) {
-        val fileId = dataFile.id ?: return
-        tags.forEach { tag ->
-            val tagId = tag.id ?: return@forEach
-            val exists = dataFileTagRepository.existsByDataFileIdAndTagId(fileId, tagId)
-            if (!exists) {
-                dataFileTagRepository.save(DataFileTag(tag = tag, dataFile = dataFile))
-            }
-        }
+        val fileId = dataFile.id ?: return
+        val tagIds = tags.mapNotNull { it.id }
+        if (tagIds.isEmpty()) return
+        // 1 query to fetch existing links
+        val existingIds = dataFileTagRepository
+            .findAllByDataFileIdAndTagIdIn(fileId, tagIds)
+            .mapNotNull { it.tag.id }
+            .toSet()
+        // prepare only missing links
+        val toSave = tags.filter { it.id != null && it.id !in existingIds }
+            .map { DataFileTag(tag = it, dataFile = dataFile) }
+        if (toSave.isNotEmpty()) {
+            // tolerate rare races
+            runCatching { dataFileTagRepository.saveAll(toSave) }
+        }
     }

Note: introduce findAllByDataFileIdAndTagIdIn(fileId: Long, tagIds: Collection<Long>): List<DataFileTag> in DataFileTagRepository.

I can provide the repository method signature and JPQL if you want.

♻️ Duplicate comments (1)
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (1)

60-64: Preserve original DB exception as cause when rethrowing

Re-throwing without the cause loses diagnostics. Pass the caught DataIntegrityViolationException into FileException.

             val dataFile = try {
                 dataFileRepository.save(DataFile.from(meta.title, type, sizeBytes, sha256, fileUrl))
             } catch (ex: DataIntegrityViolationException) {
-                throw FileException(ErrorCode.ALREADY_FILE, sha256)
+                throw FileException(ErrorCode.ALREADY_FILE, sha256, ex)
             }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f36a9ca and 0729fe9.

📒 Files selected for processing (5)
  • build.gradle (1 hunks)
  • src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (2 hunks)
  • src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (2 hunks)
  • src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (3 hunks)
  • src/test/resources/application-test.yml (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (1)
  • indexRepository (15-67)
src/main/kotlin/simplerag/ragback/domain/index/controller/IndexController.kt (1)
  • indexService (11-59)
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (1)
src/test/kotlin/simplerag/ragback/RagBackApplicationTests.kt (1)
  • contextLoads (6-13)
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (2)
src/main/kotlin/simplerag/ragback/domain/document/controller/DataFileController.kt (1)
  • dataFileService (20-64)
src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt (1)
  • dataFilePreviewResponseList (6-8)
🪛 YAMLlint (1.37.1)
src/test/resources/application-test.yml

[error] 7-7: no new line character at the end of file

(new-line-at-end-of-file)

🪛 detekt (1.23.8)
src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt

[warning] 62-62: The caught exception is swallowed. The original exception could be lost.

(detekt.exceptions.SwallowedException)

🔇 Additional comments (5)
src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (1)

105-105: Assertion update matches new DTO contract

Switch to indexPreviewResponseList.size is correct given the service now returns preview lists.

build.gradle (1)

49-51: Good: Testcontainers dependencies added for Boot-managed containers

These lines align the test environment with the new PostgreSQL Testcontainers setup. Spring Boot’s dependency management will pin compatible versions.

src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (1)

53-59: Cleanup strategy LGTM

Post-test cleanup of join table, tag table, files, and S3 stub keeps tests isolated and deterministic.

src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt (2)

88-91: Cursor calculation is concise and correct

Using lastOrNull()?.id with files.hasNext() matches keyset pagination semantics.


82-87: No action needed: tag is already eagerly fetched

The findAllByDataFileIn method in DataFileTagRepository is annotated with @EntityGraph(attributePaths = ["tag"]), which instructs Spring Data JPA to fetch the tag association eagerly and prevents N+1 queries. You can leave the repository implementation as-is without additional changes.

Comment on lines 42 to 46
companion object {
@Container
@ServiceConnection
val postgres = PostgreSQLContainer("postgres:15.3")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Rely on Boot-managed container; add Kotlin generic

Same as the index test: @ServiceConnection is enough; you can drop @Container. Add <Nothing> generic for Kotlin to avoid raw-type usage.

     companion object {
-        @Container
         @ServiceConnection
-        val postgres = PostgreSQLContainer("postgres:15.3")
+        val postgres = PostgreSQLContainer<Nothing>("postgres:15.3")
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
companion object {
@Container
@ServiceConnection
val postgres = PostgreSQLContainer("postgres:15.3")
}
companion object {
@ServiceConnection
val postgres = PostgreSQLContainer<Nothing>("postgres:15.3")
}
🤖 Prompt for AI Agents
In
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt
around lines 42 to 46, remove the explicit @Container annotation and rely on
Spring Boot's @ServiceConnection to manage the testcontainer, and change the
raw-type PostgreSQLContainer to a Kotlin generic by declaring it as
PostgreSQLContainer<Nothing> so it no longer uses a raw type; keep the container
instantiation string unchanged and ensure imports still resolve.

Comment on lines +3 to +5
driver-class-name: org.postgresql.Driver
jpa:
database-platform: org.hibernate.dialect.H2Dialect
database-platform: org.hibernate.dialect.PostgreSQLDialect
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Prefer relying on Spring Boot Testcontainers auto-config instead of hard-coding driver/dialect in tests

With a static container annotated by @Serviceconnection, Spring Boot wires the DataSource from the container. Keeping explicit driver-class-name and database-platform here is redundant and can mask misconfig if the container setup changes. Consider removing them in the test profile to avoid drift.

 spring:
   datasource:
-    driver-class-name: org.postgresql.Driver
   jpa:
-    database-platform: org.hibernate.dialect.PostgreSQLDialect
     hibernate:
       ddl-auto: create-drop
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
driver-class-name: org.postgresql.Driver
jpa:
database-platform: org.hibernate.dialect.H2Dialect
database-platform: org.hibernate.dialect.PostgreSQLDialect
spring:
datasource:
jpa:
hibernate:
ddl-auto: create-drop
🤖 Prompt for AI Agents
In src/test/resources/application-test.yml around lines 3 to 5, the test profile
explicitly sets driver-class-name and jpa.database-platform which is redundant
and can conflict with Spring Boot Testcontainers auto-configuration; remove the
driver-class-name and jpa.database-platform entries from this test YAML so the
DataSource and Hibernate dialect are derived from the container via
@ServiceConnection. After removal, ensure the test container is
annotated/configured with @ServiceConnection and that no other test properties
reintroduce these keys.

database-platform: org.hibernate.dialect.PostgreSQLDialect
hibernate:
ddl-auto: create-drop
ddl-auto: create-drop No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Fix YAML lint: add trailing newline

File lacks a newline at EOF. Add it to satisfy linters and common tooling. No functional impact.

 spring:
   datasource:
     driver-class-name: org.postgresql.Driver
   jpa:
     database-platform: org.hibernate.dialect.PostgreSQLDialect
     hibernate:
       ddl-auto: create-drop
+
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ddl-auto: create-drop
spring:
datasource:
driver-class-name: org.postgresql.Driver
jpa:
database-platform: org.hibernate.dialect.PostgreSQLDialect
hibernate:
ddl-auto: create-drop
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 7-7: no new line character at the end of file

(new-line-at-end-of-file)

🤖 Prompt for AI Agents
In src/test/resources/application-test.yml around line 7, the file is missing a
trailing newline at EOF; add a single newline character after the final line
("ddl-auto: create-drop") so the file ends with a newline to satisfy YAML
linters and tooling.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (2)

26-33: Avoid linear scans on every lookup; precompute a map for O(1) findByModelId.

This lookup may sit on a hot path in request validation. A tiny optimization keeps things simple and faster.

Apply this diff:

 companion object {
-        fun findByModelId(modelId: String): EmbeddingModel? {
-            return entries.find { it.modelId == modelId }
-        }
-
-        fun getAllModelIds(): List<String> {
-            return entries.map { it.modelId }
-        }
+        private val byModelId: Map<String, EmbeddingModel> = entries.associateBy { it.modelId }
+
+        @JvmStatic
+        fun findByModelId(modelId: String): EmbeddingModel? = byModelId[modelId]
+
+        @JvmStatic
+        fun getAllModelIds(): List<String> = byModelId.keys.toList()
 }

3-6: Model the dimension as “max/default dimension” and support an optional override.

OpenAI’s TE3 models can return reduced-dimension embeddings via a “dimensions” parameter. Encoding a single fixed dim in the enum can become a footgun if you introduce overrides later.

Two light options (either is fine now, keeps future changes small):

  • Rename dim to maxDim and add a small helper:
    • fun effectiveDim(override: Int?): Int = override ?: maxDim
  • Or keep dim, but document it as the default dimension and validate any override against it when creating an Index.

If helpful, I can follow up with a minimal change set in IndexCreateRequest/Index to carry an optional dimensionsOverride and validate it against the chosen model.

When using larger vectors (3072), consider revisiting pgvector index parameters (e.g., IVF lists or HNSW M/efConstruction) as memory and build time scale with dimension.

build.gradle (1)

34-39: Deduplicate jackson-module-kotlin dependency.

It appears twice; keep a single entry to avoid confusion.

Apply this diff (remove the second occurrence):

     implementation("com.fasterxml.jackson.module:jackson-module-kotlin")
@@
-    implementation 'com.fasterxml.jackson.module:jackson-module-kotlin'
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (2)

223-235: Reduce test brittleness by avoiding assumptions about S3 key format.

The assertion depends on the current key-naming convention. If S3 key generation changes, this test will break even if behavior (compensating delete on rollback) remains correct. Prefer asserting the fake S3 is empty or that no objects exist for the computed sha without depending on the exact key.

Example alternative (if FakeS3Util exposes listing):

assertTrue(s3Util.list().isEmpty(), "롤백 시 S3에 어떠한 객체도 남아있지 않아야 합니다")

33-41: Action Required: Enable constructor injection in DataFileServiceTest

DataFileServiceTest currently relies on parameter-level @Autowired without instructing Spring to use the primary constructor. Unlike IndexServiceTest—which is annotated with @TestConstructor(autowireMode = ALL)—this class won’t be instantiated properly by SpringExtension unless you explicitly opt in. Please update as follows in src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt:

Option A: Mirror IndexServiceTest with @TestConstructor

 @SpringBootTest
 @ActiveProfiles("test")
+@TestConstructor(autowireMode = TestConstructor.AutowireMode.ALL)
-class DataFileServiceTest(
-    @Autowired private val dataFileService: DataFileService,
-    @Autowired private val dataFileRepository: DataFileRepository,
-    @Autowired private val tagRepository: TagRepository,
-    @Autowired private val dataFileTagRepository: DataFileTagRepository,
-    @Autowired private val s3Util: FakeS3Util
+class DataFileServiceTest(
+    private val dataFileService: DataFileService,
+    private val dataFileRepository: DataFileRepository,
+    private val tagRepository: TagRepository,
+    private val dataFileTagRepository: DataFileTagRepository,
+    private val s3Util: FakeS3Util
 ) {

Add import:

import org.springframework.test.context.TestConstructor

Option B: Annotate the constructor directly

-class DataFileServiceTest(
+class DataFileServiceTest @Autowired constructor(
     private val dataFileService: DataFileService,
     private val dataFileRepository: DataFileRepository,
     private val tagRepository: TagRepository,
     private val dataFileTagRepository: DataFileTagRepository,
     private val s3Util: FakeS3Util
 ) {

Either approach ensures Spring will wire the constructor parameters. Verify that the test runner can instantiate this class via SpringExtension.

src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (1)

83-84: Avoid asserting on localized exception messages; assert on ErrorCode instead.

Message text is brittle and i18n-dependent. Prefer asserting on the exception’s error code.

Apply these diffs:

CreateIndex overlap:

-        val message = assertThrows<IndexException> {
-            indexService.createIndex(indexCreateRequest)
-        }.message
-
-        assertEquals("overlap 크기는 chunking 크기를 넘을 수 없습니다.", message)
+        val ex = assertThrows<IndexException> {
+            indexService.createIndex(indexCreateRequest)
+        }
+        assertEquals(ErrorCode.OVERLAP_OVERFLOW, ex.errorCode)

GetIndex not found:

-        val message = assertThrows<IndexException> { indexService.getIndex(savedIndex.id!! + 1L) }.message
-
-        assertEquals("리소스를 찾을 수 없습니다.", message)
+        val ex = assertThrows<IndexException> { indexService.getIndex(savedIndex.id!! + 1L) }
+        assertEquals(ErrorCode.NOT_FOUND, ex.errorCode)

UpdateIndex not found:

-        val message =
-            assertThrows<IndexException> { indexService.updateIndex(savedIndex.id!! + 1L, indexUpdateRequest) }.message
-
-        assertEquals("리소스를 찾을 수 없습니다.", message)
+        val ex = assertThrows<IndexException> { indexService.updateIndex(savedIndex.id!! + 1L, indexUpdateRequest) }
+        assertEquals(ErrorCode.NOT_FOUND, ex.errorCode)

UpdateIndex overlap:

-        val message = assertThrows<IndexException> {
-            indexService.updateIndex(savedIndex.id!!, indexUpdateRequest)
-        }.message
-
-        assertEquals("overlap 크기는 chunking 크기를 넘을 수 없습니다.", message)
+        val ex = assertThrows<IndexException> {
+            indexService.updateIndex(savedIndex.id!!, indexUpdateRequest)
+        }
+        assertEquals(ErrorCode.OVERLAP_OVERFLOW, ex.errorCode)

Add the import if missing:

import simplerag.ragback.global.error.ErrorCode

Also applies to: 169-170, 229-230, 255-255

♻️ Duplicate comments (2)
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (1)

44-55: Prefer PostgreSQLContainer for type-safety; keep relying on @Serviceconnection.

Good use of Boot-managed container + pgvector image. Minor nit: avoid star-projection by using Kotlin’s Nothing generic.

Apply this diff:

-        @ServiceConnection
-        val postgres: PostgreSQLContainer<*> =
-            PostgreSQLContainer(pgvectorImage).apply {
+        @ServiceConnection
+        val postgres: PostgreSQLContainer<Nothing> =
+            PostgreSQLContainer<Nothing>(pgvectorImage).apply {
                 withInitScript("db/init.sql")
             }
src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (1)

36-47: Prefer PostgreSQLContainer; consider a shared base to reuse the container.

  • Switch from star-projection to for clarity.
  • Optional: Extract this container into a shared abstract base test so DataFile/Index tests reuse a single Boot-managed container, cutting startup time.

Apply this diff:

-        @ServiceConnection
-        val postgres: PostgreSQLContainer<*> =
-            PostgreSQLContainer(pgvectorImage).apply {
+        @ServiceConnection
+        val postgres: PostgreSQLContainer<Nothing> =
+            PostgreSQLContainer<Nothing>(pgvectorImage).apply {
                 withInitScript("db/init.sql")
             }

Shared base sketch:

@SpringBootTest
@ActiveProfiles("test")
abstract class PostgresContainerBase {
    companion object {
        @ServiceConnection
        val postgres = PostgreSQLContainer<Nothing>(
            DockerImageName.parse("pgvector/pgvector:pg16").asCompatibleSubstituteFor("postgres")
        ).apply { withInitScript("db/init.sql") }
    }
}

Then have tests extend PostgresContainerBase.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 0729fe9 and 6ecf023.

📒 Files selected for processing (5)
  • build.gradle (2 hunks)
  • src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (1 hunks)
  • src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (2 hunks)
  • src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (2 hunks)
  • src/test/resources/db/init.sql (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (2)
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (3)
  • name (8-42)
  • newVec (36-38)
  • updateEmbedding (35-40)
src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (1)
  • { "Embedding must not be empty; expected fixed dimension (e.g., 1536)" } (10-10)
src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (1)
src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt (1)
  • indexRepository (15-67)
🔇 Additional comments (6)
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (1)

9-9: The next step is to inspect your Spring configuration to confirm how Hibernate schema generation is handled (e.g., spring.jpa.hibernate.ddl-auto) and whether any vector dimension settings are defined in application.yml or application-local.yml. This will let us determine if the new 3072-dim embedding column is created correctly at runtime or via migrations. Please review the upcoming script output.

src/test/resources/db/init.sql (1)

1-1: Vector extension init is correct and idempotent.

This will make pgvector available in tests without failing when rerun. Ensure production environments that require vector also create the extension via migrations (Flyway/Liquibase) rather than relying only on the test init script.

build.gradle (2)

49-52: Testcontainers setup is appropriate.

spring-boot-testcontainers + JUnit Jupiter + PostgreSQL modules are the right trio for the new Postgres-backed tests.


61-63: Confirm single source of pgvector truth.

You already declare pgvector here for the app. After removing the testImplementation duplicate (see above), this becomes the single managed location.

If the project uses pgvector only in tests, move this to testImplementation instead; otherwise leave as is for runtime use.

src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt (1)

62-68: Cleanup moved to @AfterEach is correct for isolation.

Running repository and S3 cleanups after each test prevents cross-test contamination and keeps intent clear.

src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt (1)

117-117: Assertion updated to preview list size — LGTM.

Matches the refactor to IndexPreviewResponseList.

Comment on lines +47 to +52
implementation 'org.postgresql:postgresql'

testImplementation "org.springframework.boot:spring-boot-testcontainers"
testImplementation "org.testcontainers:postgresql"
testImplementation "org.testcontainers:junit-jupiter"
testImplementation "com.pgvector:pgvector:0.1.6"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Use runtimeOnly for JDBC driver and drop duplicate pgvector test dep.

  • JDBC drivers should be runtimeOnly for JPA; no need on the compile classpath.
  • You declare pgvector in both implementation and testImplementation; keep a single declaration to avoid version drift.

Apply this diff:

-    implementation 'org.postgresql:postgresql'
+    runtimeOnly 'org.postgresql:postgresql'

-    testImplementation "org.springframework.boot:spring-boot-testcontainers"
-    testImplementation "org.testcontainers:postgresql"
-    testImplementation "org.testcontainers:junit-jupiter"
-    testImplementation "com.pgvector:pgvector:0.1.6"
+    testImplementation "org.springframework.boot:spring-boot-testcontainers"
+    testImplementation "org.testcontainers:postgresql"
+    testImplementation "org.testcontainers:junit-jupiter"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
implementation 'org.postgresql:postgresql'
testImplementation "org.springframework.boot:spring-boot-testcontainers"
testImplementation "org.testcontainers:postgresql"
testImplementation "org.testcontainers:junit-jupiter"
testImplementation "com.pgvector:pgvector:0.1.6"
// Use runtimeOnly for the JDBC driver instead of compile‐time implementation
runtimeOnly 'org.postgresql:postgresql'
// Keep only CI/test containers deps here—pgvector comes transitively from the main implementation
testImplementation "org.springframework.boot:spring-boot-testcontainers"
testImplementation "org.testcontainers:postgresql"
testImplementation "org.testcontainers:junit-jupiter"
🤖 Prompt for AI Agents
In build.gradle around lines 47 to 52, change the PostgreSQL dependency from
implementation 'org.postgresql:postgresql' to runtimeOnly
'org.postgresql:postgresql' and remove the duplicate pgvector testImplementation
declaration (keep only one pgvector dependency declaration in the appropriate
configuration—either a single implementation entry if it's needed at compile
time or a single testImplementation if it's only used in tests) so the JDBC
driver is runtime-only and pgvector is declared once to avoid version drift.

@catturtle123 catturtle123 merged commit 39da5d8 into develop Aug 21, 2025
1 check was pending
@catturtle123 catturtle123 deleted the refactor/#12-datafile-converter branch August 21, 2025 12:56
@coderabbitai coderabbitai bot mentioned this pull request Aug 22, 2025
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chore other work

Projects

None yet

Development

Successfully merging this pull request may close these issues.

♻️ [Refactor] dataFile 로직 converter로 수정

2 participants