Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -46,20 +46,20 @@ dependencies {

// postgresql
implementation 'org.postgresql:postgresql'
implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
implementation 'com.pgvector:pgvector:0.1.6'

// test
testImplementation "org.springframework.boot:spring-boot-testcontainers"
testImplementation "org.testcontainers:postgresql"
testImplementation "org.testcontainers:junit-jupiter"
testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
testImplementation 'com.pgvector:pgvector:0.1.6'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿงน Nitpick (assertive)

Avoid version drift; centralize pgvector version and deduplicate test dependency.

Declare the version once and reuse it for both main and test scopes. Also, you typically donโ€™t need testImplementation if implementation already pulls it transitively for tests.

Apply:

-    implementation 'com.pgvector:pgvector:0.1.6'
+    def pgvectorVersion = '0.1.6'
+    implementation "com.pgvector:pgvector:${pgvectorVersion}"
...
-    testImplementation 'com.pgvector:pgvector:0.1.6'
+    // If tests require explicit access, keep this; otherwise remove.
+    testImplementation "com.pgvector:pgvector:${pgvectorVersion}"
๐Ÿค– Prompt for AI Agents
In build.gradle around line 55, the pgvector version is hardcoded for
testImplementation which risks version drift and duplicates the dependency;
declare a single version variable (e.g., ext or extra property like
pgvectorVersion = '0.1.6') and use it for the implementation dependency
(implementation "com.pgvector:pgvector:${pgvectorVersion}"), then remove the
testImplementation line unless tests need a separate scopeโ€”if tests require it,
reference the same variable (testImplementation
"com.pgvector:pgvector:${pgvectorVersion}") so the version is centralized and
not duplicated.


// swagger
implementation 'org.springdoc:springdoc-openapi-starter-webmvc-ui:2.7.0'

// s3
implementation(platform("software.amazon.awssdk:bom:2.25.70"))
implementation("software.amazon.awssdk:s3")
// extractor
implementation 'org.apache.pdfbox:pdfbox:2.0.30'
implementation 'org.apache.poi:poi-ooxml:5.2.5'
}

dependencyManagement {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,13 @@ data class DataFileBulkCreateRequest(
@field:Size(min = 1, message = "์ตœ์†Œ ํ•˜๋‚˜ ์ด์ƒ ์—…๋กœ๋“œํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค")
@Valid
val items: List<DataFileCreateItem>
)
) {
data class DataFileCreateItem(
@field:NotBlank(message = "title์€ ๋น„์–ด์žˆ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค")
@field:Size(max = 100)
val title: String,

data class DataFileCreateItem(
@field:NotBlank(message = "title์€ ๋น„์–ด์žˆ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค")
@field:Size(max = 100)
val title: String,

@field:Size(max = 10, message = "ํƒœ๊ทธ๋Š” ์ตœ๋Œ€ 10๊ฐœ๊นŒ์ง€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค")
val tags: List<String> = emptyList()
)
@field:Size(max = 10, message = "ํƒœ๊ทธ๋Š” ์ตœ๋Œ€ 10๊ฐœ๊นŒ์ง€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค")
val tags: List<String> = emptyList()
)
Comment on lines +12 to +19
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿงน Nitpick (assertive)

Optional: Strengthen validation on tags to prevent blank/oversized tag values

Currently only the list size is validated. Consider enforcing per-tag constraints.

Apply:

     data class DataFileCreateItem(
         @field:NotBlank(message = "title์€ ๋น„์–ด์žˆ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค")
         @field:Size(max = 100)
         val title: String,
 
-        @field:Size(max = 10, message = "ํƒœ๊ทธ๋Š” ์ตœ๋Œ€ 10๊ฐœ๊นŒ์ง€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค")
-        val tags: List<String> = emptyList()
+        @field:Size(max = 10, message = "ํƒœ๊ทธ๋Š” ์ตœ๋Œ€ 10๊ฐœ๊นŒ์ง€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค")
+        val tags: List<@NotBlank(message = "๋นˆ ํƒœ๊ทธ๋Š” ํ—ˆ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค") @Size(max = 30, message = "ํƒœ๊ทธ๋Š” ์ตœ๋Œ€ 30์ž๊นŒ์ง€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค") String> = emptyList()
     )
๐Ÿ“ Committable suggestion

โ€ผ๏ธ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
data class DataFileCreateItem(
@field:NotBlank(message = "title์€ ๋น„์–ด์žˆ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค")
@field:Size(max = 100)
val title: String,
data class DataFileCreateItem(
@field:NotBlank(message = "title์€ ๋น„์–ด์žˆ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค")
@field:Size(max = 100)
val title: String,
@field:Size(max = 10, message = "ํƒœ๊ทธ๋Š” ์ตœ๋Œ€ 10๊ฐœ๊นŒ์ง€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค")
val tags: List<String> = emptyList()
)
@field:Size(max = 10, message = "ํƒœ๊ทธ๋Š” ์ตœ๋Œ€ 10๊ฐœ๊นŒ์ง€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค")
val tags: List<String> = emptyList()
)
data class DataFileCreateItem(
@field:NotBlank(message = "title์€ ๋น„์–ด์žˆ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค")
@field:Size(max = 100)
val title: String,
@field:Size(max = 10, message = "ํƒœ๊ทธ๋Š” ์ตœ๋Œ€ 10๊ฐœ๊นŒ์ง€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค")
val tags: List<@NotBlank(message = "๋นˆ ํƒœ๊ทธ๋Š” ํ—ˆ์šฉ๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค") @Size(max = 30, message = "ํƒœ๊ทธ๋Š” ์ตœ๋Œ€ 30์ž๊นŒ์ง€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค") String> = emptyList()
)
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileRequestDTO.kt
around lines 12 to 19, strengthen validation on the tags list by adding
per-element constraints so individual tags cannot be blank or too long; change
the tags declaration to use type-use annotations (e.g. List<@field:NotBlank
@field:Size(max = 30) String> or preferably List<@NotBlank @Size(max = 30)
String>) and add @field:Valid if needed, update/adjust validation messages and
imports accordingly so each tag is validated for non-blank content and a maximum
length (choose an appropriate max like 30).

}
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ data class DataFilePreviewResponse(
companion object {
fun from(file: DataFile): DataFilePreviewResponse =
DataFilePreviewResponse(
id = requireNotNull(file.id) { "DataFile.id is null" },
id = file.id,
sha256 = file.sha256,
)
Comment on lines +21 to 23
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

โš ๏ธ Potential issue

Map id defensively; entity ids can be null before persistence.

After switching entity id to nullable (recommended), avoid propagating a null or โ€œ0โ€ id to API responses.

Apply:

-                id = file.id,
+                id = requireNotNull(file.id) { "DataFile.id must be assigned" },
                 sha256 = file.sha256,
๐Ÿ“ Committable suggestion

โ€ผ๏ธ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
id = file.id,
sha256 = file.sha256,
)
id = requireNotNull(file.id) { "DataFile.id must be assigned" },
sha256 = file.sha256,
)
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/dto/DataFileResponseDTO.kt
around lines 21 to 23, the code maps file.id directly but entity ids can be null
before persistence; change the mapping to handle nulls defensively: either make
DataFileResponseDTO.id nullable (Long?) and assign id = file.id ?: null, or if
the API requires a non-null id, fail fast by checking file.id and throw an
IllegalStateException with a clear message when it's nullโ€”update the DTO type
and callers accordingly so no null or sentinel โ€œ0โ€ value is returned in
responses.

}
Expand All @@ -40,33 +40,33 @@ data class DataFileDetailResponseList(
hasNext = hasNext,
)
}
}

data class DataFileDetailResponse(
var id: Long?,
val title: String,
val type: String,
val lastModified: LocalDateTime,
val sizeMB: Double,
val tags: List<TagDTO>,
val sha256: String,
) {
companion object {
fun from(file: DataFile, tags: List<TagDTO>): DataFileDetailResponse =
DataFileDetailResponse(
id = requireNotNull(file.id) { "DataFile.id is null" },
title = file.title,
type = file.type,
lastModified = file.updatedAt,
sizeMB = file.sizeBytes.toMegaBytes(2),
tags = tags,
sha256 = file.sha256,
)
data class DataFileDetailResponse(
val id: Long,
val title: String,
val type: String,
val lastModified: LocalDateTime,
val sizeMB: Double,
val tags: List<TagDTO>,
val sha256: String,
) {
companion object {
fun from(file: DataFile, tags: List<TagDTO>): DataFileDetailResponse =
DataFileDetailResponse(
id = file.id,
title = file.title,
type = file.type,
lastModified = file.updatedAt,
sizeMB = file.sizeBytes.toMegaBytes(2),
tags = tags,
sha256 = file.sha256,
)
}
}
}

data class TagDTO(
val id: Long?,
val id: Long,
val name: String,
) {
companion object {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,16 +22,17 @@ class DataFile(
@Column(nullable = false, length = 64)
val sha256: String,

@Column(nullable = false, length = 2048, name = "file_url")
val fileUrl: String,

@Column(nullable = false)
@Lob
val content: String,
) : BaseEntity() {
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "data_files_id")
val id: Long? = null,
) : BaseEntity() {
val id: Long = 0

Comment on lines 29 to +32
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

โš ๏ธ Potential issue

Generated ID cannot reliably be a non-null val initialized to 0; use nullable or mutable id.

Hibernate needs to assign the identifier after insert. val id: Long = 0 risks persistence or hydration issues. The common pattern is var id: Long? = null. This also aligns with DTOs that can assert non-null post-persist.

Apply:

-    @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
-    @Column(name = "data_files_id")
-    val id: Long = 0
+    @Id
+    @GeneratedValue(strategy = GenerationType.IDENTITY)
+    @Column(name = "data_files_id")
+    var id: Long? = null

Follow-up: update DTO mappers to require a non-null id at mapping time (see comments in DataFileResponseDTO.kt).

๐Ÿ“ Committable suggestion

โ€ผ๏ธ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "data_files_id")
val id: Long? = null,
) : BaseEntity() {
val id: Long = 0
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "data_files_id")
var id: Long? = null
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt around
lines 29-32, the entity declares a non-null val id initialized to 0 which
prevents Hibernate from assigning the generated identifier correctly; change the
declaration to a nullable mutable property (e.g., var id: Long? = null) so the
persistence provider can set the ID after insert, and then update DTO mappers to
treat the entity ID as nullable at mapping time and assert non-null only after
persist (per comment in DataFileResponseDTO.kt).

companion object {
fun from(title: String, type: String, sizeBytes : Long, sha256 : String, fileUrl: String): DataFile {
return DataFile(title, type, sizeBytes, sha256, fileUrl)
fun from(title: String, type: String, sizeBytes : Long, sha256 : String, content: String): DataFile {
return DataFile(title, type, sizeBytes, sha256, content)
}
Comment on lines +34 to 36
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿงน Nitpick (assertive)

Factory method aligns with new model; consider basic invariants.

Add lightweight checks to guard against invalid inputs, e.g., blank title/type, negative sizes.

Example:

-        fun from(title: String, type: String, sizeBytes : Long, sha256 : String, content: String): DataFile {
-            return DataFile(title, type, sizeBytes, sha256, content)
-        }
+        fun from(title: String, type: String, sizeBytes: Long, sha256: String, content: String): DataFile {
+            require(title.isNotBlank()) { "title must not be blank" }
+            require(type.isNotBlank()) { "type must not be blank" }
+            require(sizeBytes >= 0) { "sizeBytes must be >= 0" }
+            require(sha256.length == 64) { "sha256 must be 64-char hex" }
+            return DataFile(title, type, sizeBytes, sha256, content)
+        }
๐Ÿ“ Committable suggestion

โ€ผ๏ธ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fun from(title: String, type: String, sizeBytes : Long, sha256 : String, content: String): DataFile {
return DataFile(title, type, sizeBytes, sha256, content)
}
fun from(title: String, type: String, sizeBytes: Long, sha256: String, content: String): DataFile {
require(title.isNotBlank()) { "title must not be blank" }
require(type.isNotBlank()) { "type must not be blank" }
require(sizeBytes >= 0) { "sizeBytes must be >= 0" }
require(sha256.length == 64) { "sha256 must be 64-char hex" }
return DataFile(title, type, sizeBytes, sha256, content)
}
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/entity/DataFile.kt around
lines 34 to 36, the factory method does not validate inputs; add lightweight
invariants to guard against invalid data by checking: title and type are not
blank (trim before check), sizeBytes is >= 0, sha256 is not blank (optionally
validate length/hex if desired), and content is non-null (allow empty if
acceptable); enforce these using Kotlin require(...) or throw
IllegalArgumentException with clear messages, and return the DataFile only after
all checks pass.

}
}
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,15 @@ import simplerag.ragback.global.entity.BaseEntity
uniqueConstraints = [UniqueConstraint(columnNames = ["data_files_id", "tags_id"])]
)
class DataFileTag(

@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "tags_id", nullable = false)
@JoinColumn(name = "tags_id")
var tag: Tag,

Comment on lines +13 to 15
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

โš ๏ธ Potential issue

FK nullability removed โ€” make the Tag relation non-nullable.

This link table should not allow rows without a Tag. Removing nullable = false (and leaving optional=true by default) risks inserting rows with tags_id = NULL. In Postgres, NULLs do not participate in unique constraints, so multiple โ€œ(data_files_id, NULL)โ€ rows would be allowed, defeating the unique pair constraint.

Apply this to keep relational integrity:

-    @ManyToOne(fetch = FetchType.LAZY)
-    @JoinColumn(name = "tags_id")
+    @ManyToOne(fetch = FetchType.LAZY, optional = false)
+    @JoinColumn(name = "tags_id", nullable = false)
     var tag: Tag,
๐Ÿ“ Committable suggestion

โ€ผ๏ธ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@JoinColumn(name = "tags_id")
var tag: Tag,
@ManyToOne(fetch = FetchType.LAZY, optional = false)
@JoinColumn(name = "tags_id", nullable = false)
var tag: Tag,
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/entity/DataFileTag.kt
around lines 13-15, the Tag relation currently allows a null foreign key which
can permit rows with tags_id = NULL and break the unique constraint; make the
relation non-nullable by adding nullable = false to the @JoinColumn and setting
optional = false on the @ManyToOne (or equivalent) annotation so the JPA mapping
enforces NOT NULL for tags_id and prevents inserting rows without a Tag.

@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "data_files_id", nullable = false)
var dataFile: DataFile,

@JoinColumn(name = "data_files_id")
var dataFile: DataFile
) : BaseEntity() {
Comment on lines +17 to +19
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

โš ๏ธ Potential issue

FK nullability removed โ€” make the DataFile relation non-nullable.

Same rationale as for Tag. The join row must always reference a DataFile.

-    @ManyToOne(fetch = FetchType.LAZY)
-    @JoinColumn(name = "data_files_id")
-    var dataFile: DataFile
+    @ManyToOne(fetch = FetchType.LAZY, optional = false)
+    @JoinColumn(name = "data_files_id", nullable = false)
+    var dataFile: DataFile

Committable suggestion skipped: line range outside the PR's diff.

๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/entity/DataFileTag.kt
around lines 17-19, the foreign-key column must be non-nullable and the relation
should be enforced as non-optional; add a @ManyToOne(optional = false)
annotation on the dataFile property, change the @JoinColumn to include nullable
= false (e.g. @JoinColumn(name = "data_files_id", nullable = false)), and ensure
the Kotlin type remains non-nullable (DataFile) so JPA and the DB require a
DataFile for every DataFileTag.

@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "data_files_tags_id")
val id: Long? = null,
) : BaseEntity()
val id: Long = 0
}
Comment on lines +22 to +23
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿ› ๏ธ Refactor suggestion

โš ๏ธ Potential issue

Primary key should be nullable for JPA; avoid sentinel 0.

Using val id: Long = 0 makes new, unsaved entities indistinguishable from persisted ones in equality/hash semantics and can confuse Hibernateโ€™s dirty checking. Kotlin + JPA best practice is var id: Long? = null.

-    val id: Long = 0
+    var id: Long? = null

If other entities in this PR shifted to non-null id = 0, consider reverting those too for consistency with JPA identity semantics.

๐Ÿ“ Committable suggestion

โ€ผ๏ธ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val id: Long = 0
}
var id: Long? = null
}
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/entity/DataFileTag.kt
around lines 22 to 23, the primary key is declared as a non-null sentinel value
`val id: Long = 0`; change it to a nullable, mutable id to follow JPA
conventions by replacing with `var id: Long? = null` so new unsaved entities are
distinguishable from persisted ones and Hibernate identity/dirty checking
behaves correctly; if other entities in this PR used `id = 0`, update them
similarly for consistency.

Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ class Tag(

@Column(nullable = false, length = 60)
val name: String,

) : BaseEntity() {
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "tags_id")
val id: Long? = null,
) : BaseEntity()
val id: Long = 0
}
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@ import org.springframework.dao.DataIntegrityViolationException
import org.springframework.data.domain.PageRequest
import org.springframework.stereotype.Service
import org.springframework.transaction.annotation.Transactional
import org.springframework.transaction.support.TransactionSynchronization
import org.springframework.transaction.support.TransactionSynchronizationManager
import org.springframework.web.multipart.MultipartFile
import simplerag.ragback.domain.document.dto.*
import simplerag.ragback.domain.document.entity.DataFile
Expand All @@ -17,18 +15,18 @@ import simplerag.ragback.domain.document.repository.TagRepository
import simplerag.ragback.global.error.CustomException
import simplerag.ragback.global.error.ErrorCode
import simplerag.ragback.global.error.FileException
import simplerag.ragback.global.util.s3.S3Type
import simplerag.ragback.global.util.s3.S3Util
import simplerag.ragback.global.util.converter.computeMetricsStreaming
import simplerag.ragback.global.util.converter.resolveContentType
import simplerag.ragback.global.util.extractor.ContentExtractorResolver
import java.util.*

@Service
@Transactional(readOnly = true)
class DataFileService(
private val dataFileRepository: DataFileRepository,
private val tagRepository: TagRepository,
private val dataFileTagRepository: DataFileTagRepository,
private val s3Util: S3Util,
private val contentExtractorResolver: ContentExtractorResolver
) {

@Transactional
Expand All @@ -40,9 +38,6 @@ class DataFileService(
throw CustomException(ErrorCode.INVALID_INPUT)
}

val uploadedUrls = mutableListOf<String>()
registerRollbackCleanup(uploadedUrls)

val responses = files.mapIndexed { idx, file ->
val meta = req.items[idx]
val metrics = file.computeMetricsStreaming()
Expand All @@ -54,25 +49,23 @@ class DataFileService(
throw FileException(ErrorCode.ALREADY_FILE, sha256)
}

val fileUrl = s3Util.upload(file, S3Type.ORIGINAL_FILE)
uploadedUrls += fileUrl
val content = contentExtractorResolver.extractContent(file, type)

val dataFile = try {
dataFileRepository.save(DataFile.from(meta.title, type, sizeBytes, sha256, fileUrl))
dataFileRepository.save(DataFile.from(meta.title, type, sizeBytes, sha256, content))
} catch (ex: DataIntegrityViolationException) {
Comment on lines +52 to 56
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿ› ๏ธ Refactor suggestion

Preserve root cause when mapping integrity violation to domain error

You correctly guard for races using unique index + catch. However, the original exception is lost, hindering diagnostics and tripping static analysis (swallowed exception).

-            val dataFile = try {
-                dataFileRepository.save(DataFile.from(meta.title, type, sizeBytes, sha256, content))
-            } catch (ex: DataIntegrityViolationException) {
-                throw FileException(ErrorCode.ALREADY_FILE, sha256)
-            }
+            val dataFile = try {
+                dataFileRepository.save(DataFile.from(meta.title, type, sizeBytes, sha256, content))
+            } catch (ex: DataIntegrityViolationException) {
+                // keep cause for observability; preserves stack and satisfies static analysis
+                throw FileException(ErrorCode.ALREADY_FILE, sha256).also { it.initCause(ex) }
+            }

If FileException supports a cause constructor, prefer:

throw FileException(ErrorCode.ALREADY_FILE, sha256, ex)
๐Ÿงฐ Tools
๐Ÿช› detekt (1.23.8)

[warning] 56-56: The caught exception is swallowed. The original exception could be lost.

(detekt.exceptions.SwallowedException)

๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt
around lines 52 to 56, the catch block for DataIntegrityViolationException
currently throws a FileException without attaching the original exception,
losing the root cause; update the throw to pass the caught exception as the
cause (e.g., use FileException(ErrorCode.ALREADY_FILE, sha256, ex)) so the
original DataIntegrityViolationException is preserved for diagnostics and static
analysis; if FileException lacks a cause-taking constructor, add one that
accepts a Throwable cause and use it here.

throw FileException(ErrorCode.ALREADY_FILE, sha256)
}

val tags = getOrCreateTags(meta.tags)
attachTagsIfMissing(dataFile, tags)

DataFilePreviewResponse.from(dataFile)
return@mapIndexed DataFilePreviewResponse.from(dataFile)
}

return DataFilePreviewResponseList(responses)
}

@Transactional(readOnly = true)
fun getDataFiles(
cursor: Long,
take: Int
Expand All @@ -82,7 +75,7 @@ class DataFileService(
val allLinks = dataFileTagRepository.findAllByDataFileIn(files.content)
val tagsByFileId: Map<Long, List<TagDTO>> =
allLinks.groupBy(
keySelector = { requireNotNull(it.dataFile.id) { "DataFile.id is null" } }
{ it.dataFile.id }
).mapValues { (_, links) -> TagDTO.from(links) }

val nextCursor = files.content.lastOrNull()?.id
Expand All @@ -92,28 +85,17 @@ class DataFileService(

@Transactional
fun deleteFile(dataFilesId: Long) {
val dataFile = dataFileRepository.findDataFileById(dataFilesId) ?: throw FileException(
ErrorCode.NOT_FOUND,
dataFilesId.toString()
)
val dataFile = dataFileRepository.findDataFileById(dataFilesId)
?: throw FileException(
ErrorCode.NOT_FOUND,
dataFilesId.toString()
)

dataFileTagRepository.deleteAllByDataFile(dataFile)

dataFileRepository.delete(dataFile)
}

private fun registerRollbackCleanup(uploadedUrls: MutableList<String>) {
if (TransactionSynchronizationManager.isSynchronizationActive()) {
TransactionSynchronizationManager.registerSynchronization(object : TransactionSynchronization {
override fun afterCompletion(status: Int) {
if (status == TransactionSynchronization.STATUS_ROLLED_BACK) {
uploadedUrls.forEach { runCatching { s3Util.deleteByUrl(it) } }
}
}
})
}
}


private fun getOrCreateTags(names: List<String>): List<Tag> {
val normalized = names
Expand Down Expand Up @@ -145,14 +127,14 @@ class DataFileService(


private fun attachTagsIfMissing(dataFile: DataFile, tags: List<Tag>) {
val fileId = dataFile.id ?: return
tags.forEach { tag ->
val tagId = tag.id ?: return@forEach
val exists = dataFileTagRepository.existsByDataFileIdAndTagId(fileId, tagId)
if (!exists) {
dataFileTagRepository.save(DataFileTag(tag = tag, dataFile = dataFile))
}
val temp = tags.mapNotNull { tag ->
val exists = dataFileTagRepository.existsByDataFileIdAndTagId(dataFile.id, tag.id)

if (exists) return@mapNotNull null

return@mapNotNull DataFileTag(tag = tag, dataFile = dataFile)
}
}

dataFileTagRepository.saveAll(temp)
}
Comment on lines +130 to +139
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿงน Nitpick (assertive)

Reduce N+1 existence checks when attaching tags

Current code performs one existsBy... query per tag. Batch-fetch existing links once and compute the delta to minimize round-trips.

-    private fun attachTagsIfMissing(dataFile: DataFile, tags: List<Tag>) {
-        val temp = tags.mapNotNull { tag ->
-            val exists = dataFileTagRepository.existsByDataFileIdAndTagId(dataFile.id, tag.id)
-
-            if (exists) return@mapNotNull null
-
-            return@mapNotNull DataFileTag(tag = tag, dataFile = dataFile)
-        }
-
-        dataFileTagRepository.saveAll(temp)
-    }
+    private fun attachTagsIfMissing(dataFile: DataFile, tags: List<Tag>) {
+        if (tags.isEmpty()) return
+        // Fetch once
+        val existingLinks = dataFileTagRepository.findAllByDataFileIdAndTagIdIn(
+            dataFile.id,
+            tags.map { it.id }
+        )
+        val existingTagIds = existingLinks.map { it.tag.id }.toHashSet()
+        val toCreate = tags
+            .asSequence()
+            .filter { it.id !in existingTagIds }
+            .map { DataFileTag(tag = it, dataFile = dataFile) }
+            .toList()
+        if (toCreate.isNotEmpty()) dataFileTagRepository.saveAll(toCreate)
+    }

Add repository method (outside this hunk) if it doesnโ€™t exist:

interface DataFileTagRepository {
    fun findAllByDataFileIdAndTagIdIn(dataFileId: Long, tagIds: List<Long>): List<DataFileTag>
}
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/document/service/DataFileService.kt
around lines 130 to 139, the code does an existsByDataFileIdAndTagId query per
tag causing N+1 queries; instead, collect tagIds, call
dataFileTagRepository.findAllByDataFileIdAndTagIdIn(dataFile.id, tagIds) to
fetch existing links in one query, build a set of existing tagIds, filter the
incoming tags to only those whose ids are not in that set, map those to
DataFileTag(dataFile = dataFile, tag = tag) and then call
dataFileTagRepository.saveAll(filteredList); if the repository method does not
exist add the suggested findAllByDataFileIdAndTagIdIn signature to the
DataFileTagRepository.

}
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ data class IndexPreviewResponseList(
}

data class IndexPreviewResponse(
var indexId: Long?,
var indexId: Long,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿงน Nitpick (assertive)

Make indexId immutable for consistency

IndexPreviewResponse.indexId is a var while IndexDetailResponse.indexId is a val. Prefer immutability in DTOs.

-data class IndexPreviewResponse(
-    var indexId: Long,
+data class IndexPreviewResponse(
+    val indexId: Long,
๐Ÿ“ Committable suggestion

โ€ผ๏ธ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var indexId: Long,
data class IndexPreviewResponse(
val indexId: Long,
// โ€ฆother properties unchangedโ€ฆ
)
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/dto/IndexResponseDTO.kt around
line 21, the DTO property indexId is declared as a mutable var while equivalent
DTOs use val; change indexId from var to val to make it immutable, update any
constructor usages or callers if they rely on mutation (adjust tests/usages
accordingly) and run compilation to ensure no remaining mutable assignments.

val snapshotName: String,
) {
companion object {
Expand All @@ -32,7 +32,7 @@ data class IndexPreviewResponse(
}

data class IndexDetailResponse(
var indexId: Long?,
val indexId: Long,
val snapshotName: String,
val chunkingSize: Int,
val overlapSize: Int,
Expand Down
15 changes: 15 additions & 0 deletions src/main/kotlin/simplerag/ragback/domain/index/embed/FakeEmbder.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
package simplerag.ragback.domain.index.embed

import org.springframework.context.annotation.Primary
import org.springframework.context.annotation.Profile
import org.springframework.stereotype.Component

@Component
@Primary
@Profile("test")
class FakeEmbder: Embedder {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿ’ก Verification agent

๐Ÿงฉ Analysis chain

Fix class name typo and follow Kotlin spacing: FakeEmbder โ†’ FakeEmbedder.

The misspelling will confuse readers and may affect bean naming. Also add a space before : per Kotlin style. Rename the file to FakeEmbedder.kt to match the class.

-@Profile("test")
-class FakeEmbder: Embedder {
+@Profile("test")
+class FakeEmbedder : Embedder {

๐Ÿ Script executed:

#!/bin/bash
# Find all occurrences to update references after rename.
rg -nP '\bFakeEmbder\b|FakeEmbedder' -C2

Length of output: 529


Action: Rename class and file to fix typo and update formatting

To align with Kotlin naming conventions and prevent confusion (including Spring bean naming), please update the class and its file as follows:

โ€ข File to update:

  • src/main/kotlin/simplerag/ragback/domain/index/embed/FakeEmbder.kt โ†’ src/main/kotlin/simplerag/ragback/domain/index/embed/FakeEmbedder.kt
    โ€ข In the file, adjust the class declaration:
  • Rename FakeEmbder โ†’ FakeEmbedder
  • Add a space before the colon per Kotlin style (class X : Y)

Proposed diff:

@@ -9,7 +9,7 @@
 @Profile("test")
-class FakeEmbder: Embedder {
+class FakeEmbedder : Embedder {
     override val dim: Int = 1536
     override fun embed(text: String): FloatArray {
         // ...
๐Ÿ“ Committable suggestion

โ€ผ๏ธ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class FakeEmbder: Embedder {
@Profile("test")
class FakeEmbedder : Embedder {
override val dim: Int = 1536
override fun embed(text: String): FloatArray {
// ...
}
}
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/FakeEmbder.kt around
line 10, the class and filename contain a typo and formatting issue: rename the
file to FakeEmbedder.kt, rename the class from FakeEmbder to FakeEmbedder, and
adjust the declaration spacing to "class FakeEmbedder : Embedder"; update any
references/imports/usages to the new class name and file path accordingly.

override val dim: Int = 1536
override fun embed(text: String): FloatArray {
return FloatArray(1536) { 0.0f }
}
Comment on lines +12 to +14
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿงน Nitpick (assertive)

Avoid magic numbers; use the declared dimension.

Tie the zero vector length to dim to prevent drift if the dimension changes.

     override val dim: Int = 1536
     override fun embed(text: String): FloatArray {
-        return FloatArray(1536) { 0.0f }
+        return FloatArray(dim) { 0.0f }
     }
๐Ÿ“ Committable suggestion

โ€ผ๏ธ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
override fun embed(text: String): FloatArray {
return FloatArray(1536) { 0.0f }
}
override val dim: Int = 1536
override fun embed(text: String): FloatArray {
return FloatArray(dim) { 0.0f }
}
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/FakeEmbder.kt around
lines 12 to 14, the embed method returns a zero vector with a hard-coded length
of 1536; change it to use the declared dimension variable (dim) so the zero
vector always matches the configured embedding size (e.g., return
FloatArray(dim) { 0.0f }) to avoid drift if dim changes.

}
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,9 @@ class ChunkEmbedding(
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "indexes_id", nullable = false)
val index: Index,

) : BaseEntity() {
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "chunk_embeddings_id")
val id: Long? = null,
) : BaseEntity()
val id: Long = 0

}
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,9 @@ class DataFileIndex(
@JoinColumn(name = "indexes_id", nullable = false)
val index: Index,

) : BaseEntity() {
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "data_files_indexes_id")
val id: Long? = null,
) : BaseEntity()
val id: Long = 0

}
Original file line number Diff line number Diff line change
Expand Up @@ -40,11 +40,10 @@ class Index(

@OneToMany(cascade = [CascadeType.ALL], orphanRemoval = true, mappedBy = "index")
val chunkEmbeddings: MutableList<ChunkEmbedding> = mutableListOf(),

) : BaseEntity() {
Comment on lines 41 to +43
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

๐Ÿงน Nitpick (assertive)

Optional: Helper method to manage the bidirectional association

Add a convenience method to keep ChunkEmbedding.index and Index.chunkEmbeddings in sync.

Example (outside the shown range):

fun addChunkEmbedding(e: ChunkEmbedding) {
    chunkEmbeddings.add(e)
    if (e.index !== this) {
        // if ChunkEmbedding.index is mutable, set it here; otherwise add a factory to create with this index
    }
}
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt around lines
41-43, add a small helper to keep the bidirectional relation between
Index.chunkEmbeddings and ChunkEmbedding.index consistent: implement an
addChunkEmbedding(e: ChunkEmbedding) that adds e to chunkEmbeddings and sets
e.index = this if the index property on ChunkEmbedding is mutable, and implement
a removeChunkEmbedding(e: ChunkEmbedding) that removes e from chunkEmbeddings
and clears e.index (or nulls it) to maintain orphanRemoval behavior; ensure
null-safety and avoid duplicate adds by checking if e is already present before
modifying.

@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "indexes_id")
var id: Long? = null,
) : BaseEntity() {
val id: Long = 0
Comment on lines +43 to +46
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

โš ๏ธ Potential issue

Blocking: Non-null id=0 under @GeneratedValue โ€” will cause repository.save() to merge

Same issue as other entities. Revert to nullable id or implement Persistable.

Apply this diff:

 ) : BaseEntity() {
     @Id @GeneratedValue(strategy = GenerationType.IDENTITY)
     @Column(name = "indexes_id")
-    val id: Long = 0
+    var id: Long? = null
๐Ÿ“ Committable suggestion

โ€ผ๏ธ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
) : BaseEntity() {
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "indexes_id")
var id: Long? = null,
) : BaseEntity() {
val id: Long = 0
) : BaseEntity() {
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "indexes_id")
var id: Long? = null
๐Ÿค– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt around lines
43 to 46, the non-null id initialized to 0 with @GeneratedValue will make Spring
Data treat entities as existing and call merge/update on save; change the id to
a nullable Long? initialized to null (val id: Long? = null) OR implement
org.springframework.data.domain.Persistable<T> (and override isNew() to return
id == null) so newly created entities are persisted instead of merged. Ensure
annotations remain (@Id, @GeneratedValue, @Column) and update any
equality/hashCode logic if it depends on a non-null id.


companion object {
fun toIndex(createRequest: IndexCreateRequest): Index {
Expand Down
Loading