Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -31,35 +31,35 @@ ext {
}

dependencies {
implementation("com.fasterxml.jackson.module:jackson-module-kotlin")
implementation 'org.springframework.boot:spring-boot-starter-data-jpa'
implementation 'org.springframework.boot:spring-boot-starter-validation'
implementation 'org.springframework.boot:spring-boot-starter-web'
implementation 'com.fasterxml.jackson.module:jackson-module-kotlin'
implementation 'org.jetbrains.kotlin:kotlin-reflect'
// implementation 'org.springframework.ai:spring-ai-starter-model-openai'
implementation 'org.springframework.ai:spring-ai-starter-model-openai'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Verification agent

🧩 Analysis chain

OpenAI starter added β€” ensure tests don’t require a real API key or make network calls

Adding spring-ai-starter-model-openai will create OpenAI beans at context startup. Make sure your test profile stubs or disables outbound calls so unit tests are hermetic.

Run this to verify there’s a test profile and it isolates OpenAI config:

If you want, I can provide a minimal application-test.yml that sets a dummy key and a @TestConfiguration that supplies a no-op EmbeddingModel bean for tests.


🏁 Script executed:

#!/bin/bash
# 1) Check for test profile configs
fd -a 'application-*.yml' -uu | sed -n 's#^#- #p'

# 2) Look for OpenAI properties in test profile(s)
rg -n --glob 'src/**/resources/**' -C2 'spring\.ai\.openai\.(api-key|base-url|embedding|chat)'

# 3) Grep for any tests using real OpenAI clients
rg -nP --type=kt --type=java -C2 '\bOpenAI|OpenAi|openai|EmbeddingModel|ChatClient'

Length of output: 317


🏁 Script executed:

#!/bin/bash
# 1) Search for tests that bootstrap the Spring context (which would load OpenAI beans)
rg -n '@SpringBootTest' -C2 --glob '**/*.java' --glob '**/*.kt'

# 2) Search for any @TestConfiguration classes that might override the OpenAI beans
rg -n '@TestConfiguration' -C2 --glob '**/*.java' --glob '**/*.kt'

# 3) Confirm whether a test-specific application-test.yml exists
fd -a 'application-test*.yml' -uu | sed -n 's#^#- #p'

Length of output: 1699


🏁 Script executed:

#!/bin/bash
# Display contents of the test profile configuration
sed -n '1,200p' src/test/resources/application-test.yml

Length of output: 235


Configure test profile to stub or disable OpenAI beans

Your application-test.yml (src/test/resources/application-test.yml) only configures the datasource and JPAβ€”there are no overrides for spring.ai.openai.*, and no @TestConfiguration to stub out the OpenAI clients. Because your tests use @SpringBootTest with the β€œtest” profile, they will attempt to instantiate the real OpenAI beans at startup, leading to missing-API-key errors or unwanted network calls.

β€’ application-test.yml (lines 1–200) shows only:

spring:
  datasource:
    driver-class-name: org.postgresql.Driver
  jpa:
    database-platform: org.hibernate.dialect.PostgreSQLDialect
    hibernate:
      ddl-auto: create-drop

(no spring.ai.openai.* props)

β€’ No @TestConfiguration or @MockBean was found to override EmbeddingModel, ChatCompletionModel, or the OpenAI client.
β€’ Tests in src/test/kotlin/** annotated with @SpringBootTest + @ActiveProfiles("test") will load the real starter beans and fail (or hit the network).

To fix, you must:

  1. Add dummy OpenAI props to your test profile:

    spring:
      ai:
        openai:
          api-key: dummy
          base-url: http://localhost:8080
    # …existing datasource/jpa…
  2. Stub out the OpenAI beans, for example via a @TestConfiguration class in src/test/java or src/test/kotlin:

    @TestConfiguration
    public class OpenAIMockConfig {
      @Bean
      EmbeddingModel embeddingModel() {
        return input -> Collections.emptyList();
      }
    
      @Bean
      ChatCompletionModel chatCompletionModel() {
        return request -> ChatCompletionResponse.builder().choices(...).build();
      }
    }
  3. (Alternate) Use @MockBean in your test classes:

    @SpringBootTest
    @ActiveProfiles("test")
    class YourServiceTest {
      @MockBean EmbeddingModel embeddingModel;
      @MockBean ChatCompletionModel chatCompletionModel;
      // …
    }

Let me know if you’d like a complete minimal example of application-test.yml plus test‐stub config.

πŸ€– Prompt for AI Agents
In build.gradle around line 39, the test profile doesn't override or stub OpenAI
beans so @SpringBootTest with profile "test" will instantiate real OpenAI
clients; add test-time configuration to prevent real API usage by (1) adding
dummy OpenAI properties to src/test/resources/application-test.yml
(spring.ai.openai.api-key and base-url) and (2) providing test-only beans that
stub EmbeddingModel and ChatCompletionModel via a @TestConfiguration class in
src/test/java or src/test/kotlin (or alternatively annotate tests with @MockBean
for those interfaces) so tests run without network calls or missing-key errors.

compileOnly 'org.projectlombok:lombok'
annotationProcessor 'org.projectlombok:lombok'
testImplementation 'org.springframework.boot:spring-boot-starter-test'
testImplementation 'org.jetbrains.kotlin:kotlin-test-junit5'
testImplementation("org.mockito.kotlin:mockito-kotlin:5.3.1")
testRuntimeOnly 'org.junit.platform:junit-platform-launcher'

// postgresql
implementation 'org.postgresql:postgresql'
implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'

Comment on lines +47 to 50
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Prefer runtimeOnly for the JDBC driver to keep the compile classpath lean

The driver isn’t needed at compile time. Switching to runtimeOnly follows Spring Boot’s guidance and reduces classpath noise.

Apply:

-    implementation 'org.postgresql:postgresql'
+    runtimeOnly 'org.postgresql:postgresql'
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// postgresql
implementation 'org.postgresql:postgresql'
implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
// postgresql
runtimeOnly 'org.postgresql:postgresql'
implementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
πŸ€– Prompt for AI Agents
In build.gradle around lines 47 to 50, the PostgreSQL JDBC driver is declared as
an implementation dependency even though it is only required at runtime; change
the org.postgresql:postgresql dependency to runtimeOnly
'org.postgresql:postgresql' (leave the spring-ai-starter-vector-store-pgvector
dependency as implementation) so the driver is not on the compile classpath and
follows Spring Boot guidance.

// test
testImplementation "org.springframework.boot:spring-boot-testcontainers"
testImplementation "org.testcontainers:postgresql"
testImplementation "org.testcontainers:junit-jupiter"
testImplementation "com.pgvector:pgvector:0.1.6"
testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'

Comment on lines +49 to 56
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Unnecessary duplicate dependency in tests

spring-ai-starter-vector-store-pgvector is already on the main classpath; duplicating in tests can mask classpath issues and slow resolution.

Apply:

-    testImplementation 'org.springframework.ai:spring-ai-starter-vector-store-pgvector'
πŸ€– Prompt for AI Agents
In build.gradle around lines 50 to 57, the test classpath redundantly includes
'org.springframework.ai:spring-ai-starter-vector-store-pgvector' which is
already declared on the main implementation classpath; remove the duplicate
testImplementation entry to avoid masking classpath issues and speed dependency
resolution, leaving only the testcontainers and junit testImplementation lines
in that block.

// swagger
implementation 'org.springdoc:springdoc-openapi-starter-webmvc-ui:2.7.0'

// s3
implementation(platform("software.amazon.awssdk:bom:2.25.70"))
implementation("software.amazon.awssdk:s3")

// pgvector
implementation("com.pgvector:pgvector:0.1.6")
}

dependencyManagement {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,10 @@ import simplerag.ragback.domain.document.repository.TagRepository
import simplerag.ragback.global.error.CustomException
import simplerag.ragback.global.error.ErrorCode
import simplerag.ragback.global.error.FileException
import simplerag.ragback.global.util.S3Type
import simplerag.ragback.global.util.S3Util
import simplerag.ragback.global.util.computeMetricsStreaming
import simplerag.ragback.global.util.resolveContentType
import simplerag.ragback.global.util.s3.S3Type
import simplerag.ragback.global.util.s3.S3Util
import simplerag.ragback.global.util.converter.computeMetricsStreaming
import simplerag.ragback.global.util.converter.resolveContentType
import java.util.*

@Service
Expand Down
Original file line number Diff line number Diff line change
@@ -1,13 +1,18 @@
package simplerag.ragback.domain.index.dto

import jakarta.validation.constraints.NotBlank
import jakarta.validation.constraints.NotEmpty
import jakarta.validation.constraints.Positive
import jakarta.validation.constraints.PositiveOrZero
import org.hibernate.validator.constraints.Length
import simplerag.ragback.domain.index.entity.enums.EmbeddingModel
import simplerag.ragback.domain.index.entity.enums.SimilarityMetric

data class IndexCreateRequest(

@field:NotEmpty
val dataFileId: List<Long>,

Comment on lines +13 to +15
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Enforce overlapSize ≀ chunkingSize in the service (per team decision)

Per retrieved learnings, the validation belongs in the service layer, not the entity. Ensure IndexService.createIndex guards this invariant and returns a 400 with a clear message when violated.

I can add a guard + unit test (valid/invalid pairs) if you want me to push a patch.

πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt around
lines 13 to 15, the DTO currently only declares dataFileId and the
overlap/chunking invariant must be enforced in the service layer: update
IndexService.createIndex to validate that request.overlapSize <=
request.chunkingSize, and if the check fails return an HTTP 400 response with a
clear error message (e.g., "overlapSize must be less than or equal to
chunkingSize"); add unit tests for both valid and invalid pairs to assert
success and the 400/error message on violation.

🧹 Nitpick (assertive)

Prefer @SiZe(min=1) for lists and align naming to plural ("dataFileIds")

  • Project already uses @SiZe(min=1) for lists (see DataFileBulkCreateRequest); use the same for consistency.
  • Name should reflect plurality.

Apply:

-import jakarta.validation.constraints.NotEmpty
+import jakarta.validation.constraints.Size
@@
-    @field:NotEmpty
-    val dataFileId: List<Long>,
+    @field:Size(min = 1, message = "μ΅œμ†Œ ν•˜λ‚˜ μ΄μƒμ˜ 파일 IDκ°€ ν•„μš”ν•©λ‹ˆλ‹€")
+    val dataFileIds: List<Long>,
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@field:NotEmpty
val dataFileId: List<Long>,
// Replace the import
import jakarta.validation.constraints.Size
// Within IndexRequestDTO.kt
@field:Size(min = 1, message = "μ΅œμ†Œ ν•˜λ‚˜ μ΄μƒμ˜ 파일 IDκ°€ ν•„μš”ν•©λ‹ˆλ‹€")
val dataFileIds: List<Long>,
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/dto/IndexRequestDTO.kt around
lines 13 to 15, replace the @field:NotEmpty annotation on the List property with
@field:Size(min = 1) and rename the property from dataFileId to dataFileIds for
plural consistency; update the import to javax.validation.constraints.Size
(remove NotEmpty import), and update all usages/constructors/JSON mappings/tests
that reference dataFileId to use dataFileIds so compilation and serialization
remain correct.

@field:Length(max = 255)
@field:NotBlank
val snapshotName: String,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ data class IndexPreviewResponseList(

data class IndexPreviewResponse(
var indexId: Long?,
val snapshotName: String
val snapshotName: String,
) {
companion object {
fun toIndexPreviewResponse(index: Index): IndexPreviewResponse {
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
package simplerag.ragback.domain.index.embed

interface Embedder {
val dim: Int
fun embed(text: String): FloatArray
}
Comment on lines +3 to +6
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Verification agent

🧩 Analysis chain

Verify implementor naming (β€œOpenAIEmbbeder” vs β€œOpenAIEmbedder”)

Typo in class/file names can confuse DI and readers.


🏁 Script executed:

#!/bin/bash
# Find embedder implementations and spot typos
rg -nP --type=kotlin -C2 '(class|interface)\s+OpenAIEm[bB]ed{1,2}er|Embedder' | sed -n '1,120p'

Length of output: 2492


Rename file to match class name
The file name OpenAIEmbbeder.kt contains a typo that doesn’t align with the class it declares (OpenAIEmbedder). This mismatch can confuse readers and tooling.

β€’ src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt β†’ rename to β†’ OpenAIEmbedder.kt

πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt around
lines 3 to 6 the file name contains a typo and does not match the declared class
OpenAIEmbedder; rename the file to OpenAIEmbedder.kt, update any
imports/usages/references across the project to the new filename, and ensure
package declarations and CI/build files (if referencing the path) still resolve
correctly so the class name and filename are consistent.

🧹 Nitpick (assertive)

Add batch embedding to improve throughput; keep single-call default

Batching reduces network overhead and cost. Provide a default method to avoid breaking implementors.

Apply:

 interface Embedder {
     val dim: Int
     fun embed(text: String): FloatArray
+    fun embedAll(texts: List<String>): List<FloatArray> = texts.map(::embed)
 }

Optionally consider making these suspend and handling upstream backpressure later.

πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/Embedder.kt around lines
3 to 6, add a default batch embedding method to the interface while keeping the
existing single-item embed to avoid breaking implementors: declare a new
function (e.g., fun embedBatch(texts: List<String>): List<FloatArray>) with a
default implementation that iterates over texts and calls the existing
embed(text: String) for each entry and returns the list of embeddings; keep the
existing val dim and fun embed as-is; optionally mark both signatures as suspend
later if you intend to introduce asynchronous/backpressure handling upstream.

Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
package simplerag.ragback.domain.index.embed

import org.springframework.ai.openai.OpenAiEmbeddingModel
import org.springframework.stereotype.Component

@Component
class OpenAIEmbedder(
private val openAiEmbeddingModel: OpenAiEmbeddingModel
) : Embedder {
override val dim: Int = 1536
override fun embed(text: String): FloatArray =
openAiEmbeddingModel.embed(text)
}
Comment on lines +1 to +13
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ› οΈ Refactor suggestion

Hard-coded dim=1536 will reject 3072-dim models; derive dimension from the configured model. Also, filename typo.

  • If the configured OpenAI model is text-embedding-3-large (3072), current dim=1536 forces INVALID_INPUT.
  • Consider inferring dim from the configured modelId (or a bean property) using your EmbeddingModel enum.
  • Minor: file name has a typo (β€œEmbbeder”). Consider renaming to OpenAiEmbedder.kt for consistency.

Apply this diff to make dim consistent with configuration (adjust imports if package paths differ):

 package simplerag.ragback.domain.index.embed

 import org.springframework.ai.openai.OpenAiEmbeddingModel
+import org.springframework.beans.factory.annotation.Value
 import org.springframework.stereotype.Component
+import simplerag.ragback.domain.index.entity.enums.EmbeddingModel

 @Component
-class OpenAIEmbedder(
-    private val openAiEmbeddingModel: OpenAiEmbeddingModel
+class OpenAIEmbedder(
+    private val openAiEmbeddingModel: OpenAiEmbeddingModel,
+    @Value("\${spring.ai.openai.embedding.options.model:text-embedding-3-small}")
+    private val modelId: String
 ) : Embedder {
-    override val dim: Int = 1536
+    override val dim: Int =
+        EmbeddingModel.findByModelId(modelId)?.dim
+            ?: error("Unsupported embedding modelId: $modelId")
     override fun embed(text: String): FloatArray =
         openAiEmbeddingModel.embed(text)
 }

If you will support multiple models per index concurrently, consider a factory to return an Embedder bound to req.embeddingModel instead of a single global bean.

πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
package simplerag.ragback.domain.index.embed
import org.springframework.ai.openai.OpenAiEmbeddingModel
import org.springframework.stereotype.Component
@Component
class OpenAIEmbedder(
private val openAiEmbeddingModel: OpenAiEmbeddingModel
) : Embedder {
override val dim: Int = 1536
override fun embed(text: String): FloatArray =
openAiEmbeddingModel.embed(text)
}
package simplerag.ragback.domain.index.embed
import org.springframework.ai.openai.OpenAiEmbeddingModel
import org.springframework.beans.factory.annotation.Value
import org.springframework.stereotype.Component
import simplerag.ragback.domain.index.entity.enums.EmbeddingModel
@Component
class OpenAIEmbedder(
private val openAiEmbeddingModel: OpenAiEmbeddingModel,
@Value("\${spring.ai.openai.embedding.options.model:text-embedding-3-small}")
private val modelId: String
) : Embedder {
override val dim: Int =
EmbeddingModel.findByModelId(modelId)?.dim
?: error("Unsupported embedding modelId: $modelId")
override fun embed(text: String): FloatArray =
openAiEmbeddingModel.embed(text)
}
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt lines
1-13, the embedder hardcodes dim=1536 which will break for 3072-dim models and
the filename/class name contains a typo; change the implementation to derive the
embedding dimension from the configured model (inject the configured modelId or
an EmbeddingModel/enum bean), map modelId to its dimension (e.g., 1536 for
text-embedding-3-small, 3072 for text-embedding-3-large) and set override val
dim accordingly with a sensible fallback and error if unknown, replace the
misspelled filename/class name (OpenAIEmbbeder -> OpenAiEmbedder) to match
project conventions and update imports/usages; optionally consider providing a
factory if multiple model types will be supported concurrently.

Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
package simplerag.ragback.domain.index.entity

import com.pgvector.PGvector
import jakarta.persistence.*
import simplerag.ragback.global.entity.BaseEntity

Expand All @@ -9,12 +8,11 @@ import simplerag.ragback.global.entity.BaseEntity
@Table(name = "chunk_embeddings")
class ChunkEmbedding(

@Column(name = "content", nullable = false)
@Lob
@Column(name = "content", nullable = false, columnDefinition = "text")
val content: String,

@Column(name = "embedding", columnDefinition = "vector")
var embedding: PGvector,
@Column(name = "embedding", columnDefinition = "vector(1536)", nullable = false)
var embedding: FloatArray,

Comment on lines +14 to 16
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Missing JPA converter on FloatArray β†’ pgvector mapping (will persist as binary/bytea or fail).

FloatArray is not a JPA-native type. You introduced FloatArrayToPgVectorStringConverter, but the field lacks @Convert(...) and the converter has autoApply = false, so it will never be invoked.

Apply the converter and import it:

@@
-    @Column(name = "embedding", columnDefinition = "vector(1536)", nullable = false)
-    var embedding: FloatArray,
+    @Column(name = "embedding", columnDefinition = "vector(1536)", nullable = false)
+    @Convert(converter = FloatArrayToPgVectorStringConverter::class)
+    var embedding: FloatArray,

Add the missing import at the top of the file:

import simplerag.ragback.global.util.FloatArrayToPgVectorStringConverter
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt
around lines 14 to 16, the FloatArray field 'embedding' lacks the required JPA
converter annotation so the FloatArrayToPgVectorStringConverter (which is
declared with autoApply = false) will never be used and the value will persist
incorrectly; fix this by adding the import
simplerag.ragback.global.util.FloatArrayToPgVectorStringConverter at the top of
the file and annotate the embedding property with @Convert(converter =
FloatArrayToPgVectorStringConverter::class) so JPA uses the converter for this
field.

@Column(name = "embedding_dim", nullable = false)
val embeddingDim: Int,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,24 +3,61 @@ package simplerag.ragback.domain.index.service
import org.springframework.data.repository.findByIdOrNull
import org.springframework.stereotype.Service
import org.springframework.transaction.annotation.Transactional
import simplerag.ragback.domain.document.entity.DataFile
import simplerag.ragback.domain.document.repository.DataFileRepository
import simplerag.ragback.domain.index.dto.*
import simplerag.ragback.domain.index.embed.Embedder
import simplerag.ragback.domain.index.entity.ChunkEmbedding
import simplerag.ragback.domain.index.entity.Index
import simplerag.ragback.domain.index.repository.IndexRepository
import simplerag.ragback.global.error.CustomException
import simplerag.ragback.global.error.ErrorCode
import simplerag.ragback.global.error.IndexException
import simplerag.ragback.global.util.loader.ContentLoader
import simplerag.ragback.global.util.TextChunker

@Service
class IndexService(
private val indexRepository: IndexRepository,
private val embedder: Embedder,
private val dataFileRepository: DataFileRepository,
private val contentLoader: ContentLoader,
) {

@Transactional
fun createIndex(indexCreateRequest: IndexCreateRequest): IndexPreviewResponse {
fun createIndex(req: IndexCreateRequest): IndexPreviewResponse {
validateOverlap(req.overlapSize, req.chunkingSize)

val files: List<DataFile> = dataFileRepository.findAllById(req.dataFileId)
if (files.size != req.dataFileId.size) {
throw CustomException(ErrorCode.NOT_FOUND, "Some dataFileIds not found")
}

Comment on lines +31 to +35
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Error message consistency and clarity for missing dataFileIds.

Thrown message mixes English while other messages are localized. Also, returning which IDs are missing helps clients debug.

-        if (files.size != req.dataFileId.size) {
-            throw CustomException(ErrorCode.NOT_FOUND, "Some dataFileIds not found")
-        }
+        if (files.size != req.dataFileId.size) {
+            val foundIds = files.mapNotNull { it.id }.toSet()
+            val missing = req.dataFileId.filterNot { it in foundIds }
+            throw CustomException(
+                ErrorCode.NOT_FOUND,
+                "μ‘΄μž¬ν•˜μ§€ μ•ŠλŠ” dataFileIds: $missing"
+            )
+        }
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
val files: List<DataFile> = dataFileRepository.findAllById(req.dataFileId)
if (files.size != req.dataFileId.size) {
throw CustomException(ErrorCode.NOT_FOUND, "Some dataFileIds not found")
}
val files: List<DataFile> = dataFileRepository.findAllById(req.dataFileId)
if (files.size != req.dataFileId.size) {
val foundIds = files.mapNotNull { it.id }.toSet()
val missing = req.dataFileId.filterNot { it in foundIds }
throw CustomException(
ErrorCode.NOT_FOUND,
"μ‘΄μž¬ν•˜μ§€ μ•ŠλŠ” dataFileIds: $missing"
)
}
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt around
lines 31 to 35, the thrown CustomException uses a hardcoded English message and
doesn't report which IDs are missing; change it to use the localized message
pattern (or ErrorCode's message lookup) and include the list of missing
dataFileIds in the exception details β€” compute missingIds = req.dataFileId -
files.map { it.id } (or equivalent), and throw
CustomException(ErrorCode.NOT_FOUND, mapOf("message" to localizedMessage,
"missingIds" to missingIds)) or use the projectΚΌs standard error payload shape
so clients receive a localized message plus the missing IDs.

if (embedder.dim != req.embeddingModel.dim) {
throw CustomException(ErrorCode.INVALID_INPUT, "Embedding dim mismatch: model=${req.embeddingModel.dim}, embedder=${embedder.dim}")
}
val index = indexRepository.save(Index.toIndex(req))

Comment on lines +36 to +40
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ’‘ Verification agent

🧩 Analysis chain

Dim check is good, but it will always fail if OpenAIEmbedder stays at 1536 with a 3072-dim request.

Please either (a) make the embedder’s dim reflect the configured OpenAI model (see my OpenAIEmbedder refactor), or (b) restrict allowed embedding models at request validation time to the single supported dimension.

I recommend (a) for flexibility; otherwise enforce EmbeddingModel.TEXT_EMBEDDING_3_SMALL at the controller validation layer.


🏁 Script executed:

#!/bin/bash
# Search for OpenAIEmbedder class definition and inspect its dimension handling
rg -n 'class OpenAIEmbedder' -g '*.kt'
rg -n -A5 -B5 'class OpenAIEmbedder' -g '*.kt'

# Find any declarations or assignments of a 'dim' property in the codebase
rg -n 'val dim' -g '*.kt'

# Locate the EmbeddingModel enum to see supported dimensions
rg -n 'enum class EmbeddingModel' -g '*.kt'
rg -n -A5 -B5 'enum class EmbeddingModel' -g '*.kt'

# Inspect the IndexService snippet around the embedder.dim check for context
rg -n -A5 -B5 'if (embedder\\.dim' src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt

Length of output: 2875


Ensure Embedder.dim aligns with the configured OpenAI model
Currently, OpenAIEmbedder hard-codes override val dim: Int = 1536, while the application’s req.embeddingModel.dim can vary (e.g. 3072 for larger models). This check will always fail if you ever support a model with a non-1536 dimension.

Key locations:

  • src/main/kotlin/simplerag/ragback/domain/index/embed/OpenAIEmbbeder.kt
    β€’ Line 10: override val dim: Int = 1536
  • src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt
    β€’ Enum values carry their own dim (e.g., TEXT_EMBEDDING_3_SMALL(1536, "text-embedding-3-small"))

Actionable options:

  1. Dynamically set Embedder.dim based on the injected OpenAiEmbeddingModel (or a lookup from your EmbeddingModel enum), so it always matches the actual backend model dimension.
  2. Restrict allowed EmbeddingModel values at the controller/validation layer to only those your embedder currently supports (e.g. TEXT_EMBEDDING_3_SMALL) and reject others up-front.

I recommend option (1) for future flexibility; if you choose (2), enforce it in your request DTO validation.

πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt around
lines 36-40, the equality check between embedder.dim and req.embeddingModel.dim
will fail when the embedder hard-codes its dimension (OpenAIEmbedder uses
override val dim = 1536); fix by making the embedder dimension come from the
actual configured model or by validating allowed models earlier: update
OpenAIEmbedder to accept the selected OpenAiEmbeddingModel (or lookup the dim
from EmbeddingModel enum) and expose that value via dim (inject or construct the
embedder with the model), then wire DI so IndexService receives an embedder
whose dim matches req.embeddingModel.dim; alternatively, if you prefer
restriction, add validation in the controller/DTO to only accept supported
models and return a validation error for unsupported dims.

for (file in files) {
val url = file.fileUrl
val content = contentLoader.load(url)
println(content)
if (content.isBlank()) continue

val chunks = TextChunker.chunkByCharsSeq(content, req.chunkingSize, req.overlapSize)
for (chunk in chunks) {
val vec = embedder.embed(chunk)
val entity = ChunkEmbedding(
content = chunk,
embedding = vec,
embeddingDim = embedder.dim,
index = index
)
index.chunkEmbeddings.add(entity)
}
Comment on lines +47 to +57
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Batch embeddings for throughput; avoid per-chunk remote calls.

Current loop does one remote call per chunk. If using Spring AI, the underlying model typically supports batch inputs.

Proposed interface and usage (outside this file):

// in Embedder.kt
interface Embedder {
    val dim: Int
    fun embed(text: String): FloatArray
    fun embedAll(texts: List<String>): List<FloatArray> = texts.map { embed(it) } // default
}

Then update here to batch:

-            val chunks = TextChunker.chunkByCharsSeq(content, req.chunkingSize, req.overlapSize)
-            for (chunk in chunks) {
-                val vec  = embedder.embed(chunk)
+            val chunks = TextChunker.chunkByCharsSeq(content, req.chunkingSize, req.overlapSize)
+            val vectors = embedder.embedAll(chunks)
+            for ((i, chunk) in chunks.withIndex()) {
+                val vec = vectors[i]
                 val entity = ChunkEmbedding(
                     content = chunk,
                     embedding = vec,
                     embeddingDim = embedder.dim,
                     index = index
                 )
                 index.chunkEmbeddings.add(entity)
             }
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt around
lines 47 to 57, the current loop calls embedder.embed per chunk causing one
remote call per chunk; change it to call a batch method like
embedder.embedAll(chunks) once, then iterate the returned List<FloatArray> in
order to create ChunkEmbedding entities and add them to index.chunkEmbeddings;
if embedAll is not available on the Embedder interface, add a default embedAll
implementation that maps to embed so existing embedders still work, and ensure
you preserve chunk order and use embedder.dim for embeddingDim when constructing
each entity.

}
Comment on lines +41 to +58
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

πŸ› οΈ Refactor suggestion

⚠️ Potential issue

Avoid println of raw content, check null/blank URLs, and handle loader failures without keeping a DB transaction open.

  • println(content) risks PII leakage and log noise. Use a logger at DEBUG, or remove.
  • fileUrl could be null/blank; add guard.
  • Remote I/O (HTTP fetch, embeddings) inside a DB transaction can hold locks for a long time; consider splitting the workflow or deferring long-running steps to an async job.

Minimal safety and logging fix:

-            val url = file.fileUrl
-            val content = contentLoader.load(url)
-            println(content)
-            if (content.isBlank()) continue
+            val url = file.fileUrl ?: continue
+            val content = try {
+                contentLoader.load(url)
+            } catch (ex: Exception) {
+                // TODO: map to domain exception or skip with WARN
+                ""
+            }
+            if (content.isBlank()) continue // consider WARN log

Longer-term architecture:

  • Persist the Index row first (as you do), then enqueue a background job to load/chunk/embed outside the DB txn (outbox pattern). Update status/progress for UX.
  • Batch embeddings: extend Embedder to support embedAll(List) to reduce API latency and rate-limit issues.
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for (file in files) {
val url = file.fileUrl
val content = contentLoader.load(url)
println(content)
if (content.isBlank()) continue
val chunks = TextChunker.chunkByCharsSeq(content, req.chunkingSize, req.overlapSize)
for (chunk in chunks) {
val vec = embedder.embed(chunk)
val entity = ChunkEmbedding(
content = chunk,
embedding = vec,
embeddingDim = embedder.dim,
index = index
)
index.chunkEmbeddings.add(entity)
}
}
for (file in files) {
val url = file.fileUrl ?: continue
val content = try {
contentLoader.load(url)
} catch (ex: Exception) {
// TODO: map to domain exception or skip with WARN
""
}
if (content.isBlank()) continue // consider WARN log
val chunks = TextChunker.chunkByCharsSeq(content, req.chunkingSize, req.overlapSize)
for (chunk in chunks) {
val vec = embedder.embed(chunk)
val entity = ChunkEmbedding(
content = chunk,
embedding = vec,
embeddingDim = embedder.dim,
index = index
)
index.chunkEmbeddings.add(entity)
}
}
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/service/IndexService.kt around
lines 41-58, remove the println(content) and either delete it or replace with a
logger.debug call to avoid PII leakage; add a guard to skip files where
file.fileUrl is null or blank before calling contentLoader.load(url); wrap the
contentLoader.load and embedder.embed calls in try/catch to handle
loader/embedding failures gracefully and avoid throwing inside the DB
transaction; and move the long-running remote I/O and embedding work out of the
DB transaction (persist the Index row first and enqueue a background job or job
queue to perform load/chunk/embed and update the index status) so locks aren’t
held during network/API calls.


validateOverlap(indexCreateRequest.overlapSize, indexCreateRequest.chunkingSize)

val createdIndex = indexRepository.save(Index.toIndex(indexCreateRequest))
return IndexPreviewResponse.toIndexPreviewResponse(createdIndex)
return IndexPreviewResponse.toIndexPreviewResponse(index)
}

@Transactional(readOnly = true)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ import org.springframework.context.annotation.Primary
import org.springframework.context.annotation.Profile
import org.springframework.stereotype.Component
import org.springframework.web.multipart.MultipartFile
import simplerag.ragback.global.util.S3Type
import simplerag.ragback.global.util.S3Util
import simplerag.ragback.global.util.sha256Hex
import simplerag.ragback.global.util.s3.S3Type
import simplerag.ragback.global.util.s3.S3Util
import simplerag.ragback.global.util.converter.sha256Hex
import java.util.concurrent.ConcurrentHashMap

@Component
Expand Down
27 changes: 27 additions & 0 deletions src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
package simplerag.ragback.global.util

object TextChunker {
fun chunkByCharsSeq(raw: String, size: Int, overlap: Int): Sequence<String> = sequence {
Comment on lines +3 to +4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider an optional word-boundary mode to reduce mid-word splits.

Character windows are fine for a first pass. If quality matters, add a whitespace-aware variant (e.g., try to extend to the next whitespace within a small slack).

I can add chunkByWordsSeq(...) with a small slack range if desired.

πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt around lines
3-4, add a whitespace-aware variant (either an optional mode on chunkByCharsSeq
or a new chunkByWordsSeq) that avoids splitting words: when computing each
window end, allow a small "slack" (configurable Int) to look forward up to slack
characters for the next whitespace and extend the end to that whitespace if
found; ensure the produced chunk size does not exceed size+slack, preserve the
requested overlap by computing the next window start = previousStart + size -
overlap (or adjust start based on the extended end while maintaining overlap
semantics), handle edge cases (no whitespace found within slack, end of string)
by falling back to the original character-boundary end, and keep this as a
Sequence<String> generator to match existing API.

require(size >= 1) { "chunk size must be >= 1" }
require(overlap in 0 until size) { "overlap must be 0..size-1" }

val text = normalize(raw)
if (text.isBlank()) return@sequence
if (text.length <= size) { yield(text); return@sequence }

val step = size - overlap
var start = 0
while (start < text.length) {
val end = (start + size).coerceAtMost(text.length)
yield(text.substring(start, end))
if (end == text.length) break
start += step
}
}

private fun normalize(s: String): String =
s.replace("\r\n", "\n").replace("\r", "\n")
.replace(Regex("[ \t]+"), " ")
.replace(Regex("\\n{3,}"), "\n\n")
.trim()
Comment on lines +22 to +26
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Avoid recompiling regexes on every call; precompile once.

Creating Regex instances per call adds overhead for large inputs. Hoist them to private val fields.

-object TextChunker {
+object TextChunker {
+    private val SPACES = Regex("[ \\t]+")
+    private val MANY_NL = Regex("\\n{3,}")
@@
-    private fun normalize(s: String): String =
-        s.replace("\r\n", "\n").replace("\r", "\n")
-            .replace(Regex("[ \t]+"), " ")
-            .replace(Regex("\\n{3,}"), "\n\n")
-            .trim()
+    private fun normalize(s: String): String =
+        s.replace("\r\n", "\n").replace("\r", "\n")
+            .replace(SPACES, " ")
+            .replace(MANY_NL, "\n\n")
+            .trim()
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/TextChunker.kt around lines 22
to 26, the normalize function currently constructs Regex("[ \t]+") and
Regex("\\n{3,}") on every call which is wasteful; hoist these regexes to
precompiled private vals (either as private val fields in the class or inside a
companion object/top-level private vals) like private val MULTI_WHITESPACE =
Regex("[ \t]+") and private val MULTIPLE_BLANK_LINES = Regex("\\n{3,}") and then
replace the inline Regex(...) usages in normalize with those precompiled vals so
Regex objects are reused across calls.

}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package simplerag.ragback.global.util
package simplerag.ragback.global.util.converter

import org.springframework.web.multipart.MultipartFile
import java.io.BufferedInputStream
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package simplerag.ragback.global.util
package simplerag.ragback.global.util.converter

import com.fasterxml.jackson.databind.ObjectMapper
import org.springframework.http.MediaType
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
package simplerag.ragback.global.util.loader


interface ContentLoader {
fun load(url: String): String
}
Comment on lines +4 to +6
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Clarify error semantics; avoid empty-string sentinels.

Interface returns String, and the HttpContentLoader impl reportedly returns "" on null body. Prefer explicit failure signaling to avoid treating errors as β€œblank content”.

Minimal change: add KDoc and commit to throwing a typed exception on failure (or switch to Result if you can ripple changes).

@@
-interface ContentLoader {
-    fun load(url: String): String
-}
+/**
+ * Loads textual content from a location (e.g., http/https URL).
+ *
+ * Contract:
+ * - Returns the fetched body as-is (UTF-8 normalized by the implementation).
+ * - Throws a domain exception (e.g., ContentLoadException) on failure instead of returning an empty string.
+ */
+interface ContentLoader {
+    fun load(url: String): String
+}

I can also switch this to Result<String> or suspend fun load(URI): Result<String> if you’d like to make it non-blocking and URI-typed.

πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
interface ContentLoader {
fun load(url: String): String
}
/**
* Loads textual content from a location (e.g., HTTP/HTTPS URL).
*
* Contract:
* - Returns the fetched body as-is (UTF-8 normalized by the implementation).
* - Throws a domain-specific exception (e.g. ContentLoadException) on failure instead of returning an empty string.
*/
interface ContentLoader {
fun load(url: String): String
}
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/loader/ContentLoader.kt around
lines 4 to 6, the interface currently returns a raw String and implementations
use empty-string sentinels for failure; add KDoc explaining that implementations
must throw a specific typed exception (e.g., ContentLoadException) on failure
instead of returning "" and update the signature contract to document this
behavior so callers rely on exceptions for error semantics (optionally note in
KDoc you may later migrate to Result<String> or suspend+URI); ensure
implementations (like HttpContentLoader) are changed to throw the typed
exception when body is null or on other load errors.

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
package simplerag.ragback.global.util.loader

import org.springframework.http.converter.StringHttpMessageConverter
import org.springframework.stereotype.Component
import org.springframework.web.client.RestTemplate
import java.nio.charset.StandardCharsets

@Component
class HttpContentLoader : ContentLoader {

private val restTemplate: RestTemplate = RestTemplate().apply {
// κΈ°μ‘΄ String 컨버터 제거 ν›„ UTF-8 컨버터λ₯Ό 맨 μ•žμ— μΆ”κ°€
val replaced = messageConverters.filterNot { it is StringHttpMessageConverter }.toMutableList()
replaced.add(0, StringHttpMessageConverter(StandardCharsets.UTF_8))
messageConverters = replaced
}
Comment on lines +11 to +16
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ› οΈ Refactor suggestion

Prefer RestTemplateBuilder with explicit timeouts and UTF-8, avoid mutating converters list manually.

Manually filtering and replacing converters is brittle. Also, no connect/read timeouts are set, so indexing can hang on slow endpoints.

Apply this refactor:

-    private val restTemplate: RestTemplate = RestTemplate().apply {
-        // κΈ°μ‘΄ String 컨버터 제거 ν›„ UTF-8 컨버터λ₯Ό 맨 μ•žμ— μΆ”κ°€
-        val replaced = messageConverters.filterNot { it is StringHttpMessageConverter }.toMutableList()
-        replaced.add(0, StringHttpMessageConverter(StandardCharsets.UTF_8))
-        messageConverters = replaced
-    }
+    private val restTemplate: RestTemplate = org.springframework.boot.web.client.RestTemplateBuilder()
+        .setConnectTimeout(java.time.Duration.ofSeconds(5))
+        .setReadTimeout(java.time.Duration.ofSeconds(15))
+        .additionalMessageConverters(StringHttpMessageConverter(StandardCharsets.UTF_8))
+        .build()

If you have a central RestTemplateBuilder bean, inject it via constructor for consistency and testability.

πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private val restTemplate: RestTemplate = RestTemplate().apply {
// κΈ°μ‘΄ String 컨버터 제거 ν›„ UTF-8 컨버터λ₯Ό 맨 μ•žμ— μΆ”κ°€
val replaced = messageConverters.filterNot { it is StringHttpMessageConverter }.toMutableList()
replaced.add(0, StringHttpMessageConverter(StandardCharsets.UTF_8))
messageConverters = replaced
}
private val restTemplate: RestTemplate = org.springframework.boot.web.client.RestTemplateBuilder()
.setConnectTimeout(java.time.Duration.ofSeconds(5))
.setReadTimeout(java.time.Duration.ofSeconds(15))
.additionalMessageConverters(StringHttpMessageConverter(StandardCharsets.UTF_8))
.build()
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/loader/HttpContentLoader.kt
around lines 11 to 16, replace the manual RestTemplate creation and converter
mutation with a RestTemplate built from a RestTemplateBuilder injected via the
constructor; configure explicit connect and read timeouts on the builder (e.g.
reasonable Duration values), register or add a StringHttpMessageConverter with
StandardCharsets.UTF_8 through the builder’s message-converter APIs (instead of
filtering the existing list), and use the builder.build() result as the
restTemplate; ensure the class accepts a RestTemplateBuilder parameter for
consistency and testability rather than creating RestTemplate directly.


override fun load(url: String): String {
return restTemplate.getForObject(url, String::class.java) ?: ""
}
Comment on lines +18 to +20
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

πŸ› οΈ Refactor suggestion

⚠️ Potential issue

Handle HTTP exceptions and validate URL scheme to avoid SSRF and brittle failures.

Load can throw RestClientException; currently it bubbles up. Also, validating allowed schemes (e.g., http/https/s3) reduces SSRF risk when URLs are user-controlled.

-    override fun load(url: String): String {
-        return restTemplate.getForObject(url, String::class.java) ?: ""
-    }
+    override fun load(url: String): String {
+        require(url.startsWith("http://") || url.startsWith("https://")) {
+            "Unsupported URL scheme"
+        }
+        return try {
+            restTemplate.getForObject(url, String::class.java) ?: ""
+        } catch (ex: org.springframework.web.client.RestClientException) {
+            // TODO: map to domain exception if desired
+            ""
+        }
+    }

If URLs can be s3:// or file:// from your DataFile pipeline, whitelist those explicitly and route to specialized loaders.

πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
override fun load(url: String): String {
return restTemplate.getForObject(url, String::class.java) ?: ""
}
override fun load(url: String): String {
require(url.startsWith("http://") || url.startsWith("https://")) {
"Unsupported URL scheme"
}
return try {
restTemplate.getForObject(url, String::class.java) ?: ""
} catch (ex: org.springframework.web.client.RestClientException) {
// TODO: map to domain exception if desired
""
}
}
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/loader/HttpContentLoader.kt
around lines 18 to 20, the loader currently calls restTemplate.getForObject(url,
String::class.java) directly which can throw RestClientException and allows any
URL scheme (SSRF risk). Wrap the call in a try/catch to handle
RestClientException (and other runtime IO errors), log the error and return a
safe default or rethrow a well-typed exception; before fetching, parse and
validate the URL scheme against an explicit whitelist (e.g., http, https and any
allowed non-HTTP schemes such as s3 or file), and if the scheme is s3:// or
file:// delegate to the appropriate specialized loader; reject or fail fast on
disallowed schemes with a clear error.

}
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package simplerag.ragback.global.util
package simplerag.ragback.global.util.s3


enum class S3Type(
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package simplerag.ragback.global.util
package simplerag.ragback.global.util.s3

import org.springframework.web.multipart.MultipartFile

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
package simplerag.ragback.global.util
package simplerag.ragback.global.util.s3

import org.slf4j.LoggerFactory
import org.springframework.context.annotation.Profile
Expand Down Expand Up @@ -58,9 +58,7 @@ class S3UtilImpl(
}

override fun urlFromKey(key: String): String =
s3.utilities()
.getUrl { it.bucket(bucket).key(key) }
.toExternalForm()
"https://mukit-s3.s3.ap-northeast-2.amazonaws.com/" + key
Comment on lines 60 to +61
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Hard-coded S3 URL will break across environments and mishandles URL encoding

Using a fixed bucket/region string prevents deploy-time changes and can yield invalid URLs for keys with spaces or special chars. Let the AWS SDK construct the URL (handles region and encoding).

Apply:

-    override fun urlFromKey(key: String): String =
-        "https://mukit-s3.s3.ap-northeast-2.amazonaws.com/" + key
+    override fun urlFromKey(key: String): String =
+        s3.utilities().getUrl { it.bucket(bucket).key(key) }.toExternalForm()
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
override fun urlFromKey(key: String): String =
s3.utilities()
.getUrl { it.bucket(bucket).key(key) }
.toExternalForm()
"https://mukit-s3.s3.ap-northeast-2.amazonaws.com/" + key
override fun urlFromKey(key: String): String =
s3.utilities().getUrl { it.bucket(bucket).key(key) }.toExternalForm()
πŸ€– Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/global/util/s3/S3UtilImpl.kt around lines
60-61, the code returns a hard-coded S3 URL which embeds bucket/region and does
not handle URL encoding; replace this with a call into the AWS SDK so the SDK
builds a correct, encoded URL and you can avoid hard-coded bucket/region. Inject
or read the configured bucket/region values from your app config, then call the
S3 client utilities to build the URL (e.g., use S3Utilities.getUrl/GetUrlRequest
or S3Client.utilities().getUrl(...) / S3Presigner for presigned URLs) with the
bucket and key; return the SDK-produced URL string instead of concatenating
strings so encoding and region resolution are handled correctly.


override fun deleteByUrl(url: String) {
val key = keyFromUrl(url) ?: throw S3Exception(ErrorCode.S3_INVALID_URL)
Expand Down
7 changes: 7 additions & 0 deletions src/main/resources/application-local.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,13 @@ spring:
hibernate:
ddl-auto: update
show-sql: true
ai:
openai:
api-key: ${OPENAI_KEY}
model:
embedding:
options:
model: text-embedding-3-small

logging:
level:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@ package simplerag.ragback.domain.document.service

import org.junit.jupiter.api.AfterEach
import org.junit.jupiter.api.Assertions.*
import org.junit.jupiter.api.BeforeEach
import org.junit.jupiter.api.DisplayName
import org.junit.jupiter.api.Test
import org.springframework.beans.factory.annotation.Autowired
Expand All @@ -14,7 +13,6 @@ import org.springframework.transaction.annotation.Transactional
import org.springframework.transaction.support.TransactionTemplate
import org.springframework.web.multipart.MultipartFile
import org.testcontainers.containers.PostgreSQLContainer
import org.testcontainers.junit.jupiter.Container
import org.testcontainers.utility.DockerImageName
import simplerag.ragback.domain.document.dto.DataFileBulkCreateRequest
import simplerag.ragback.domain.document.dto.DataFileCreateItem
Expand All @@ -26,18 +24,18 @@ import simplerag.ragback.global.error.CustomException
import simplerag.ragback.global.error.ErrorCode
import simplerag.ragback.global.error.FileException
import simplerag.ragback.global.storage.FakeS3Util
import simplerag.ragback.global.util.S3Type
import simplerag.ragback.global.util.sha256Hex
import simplerag.ragback.global.util.s3.S3Type
import simplerag.ragback.global.util.converter.sha256Hex
import java.security.MessageDigest

@SpringBootTest
@ActiveProfiles("test")
class DataFileServiceTest(
private val dataFileService: DataFileService,
private val dataFileRepository: DataFileRepository,
private val tagRepository: TagRepository,
private val dataFileTagRepository: DataFileTagRepository,
private val s3Util: FakeS3Util
@Autowired val dataFileService: DataFileService,
@Autowired val dataFileRepository: DataFileRepository,
@Autowired val tagRepository: TagRepository,
@Autowired val dataFileTagRepository: DataFileTagRepository,
@Autowired val s3Util: FakeS3Util
) {
Comment on lines 31 to 39
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Constructor injection in tests: add @Testconstructor(autowireMode = ALL) for clarity and portability.

Relying on @Autowired on constructor params alone can be fragile across Boot versions. Align with IndexServiceTest by enabling constructor autowiring explicitly.

 @SpringBootTest
 @ActiveProfiles("test")
+@org.springframework.test.context.TestConstructor(autowireMode = org.springframework.test.context.TestConstructor.AutowireMode.ALL)
 class DataFileServiceTest(
-    @Autowired val dataFileService: DataFileService,
-    @Autowired val dataFileRepository: DataFileRepository,
-    @Autowired val tagRepository: TagRepository,
-    @Autowired val dataFileTagRepository: DataFileTagRepository,
-    @Autowired val s3Util: FakeS3Util
+    @Autowired val dataFileService: DataFileService,
+    @Autowired val dataFileRepository: DataFileRepository,
+    @Autowired val tagRepository: TagRepository,
+    @Autowired val dataFileTagRepository: DataFileTagRepository,
+    @Autowired val s3Util: FakeS3Util
 )
πŸ“ Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@SpringBootTest
@ActiveProfiles("test")
class DataFileServiceTest(
private val dataFileService: DataFileService,
private val dataFileRepository: DataFileRepository,
private val tagRepository: TagRepository,
private val dataFileTagRepository: DataFileTagRepository,
private val s3Util: FakeS3Util
@Autowired val dataFileService: DataFileService,
@Autowired val dataFileRepository: DataFileRepository,
@Autowired val tagRepository: TagRepository,
@Autowired val dataFileTagRepository: DataFileTagRepository,
@Autowired val s3Util: FakeS3Util
) {
@SpringBootTest
@ActiveProfiles("test")
@org.springframework.test.context.TestConstructor(
autowireMode = org.springframework.test.context.TestConstructor.AutowireMode.ALL
)
class DataFileServiceTest(
@Autowired val dataFileService: DataFileService,
@Autowired val dataFileRepository: DataFileRepository,
@Autowired val tagRepository: TagRepository,
@Autowired val dataFileTagRepository: DataFileTagRepository,
@Autowired val s3Util: FakeS3Util
) {
// …
}
πŸ€– Prompt for AI Agents
In
src/test/kotlin/simplerag/ragback/domain/document/service/DataFileServiceTest.kt
around lines 31 to 39, the test class relies on constructor @Autowired
parameters but lacks the explicit Spring Boot test constructor autowiring
annotation; add @TestConstructor(autowireMode =
TestConstructor.AutowireMode.ALL) above the class declaration (and import
org.springframework.test.context.TestConstructor) so constructor injection is
explicit and consistent with IndexServiceTest.



Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,8 @@ import org.testcontainers.utility.DockerImageName
@ActiveProfiles("test")
@TestConstructor(autowireMode = TestConstructor.AutowireMode.ALL)
class IndexServiceTest(
val indexService: IndexService,
val indexRepository: IndexRepository,
@Autowired val indexService: IndexService,
@Autowired val indexRepository: IndexRepository,
Comment on lines +30 to +31
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Drop redundant @Autowired on constructor params (you already enabled constructor autowiring).

Since the class is annotated with @Testconstructor(autowireMode = ALL), the @Autowired annotations on the primary constructor parameters are unnecessary noise.

Apply this diff:

-class IndexServiceTest(
-    @Autowired val indexService: IndexService,
-    @Autowired val indexRepository: IndexRepository,
+class IndexServiceTest(
+    val indexService: IndexService,
+    val indexRepository: IndexRepository,
 )
πŸ€– Prompt for AI Agents
In src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
around lines 30 to 31, the primary constructor parameters are redundantly
annotated with @Autowired despite @TestConstructor(autowireMode = ALL) enabling
constructor autowiring; remove the @Autowired annotations from the constructor
parameters (keep the parameter types and names intact) so the test uses
constructor injection provided by the class-level configuration and avoid
unnecessary annotations.

) {


Expand Down