Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ dependencies {
testImplementation 'org.jetbrains.kotlin:kotlin-test-junit5'
testImplementation("org.mockito.kotlin:mockito-kotlin:5.3.1")
testRuntimeOnly 'org.junit.platform:junit-platform-launcher'
runtimeOnly 'org.postgresql:postgresql'
implementation 'org.postgresql:postgresql' // ← 변경
testRuntimeOnly("com.h2database:h2")

Comment on lines +47 to 49
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

H2 in tests will break with the pgvector type; switch to PostgreSQL Testcontainers.

The JPA columnDefinition "vector" won’t exist in H2, so schema generation and queries involving the embedding column will fail under the current test profile. Replace H2 with Testcontainers-based PostgreSQL for tests and ensure the extension is enabled.

Apply this diff to dependencies:

-    testRuntimeOnly("com.h2database:h2")
+    // Prefer PostgreSQL Testcontainers for pgvector
+    testImplementation 'org.testcontainers:postgresql'
+    testImplementation 'org.testcontainers:junit-jupiter'

And add a Flyway migration (outside this file) to enable the extension during tests and dev:

-- src/main/resources/db/migration/V1__enable_pgvector.sql
CREATE EXTENSION IF NOT EXISTS vector;

Run this to find any H2-specific configs that need removal/migration in test resources:


🏁 Script executed:

#!/bin/bash
rg -nC2 -i 'h2|jdbc:h2|spring\.datasource\..*h2|driver.*h2' --glob 'src/**/resources/**'

Length of output: 794


Replace H2 with PostgreSQL Testcontainers for Tests

Your test setup still relies on H2—even though you’ve switched your main runtime to PostgreSQL—so any “vector” columns (pgvector) won’t exist in H2 and will cause schema-generation or query failures. I found H2 config in your test resources:

  • src/test/resources/application-test.yml
    url: jdbc:h2:mem:simplerag;MODE=PostgreSQL;DB_CLOSE_DELAY=-1;…
    driver-class-name: org.h2.Driver
    jpa.database-platform: org.hibernate.dialect.H2Dialect
    hibernate.ddl-auto: create-drop

Please update as follows:

• In build.gradle, replace the H2 test runtime with Testcontainers dependencies:

-    testRuntimeOnly("com.h2database:h2")
+    // Use PostgreSQL Testcontainers for pgvector support
+    testImplementation "org.testcontainers:postgresql"
+    testImplementation "org.testcontainers:junit-jupiter"

• In src/test/resources/application-test.yml, remove all H2 settings and configure a Testcontainers-backed PostgreSQL datasource. For example:

-  datasource:
-    url: jdbc:h2:mem:simplerag;MODE=PostgreSQL;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE
-    driver-class-name: org.h2.Driver
-    username: sa
-    password:
-  jpa:
-    database-platform: org.hibernate.dialect.H2Dialect
-    hibernate:
-      ddl-auto: create-drop
+  datasource:
+    # PostgreSQL Testcontainer will supply URL, driver, and credentials
+    driver-class-name: org.postgresql.Driver
+  jpa:
+    database-platform: org.hibernate.dialect.PostgreSQLDialect
+    hibernate:
+      ddl-auto: none

• Add a Flyway migration to enable pgvector in your test (and dev) databases:

-- src/main/resources/db/migration/V1__enable_pgvector.sql
CREATE EXTENSION IF NOT EXISTS vector;

With these changes, your tests will spin up a real PostgreSQL instance (with pgvector support) instead of H2, ensuring your schema and queries remain compatible.

🤖 Prompt for AI Agents
In build.gradle around lines 47-49, the project still declares H2 as the test
runtime which breaks pgvector-dependent schema and queries; replace the H2
testRuntimeOnly entry with the Testcontainers PostgreSQL and Testcontainers
JUnit/Jupiter dependencies (and the Testcontainers BOM if used) so tests run
against a real PostgreSQL container. Then update
src/test/resources/application-test.yml to remove the H2
URL/driver/dialect/ddl-auto settings and instead configure datasource values
that will be populated from Testcontainers (e.g., property placeholders or
system/env properties set by your test harness) or use Spring’s Testcontainers
support to supply the JDBC URL, username and password at test runtime. Finally
add a Flyway migration file under src/main/resources/db/migration (e.g.,
V1__enable_pgvector.sql) that runs CREATE EXTENSION IF NOT EXISTS vector; so the
pgvector extension exists in both dev and test databases.

// swagger
Expand All @@ -53,6 +53,9 @@ dependencies {
// s3
implementation(platform("software.amazon.awssdk:bom:2.25.70"))
implementation("software.amazon.awssdk:s3")

// pgvector
implementation("com.pgvector:pgvector:0.1.6")
Comment on lines +57 to +58
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

pgvector dependency added: ensure Hibernate type mapping and DB extension are in place.

Adding com.pgvector:pgvector is necessary, but by itself Hibernate won’t know how to bind PGvector unless you map it as OTHER (or provide a custom type/AttributeConverter). See my entity comment for @JdbcTypeCode(SqlTypes.OTHER). Also make sure a Flyway/Liquibase migration creates the vector extension.

I can add a minimal custom type or converter if you prefer that approach over @JdbcTypeCode.

🤖 Prompt for AI Agents
In build.gradle around lines 57-58, you added the pgvector dependency but
Hibernate won’t bind PGvector automatically; update the entity mapping to use
@JdbcTypeCode(SqlTypes.OTHER) or implement and register a custom Hibernate Type
/ JPA AttributeConverter for pgvector to ensure correct JDBC binding, and add a
Flyway or Liquibase migration that runs CREATE EXTENSION IF NOT EXISTS vector;
to ensure the DB has the pgvector extension before migrations/usage.

}

dependencyManagement {
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
package simplerag.ragback.domain.index.entity

import com.pgvector.PGvector
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Ensure the pgvector extension exists in all environments.

Add a migration to run CREATE EXTENSION IF NOT EXISTS vector and verify permissions in non-dev DBs. Without it, application startup will fail on schema validation.

I can draft the Flyway migration and an index creation script tailored to your chosen dimension.


🏁 Script executed:

#!/bin/bash
# Confirm presence of extension-enabling migration
rg -n 'CREATE EXTENSION IF NOT EXISTS vector' src/main/resources/db/migration || echo "Missing migration for pgvector extension"

Length of output: 242


Add a Flyway migration to enable the pgvector extension

We’ve confirmed that there is no existing Flyway migration for CREATE EXTENSION IF NOT EXISTS vector (the db/migration folder is missing entirely), so application startup will fail schema validation in environments where the vector extension isn’t already present.

• Create a new SQL migration under src/main/resources/db/migration/, for example
V2__enable_vector_extension.sql

-- enable PostgreSQL vector extension for embedding support
CREATE EXTENSION IF NOT EXISTS vector;

• (Optional) Immediately follow with an index creation, e.g.

-- index on embedding vector column for fast similarity searches
CREATE INDEX IF NOT EXISTS idx_chunk_embedding_vector
  ON chunks USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

• Ensure non-dev databases have sufficient privileges to run CREATE EXTENSION. Update your deployment docs or grant scripts accordingly.

Once the migration is in place, Flyway will apply it automatically before your application’s schema validation check.

🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt
around line 3, there is an import for PGvector but no Flyway migration to enable
the PostgreSQL pgvector extension which causes startup schema validation
failures in environments missing the extension; add a SQL migration file under
src/main/resources/db/migration/, e.g. V2__enable_vector_extension.sql,
containing a CREATE EXTENSION IF NOT EXISTS vector; statement (optionally
followed by a CREATE INDEX IF NOT EXISTS ... for the embedding column using
ivfflat and vector_l2_ops), and update deployment/DB provisioning docs or grant
scripts so non-dev DBs have the privileges to run CREATE EXTENSION.

import jakarta.persistence.*
import simplerag.ragback.global.entity.BaseEntity
import simplerag.ragback.global.util.FloatArrayToPgVectorStringConverter

// 임베딩 크기를 서비스단에서 검증을 해줘야함
@Entity
Expand All @@ -13,9 +13,8 @@ class ChunkEmbedding(
@Lob
val content: String,

@Convert(converter = FloatArrayToPgVectorStringConverter::class)
@Column(name = "embedding", nullable = false)
private var _embedding: FloatArray,
@Column(name = "embedding", columnDefinition = "vector")
var embedding: PGvector,

@Column(name = "embedding_dim", nullable = false)
val embeddingDim: Int,
Expand All @@ -27,16 +26,4 @@ class ChunkEmbedding(
@Id @GeneratedValue(strategy = GenerationType.IDENTITY)
@Column(name = "chunk_embeddings_id")
val id: Long? = null,
) : BaseEntity() {

@get:Transient
val embedding: FloatArray get() = _embedding.copyOf()

fun updateEmbedding(newVec: FloatArray) {
require(newVec.size == embeddingDim) {
"Embedding dimension mismatch: expected=$embeddingDim, got=${newVec.size}"
}
_embedding = newVec.copyOf()
}

}
) : BaseEntity()
Original file line number Diff line number Diff line change
Expand Up @@ -6,36 +6,21 @@ enum class EmbeddingModel(
) {
// OpenAI
TEXT_EMBEDDING_3_SMALL(1536, "text-embedding-3-small"),
TEXT_EMBEDDING_3_LARGE(3072, "text-embedding-3-large"),

// SBERT / HuggingFace
ALL_MINILM_L6_V2(384, "sentence-transformers/all-MiniLM-L6-v2"),
ALL_MP_NET_BASE_V2(768, "sentence-transformers/all-mpnet-base-v2"),
MULTI_QA_MP_NET_BASE_DOT_V1(768, "sentence-transformers/multi-qa-mpnet-base-dot-v1"),
DISTILUSE_BASE_MULTILINGUAL_CASED_V2(512, "sentence-transformers/distiluse-base-multilingual-cased-v2"),
PARAPHRASE_MULTILINGUAL_MINILM_L12_V2(384, "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"),
KO_SBERT_V1(768, "jhgan/ko-sbert-v1"),
KOR_SROBERTA(768, "jhgan/ko-sroberta-medium-nli"),

// Korean specific
BM_KO_SMALL(512, "bespin-global/klue-sroberta-base-continue-learning-by-mnr"),

// Instructor / Mistral
INSTRUCTOR_BASE(768, "hkunlp/instructor-base"),
INSTRUCTOR_XL(1024, "hkunlp/instructor-xl"),
MISTRAL_EMBED(1024, "mistral-embed"),
// Korean
KO_SBERT_V1(768, "jhgan/ko-sbert-v1"),

// BGE / E5 etc
BGE_SMALL_EN(384, "BAAI/bge-small-en-v1.5"),
// BGE
BGE_BASE_EN(768, "BAAI/bge-base-en-v1.5"),
BGE_LARGE_EN(1024, "BAAI/bge-large-en-v1.5"),
BGE_M3(1024, "BAAI/bge-m3"),
E5_SMALL(384, "intfloat/e5-small-v2"),
E5_BASE(768, "intfloat/e5-base-v2"),
E5_LARGE(1024, "intfloat/e5-large-v2"),

// Old word vectors
FASTTEXT_KO(300, "fasttext-ko-300d");
// E5
E5_BASE(768, "intfloat/e5-base-v2");
Comment on lines +18 to +23
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

모델 축소에 따른 참조 누락/컴파일 오류 점검 필요

여러 상수가 제거되었습니다. 레거시 참조가 남아 있으면 컴파일 실패 혹은 런타임 분기 누락이 발생할 수 있습니다. 자동 점검 스크립트를 제안합니다.


🏁 Script executed:

#!/bin/bash
set -euo pipefail

REMOVED=(
  TEXT_EMBEDDING_3_LARGE
  MULTI_QA_MP_NET_BASE_DOT_V1
  PARAPHRASE_MULTILINGUAL_MINILM_L12_V2
  KOR_SROBERTA
  BM_KO_SMALL
  INSTRUCTOR_BASE
  INSTRUCTOR_XL
  MISTRAL_EMBED
  BGE_SMALL_EN
  BGE_LARGE_EN
  E5_SMALL
  E5_LARGE
  FASTTEXT_KO
)

echo "▶ 제거된 상수 참조 검색"
for sym in "${REMOVED[@]}"; do
  rg -n -C2 -P "\bEmbeddingModel\.$sym\b" || true
done

echo
echo "▶ 모델 ID 문자열 직접 사용 여부도 점검(선택)"
for sym in "${REMOVED[@]}"; do
  rg -n -C2 -F "$sym" || true
done

Length of output: 14631


테스트 코드: 제거된 모델 상수(TEXT_EMBEDDING_3_LARGE) 참조 교체 필요

EmbeddingModel.TEXT_EMBEDDING_3_LARGE가 삭제되어 아래 테스트들이 컴파일 오류를 일으킵니다. 적절한 신규 상수(예: EmbeddingModel.E5_BASE 또는 BGE_BASE_EN)로 교체하고, 테스트 로직이 여전히 의도한 대로 동작하는지 검증해주세요.

  • 수정 대상:

    • src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
      IndexCreateRequest(..., EmbeddingModel.TEXT_EMBEDDING_3_LARGE, ...) 참조 (기본 37, 54, 76, 85, 109, 139, 161, 196, 221, 247, 271행 등)
  • 변경 예시:

    - IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true)
    + IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.E5_BASE,         true)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// BGE
BGE_BASE_EN(768, "BAAI/bge-base-en-v1.5"),
BGE_LARGE_EN(1024, "BAAI/bge-large-en-v1.5"),
BGE_M3(1024, "BAAI/bge-m3"),
E5_SMALL(384, "intfloat/e5-small-v2"),
E5_BASE(768, "intfloat/e5-base-v2"),
E5_LARGE(1024, "intfloat/e5-large-v2"),
// Old word vectors
FASTTEXT_KO(300, "fasttext-ko-300d");
// E5
E5_BASE(768, "intfloat/e5-base-v2");
++ b/src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
@@ -37,7 +37,7 @@ class IndexServiceTest {
//
IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.E5_BASE, true)
//
}


companion object {
fun findByModelId(modelId: String): EmbeddingModel? {
Expand Down

This file was deleted.