🚀 Chore: entity 수정 및 모델 수 줄이기 #14

catturtle123 · 2025-08-19T12:09:09Z

📌 PR 개요

entity 수정 및 모델 수 줄이기

🔍 관련 이슈

Closes 🚀 [Chore] pgvector 엔티티 수정 #13

✨ 변경 사항

entity 수정 및 모델 수 줄이기
🚀 chore: entity 수정 및 모델 수 줄이기

📸 스크린샷 / 동작 확인 결과 (선택)

필요하다면 이미지나 동영상 첨부.

✅ 체크리스트

코드 스타일을 준수했습니다.
불필요한 주석이나 콘솔 로그를 제거했습니다.
관련 테스트를 작성/수정했습니다.
로컬 환경에서 모든 기능이 정상 작동함을 확인했습니다.

🗒️ 추가 메모

기타 참고 사항을 적어주세요.

Summary by CodeRabbit

New Features
- Store embeddings using Postgres’ native vector type for better performance and compatibility.
- Added KO_SBERT_V1 embedding model option.
Refactor
- Replaced custom embedding conversion with direct vector storage.
- Streamlined the embedding model list by removing several legacy options.
Chores
- Updated dependencies to enable vector support and adjusted PostgreSQL driver scope.

coderabbitai · 2025-08-19T12:09:15Z

Walkthrough

This pull request modifies the project's PostgreSQL and pgvector integration by updating dependency configurations, refactoring the ChunkEmbedding entity to use PGvector directly, and pruning the EmbeddingModel enum.

Changes

Cohort / File(s)	Change Summary
Gradle Dependencies `build.gradle`	- Changed PostgreSQL JDBC driver from `runtimeOnly` to `implementation` - Added pgvector dependency `com.pgvector:pgvector:0.1.6`
Entity Refactoring `src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt`	- Replaced FloatArray embedding with direct PGvector type - Removed converter-based mapping - Added @table annotation - Simplified embedding field declaration
Enum Modification `src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt`	- Removed multiple embedding model entries - Introduced KO_SBERT_V1 - Reduced overall model set
Converter Removal `src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt`	- Completely removed the FloatArray to PGvector string converter

Sequence Diagram

sequenceDiagram
    participant Entity as ChunkEmbedding
    participant Database as PostgreSQL
    participant PGVector as PGvector

    Entity->>PGVector: Direct embedding storage
    PGVector->>Database: Store vector type
    Database-->>PGVector: Native vector support

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Assessment against Linked Issues

Objective	Addressed	Explanation
Update Dependencies	✅
Gradle Configuration Changes	✅
Entity Modifications	✅
No Runtime Behavior Changes	✅

Out-of-Scope Changes

There are no out-of-scope changes detected in this pull request.

Possibly Related PRs

✨ Feature: erd 수정 #6: Directly related PR modifying ChunkEmbedding entity and pgvector converter

Poem

🐰 In the realm of vectors bright,
PostgreSQL dances with pure delight,
Embeddings now sleek and refined,
A rabbit's code, precisely designed!
Hop, hop, hooray for clean design! 🌟

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch chore/#13-pgvector-fix

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)
35-43: 로드 시 차원 검증을 추가해 데이터 불일치 조기 감지

JPA가 필드에 직접 주입하므로, updateEmbedding의 차원 검증은 setter 경로에서만 동작합니다. DB→엔티티 로딩 시 _embedding.size != embeddingDim인 데이터가 들어오면 늦게 발견됩니다. @PostLoad 훅으로 조기 검증을 권장합니다.

예시(파일 내 임의 위치에 추가):
@PostLoad
private fun validateEmbeddingOnLoad() {
    require(_embedding.size == embeddingDim) {
        "Embedding dimension mismatch on load: expected=$embeddingDim, got=${_embedding.size}"
    }
}

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 55f2b92 and f0fd20e.

📒 Files selected for processing (5)

build.gradle (1 hunks)
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (2 hunks)
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (1 hunks)
src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorConverter.kt (1 hunks)
src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (0 hunks)

💤 Files with no reviewable changes (1)

src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt

🧰 Additional context used

🧬 Code Graph Analysis (2)

src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorConverter.kt (1)

src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (9)

autoApply (6-34)

convertToEntityAttribute (17-33)

convertToDatabaseColumn (8-15)

{ "Embedding (FloatArray) must not be null" } (9-9)

{ (26-28)

arr (25-29)

{ (11-13)

{ "Embedding must not be empty; expected fixed dimension (e.g., 1536)" } (10-10)

it (14-14)

src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)

src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (2)

autoApply (6-34)

{ "Embedding (FloatArray) must not be null" } (9-9)

🔇 Additional comments (5)

build.gradle (1)

47-47: PGobject 사용으로 컴파일 타임 의존 필요 — scope 변경 타당

FloatArrayToPgVectorConverter에서 org.postgresql.util.PGobject를 직접 참조하므로, postgresql 드라이버를 implementation으로 승격한 결정은 합리적입니다. Spring Boot BOM이 버전을 관리하므로 별도 버전 고정도 불필요합니다.

src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorConverter.kt (2)

9-13: 입력 검증 적절

null/empty 및 NaN/Infinity 차단 로직이 명확합니다. 런타임 예외 메시지도 충분히 설명적입니다.

14-21: PGobject 매핑 접근 방식 적절

pgvector 컬럼에 type="vector"로 PGobject를 전달하는 방식은 Hibernate에서 SqlTypes.OTHER와 함께 안정적으로 동작합니다. 문자열 빌드도 불필요한 객체 생성을 최소화합니다.
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)

18-21: vector 차원 미지정(.vector) 및 테스트 환경 Postgres 전환 필수

다차원(768/1024/1536 등) EmbeddingModel을 지원하려면, 현재 vector(3072)로 고정된 정의를 vector로 변경해야 하며, H2 기반 테스트 환경은 DDL 오류가 발생하므로 PostgreSQL Testcontainers로 전환이 필요합니다.

수정 위치:
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt – 라인 20
- @Column(name = "embedding", nullable = false, columnDefinition = "vector(3072)")
+ @Column(name = "embedding", nullable = false, columnDefinition = "vector")
build.gradle

testRuntimeOnly("com.h2database:h2") 제거

org.testcontainers:postgresql 등 Testcontainers 의존성 추가

src/test/resources/application-test.yml

H2 설정 제거 후 PostgreSQL Testcontainers URI 또는 프로퍼티로 교체
⛔ Skipped due to learnings
Learnt from: catturtle123
PR: Simple-RAG/BACK#6
File: src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt:16-22
Timestamp: 2025-08-17T13:12:02.200Z
Learning: catturtle123이 Index 엔티티의 overlapSize ≤ chunkingSize 검증을 엔티티 레벨이 아닌 서비스 레이어에서 구현하기로 결정했으며, chunking embedding 저장 관련 PR에서 이 검증 로직을 추가할 예정이므로 해당 PR에서 언급해야 함 (Simple-RAG/BACK 프로젝트)
src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (1)

15-17: KO_SBERT_V1 추가 적절

한국어 모델을 최소 구성으로 유지하면서 768차원 라인업을 확보했습니다. 다른 코드에서의 사용성에 영향은 없어 보입니다.

coderabbitai · 2025-08-19T12:17:24Z

src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt

+    // BGE
    BGE_BASE_EN(768, "BAAI/bge-base-en-v1.5"),
-    BGE_LARGE_EN(1024, "BAAI/bge-large-en-v1.5"),
    BGE_M3(1024, "BAAI/bge-m3"),
-    E5_SMALL(384, "intfloat/e5-small-v2"),
-    E5_BASE(768, "intfloat/e5-base-v2"),
-    E5_LARGE(1024, "intfloat/e5-large-v2"),

-    // Old word vectors
-    FASTTEXT_KO(300, "fasttext-ko-300d");
+    // E5
+    E5_BASE(768, "intfloat/e5-base-v2");


💡 Verification agent

🧩 Analysis chain

모델 축소에 따른 참조 누락/컴파일 오류 점검 필요

여러 상수가 제거되었습니다. 레거시 참조가 남아 있으면 컴파일 실패 혹은 런타임 분기 누락이 발생할 수 있습니다. 자동 점검 스크립트를 제안합니다.

🏁 Script executed:

#!/bin/bash set -euo pipefail REMOVED=( TEXT_EMBEDDING_3_LARGE MULTI_QA_MP_NET_BASE_DOT_V1 PARAPHRASE_MULTILINGUAL_MINILM_L12_V2 KOR_SROBERTA BM_KO_SMALL INSTRUCTOR_BASE INSTRUCTOR_XL MISTRAL_EMBED BGE_SMALL_EN BGE_LARGE_EN E5_SMALL E5_LARGE FASTTEXT_KO ) echo "▶ 제거된 상수 참조 검색" for sym in "${REMOVED[@]}"; do rg -n -C2 -P "\bEmbeddingModel\.$sym\b" || true done echo echo "▶ 모델 ID 문자열 직접 사용 여부도 점검(선택)" for sym in "${REMOVED[@]}"; do rg -n -C2 -F "$sym" || true done

Length of output: 14631

테스트 코드: 제거된 모델 상수(TEXT_EMBEDDING_3_LARGE) 참조 교체 필요

EmbeddingModel.TEXT_EMBEDDING_3_LARGE가 삭제되어 아래 테스트들이 컴파일 오류를 일으킵니다. 적절한 신규 상수(예: EmbeddingModel.E5_BASE 또는 BGE_BASE_EN)로 교체하고, 테스트 로직이 여전히 의도한 대로 동작하는지 검증해주세요.

수정 대상:

src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
– IndexCreateRequest(..., EmbeddingModel.TEXT_EMBEDDING_3_LARGE, ...) 참조 (기본 37, 54, 76, 85, 109, 139, 161, 196, 221, 247, 271행 등)

변경 예시:

- IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true) + IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.E5_BASE, true)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// BGE

BGE_BASE_EN(768, "BAAI/bge-base-en-v1.5"),

BGE_LARGE_EN(1024, "BAAI/bge-large-en-v1.5"),

BGE_M3(1024, "BAAI/bge-m3"),

E5_SMALL(384, "intfloat/e5-small-v2"),

E5_BASE(768, "intfloat/e5-base-v2"),

E5_LARGE(1024, "intfloat/e5-large-v2"),

// Old word vectors

FASTTEXT_KO(300, "fasttext-ko-300d");

// E5

E5_BASE(768, "intfloat/e5-base-v2");

++ b/src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt

@@ -37,7 +37,7 @@ class IndexServiceTest {

// …

IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.E5_BASE, true)

// …

}

coderabbitai · 2025-08-19T12:17:25Z

src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorConverter.kt

+    override fun convertToEntityAttribute(dbData: PGobject?): FloatArray {
+        requireNotNull(dbData) { "db vector is null" }
+        val body = dbData.value?.trim()?.removePrefix("[")?.removeSuffix("]") ?: error("empty vector")
+        return body.split(',').map { it.trim().toFloat() }.toFloatArray()
+    }


🛠️ Refactor suggestion

빈 벡터 및 파싱 오류 처리 보강 필요 (NaN/Infinity 재검증 포함)

현재 구현은

[] 같은 빈 벡터 문자열을 만나면 split(',') 후 toFloat()에서 NumberFormatException이 발생합니다.

파싱 실패 시 예외 메시지가 모호하고, 파싱 이후 NaN/Infinity에 대한 재검증이 없습니다.

아래와 같이 방어적 파싱과 명확한 예외를 추가해 주세요.

- override fun convertToEntityAttribute(dbData: PGobject?): FloatArray { - requireNotNull(dbData) { "db vector is null" } - val body = dbData.value?.trim()?.removePrefix("[")?.removeSuffix("]") ?: error("empty vector") - return body.split(',').map { it.trim().toFloat() }.toFloatArray() - } + override fun convertToEntityAttribute(dbData: PGobject?): FloatArray { + requireNotNull(dbData) { "db vector is null" } + val raw = dbData.value ?: throw IllegalArgumentException("db vector has null value") + val body = raw.trim().removePrefix("[").removeSuffix("]").trim() + if (body.isEmpty()) { + // Defensive: avoid NumberFormatException on "[]" + return floatArrayOf() + } + val arr = try { + body.split(',') + .map { it.trim().toFloat() } + .toFloatArray() + } catch (e: NumberFormatException) { + throw IllegalArgumentException("Invalid vector literal for pgvector: '$raw'", e) + } + require(arr.all { it.isFinite() }) { "NaN/Infinity not allowed (db → entity)" } + return arr + }

🤖 Prompt for AI Agents

In src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorConverter.kt around lines 24 to 28, replace the current naive parsing with defensive parsing: treat "[]" or an empty trimmed body as an empty FloatArray; split only when non-empty and skip any extra empty tokens; convert each token to Float in a try/catch that throws a clear IllegalArgumentException indicating the offending token and its index on parse failure; after conversion, validate each float to ensure it is finite (not NaN or Infinity) and throw an IllegalArgumentException indicating the index and invalid value if found; preserve existing null checks on dbData and provide explicit messages for null/empty cases.

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)

build.gradle (2)
34-39: Duplicate dependency: jackson-module-kotlin is declared twice.

Remove one of the duplicates to avoid noise and potential resolution ambiguity.
-    implementation("com.fasterxml.jackson.module:jackson-module-kotlin")
...
-    implementation 'com.fasterxml.jackson.module:jackson-module-kotlin'
+    implementation 'com.fasterxml.jackson.module:jackson-module-kotlin'
73-77: Redundant allOpen block when using kotlin-jpa plugin.

The kotlin-jpa plugin already configures all-open for JPA. Keeping both is harmless but redundant. Consider removing the manual block for simplicity.
-allOpen {
-    annotation 'jakarta.persistence.Entity'
-    annotation 'jakarta.persistence.MappedSuperclass'
-    annotation 'jakarta.persistence.Embeddable'
-}
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (2)
12-15: Consider lazy-loading LOB content.

Large content fields often don’t need to be fetched with every embedding read. Mark as LAZY to reduce I/O if your access patterns allow.
-    @Lob
+    @Lob
+    @Basic(fetch = FetchType.LAZY)
     val content: String,
Note: Validate that your JPA provider and usage patterns truly defer LOB loading; some providers may still fetch eagerly depending on dialect/settings.

19-21: Action Required: Enforce embeddingDim Invariant at Entity Boundary

A quick grep (rg -nP 'new\s+ChunkEmbedding\(|ChunkEmbedding\(' -g 'src/**') found no direct constructor calls outside of the class declaration itself. This means instantiation (and persistence) happens solely through JPA, so service-layer checks are your only current safeguard—but they’re easy to bypass in the future. To prevent bad data ingestion, add a self-defensive lifecycle validator on the entity:

Locations to update:

File: src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt

Add the following methods in the class body (outside existing property/constructor declarations):
 class ChunkEmbedding(
     @Column(name = "embedding", columnDefinition = "vector", nullable = false)
     val embedding: PGvector,
 
     @Column(name = "embedding_dim", nullable = false)
     val embeddingDim: Int,
 
     // … other fields …
 ) {
+    @PrePersist
+    @PreUpdate
+    fun validateEmbedding() {
+        val vecSize = embedding.vector?.size ?: -1
+        require(vecSize == embeddingDim) {
+            "Invalid embeddingDim: expected $embeddingDim, but got PGvector of length $vecSize"
+        }
+    }
 }
This ensures that every persist or update operation will immediately fail if the embeddingDim doesn’t match the actual vector length, guarding your database against mismatched data.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f0fd20e and a307fcd.

📒 Files selected for processing (2)

build.gradle (2 hunks)
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (3 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)

src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (6)

{ "Embedding (FloatArray) must not be null" } (9-9)

autoApply (6-34)

{ "Embedding must not be empty; expected fixed dimension (e.g., 1536)" } (10-10)

{ (26-28)

convertToEntityAttribute (17-33)

{ (11-13)

🔇 Additional comments (5)

build.gradle (2)

47-47: Switching the PostgreSQL driver to implementation is appropriate here.

Because the codebase now references PGobject-derived types (via PGvector), the driver must be on the compile classpath. Keeping it as runtimeOnly would risk compilation/linkage issues.

2-6: Kotlin plugin versions align with Spring Boot BOM

Verified that Spring Boot 3.5.4’s dependency-management BOM declares <kotlin.version>1.9.25</kotlin.version>, matching all Kotlin plugin versions in build.gradle.

No other Kotlin versions (1.x or 2.x) were found elsewhere in the codebase.

Future-proofing: if you upgrade to Kotlin 2.x, remember to bump every Kotlin plugin declaration in lockstep.
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (3)
16-18: Annotate PGvector field with JdbcTypeCode and enforce NOT NULL

We ran a search for any existing converters or @Convert annotations targeting PGvector and found none. It’s safe to add the explicit JDBC type mapping and non-null constraint.

• File: src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt
• Lines: 16–18

Suggested change:
 import com.pgvector.PGvector
+import org.hibernate.annotations.JdbcTypeCode
+import org.hibernate.type.SqlTypes

     @Column(name = "embedding", columnDefinition = "vector")
-    var embedding: PGvector,
+    @JdbcTypeCode(SqlTypes.OTHER)
+    @Column(name = "embedding", columnDefinition = "vector", nullable = false)
+    var embedding: PGvector,
This ensures Hibernate uses PreparedStatement#setObject for the vector type and enforces a NOT NULL constraint at the database level.

9-10: Action required: confirm pgvector dimension and index strategy

The ChunkEmbedding entity at
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt
declares
@Column(name = "embedding", columnDefinition = "vector"), which omits a fixed dimension.

pgvector’s docs require a fixed size (e.g., vector(768) or vector(1536)) to support IVFFlat/HNSW indexes.

No migrations were found under src/main/resources/db/migration (or elsewhere) that:

Define a fixed-dimension vector column

Create an IVFFlat or HNSW index

Install the vector extension explicitly

Please verify your database schema or migration scripts to ensure you:

Set a fixed vector dimension per table (for example, columnDefinition = "vector(768)")

Create the appropriate similarity index (IVFFlat/HNSW) once the dimension is fixed

If you plan to run large-scale similarity searches, standardizing on a single embedding size (or splitting tables by model/dimension) is crucial for efficient indexing.

22-24: Naming consistency confirmed – no change needed

The Index entity is mapped to the table named "indexes", and its primary key column is explicitly annotated as @Column(name = "indexes_id"). All related entities (ChunkEmbedding, DataFileIndex, and Model) correctly use @JoinColumn(name = "indexes_id"), matching the table and column naming. No adjustments are required here.

coderabbitai · 2025-08-21T08:02:08Z

build.gradle

+    implementation 'org.postgresql:postgresql'   // ← 변경
    testRuntimeOnly("com.h2database:h2")



💡 Verification agent

🧩 Analysis chain

H2 in tests will break with the pgvector type; switch to PostgreSQL Testcontainers.

The JPA columnDefinition "vector" won’t exist in H2, so schema generation and queries involving the embedding column will fail under the current test profile. Replace H2 with Testcontainers-based PostgreSQL for tests and ensure the extension is enabled.

Apply this diff to dependencies:

- testRuntimeOnly("com.h2database:h2") + // Prefer PostgreSQL Testcontainers for pgvector + testImplementation 'org.testcontainers:postgresql' + testImplementation 'org.testcontainers:junit-jupiter'

And add a Flyway migration (outside this file) to enable the extension during tests and dev:

-- src/main/resources/db/migration/V1__enable_pgvector.sql CREATE EXTENSION IF NOT EXISTS vector;

Run this to find any H2-specific configs that need removal/migration in test resources:

🏁 Script executed:

#!/bin/bash rg -nC2 -i 'h2|jdbc:h2|spring\.datasource\..*h2|driver.*h2' --glob 'src/**/resources/**'

Length of output: 794

Replace H2 with PostgreSQL Testcontainers for Tests

Your test setup still relies on H2—even though you’ve switched your main runtime to PostgreSQL—so any “vector” columns (pgvector) won’t exist in H2 and will cause schema-generation or query failures. I found H2 config in your test resources:

src/test/resources/application-test.yml
• url: jdbc:h2:mem:simplerag;MODE=PostgreSQL;DB_CLOSE_DELAY=-1;…
• driver-class-name: org.h2.Driver
• jpa.database-platform: org.hibernate.dialect.H2Dialect
• hibernate.ddl-auto: create-drop

Please update as follows:

• In build.gradle, replace the H2 test runtime with Testcontainers dependencies:

- testRuntimeOnly("com.h2database:h2") + // Use PostgreSQL Testcontainers for pgvector support + testImplementation "org.testcontainers:postgresql" + testImplementation "org.testcontainers:junit-jupiter"

• In src/test/resources/application-test.yml, remove all H2 settings and configure a Testcontainers-backed PostgreSQL datasource. For example:

- datasource: - url: jdbc:h2:mem:simplerag;MODE=PostgreSQL;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE - driver-class-name: org.h2.Driver - username: sa - password: - jpa: - database-platform: org.hibernate.dialect.H2Dialect - hibernate: - ddl-auto: create-drop + datasource: + # PostgreSQL Testcontainer will supply URL, driver, and credentials + driver-class-name: org.postgresql.Driver + jpa: + database-platform: org.hibernate.dialect.PostgreSQLDialect + hibernate: + ddl-auto: none

• Add a Flyway migration to enable pgvector in your test (and dev) databases:

-- src/main/resources/db/migration/V1__enable_pgvector.sql CREATE EXTENSION IF NOT EXISTS vector;

With these changes, your tests will spin up a real PostgreSQL instance (with pgvector support) instead of H2, ensuring your schema and queries remain compatible.

🤖 Prompt for AI Agents

In build.gradle around lines 47-49, the project still declares H2 as the test runtime which breaks pgvector-dependent schema and queries; replace the H2 testRuntimeOnly entry with the Testcontainers PostgreSQL and Testcontainers JUnit/Jupiter dependencies (and the Testcontainers BOM if used) so tests run against a real PostgreSQL container. Then update src/test/resources/application-test.yml to remove the H2 URL/driver/dialect/ddl-auto settings and instead configure datasource values that will be populated from Testcontainers (e.g., property placeholders or system/env properties set by your test harness) or use Spring’s Testcontainers support to supply the JDBC URL, username and password at test runtime. Finally add a Flyway migration file under src/main/resources/db/migration (e.g., V1__enable_pgvector.sql) that runs CREATE EXTENSION IF NOT EXISTS vector; so the pgvector extension exists in both dev and test databases.

coderabbitai · 2025-08-21T08:02:08Z

build.gradle

+    // pgvector
+    implementation("com.pgvector:pgvector:0.1.6")


🧹 Nitpick (assertive)

pgvector dependency added: ensure Hibernate type mapping and DB extension are in place.

Adding com.pgvector:pgvector is necessary, but by itself Hibernate won’t know how to bind PGvector unless you map it as OTHER (or provide a custom type/AttributeConverter). See my entity comment for @JdbcTypeCode(SqlTypes.OTHER). Also make sure a Flyway/Liquibase migration creates the vector extension.

I can add a minimal custom type or converter if you prefer that approach over @JdbcTypeCode.

🤖 Prompt for AI Agents

In build.gradle around lines 57-58, you added the pgvector dependency but Hibernate won’t bind PGvector automatically; update the entity mapping to use @JdbcTypeCode(SqlTypes.OTHER) or implement and register a custom Hibernate Type / JPA AttributeConverter for pgvector to ensure correct JDBC binding, and add a Flyway or Liquibase migration that runs CREATE EXTENSION IF NOT EXISTS vector; to ensure the DB has the pgvector extension before migrations/usage.

coderabbitai · 2025-08-21T08:02:08Z

src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt

@@ -1,8 +1,8 @@
 package simplerag.ragback.domain.index.entity

+import com.pgvector.PGvector


💡 Verification agent

🧩 Analysis chain

Ensure the pgvector extension exists in all environments.

Add a migration to run CREATE EXTENSION IF NOT EXISTS vector and verify permissions in non-dev DBs. Without it, application startup will fail on schema validation.

I can draft the Flyway migration and an index creation script tailored to your chosen dimension.

🏁 Script executed:

#!/bin/bash # Confirm presence of extension-enabling migration rg -n 'CREATE EXTENSION IF NOT EXISTS vector' src/main/resources/db/migration || echo "Missing migration for pgvector extension"

Length of output: 242

Add a Flyway migration to enable the pgvector extension

We’ve confirmed that there is no existing Flyway migration for CREATE EXTENSION IF NOT EXISTS vector (the db/migration folder is missing entirely), so application startup will fail schema validation in environments where the vector extension isn’t already present.

• Create a new SQL migration under src/main/resources/db/migration/, for example
V2__enable_vector_extension.sql

-- enable PostgreSQL vector extension for embedding support CREATE EXTENSION IF NOT EXISTS vector;

• (Optional) Immediately follow with an index creation, e.g.

-- index on embedding vector column for fast similarity searches CREATE INDEX IF NOT EXISTS idx_chunk_embedding_vector ON chunks USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

• Ensure non-dev databases have sufficient privileges to run CREATE EXTENSION. Update your deployment docs or grant scripts accordingly.

Once the migration is in place, Flyway will apply it automatically before your application’s schema validation check.

🤖 Prompt for AI Agents

In src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt around line 3, there is an import for PGvector but no Flyway migration to enable the PostgreSQL pgvector extension which causes startup schema validation failures in environments missing the extension; add a SQL migration file under src/main/resources/db/migration/, e.g. V2__enable_vector_extension.sql, containing a CREATE EXTENSION IF NOT EXISTS vector; statement (optionally followed by a CREATE INDEX IF NOT EXISTS ... for the embedding column using ivfflat and vector_l2_ops), and update deployment/DB provisioning docs or grant scripts so non-dev DBs have the privileges to run CREATE EXTENSION.

🚀 chore: entity 수정 및 모델 수 줄이기

f0fd20e

catturtle123 self-assigned this Aug 19, 2025

catturtle123 added the chore other work label Aug 19, 2025

catturtle123 linked an issue Aug 19, 2025 that may be closed by this pull request

🚀 [Chore] pgvector 엔티티 수정 #13

Closed

10 tasks

coderabbitai bot reviewed Aug 19, 2025

View reviewed changes

🚀 Chore: pgvector library apply

a307fcd

coderabbitai bot reviewed Aug 21, 2025

View reviewed changes

catturtle123 merged commit 8148871 into develop Aug 21, 2025
1 check passed

catturtle123 deleted the chore/#13-pgvector-fix branch August 21, 2025 08:25

coderabbitai bot mentioned this pull request Aug 22, 2025

✨ Feature: indexing #18

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Chore: entity 수정 및 모델 수 줄이기 #14

🚀 Chore: entity 수정 및 모델 수 줄이기 #14

Uh oh!

catturtle123 commented Aug 19, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 19, 2025 •

edited

Loading

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 19, 2025

Uh oh!

coderabbitai bot Aug 19, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Aug 21, 2025

Uh oh!

coderabbitai bot Aug 21, 2025

Uh oh!

coderabbitai bot Aug 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		implementation 'org.postgresql:postgresql' // ← 변경
		testRuntimeOnly("com.h2database:h2")

		@@ -1,8 +1,8 @@
		package simplerag.ragback.domain.index.entity

		import com.pgvector.PGvector

🚀 Chore: entity 수정 및 모델 수 줄이기 #14

🚀 Chore: entity 수정 및 모델 수 줄이기 #14

Uh oh!

Conversation

catturtle123 commented Aug 19, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 PR 개요

🔍 관련 이슈

✨ 변경 사항

📸 스크린샷 / 동작 확인 결과 (선택)

✅ 체크리스트

🗒️ 추가 메모

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Assessment against Linked Issues

Out-of-Scope Changes

Possibly Related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 21, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

catturtle123 commented Aug 19, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 19, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)