-
Notifications
You must be signed in to change notification settings - Fork 0
🚀 Chore: entity 수정 및 모델 수 줄이기 #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis pull request modifies the project's PostgreSQL and pgvector integration by updating dependency configurations, refactoring the ChunkEmbedding entity to use PGvector directly, and pruning the EmbeddingModel enum. Changes
Sequence DiagramsequenceDiagram
participant Entity as ChunkEmbedding
participant Database as PostgreSQL
participant PGVector as PGvector
Entity->>PGVector: Direct embedding storage
PGVector->>Database: Store vector type
Database-->>PGVector: Native vector support
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~25 minutes Assessment against Linked Issues
Out-of-Scope ChangesThere are no out-of-scope changes detected in this pull request. Possibly Related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)
35-43: 로드 시 차원 검증을 추가해 데이터 불일치 조기 감지JPA가 필드에 직접 주입하므로,
updateEmbedding의 차원 검증은 setter 경로에서만 동작합니다. DB→엔티티 로딩 시_embedding.size != embeddingDim인 데이터가 들어오면 늦게 발견됩니다.@PostLoad훅으로 조기 검증을 권장합니다.예시(파일 내 임의 위치에 추가):
@PostLoad private fun validateEmbeddingOnLoad() { require(_embedding.size == embeddingDim) { "Embedding dimension mismatch on load: expected=$embeddingDim, got=${_embedding.size}" } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (5)
build.gradle(1 hunks)src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt(2 hunks)src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorConverter.kt(1 hunks)src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt(0 hunks)
💤 Files with no reviewable changes (1)
- src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt
🧰 Additional context used
🧬 Code Graph Analysis (2)
src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorConverter.kt (1)
src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (9)
autoApply(6-34)convertToEntityAttribute(17-33)convertToDatabaseColumn(8-15){ "Embedding (FloatArray) must not be null" }(9-9){(26-28)arr(25-29){(11-13){ "Embedding must not be empty; expected fixed dimension (e.g., 1536)" }(10-10)it(14-14)
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)
src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (2)
autoApply(6-34){ "Embedding (FloatArray) must not be null" }(9-9)
🔇 Additional comments (5)
build.gradle (1)
47-47: PGobject 사용으로 컴파일 타임 의존 필요 — scope 변경 타당
FloatArrayToPgVectorConverter에서org.postgresql.util.PGobject를 직접 참조하므로,postgresql드라이버를implementation으로 승격한 결정은 합리적입니다. Spring Boot BOM이 버전을 관리하므로 별도 버전 고정도 불필요합니다.src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorConverter.kt (2)
9-13: 입력 검증 적절null/empty 및 NaN/Infinity 차단 로직이 명확합니다. 런타임 예외 메시지도 충분히 설명적입니다.
14-21: PGobject 매핑 접근 방식 적절pgvector 컬럼에
type="vector"로PGobject를 전달하는 방식은 Hibernate에서SqlTypes.OTHER와 함께 안정적으로 동작합니다. 문자열 빌드도 불필요한 객체 생성을 최소화합니다.src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)
18-21: vector 차원 미지정(.vector) 및 테스트 환경 Postgres 전환 필수다차원(768/1024/1536 등) EmbeddingModel을 지원하려면, 현재
vector(3072)로 고정된 정의를vector로 변경해야 하며, H2 기반 테스트 환경은 DDL 오류가 발생하므로 PostgreSQL Testcontainers로 전환이 필요합니다.수정 위치:
- src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt – 라인 20
- @Column(name = "embedding", nullable = false, columnDefinition = "vector(3072)") + @Column(name = "embedding", nullable = false, columnDefinition = "vector")- build.gradle
testRuntimeOnly("com.h2database:h2")제거org.testcontainers:postgresql등 Testcontainers 의존성 추가- src/test/resources/application-test.yml
- H2 설정 제거 후 PostgreSQL Testcontainers URI 또는 프로퍼티로 교체
⛔ Skipped due to learnings
Learnt from: catturtle123 PR: Simple-RAG/BACK#6 File: src/main/kotlin/simplerag/ragback/domain/index/entity/Index.kt:16-22 Timestamp: 2025-08-17T13:12:02.200Z Learning: catturtle123이 Index 엔티티의 overlapSize ≤ chunkingSize 검증을 엔티티 레벨이 아닌 서비스 레이어에서 구현하기로 결정했으며, chunking embedding 저장 관련 PR에서 이 검증 로직을 추가할 예정이므로 해당 PR에서 언급해야 함 (Simple-RAG/BACK 프로젝트)src/main/kotlin/simplerag/ragback/domain/index/entity/enums/EmbeddingModel.kt (1)
15-17: KO_SBERT_V1 추가 적절한국어 모델을 최소 구성으로 유지하면서 768차원 라인업을 확보했습니다. 다른 코드에서의 사용성에 영향은 없어 보입니다.
| // BGE | ||
| BGE_BASE_EN(768, "BAAI/bge-base-en-v1.5"), | ||
| BGE_LARGE_EN(1024, "BAAI/bge-large-en-v1.5"), | ||
| BGE_M3(1024, "BAAI/bge-m3"), | ||
| E5_SMALL(384, "intfloat/e5-small-v2"), | ||
| E5_BASE(768, "intfloat/e5-base-v2"), | ||
| E5_LARGE(1024, "intfloat/e5-large-v2"), | ||
|
|
||
| // Old word vectors | ||
| FASTTEXT_KO(300, "fasttext-ko-300d"); | ||
| // E5 | ||
| E5_BASE(768, "intfloat/e5-base-v2"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
모델 축소에 따른 참조 누락/컴파일 오류 점검 필요
여러 상수가 제거되었습니다. 레거시 참조가 남아 있으면 컴파일 실패 혹은 런타임 분기 누락이 발생할 수 있습니다. 자동 점검 스크립트를 제안합니다.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
REMOVED=(
TEXT_EMBEDDING_3_LARGE
MULTI_QA_MP_NET_BASE_DOT_V1
PARAPHRASE_MULTILINGUAL_MINILM_L12_V2
KOR_SROBERTA
BM_KO_SMALL
INSTRUCTOR_BASE
INSTRUCTOR_XL
MISTRAL_EMBED
BGE_SMALL_EN
BGE_LARGE_EN
E5_SMALL
E5_LARGE
FASTTEXT_KO
)
echo "▶ 제거된 상수 참조 검색"
for sym in "${REMOVED[@]}"; do
rg -n -C2 -P "\bEmbeddingModel\.$sym\b" || true
done
echo
echo "▶ 모델 ID 문자열 직접 사용 여부도 점검(선택)"
for sym in "${REMOVED[@]}"; do
rg -n -C2 -F "$sym" || true
doneLength of output: 14631
테스트 코드: 제거된 모델 상수(TEXT_EMBEDDING_3_LARGE) 참조 교체 필요
EmbeddingModel.TEXT_EMBEDDING_3_LARGE가 삭제되어 아래 테스트들이 컴파일 오류를 일으킵니다. 적절한 신규 상수(예: EmbeddingModel.E5_BASE 또는 BGE_BASE_EN)로 교체하고, 테스트 로직이 여전히 의도한 대로 동작하는지 검증해주세요.
-
수정 대상:
- src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
–IndexCreateRequest(..., EmbeddingModel.TEXT_EMBEDDING_3_LARGE, ...)참조 (기본 37, 54, 76, 85, 109, 139, 161, 196, 221, 247, 271행 등)
- src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt
-
변경 예시:
- IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.TEXT_EMBEDDING_3_LARGE, true) + IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.E5_BASE, true)
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // BGE | |
| BGE_BASE_EN(768, "BAAI/bge-base-en-v1.5"), | |
| BGE_LARGE_EN(1024, "BAAI/bge-large-en-v1.5"), | |
| BGE_M3(1024, "BAAI/bge-m3"), | |
| E5_SMALL(384, "intfloat/e5-small-v2"), | |
| E5_BASE(768, "intfloat/e5-base-v2"), | |
| E5_LARGE(1024, "intfloat/e5-large-v2"), | |
| // Old word vectors | |
| FASTTEXT_KO(300, "fasttext-ko-300d"); | |
| // E5 | |
| E5_BASE(768, "intfloat/e5-base-v2"); | |
| ++ b/src/test/kotlin/simplerag/ragback/domain/index/service/IndexServiceTest.kt | |
| @@ -37,7 +37,7 @@ class IndexServiceTest { | |
| // … | |
| IndexCreateRequest("test", 1, 0, SimilarityMetric.COSINE, 1, EmbeddingModel.E5_BASE, true) | |
| // … | |
| } |
| override fun convertToEntityAttribute(dbData: PGobject?): FloatArray { | ||
| requireNotNull(dbData) { "db vector is null" } | ||
| val body = dbData.value?.trim()?.removePrefix("[")?.removeSuffix("]") ?: error("empty vector") | ||
| return body.split(',').map { it.trim().toFloat() }.toFloatArray() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
빈 벡터 및 파싱 오류 처리 보강 필요 (NaN/Infinity 재검증 포함)
현재 구현은
[]같은 빈 벡터 문자열을 만나면split(',')후toFloat()에서NumberFormatException이 발생합니다.- 파싱 실패 시 예외 메시지가 모호하고, 파싱 이후 NaN/Infinity에 대한 재검증이 없습니다.
아래와 같이 방어적 파싱과 명확한 예외를 추가해 주세요.
- override fun convertToEntityAttribute(dbData: PGobject?): FloatArray {
- requireNotNull(dbData) { "db vector is null" }
- val body = dbData.value?.trim()?.removePrefix("[")?.removeSuffix("]") ?: error("empty vector")
- return body.split(',').map { it.trim().toFloat() }.toFloatArray()
- }
+ override fun convertToEntityAttribute(dbData: PGobject?): FloatArray {
+ requireNotNull(dbData) { "db vector is null" }
+ val raw = dbData.value ?: throw IllegalArgumentException("db vector has null value")
+ val body = raw.trim().removePrefix("[").removeSuffix("]").trim()
+ if (body.isEmpty()) {
+ // Defensive: avoid NumberFormatException on "[]"
+ return floatArrayOf()
+ }
+ val arr = try {
+ body.split(',')
+ .map { it.trim().toFloat() }
+ .toFloatArray()
+ } catch (e: NumberFormatException) {
+ throw IllegalArgumentException("Invalid vector literal for pgvector: '$raw'", e)
+ }
+ require(arr.all { it.isFinite() }) { "NaN/Infinity not allowed (db → entity)" }
+ return arr
+ }🤖 Prompt for AI Agents
In
src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorConverter.kt
around lines 24 to 28, replace the current naive parsing with defensive parsing:
treat "[]" or an empty trimmed body as an empty FloatArray; split only when
non-empty and skip any extra empty tokens; convert each token to Float in a
try/catch that throws a clear IllegalArgumentException indicating the offending
token and its index on parse failure; after conversion, validate each float to
ensure it is finite (not NaN or Infinity) and throw an IllegalArgumentException
indicating the index and invalid value if found; preserve existing null checks
on dbData and provide explicit messages for null/empty cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (4)
build.gradle (2)
34-39: Duplicate dependency: jackson-module-kotlin is declared twice.Remove one of the duplicates to avoid noise and potential resolution ambiguity.
- implementation("com.fasterxml.jackson.module:jackson-module-kotlin") ... - implementation 'com.fasterxml.jackson.module:jackson-module-kotlin' + implementation 'com.fasterxml.jackson.module:jackson-module-kotlin'
73-77: Redundant allOpen block when using kotlin-jpa plugin.The kotlin-jpa plugin already configures all-open for JPA. Keeping both is harmless but redundant. Consider removing the manual block for simplicity.
-allOpen { - annotation 'jakarta.persistence.Entity' - annotation 'jakarta.persistence.MappedSuperclass' - annotation 'jakarta.persistence.Embeddable' -}src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (2)
12-15: Consider lazy-loading LOB content.Large content fields often don’t need to be fetched with every embedding read. Mark as LAZY to reduce I/O if your access patterns allow.
- @Lob + @Lob + @Basic(fetch = FetchType.LAZY) val content: String,Note: Validate that your JPA provider and usage patterns truly defer LOB loading; some providers may still fetch eagerly depending on dialect/settings.
19-21: Action Required: EnforceembeddingDimInvariant at Entity BoundaryA quick grep (
rg -nP 'new\s+ChunkEmbedding\(|ChunkEmbedding\(' -g 'src/**') found no direct constructor calls outside of the class declaration itself. This means instantiation (and persistence) happens solely through JPA, so service-layer checks are your only current safeguard—but they’re easy to bypass in the future. To prevent bad data ingestion, add a self-defensive lifecycle validator on the entity:Locations to update:
- File:
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt
- Add the following methods in the class body (outside existing property/constructor declarations):
class ChunkEmbedding( @Column(name = "embedding", columnDefinition = "vector", nullable = false) val embedding: PGvector, @Column(name = "embedding_dim", nullable = false) val embeddingDim: Int, // … other fields … ) { + @PrePersist + @PreUpdate + fun validateEmbedding() { + val vecSize = embedding.vector?.size ?: -1 + require(vecSize == embeddingDim) { + "Invalid embeddingDim: expected $embeddingDim, but got PGvector of length $vecSize" + } + } }This ensures that every persist or update operation will immediately fail if the
embeddingDimdoesn’t match the actual vector length, guarding your database against mismatched data.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
build.gradle(2 hunks)src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (1)
src/main/kotlin/simplerag/ragback/global/util/FloatArrayToPgVectorStringConverter.kt (6)
{ "Embedding (FloatArray) must not be null" }(9-9)autoApply(6-34){ "Embedding must not be empty; expected fixed dimension (e.g., 1536)" }(10-10){(26-28)convertToEntityAttribute(17-33){(11-13)
🔇 Additional comments (5)
build.gradle (2)
47-47: Switching the PostgreSQL driver to implementation is appropriate here.Because the codebase now references PGobject-derived types (via PGvector), the driver must be on the compile classpath. Keeping it as runtimeOnly would risk compilation/linkage issues.
2-6: Kotlin plugin versions align with Spring Boot BOM
- Verified that Spring Boot 3.5.4’s dependency-management BOM declares
<kotlin.version>1.9.25</kotlin.version>, matching all Kotlin plugin versions in build.gradle.- No other Kotlin versions (1.x or 2.x) were found elsewhere in the codebase.
- Future-proofing: if you upgrade to Kotlin 2.x, remember to bump every Kotlin plugin declaration in lockstep.
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt (3)
16-18: Annotate PGvector field with JdbcTypeCode and enforce NOT NULLWe ran a search for any existing converters or
@Convertannotations targetingPGvectorand found none. It’s safe to add the explicit JDBC type mapping and non-null constraint.• File: src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt
• Lines: 16–18Suggested change:
import com.pgvector.PGvector +import org.hibernate.annotations.JdbcTypeCode +import org.hibernate.type.SqlTypes @Column(name = "embedding", columnDefinition = "vector") - var embedding: PGvector, + @JdbcTypeCode(SqlTypes.OTHER) + @Column(name = "embedding", columnDefinition = "vector", nullable = false) + var embedding: PGvector,This ensures Hibernate uses
PreparedStatement#setObjectfor the vector type and enforces a NOT NULL constraint at the database level.
9-10: Action required: confirm pgvector dimension and index strategy
- The
ChunkEmbeddingentity at
src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt
declares
@Column(name = "embedding", columnDefinition = "vector"), which omits a fixed dimension.- pgvector’s docs require a fixed size (e.g.,
vector(768)orvector(1536)) to support IVFFlat/HNSW indexes.- No migrations were found under
src/main/resources/db/migration(or elsewhere) that:
- Define a fixed-dimension vector column
- Create an IVFFlat or HNSW index
- Install the
vectorextension explicitlyPlease verify your database schema or migration scripts to ensure you:
- Set a fixed vector dimension per table (for example,
columnDefinition = "vector(768)")- Create the appropriate similarity index (IVFFlat/HNSW) once the dimension is fixed
If you plan to run large-scale similarity searches, standardizing on a single embedding size (or splitting tables by model/dimension) is crucial for efficient indexing.
22-24: Naming consistency confirmed – no change neededThe
Indexentity is mapped to the table named"indexes", and its primary key column is explicitly annotated as@Column(name = "indexes_id"). All related entities (ChunkEmbedding,DataFileIndex, andModel) correctly use@JoinColumn(name = "indexes_id"), matching the table and column naming. No adjustments are required here.
| implementation 'org.postgresql:postgresql' // ← 변경 | ||
| testRuntimeOnly("com.h2database:h2") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
H2 in tests will break with the pgvector type; switch to PostgreSQL Testcontainers.
The JPA columnDefinition "vector" won’t exist in H2, so schema generation and queries involving the embedding column will fail under the current test profile. Replace H2 with Testcontainers-based PostgreSQL for tests and ensure the extension is enabled.
Apply this diff to dependencies:
- testRuntimeOnly("com.h2database:h2")
+ // Prefer PostgreSQL Testcontainers for pgvector
+ testImplementation 'org.testcontainers:postgresql'
+ testImplementation 'org.testcontainers:junit-jupiter'And add a Flyway migration (outside this file) to enable the extension during tests and dev:
-- src/main/resources/db/migration/V1__enable_pgvector.sql
CREATE EXTENSION IF NOT EXISTS vector;Run this to find any H2-specific configs that need removal/migration in test resources:
🏁 Script executed:
#!/bin/bash
rg -nC2 -i 'h2|jdbc:h2|spring\.datasource\..*h2|driver.*h2' --glob 'src/**/resources/**'Length of output: 794
Replace H2 with PostgreSQL Testcontainers for Tests
Your test setup still relies on H2—even though you’ve switched your main runtime to PostgreSQL—so any “vector” columns (pgvector) won’t exist in H2 and will cause schema-generation or query failures. I found H2 config in your test resources:
- src/test/resources/application-test.yml
•url: jdbc:h2:mem:simplerag;MODE=PostgreSQL;DB_CLOSE_DELAY=-1;…
•driver-class-name: org.h2.Driver
•jpa.database-platform: org.hibernate.dialect.H2Dialect
•hibernate.ddl-auto: create-drop
Please update as follows:
• In build.gradle, replace the H2 test runtime with Testcontainers dependencies:
- testRuntimeOnly("com.h2database:h2")
+ // Use PostgreSQL Testcontainers for pgvector support
+ testImplementation "org.testcontainers:postgresql"
+ testImplementation "org.testcontainers:junit-jupiter"• In src/test/resources/application-test.yml, remove all H2 settings and configure a Testcontainers-backed PostgreSQL datasource. For example:
- datasource:
- url: jdbc:h2:mem:simplerag;MODE=PostgreSQL;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE
- driver-class-name: org.h2.Driver
- username: sa
- password:
- jpa:
- database-platform: org.hibernate.dialect.H2Dialect
- hibernate:
- ddl-auto: create-drop
+ datasource:
+ # PostgreSQL Testcontainer will supply URL, driver, and credentials
+ driver-class-name: org.postgresql.Driver
+ jpa:
+ database-platform: org.hibernate.dialect.PostgreSQLDialect
+ hibernate:
+ ddl-auto: none• Add a Flyway migration to enable pgvector in your test (and dev) databases:
-- src/main/resources/db/migration/V1__enable_pgvector.sql
CREATE EXTENSION IF NOT EXISTS vector;With these changes, your tests will spin up a real PostgreSQL instance (with pgvector support) instead of H2, ensuring your schema and queries remain compatible.
🤖 Prompt for AI Agents
In build.gradle around lines 47-49, the project still declares H2 as the test
runtime which breaks pgvector-dependent schema and queries; replace the H2
testRuntimeOnly entry with the Testcontainers PostgreSQL and Testcontainers
JUnit/Jupiter dependencies (and the Testcontainers BOM if used) so tests run
against a real PostgreSQL container. Then update
src/test/resources/application-test.yml to remove the H2
URL/driver/dialect/ddl-auto settings and instead configure datasource values
that will be populated from Testcontainers (e.g., property placeholders or
system/env properties set by your test harness) or use Spring’s Testcontainers
support to supply the JDBC URL, username and password at test runtime. Finally
add a Flyway migration file under src/main/resources/db/migration (e.g.,
V1__enable_pgvector.sql) that runs CREATE EXTENSION IF NOT EXISTS vector; so the
pgvector extension exists in both dev and test databases.
| // pgvector | ||
| implementation("com.pgvector:pgvector:0.1.6") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
pgvector dependency added: ensure Hibernate type mapping and DB extension are in place.
Adding com.pgvector:pgvector is necessary, but by itself Hibernate won’t know how to bind PGvector unless you map it as OTHER (or provide a custom type/AttributeConverter). See my entity comment for @JdbcTypeCode(SqlTypes.OTHER). Also make sure a Flyway/Liquibase migration creates the vector extension.
I can add a minimal custom type or converter if you prefer that approach over @JdbcTypeCode.
🤖 Prompt for AI Agents
In build.gradle around lines 57-58, you added the pgvector dependency but
Hibernate won’t bind PGvector automatically; update the entity mapping to use
@JdbcTypeCode(SqlTypes.OTHER) or implement and register a custom Hibernate Type
/ JPA AttributeConverter for pgvector to ensure correct JDBC binding, and add a
Flyway or Liquibase migration that runs CREATE EXTENSION IF NOT EXISTS vector;
to ensure the DB has the pgvector extension before migrations/usage.
| @@ -1,8 +1,8 @@ | |||
| package simplerag.ragback.domain.index.entity | |||
|
|
|||
| import com.pgvector.PGvector | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Ensure the pgvector extension exists in all environments.
Add a migration to run CREATE EXTENSION IF NOT EXISTS vector and verify permissions in non-dev DBs. Without it, application startup will fail on schema validation.
I can draft the Flyway migration and an index creation script tailored to your chosen dimension.
🏁 Script executed:
#!/bin/bash
# Confirm presence of extension-enabling migration
rg -n 'CREATE EXTENSION IF NOT EXISTS vector' src/main/resources/db/migration || echo "Missing migration for pgvector extension"Length of output: 242
Add a Flyway migration to enable the pgvector extension
We’ve confirmed that there is no existing Flyway migration for CREATE EXTENSION IF NOT EXISTS vector (the db/migration folder is missing entirely), so application startup will fail schema validation in environments where the vector extension isn’t already present.
• Create a new SQL migration under src/main/resources/db/migration/, for example
V2__enable_vector_extension.sql
-- enable PostgreSQL vector extension for embedding support
CREATE EXTENSION IF NOT EXISTS vector;• (Optional) Immediately follow with an index creation, e.g.
-- index on embedding vector column for fast similarity searches
CREATE INDEX IF NOT EXISTS idx_chunk_embedding_vector
ON chunks USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);• Ensure non-dev databases have sufficient privileges to run CREATE EXTENSION. Update your deployment docs or grant scripts accordingly.
Once the migration is in place, Flyway will apply it automatically before your application’s schema validation check.
🤖 Prompt for AI Agents
In src/main/kotlin/simplerag/ragback/domain/index/entity/ChunkEmbedding.kt
around line 3, there is an import for PGvector but no Flyway migration to enable
the PostgreSQL pgvector extension which causes startup schema validation
failures in environments missing the extension; add a SQL migration file under
src/main/resources/db/migration/, e.g. V2__enable_vector_extension.sql,
containing a CREATE EXTENSION IF NOT EXISTS vector; statement (optionally
followed by a CREATE INDEX IF NOT EXISTS ... for the embedding column using
ivfflat and vector_l2_ops), and update deployment/DB provisioning docs or grant
scripts so non-dev DBs have the privileges to run CREATE EXTENSION.
📌 PR 개요
entity 수정 및 모델 수 줄이기
🔍 관련 이슈
✨ 변경 사항
🚀 chore: entity 수정 및 모델 수 줄이기
📸 스크린샷 / 동작 확인 결과 (선택)
필요하다면 이미지나 동영상 첨부.
✅ 체크리스트
🗒️ 추가 메모
기타 참고 사항을 적어주세요.
Summary by CodeRabbit
New Features
Refactor
Chores