-
Notifications
You must be signed in to change notification settings - Fork 0
🚀 Chore: entity 수정 및 모델 수 줄이기 #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -44,7 +44,7 @@ dependencies { | |
| testImplementation 'org.jetbrains.kotlin:kotlin-test-junit5' | ||
| testImplementation("org.mockito.kotlin:mockito-kotlin:5.3.1") | ||
| testRuntimeOnly 'org.junit.platform:junit-platform-launcher' | ||
| runtimeOnly 'org.postgresql:postgresql' | ||
| implementation 'org.postgresql:postgresql' // ← 변경 | ||
| testRuntimeOnly("com.h2database:h2") | ||
|
|
||
| // swagger | ||
|
|
@@ -53,6 +53,9 @@ dependencies { | |
| // s3 | ||
| implementation(platform("software.amazon.awssdk:bom:2.25.70")) | ||
| implementation("software.amazon.awssdk:s3") | ||
|
|
||
| // pgvector | ||
| implementation("com.pgvector:pgvector:0.1.6") | ||
|
Comment on lines
+57
to
+58
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧹 Nitpick (assertive) pgvector dependency added: ensure Hibernate type mapping and DB extension are in place. Adding com.pgvector:pgvector is necessary, but by itself Hibernate won’t know how to bind PGvector unless you map it as OTHER (or provide a custom type/AttributeConverter). See my entity comment for @JdbcTypeCode(SqlTypes.OTHER). Also make sure a Flyway/Liquibase migration creates the vector extension. I can add a minimal custom type or converter if you prefer that approach over @JdbcTypeCode. 🤖 Prompt for AI Agents |
||
| } | ||
|
|
||
| dependencyManagement { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,8 +1,8 @@ | ||
| package simplerag.ragback.domain.index.entity | ||
|
|
||
| import com.pgvector.PGvector | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chainEnsure the pgvector extension exists in all environments. Add a migration to run CREATE EXTENSION IF NOT EXISTS vector and verify permissions in non-dev DBs. Without it, application startup will fail on schema validation. I can draft the Flyway migration and an index creation script tailored to your chosen dimension. 🏁 Script executed: #!/bin/bash
# Confirm presence of extension-enabling migration
rg -n 'CREATE EXTENSION IF NOT EXISTS vector' src/main/resources/db/migration || echo "Missing migration for pgvector extension"Length of output: 242 Add a Flyway migration to enable the pgvector extension We’ve confirmed that there is no existing Flyway migration for • Create a new SQL migration under -- enable PostgreSQL vector extension for embedding support
CREATE EXTENSION IF NOT EXISTS vector;• (Optional) Immediately follow with an index creation, e.g. -- index on embedding vector column for fast similarity searches
CREATE INDEX IF NOT EXISTS idx_chunk_embedding_vector
ON chunks USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);• Ensure non-dev databases have sufficient privileges to run Once the migration is in place, Flyway will apply it automatically before your application’s schema validation check. 🤖 Prompt for AI Agents |
||
| import jakarta.persistence.* | ||
| import simplerag.ragback.global.entity.BaseEntity | ||
| import simplerag.ragback.global.util.FloatArrayToPgVectorStringConverter | ||
|
|
||
| // 임베딩 크기를 서비스단에서 검증을 해줘야함 | ||
| @Entity | ||
|
|
@@ -13,9 +13,8 @@ class ChunkEmbedding( | |
| @Lob | ||
| val content: String, | ||
|
|
||
| @Convert(converter = FloatArrayToPgVectorStringConverter::class) | ||
| @Column(name = "embedding", nullable = false) | ||
| private var _embedding: FloatArray, | ||
| @Column(name = "embedding", columnDefinition = "vector") | ||
| var embedding: PGvector, | ||
|
|
||
| @Column(name = "embedding_dim", nullable = false) | ||
| val embeddingDim: Int, | ||
|
|
@@ -27,16 +26,4 @@ class ChunkEmbedding( | |
| @Id @GeneratedValue(strategy = GenerationType.IDENTITY) | ||
| @Column(name = "chunk_embeddings_id") | ||
| val id: Long? = null, | ||
| ) : BaseEntity() { | ||
|
|
||
| @get:Transient | ||
| val embedding: FloatArray get() = _embedding.copyOf() | ||
|
|
||
| fun updateEmbedding(newVec: FloatArray) { | ||
| require(newVec.size == embeddingDim) { | ||
| "Embedding dimension mismatch: expected=$embeddingDim, got=${newVec.size}" | ||
| } | ||
| _embedding = newVec.copyOf() | ||
| } | ||
|
|
||
| } | ||
| ) : BaseEntity() | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -6,36 +6,21 @@ enum class EmbeddingModel( | |||||||||||||||||||||||||||||||||||||
| ) { | ||||||||||||||||||||||||||||||||||||||
| // OpenAI | ||||||||||||||||||||||||||||||||||||||
| TEXT_EMBEDDING_3_SMALL(1536, "text-embedding-3-small"), | ||||||||||||||||||||||||||||||||||||||
| TEXT_EMBEDDING_3_LARGE(3072, "text-embedding-3-large"), | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| // SBERT / HuggingFace | ||||||||||||||||||||||||||||||||||||||
| ALL_MINILM_L6_V2(384, "sentence-transformers/all-MiniLM-L6-v2"), | ||||||||||||||||||||||||||||||||||||||
| ALL_MP_NET_BASE_V2(768, "sentence-transformers/all-mpnet-base-v2"), | ||||||||||||||||||||||||||||||||||||||
| MULTI_QA_MP_NET_BASE_DOT_V1(768, "sentence-transformers/multi-qa-mpnet-base-dot-v1"), | ||||||||||||||||||||||||||||||||||||||
| DISTILUSE_BASE_MULTILINGUAL_CASED_V2(512, "sentence-transformers/distiluse-base-multilingual-cased-v2"), | ||||||||||||||||||||||||||||||||||||||
| PARAPHRASE_MULTILINGUAL_MINILM_L12_V2(384, "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"), | ||||||||||||||||||||||||||||||||||||||
| KO_SBERT_V1(768, "jhgan/ko-sbert-v1"), | ||||||||||||||||||||||||||||||||||||||
| KOR_SROBERTA(768, "jhgan/ko-sroberta-medium-nli"), | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| // Korean specific | ||||||||||||||||||||||||||||||||||||||
| BM_KO_SMALL(512, "bespin-global/klue-sroberta-base-continue-learning-by-mnr"), | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| // Instructor / Mistral | ||||||||||||||||||||||||||||||||||||||
| INSTRUCTOR_BASE(768, "hkunlp/instructor-base"), | ||||||||||||||||||||||||||||||||||||||
| INSTRUCTOR_XL(1024, "hkunlp/instructor-xl"), | ||||||||||||||||||||||||||||||||||||||
| MISTRAL_EMBED(1024, "mistral-embed"), | ||||||||||||||||||||||||||||||||||||||
| // Korean | ||||||||||||||||||||||||||||||||||||||
| KO_SBERT_V1(768, "jhgan/ko-sbert-v1"), | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| // BGE / E5 etc | ||||||||||||||||||||||||||||||||||||||
| BGE_SMALL_EN(384, "BAAI/bge-small-en-v1.5"), | ||||||||||||||||||||||||||||||||||||||
| // BGE | ||||||||||||||||||||||||||||||||||||||
| BGE_BASE_EN(768, "BAAI/bge-base-en-v1.5"), | ||||||||||||||||||||||||||||||||||||||
| BGE_LARGE_EN(1024, "BAAI/bge-large-en-v1.5"), | ||||||||||||||||||||||||||||||||||||||
| BGE_M3(1024, "BAAI/bge-m3"), | ||||||||||||||||||||||||||||||||||||||
| E5_SMALL(384, "intfloat/e5-small-v2"), | ||||||||||||||||||||||||||||||||||||||
| E5_BASE(768, "intfloat/e5-base-v2"), | ||||||||||||||||||||||||||||||||||||||
| E5_LARGE(1024, "intfloat/e5-large-v2"), | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| // Old word vectors | ||||||||||||||||||||||||||||||||||||||
| FASTTEXT_KO(300, "fasttext-ko-300d"); | ||||||||||||||||||||||||||||||||||||||
| // E5 | ||||||||||||||||||||||||||||||||||||||
| E5_BASE(768, "intfloat/e5-base-v2"); | ||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+18
to
+23
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💡 Verification agent 🧩 Analysis chain모델 축소에 따른 참조 누락/컴파일 오류 점검 필요 여러 상수가 제거되었습니다. 레거시 참조가 남아 있으면 컴파일 실패 혹은 런타임 분기 누락이 발생할 수 있습니다. 자동 점검 스크립트를 제안합니다. 🏁 Script executed: #!/bin/bash
set -euo pipefail
REMOVED=(
TEXT_EMBEDDING_3_LARGE
MULTI_QA_MP_NET_BASE_DOT_V1
PARAPHRASE_MULTILINGUAL_MINILM_L12_V2
KOR_SROBERTA
BM_KO_SMALL
INSTRUCTOR_BASE
INSTRUCTOR_XL
MISTRAL_EMBED
BGE_SMALL_EN
BGE_LARGE_EN
E5_SMALL
E5_LARGE
FASTTEXT_KO
)
echo "▶ 제거된 상수 참조 검색"
for sym in "${REMOVED[@]}"; do
rg -n -C2 -P "\bEmbeddingModel\.$sym\b" || true
done
echo
echo "▶ 모델 ID 문자열 직접 사용 여부도 점검(선택)"
for sym in "${REMOVED[@]}"; do
rg -n -C2 -F "$sym" || true
doneLength of output: 14631 테스트 코드: 제거된 모델 상수(TEXT_EMBEDDING_3_LARGE) 참조 교체 필요
📝 Committable suggestion
Suggested change
|
||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
| companion object { | ||||||||||||||||||||||||||||||||||||||
| fun findByModelId(modelId: String): EmbeddingModel? { | ||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||
This file was deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
H2 in tests will break with the pgvector type; switch to PostgreSQL Testcontainers.
The JPA columnDefinition "vector" won’t exist in H2, so schema generation and queries involving the embedding column will fail under the current test profile. Replace H2 with Testcontainers-based PostgreSQL for tests and ensure the extension is enabled.
Apply this diff to dependencies:
And add a Flyway migration (outside this file) to enable the extension during tests and dev:
-- src/main/resources/db/migration/V1__enable_pgvector.sql CREATE EXTENSION IF NOT EXISTS vector;Run this to find any H2-specific configs that need removal/migration in test resources:
🏁 Script executed:
Length of output: 794
Replace H2 with PostgreSQL Testcontainers for Tests
Your test setup still relies on H2—even though you’ve switched your main runtime to PostgreSQL—so any “vector” columns (pgvector) won’t exist in H2 and will cause schema-generation or query failures. I found H2 config in your test resources:
•
url: jdbc:h2:mem:simplerag;MODE=PostgreSQL;DB_CLOSE_DELAY=-1;…•
driver-class-name: org.h2.Driver•
jpa.database-platform: org.hibernate.dialect.H2Dialect•
hibernate.ddl-auto: create-dropPlease update as follows:
• In build.gradle, replace the H2 test runtime with Testcontainers dependencies:
• In src/test/resources/application-test.yml, remove all H2 settings and configure a Testcontainers-backed PostgreSQL datasource. For example:
• Add a Flyway migration to enable pgvector in your test (and dev) databases:
-- src/main/resources/db/migration/V1__enable_pgvector.sql CREATE EXTENSION IF NOT EXISTS vector;With these changes, your tests will spin up a real PostgreSQL instance (with pgvector support) instead of H2, ensuring your schema and queries remain compatible.
🤖 Prompt for AI Agents