As per Google's definition:
Semantic search is a data searching technique that focuses on understanding the contextual meaning and intent behind a userโs search query, rather than only matching keywords.
This application was designed as a simple tool for testing the capabilities of embeddings when used as part of semantic search use cases.
The software follows these steps to generate a semantic search database:
- Using a system prompt and list of topics, a local LLM generates a set of archival data documents.
- A chunking strategy is applied to break the archival data into more focussed documents.
- Contextualisation is applied to each chunk in order to retain the wider document's context.
- A local text embedding model generates embeddings for each chunk, the embeddings are normalised.
- The normalised embeddings are stored within a vector database.
Once the database exists, the user can submit search queries which are converted into embeddings and then compared against the data within the vector database.
This concept can form part of a Retrieval Augmented Generation (RAG) solution, where the resulting relevant documents can be passed into a LLM to support the generation of a response to the user's original query.
- Create simple console application.
- Integrate local language model.
- Integrate local vector database.
- Provide search function.
- Apply normalisation to embeddings.
- Implement basic chunking, use markdown format for input.
- Implement contextualisation, use alternative local LLM.
- Test with alternative embeddings model.
- Test with alternative chat completion model.
- Explore capabilities of the Qdrant vector database, understand how search queries can be adjusted to affect results.
- Consider adding a lexical search option for comparing results.
No known defects.
GitHub Copilot was used to assist in the development of this software.
Note
Other operating systems and versions will work, where versions are specified treat as minimums.
A system capable of running LM Studio is required.
Details of my personal system are below.
Note
The hardware in use on my PC includes an Accelerated Processor Unit (APU) which combines CPU and GPU on a single chip. Recommendations for alternative hardware can be found here, performance will depend upon the models you choose to run (and other operational factors).
Configure LM Studio as per the documentation.
Download:
- an appropriate text embedding model,
- an appropriate LLM for chat completion.
Note
You can use community leaderboards to help select appropriate models.
Use the Developer tab to run your chosen models using the API server.
You can use Postman to test access to the endpoints.
If testing your text embeddings model using the default options, you can test the local server by configuring a POST request with the following parameters:
URL:
http://127.0.0.1:1234/v1/embeddings
Headers:
Content-Type: application/json
Body (raw):
{
"input": "Hello world!"
}
You should see a response which includes the embedding values:
{
"object": "list",
"data": [
{
"object": "embedding",
"embedding": [
0.03805531933903694,
0.032784245908260345,
...
-0.006903552915900946,
-0.02046305313706398
],
"index": 0
}
],
"model": "text-embedding-embeddinggemma-300m",
"usage": {
"prompt_tokens": 0,
"total_tokens": 0
}
}
The appsettings.json file manages the application settings.
Review the file and ensure that the settings are appropriate for your local environment.
E.g. update the models names as required:
{
"EmbeddingApi": {
"Model": "text-embedding-embeddinggemma-300m"
},
"ChatCompletionApi": {
"Model": "openai/gpt-oss-120b"
},
}The data under test can be configured. The software is designed to use your chosen LLM to create archival data to index.
Note
The quality of your archival data will depend on the model you choose. Consider trying multiple models to generate archival data.
The system is configured to use the game-historian.md system prompt when generating archival data. You may choose to write an alternative system prompt for generating archival data. If you do, update the configuration with the new prompt location:
{
"SystemPromptPath": "path/to/system-prompt.md",
}The ContextSectionChunkTitles setting specifies which sections from your archival data markdown capture a useful summary of the document's content, these sections will be added to all document chunks to maintain context.
The ArchivalTopics setting specifies topics for the generation of archival data. These topics will be passed into your chosen LLM along with the system prompt to generate archival data.
You may choose to adjust either of these settings if you author your own system prompt or change the topics to be searched.
Related settings:
{
"Contextualisation": {
"ContextSectionChunkTitles": [
"A title defined by your system prompt's template"
]
},
"ArchivalTopics": [
"A related topic which can be processed by the system prompt"
]
}Clone the repository.
Open in Visual Studio code.
Build the projects.
- Creation of archival data using a local LLM.
- Generation of normalised embeddings of archival data using local text embedding model.
- Storage of embeddings in local vector database.
- Submission of search queries, comparison against stored embeddings.
Start the Qdrant vector database Docker container, the configuration for which is located in the docker directory.
Start LM Studio and ensure that both your text embedding model and LLM are running:
Note
If you are unable to run both models simultaneously due to lack of resources, consider running the LLM only while generating archival data. You can then eject the model and load your text embedding model for indexing and searching.
Hit F5 in VS Code to begin debugging.
The application is configured to load within the integrated terminal, you should be presented with multiple options:
Create your archival data files, if they do not yet exist:
1. Create archival data filesNote
This operation can take a long time to complete. Consider adjusting the system prompt and related archival data settings to simplify the operation. Note that simplifying or reducing the archival data will affect the semantic meaning and search capabilities.
When your archival data files have been created, create the vector database:
1. Create vector database from data filesYou can view the content of your vector database using the following URL: http://localhost:6333/dashboard
Once you have data within your vector database, you can perform a search:
4. Enter search textYou will then see results which display a relevancy score:
This repository was created primarily for my own exploration of the technologies involved.
I have selected an appropriate license using this tool.
This software is licensed under the MIT license.
More detailed information can be found in the documentation:



