Stacky AI Search

Steps

Azure AI Search

This should already be available in your subscription with an Embeddings model deployed. The default in the labs is text-embeddings-ada-002.

Storage

In your Azure subscription, create an ADLS Gen 2 storage account.
Create a container called "stackexchange"
From this repository, choose 1 file from a site of your interest from the /data/ folder
Upload to the "stackechange" folder in ADLSGen2.
( The zip file contains other data which may be useful to you after this lab, uploading more than 1 file will take up to two hours (or more) to index. )

Azure AI Search

Create an Azure AI Search resource, if you don't have one already
On the Overview blade of Azure AI Search, choose the option to "Import and vectorize your data".

Import and vectorize data

Connect to your data

Choose Azure Data Lake Storage Gen2
Select your Subscription, Storage account, and Blob container where thet JSON files are located.
Leave the Blob Folder option empty as you uploaded directly to the root folder
Parsing mode - choose "JSON array", then leave the Document root option blank
Click Next which will initiate schema validation

Vectorize your text

Column to vectorize, select Body, which contains the test from the posts
Leave Kind and Subscription with the defaults, which should be Azure OpenAI and your Subscription
Select your Azure OpenAI Service
Select the embeddings model which was already deployed, text-embedding-ada-002
Leave the option for API Key selected, and check the box for acknowledgement of costs associated with using Azure OpenAI

Advanced Settings

Leave the defaults for these settings

Review and Create

Review the settings then click Create, which will deploy the Index, Indexer, and configuration profiles to Azure AI Search. It will also initiate the indexing process.
Back in Azure AI Search, observe the Indexer progress. When complete, continue back in Azure AI Foundry

Azure AI Foundry Chat Playground

Inside the Azure AI Foundry portal, click on the Chat option under Playground
Enter the instructions from grounding.txt in "Give the model instructions and context", then click Apply Changes
Expand the "Add your data" section and click the "Add a data source"

Add data

Data Source

From the drop down, select Azure AI Search
The subscription should already be populated, then select the existing Azure AI Search instance and the Index you just created
Check the box for "Add vector search to this search resource" and select the already deployed embeddings model used earlier.
Leave the Use custom field mapping unchecked

Data management

Leave the Search type as the default option
Select the existing vector search configuration from your Azure AI Search resource

Data connection

Select the API Key option
After validation, click Save and Close

Licensing

StackExchange Datadump Licensing
https://meta.stackexchange.com/questions/401324/announcing-a-change-to-the-data-dump-process

This data is prepared and used by individuals for use with an LLM, not to train an LLM. Think of it this way: training an LLM is like teaching it new skills and knowledge, while accessing your own data with an LLM is like asking a trained expert to interpret or respond based on specific information you provide.

Training an LLM:

Purpose: The data is used to improve or refine the model's ability to understand and generate human-like text.
Method: The data is fed into the model during its training phase. This phase involves updating the model's parameters based on the patterns and information found in the data.
Scale: Typically involves large volumes of diverse data to ensure the model learns a wide range of language patterns.

Accessing Your Own Data with an LLM:

Purpose: The model is used to process, analyze, or generate responses based on your own specific data.
Method: The data is input to the model at runtime (i.e., when the model is already trained), and the model uses its existing knowledge to interpret or generate responses related to that data.
Scale: Usually involves a specific subset of data relevant to your needs, rather than the vast and diverse data used for training.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
LICENSE		LICENSE
README.md		README.md
grounding.txt		grounding.txt
questions.txt		questions.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stacky AI Search

Steps

Azure AI Search

Storage

Azure AI Search

Import and vectorize data

Connect to your data

Vectorize your text

Advanced Settings

Review and Create

Azure AI Foundry Chat Playground

Add data

Data Source

Data management

Data connection

Licensing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Stacky AI Search

Steps

Azure AI Search

Storage

Azure AI Search

Import and vectorize data

Connect to your data

Vectorize your text

Advanced Settings

Review and Create

Azure AI Foundry Chat Playground

Add data

Data Source

Data management

Data connection

Licensing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages