I came from Data Engineering stuff before jumping into LLM stuff, i am surprised that many people in this space never heard Elastic/OpenSearch

Reddit r/LocalLLaMA / 3/23/2026

💬 OpinionDeveloper Stack & InfrastructureTools & Practical Usage

共有:

Key Points

Elastic/OpenSearch and Lucene are presented as strong retrieval options for LLM-backed pipelines, comparable to vector stores and traditional search, with scale being the main differentiator.
A small BERT model (around 100 MB FP32) can run on CPU inside Elastic/OpenSearch to generate embeddings, enabling embedding-based retrieval within existing infrastructure.
For small document sets (roughly under 10,000) with good variance, a compact embedding model can suffice, and in some cases embeddings can be skipped in favor of simpler methods like TF-IDF or BM25.
The overall takeaway is that Elastic/OpenSearch can be a practical, scalable choice for RAG workflows, especially when you want to leverage familiar tooling and avoid introducing new stack complexity.

I came from Data Engineering stuff before jumping into LLM stuff, i am surprised that many people in this space never heard Elastic/OpenSearch

Jokes aside, on a technical level, Google/brave search and vector stores basically work in a very similar way. The main difference is scale. From an LLM point of view, both fall under RAG. You can even ignore embedding models entirely and just use TF-IDF or BM25.

Elastic and OpenSearch (and technically Lucene) are powerhouses when it comes to this kind of retrieval. You can also enable a small BERT model as a vector embedding, around 100 MB (FP32), running in on CPU, within either Elastic or OpenSearch.

If your document set is relatively small (under ~10K) and has good variance, a small BERT model can handle the task well, or you can even skip embeddings entirely. For deeper semantic similarity or closely related documents, more powerful embedding models are usually the go to.

submitted by /u/Altruistic_Heat_9531
[link] [comments]

How to Enforce LLM Spend Limits Per Team Without Slowing Down Your Engineers

Dev.to

v1.82.6.rc.1

LiteLLM Releases

Reduce errores y costos de tokens en agentes con seleccion semantica de herramientas

Dev.to

How I Built Enterprise Monitoring Software in 6 Weeks Using Structured AI Development

Dev.to

Engenharia de Prompt: Por Que a Forma Como Você Pergunta Muda Tudo(Um guia introdutório)

Dev.to

I came from Data Engineering stuff before jumping into LLM stuff, i am surprised that many people in this space never heard Elastic/OpenSearch

Key Points

Related Articles

How to Enforce LLM Spend Limits Per Team Without Slowing Down Your Engineers

v1.82.6.rc.1

Reduce errores y costos de tokens en agentes con seleccion semantica de herramientas

How I Built Enterprise Monitoring Software in 6 Weeks Using Structured AI Development

Engenharia de Prompt: Por Que a Forma Como Você Pergunta Muda Tudo(Um guia introdutório)

関連おすすめサービス

Notta搭載AI議事録イヤホン ZENCHORD1

AI搭載ボイスレコーダー Plaud

画像高画質化AIツール Aiarty Image Enhancer