A practical guide to importing an ONNX embedding model, generating embeddings, and running semantic search in Oracle AI Database
Companion notebook: ONNX In-Database Embeddings with Oracle AI Database 26ai
Key Takeaways
- Oracle AI Database can load and register an augmented ONNX embedding model with DBMS_VECTOR.LOAD_ONNX_MODEL().
-
VECTOR_EMBEDDING()lets SQL generate embeddings directly inside Oracle AI Database. - Embeddings can be stored natively in
VECTORcolumns. -
VECTOR_DISTANCE()enables semantic search directly in SQL. - LangChain can build on the same Oracle-native workflow without moving embeddings or retrieval outside the database (LangChain Oracle vector store integration).
Why This Matters
In many embedding pipelines, source data resides in a relational database, the model runs somewhere else as an external service, and the vectors are stored in a separate vector database. While this architecture can work well, it introduces additional data movement, infrastructure, and operational complexity.
Oracle AI Database supports a more consolidated approach. You can load an ONNX embedding model directly into the database, invoke it, store the generated embeddings in native VECTOR columns, and perform semantic search in the same database.
This article walks through that end-to-end workflow using an ONNX model: loading it into Oracle AI Database, validating that it is registered correctly, generating embeddings with SQL, storing them in a native vector column, and querying them using semantic similarity. It also demonstrates how the same architecture can be used with LangChain, without changing where embedding and retrieval occur.
What You'll Learn
- How to load an augmented ONNX model with Oracle AI Database.
- How to generate embeddings directly in SQL with
VECTOR_EMBEDDING(). - How to run semantic search with
VECTOR_DISTANCE()in Oracle AI Database and through LangChain.
Architecture Overview
This workflow keeps model execution, vector storage, and semantic retrieval inside Oracle AI Database. An augmented ONNX model is exposed through an Oracle directory object, loaded with DBMS_VECTOR.LOAD_ONNX_MODEL(), invoked with VECTOR_EMBEDDING(), and queried with VECTOR_DISTANCE(). The model artifact can come either from a local or container-mounted path or directly from Oracle Cloud Object Storage using DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD(). LangChain can build on the same Oracle-native execution path through OracleEmbeddings and OracleVS.
Prerequisites
- Python 3.10+
- Oracle AI Database 26ai running in a container
- Dependencies such as
oracledb,python-dotenv,pandas,numpy,langchain,langchain-community, andlangchain-oracledb - For cloud loading: an Oracle Cloud Object Storage bucket and model URI, or a PAR URL
- If not using a PAR URL, an Object Storage credential created with
DBMS_CLOUD.CREATE_CREDENTIAL
In the notebook, those packages are installed up front:
import subprocess
import sys
result = subprocess.run(
[sys.executable, "-m", "pip", "install", "-q",
"oracledb", "python-dotenv", "pandas", "numpy",
"langchain", "langchain-core", "langchain-community", "langchain-oracledb"],
capture_output=True, text=True
)
print("Packages installed." if result.returncode == 0 else f"Install failed: {result.stderr}")
The example also assumes Oracle AI Database 26ai is running in a container, with a mounted directory for ONNX model files. That mounted directory becomes important later, because Oracle accesses the model through a database directory object rather than through ad hoc file access.
Step-by-Step Guide
Step 1: Understand why Oracle requires an augmented ONNX model
One of the most important details in this workflow is that Oracle needs an augmented ONNX model, not just a standard transformer export.
For VECTOR_EMBEDDING() to accept raw text directly, tokenization and related preprocessing need to be included inside the ONNX graph itself. That is what allows Oracle to take a normal text string and produce an embedding without relying on external preprocessing in Python.
In the notebook, the model used is an augmented version of all-MiniLM-L12-v2:
MODEL_NAME = "all_MiniLM_L12_v2"
ONNX_FILE = "all_MiniLM_L12_v2.onnx"
Without that augmented packaging, the flow would no longer be fully Oracle-native, because preprocessing would have to happen outside the database first.
Step 2: Prepare an ONNX model for Oracle AI Database
Before the model can be used in SQL, Oracle needs controlled access to the ONNX file through a database directory object. This is a database-managed reference to a filesystem location, which means access to the model artifact is handled through Oracle privileges rather than through direct filesystem assumptions.
The notebook includes a one-time admin setup that creates the user, grants privileges, and registers the ONNX model directory. At runtime, the important pieces are:
- a database user with the required privileges
- permission to load mining models
- a registered Oracle directory such as
ONNX_DIR - access to the ONNX file from inside the container
A simplified version of the directory setup looks like this:
CREATE OR REPLACE DIRECTORY ONNX_DIR AS '/opt/oracle/onnx_models';
GRANT READ, WRITE ON DIRECTORY ONNX_DIR TO my_user;
This matters because the model import is not treated as an ad hoc file operation. The file is exposed to Oracle through a controlled database object, which is much more aligned with enterprise governance expectations.
Figure 1. An augmented ONNX model is exposed through an Oracle directory object, loaded with DBMS_VECTOR.LOAD_ONNX_MODEL(), registered in Oracle, and invoked from SQL.
Step 2b: Cloud option - load ONNX from Oracle Object Storage
Oracle also supports loading ONNX models from Oracle Cloud Object Storage with DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD(). This is a documented alternative to the local directory workflow used in the companion notebook.
Per Oracle documentation, use a credential for standard Object Storage URIs, and pass credential => NULL for pre-authenticated request (PAR) URLs.
-- Option A: regular Object Storage URI (credential required)
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD(
model_name => 'ALL_MINILM_L12_V2',
credential => 'OBJ_STORE_CRED',
uri => 'https://objectstorage.<region>.oraclecloud.com/n/<namespace>/b/<bucket>/o/all_MiniLM_L12_v2.onnx',
metadata => JSON('{
"function":"embedding",
"embeddingOutput":"embedding",
"input":{"input":["DATA"]}
}')
);
-- Option B: PAR URL (credential must be NULL)
EXECUTE DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD(
model_name => 'ALL_MINILM_L12_V2',
credential => NULL,
uri => 'https://objectstorage.<region>.oraclecloud.com/p/<par-token>/n/<namespace>/b/<bucket>/o/all_MiniLM_L12_v2.onnx'
);
Note: According to Oracle documentation, metadata is optional for models prepared with Oracle's Python utility defaults, model names must follow Oracle naming rules, and the ONNX file size limit for cloud loading is 2 GB.
Step 2c: Multi-cloud note (AWS/GCP/Google Drive)
DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD() is documented for Oracle Cloud Object Storage. If your model artifact is hosted in AWS S3, Google Cloud Storage, or Google Drive, use a portable two-step pattern: download the ONNX file to a database-accessible local path, then load it with DBMS_VECTOR.LOAD_ONNX_MODEL().
This keeps embedding generation and semantic retrieval Oracle-native while allowing model artifact hosting outside OCI.
import os
import requests
model_url = os.environ["MODEL_SIGNED_URL"] # S3 pre-signed URL / GCS signed URL / Drive direct URL
target_path = "/opt/oracle/onnx_models/all_MiniLM_L12_v2.onnx"
resp = requests.get(model_url, stream=True, timeout=180)
resp.raise_for_status()
with open(target_path, "wb") as f:
for chunk in resp.iter_content(chunk_size=1024 * 1024):
if chunk:
f.write(chunk)
print(f"Model downloaded to {target_path}")
BEGIN
DBMS_VECTOR.LOAD_ONNX_MODEL(
directory => 'ONNX_DIR',
file_name => 'all_MiniLM_L12_v2.onnx',
model_name => 'ALL_MINILM_L12_V2'
);
END;
/
Step 3: Connect to Oracle AI Database from Python
The notebook connects to Oracle AI Database using python-oracledb in Thin mode, so no Oracle Client libraries are required:
import oracledb
conn = oracledb.connect(...)
print("Connected to Oracle AI Database")
That same connection is then reused across the SQL examples and the LangChain integration later in the notebook.
To keep the notebook readable, it defines a small helper function for executing SQL and optionally returning results as a pandas DataFrame:
import pandas as pd
def run_sql(sql, params=None, fetch=False, many=False, data=None):
"""Execute SQL against Oracle Database."""
with conn.cursor() as cur:
if many and data:
cur.executemany(sql, data)
elif params:
cur.execute(sql, params)
else:
cur.execute(sql)
if fetch:
cols = [c[0] for c in cur.description]
return pd.DataFrame(cur.fetchall(), columns=cols)
conn.commit()
The example also assumes Oracle AI Database 26ai is running in a container, with a mounted directory for ONNX model files. That mounted directory becomes important later, because Oracle accesses the model through a database directory object rather than through ad hoc file access.
Step 4: Load an ONNX embedding model into Oracle AI Database
The notebook does not assume the ONNX model is already present. If the file is missing, it downloads the official pre-built augmented model and places it in the model directory used by Oracle.
Once the model file is available, either through an Oracle directory object or a cloud URI, it can be imported with DBMS_VECTOR.LOAD_ONNX_MODEL() or DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD().
A simplified version of the local directory call looks like this:
BEGIN
DBMS_VECTOR.LOAD_ONNX_MODEL(
directory => 'ONNX_DIR',
file_name => 'all_MiniLM_L12_v2.onnx',
model_name => 'ALL_MINILM_L12_V2',
metadata => JSON('{
"function":"embedding",
"embeddingOutput":"embedding",
"input":{"input":["DATA"]}
}')
);
END;
/
This is the point where the model becomes more than a file. Oracle registers it, stores the associated metadata, and exposes it as a named object that SQL can invoke directly.
The metadata is especially important. It defines how Oracle maps the SQL input text into the model graph and identifies which output node should be used as the embedding vector. In the notebook, the workflow also checks whether the model already exists before reloading it. This makes reruns safer and ensures the workflow remains idempotent.
model_check = run_sql(
"SELECT COUNT(*) AS cnt FROM USER_MINING_MODELS WHERE MODEL_NAME = UPPER(:model_name)",
params={"model_name": MODEL_NAME},
fetch=True
)
Expected output: the model check confirms whether the ONNX model is already registered, so reruns stay idempotent.
Step 5: Verify that Oracle registered the model correctly
After the import, the next step is to validate that Oracle recognizes the model.
The notebook queries the model catalog to verify that the ONNX model has been loaded successfully:
SELECT model_name, mining_function, algorithm
FROM user_mining_models
WHERE model_name = 'ALL_MINILM_L12_V2';
This is a small but important part of the workflow. It confirms that the model is visible to Oracle as a registered object and is ready to be used by the vector functions that come next.
Expected output: the query returns the registered ONNX model from USER_MINING_MODELS.
Step 6: Generate embeddings in SQL with VECTOR_EMBEDDING()
Once the model is registered, Oracle can use it directly through VECTOR_EMBEDDING().
The notebook first tests this with a simple text input to confirm that the model works and that the returned vector has the expected size.
SELECT VECTOR_EMBEDDING(
ALL_MINILM_L12_V2
USING 'Oracle Database supports vector search.' AS DATA
) AS embedding
FROM dual;
This is one of the most important parts of the article. Embedding generation is no longer a separate service call. It becomes a SQL operation:
- the application does not need to call an external embedding API
- the database can generate embeddings internally
- the semantic representation stays close to the data it describes
Expected output: Oracle returns a 384-dimensional embedding for the supplied text.
Step 7: Store embeddings in a native VECTOR column
After validating embedding generation, the notebook creates a table where the source text and its embedding live together.
CREATE TABLE onnx_docs (
id NUMBER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
category VARCHAR2(100),
doc_text CLOB,
embedding VECTOR(384, FLOAT32)
);
This is an important design choice. The vector is not stored as an opaque blob or external payload. It is stored in Oracle's native VECTOR type, which means it becomes part of the same database model as the relational data.
- vectors stay linked to the exact rows they describe
- access control applies consistently
- backups and retention policies stay unified
- the application does not need to coordinate data across multiple storage systems
The notebook inserts demo content and generates the embedding directly in the same SQL statement:
INSERT INTO onnx_docs (category, doc_text, embedding)
VALUES (
'database',
'Oracle AI Database supports in-database vector search and semantic retrieval.',
VECTOR_EMBEDDING(
ALL_MINILM_L12_V2
USING 'Oracle AI Database supports in-database vector search and semantic retrieval.' AS DATA
)
);
The semantic representation is created at the same time as the row is written, inside the same transactional boundary.
Figure 2. Embedding generation happens at insert time inside Oracle AI Database, where document text is embedded with VECTOR_EMBEDDING() and stored together with the row in a VECTOR column.
Before moving into retrieval, the notebook inspects the inserted rows:
SELECT id, category, DBMS_LOB.SUBSTR(doc_text, 120, 1) AS preview
FROM onnx_docs
ORDER BY id;
Step 8: Run semantic search in SQL and LangChain
Once embeddings are stored, semantic retrieval is handled entirely inside Oracle. The notebook uses VECTOR_DISTANCE() together with VECTOR_EMBEDDING() so that the query text is embedded on the fly and compared against the stored vectors:
SELECT
id,
category,
DBMS_LOB.SUBSTR(doc_text, 200, 1) AS doc_preview,
VECTOR_DISTANCE(
embedding,
VECTOR_EMBEDDING(ALL_MINILM_L12_V2 USING 'How does Oracle support semantic search?' AS DATA),
COSINE
) AS distance
FROM onnx_docs
ORDER BY distance
FETCH FIRST 3 ROWS ONLY;
The user query is embedded directly within Oracle, where it is compared against stored document vectors. The results are then ranked by similarity, and the closest semantic matches are returned through SQL.
The notebook explicitly explains how to interpret the output: the smaller the cosine distance, the more semantically similar the document is to the query.
The notebook also runs several queries to validate that semantic ranking remains meaningful across different phrasings:
test_queries = [
"Which Oracle feature helps semantic retrieval?",
"Can I store embeddings in the database?",
"How does LangChain work with Oracle vectors?",
"Why are ONNX models useful here?"
]
Figure 3. At query time, Oracle embeds the input text, compares it with stored vectors using VECTOR_DISTANCE(), and returns the nearest semantic matches directly through SQL.
The notebook then adds an optional framework layer using LangChain:
from langchain_oracledb.embeddings import OracleEmbeddings
from langchain_oracledb.vectorstores.oraclevs import OracleVS
With OracleEmbeddings, the application can use Oracle's registered in-database embedding model:
oracle_embedder = OracleEmbeddings(
conn=conn,
params={"provider": "database", "model": MODEL_NAME}
)
The notebook also validates that the LangChain embedding call returns a vector of the expected size:
lc_embedding = oracle_embedder.embed_query(
"Oracle AI Database performs semantic search using vectors."
)
print(f"Embedding dimension: {len(lc_embedding)}")
print(f"First 5 values: {lc_embedding[:5]}")
The notebook then uses OracleVS, a LangChain-compatible vector store backed by Oracle AI Vector Search.
from langchain_core.documents import Document
from langchain_oracledb.vectorstores.oraclevs import OracleVS
from langchain_community.vectorstores.utils import DistanceStrategy
langchain_docs = [
Document(page_content="Oracle AI Database supports vector storage and semantic search."),
Document(page_content="An ONNX embedding model can be loaded directly into Oracle."),
Document(page_content="LangChain can use OracleVS to query Oracle AI Vector Search."),
Document(page_content="Using in-database embeddings can reduce architectural complexity."),
]
vector_store = OracleVS.from_documents(
documents=langchain_docs,
embedding=oracle_embedder,
client=conn,
table_name="LC_ONNX_DEMO",
distance_strategy=DistanceStrategy.COSINE
)
The notebook also runs a similarity query through the LangChain abstraction:
results = vector_store.similarity_search(
"How can Oracle Database help with semantic retrieval?",
k=3
)
for i, doc in enumerate(results, start=1):
print(f"{i}. {doc.page_content}")
Validation & Troubleshooting
- Validate that the model appears in
USER_MINING_MODELSafterDBMS_VECTOR.LOAD_ONNX_MODEL()orDBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD(). - Confirm that
VECTOR_EMBEDDING()returns a 384-dimensional embedding for the loaded model. - If semantic ranking looks off, verify that the same model is used for both stored document embeddings and query embeddings.
- If using cloud loading, verify URI or PAR validity, bucket path, region, and credential privileges.
- When rerunning the notebook, check whether the model and demo tables already exist to avoid duplicate object errors.
Frequently Asked Questions
Why load the model into Oracle instead of calling an external API?
Because Oracle can generate embeddings directly in SQL, which reduces external dependencies and keeps data and inference inside the same system boundary.
Why does the model need to be augmented?
Because Oracle must be able to accept raw text input directly. That requires tokenization and preprocessing logic to already be included in the ONNX graph.
What does VECTOR_EMBEDDING() do?
It invokes the registered model inside Oracle and returns the embedding vector for the input text.
What does the VECTOR column store?
It stores the numeric embedding representation produced by the model. In this example, the vectors are 384-dimensional FLOAT32 values.
How is semantic similarity computed?
This workflow uses VECTOR_DISTANCE() with cosine distance to compare the stored document vectors with the embedded query.
Can the model be reused by multiple applications?
Yes. Once registered and granted appropriately, the model can be invoked by any application that has access to the Oracle environment.
Can I load the model from cloud storage instead of a local mounted directory?
Yes. Oracle AI Database supports DBMS_VECTOR.LOAD_ONNX_MODEL_CLOUD() for models in Oracle Cloud Object Storage, with either a credential or a PAR URL.
Does LangChain move embeddings outside Oracle?
No. LangChain provides a higher-level interface, but the model execution and vector search still run in Oracle.
Does this replace a separate vector database?
For many use cases, yes. Oracle provides native vector storage and vector search directly in the database.
Related Documentation and Further Reading
- Companion notebook on GitHub
- Oracle Database 26ai documentation
- Oracle AI Vector Search User's Guide
- LOAD_ONNX_MODEL_CLOUD
- DBMS_VECTOR package reference
- VECTOR_EMBEDDING SQL reference
- VECTOR_DISTANCE SQL reference
- DBMS_CLOUD package reference
- Oracle JSON Developer's Guide
- Oracle Text Application Developer's Guide
- Oracle Spatial and Graph documentation
- Oracle Database Security Guide
- LangChain Oracle vector store integration






