What We Think Is Cool About Amgix

We built Amgix to solve the headaches we experienced deploying hybrid search in production. Here is a list of the architectural choices, features, and capabilities that we think are really cool, and that make building with Amgix a great experience.

Architecture & Scalability

Deployment options: single container or full stack

For simple setups, Amgix One runs the API, encoder, RabbitMQ, and Qdrant in a single container with data persisted under /data — ideal for quick trials, edge devices, or small deployments. For larger environments, the same components run as separate services (e.g. via Docker Compose or Kubernetes), so you can scale API, encoder, queue, and storage independently without changing the API.

Independently scalable components

Amgix is built as a set of distinct services that can be scaled separately:

API nodes — handle all HTTP API requests (search, ingestion, management)
Encoder nodes — run the embedding workloads and can be deployed with different roles:
- index for ingestion-only document embedding and indexing
- query for online embedding used by search and other embedding consumers
Storage backends — scale independently based on the chosen engine (Qdrant, PostgreSQL, MariaDB), decoupled from API and encoder scaling

You can scale API, encoder, and storage layers independently: add API nodes for more concurrent requests, add index / query nodes for heavier ingestion or online embedding, and scale the storage backend according to its own clustering or replication model. This avoids the tight coupling where query capacity and storage capacity must scale together.

Message-queue-based embedding pipeline

All embedding work runs over RabbitMQ, not HTTP request chains. Encoder nodes can fail and recover independently without affecting the API. Bulk ingestion doesn't block the API or stack up HTTP timeouts. Backpressure is handled at the queue level. Other systems embed synchronously on the write path or require you to set up a separate inference service and wire it together yourself.

Adaptive model orchestration — no manual assignment

Amgix automatically decides which encoder nodes should load which models based on demand and available resources. You don't assign models to machines manually; the system keeps the cluster balanced as traffic patterns change to optimize latency in real time without operator intervention.

This means you can run a heterogeneous cluster of encoder nodes with different hardware, and the system figures out where models should live and where to send each request. There is no manual assignment of models to nodes, no per-node configuration to keep in sync.

Encoder node specialization

Individual encoder nodes can be run in different service modes (index, query, or all), so you can dedicate some nodes to ingestion-only workloads and others to online search and model validation, without any changes on the client side.

Embedding & Vectors

Asymmetric model support (separate document/query models)

Each vector in a collection can use a different model for document embedding and query embedding. This supports asymmetric encoders — models like E5, BGE, and others that are specifically trained to embed documents and queries differently for better retrieval quality. You configure it once per collection; Amgix handles the right model at the right time automatically.

Multiple named vector types per collection, per field

You define multiple named vectors in a single collection — for example:

wmtr on the name field for keyword/identifier search
dense on the content field for semantic search
splade on the content field for sparse semantic search

Each is an independent index. Any combination of dense, sparse, and tokenization-based vectors can coexist in the same collection. At query time you choose which vectors to use and at what weight.

Per-vector, per-field search weights at query time

At search time you specify which vector × field combinations to use and at what weight through a single vector_weights structure. The API takes care of running the right models and fusing scores on the server, so you can switch between keyword-only, semantic-only, and hybrid modes without changing the collection schema or hand-writing fusion logic.

Bring your own vectors (BYOV)

At document upload time, you can include pre-computed dense or sparse vectors alongside your document content; at query time, you can also supply pre-computed query vectors. Amgix uses them as-is and runs them through the same storage, fusion, and ranking pipeline as internally generated embeddings, so you can mix external embedding APIs or proprietary models with Amgix-managed models without changing how you search.

WMTR — purpose-built for identifier-heavy data

Weighted Multilevel Token Representation (WMTR) is Amgix's own tokenizer, designed specifically for data that mixes natural language with dense identifiers and symbols. It balances language-aware signals with structure- and character-level signals so that both human-readable text and “ugly” identifiers stay searchable:

Part numbers and product codes (e.g. 12LP'-x03/5-XL)
SKUs, serial numbers, and catalog identifiers
Technical, scientific, and legal text with many abbreviations and symbols
Mixed-language data

Standard BM25 or pure semantic retrieval both tend to perform poorly on this kind of data, either over-focusing on words or ignoring the fine-grained structure of identifiers. WMTR was built to close that gap.

Multiple sparse vector strategies in one collection

You can run full-text, trigrams, whitespace, WMTR, and SPLADE-style sparse models across different fields in the same collection. Each uses a different tokenization or modeling strategy appropriate to its field.

Storage & Data Model

True storage backend agnosticism

Qdrant, PostgreSQL, and MariaDB are supported behind the same API. The storage backend is selected from a connection URL at startup. Collection configuration, documents, vectors, queue, and locking all work identically across backends.

This means you can choose Qdrant, PostgreSQL, or MariaDB based on operational, performance, or regulatory needs while keeping the API and behavior the same — Amgix is not tightly coupled to a single storage engine.

Built-in, backend-native ingestion queue

The asynchronous processing queue is stored in the same database backend as your documents. You don't need to run a separate service (like Redis or Kafka) just to manage ingestion. This keeps the stack simpler, guarantees durability across encoder restarts, and avoids silently dropped documents without adding operational overhead.

Ingestion & Reliability

Per-document status tracking

After uploading a document asynchronously, you can query its processing status at any point. A synchronous upload mode is also available when you need immediate consistency.

Smarter retry logic

The ingestion pipeline distinguishes between transient infrastructure issues and fundamentally bad data. It retries temporary problems to prevent data loss, but fails fast on invalid inputs so you don't end up with retry storms overloading your system.

Distributed locking for multi-node correctness

Built-in distributed locking ensures each document is written exactly once. This avoids race conditions and duplicate writes in multi-node deployments.

Timestamp-based deduplication

Each document carries a UTC timestamp. If a newer version of the document is already indexed, an incoming upsert is skipped. You can re-index your entire dataset safely — concurrent workers and re-runs don't overwrite newer data with older versions.

Search

Server-side weighted fusion

Fusion runs on the server: the client sends a single search request (optionally with per-vector weights) and gets back one fused, ranked result list, without having to hand-roll fusion logic or combine multiple calls.