-
Notifications
You must be signed in to change notification settings - Fork 325
Open
Milestone
Description
System Info
Hello,
Hugginface Text Embedding Inference version: 1.8.3
Which image do I use?: ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.3
Model: janni-t/qwen3-embedding-0.6b-tei-onnx
Container stops in "Warming up model" phase.
My docker-compose code:
services:
embedding:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.3
ports:
- "8080:80"
volumes:
- embedding_cache:/data
command: ["--model-id", "janni-t/qwen3-embedding-0.6b-tei-onnx","--pooling", "mean", "--auto-truncate"]
Logs:
2025-11-16 14:48:19.055 | 2025-11-16T11:48:19.053494Z INFO text_embeddings_router: router/src/main.rs:203: Args { model_id: "jan**-*/*****-*********-*.**-***-*nnx", revision: None, tokenization_workers: None, dtype: None, pooling: Some(Mean), max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: true, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "af8152803796", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-11-16 14:48:19.135 | 2025-11-16T11:48:19.135066Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:42: Starting download
2025-11-16 14:48:19.135 | 2025-11-16T11:48:19.135097Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_bert_config.json`
2025-11-16 14:48:19.588 | 2025-11-16T11:48:19.588526Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_roberta_config.json`
2025-11-16 14:48:19.747 | 2025-11-16T11:48:19.747345Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_distilbert_config.json`
2025-11-16 14:48:19.904 | 2025-11-16T11:48:19.903960Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_camembert_config.json`
2025-11-16 14:48:20.057 | 2025-11-16T11:48:20.056866Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_albert_config.json`
2025-11-16 14:48:20.218 | 2025-11-16T11:48:20.218312Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlm-roberta_config.json`
2025-11-16 14:48:20.372 | 2025-11-16T11:48:20.372777Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlnet_config.json`
2025-11-16 14:48:20.525 | 2025-11-16T11:48:20.525037Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config_sentence_transformers.json`
2025-11-16 14:48:20.525 | 2025-11-16T11:48:20.525299Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config.json`
2025-11-16 14:48:20.525 | 2025-11-16T11:48:20.525332Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `tokenizer.json`
2025-11-16 14:48:20.525 | 2025-11-16T11:48:20.525505Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:72: Model artifacts downloaded in 1.390442692s
2025-11-16 14:48:20.802 | 2025-11-16T11:48:20.801853Z WARN text_embeddings_router: router/src/lib.rs:191: Could not find a Sentence Transformers config
2025-11-16 14:48:20.802 | 2025-11-16T11:48:20.802231Z WARN text_embeddings_router: router/src/lib.rs:205: The input sequences will be truncated to 16384 tokens even if the model `max_input_length` is greater than the provided `--max-batch-tokens` (32768 > 16384), as `--auto-truncate` is enabled.
2025-11-16 14:48:20.802 | 2025-11-16T11:48:20.802420Z INFO text_embeddings_router: router/src/lib.rs:216: Maximum number of tokens per request: 16384
2025-11-16 14:48:20.802 | 2025-11-16T11:48:20.802747Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 20 tokenization workers
2025-11-16 14:48:21.436 | 2025-11-16T11:48:21.436206Z INFO text_embeddings_router: router/src/lib.rs:264: Starting model backend
2025-11-16 14:48:21.438 | 2025-11-16T11:48:21.437916Z INFO text_embeddings_backend: backends/src/lib.rs:627: Downloading `model.onnx`
2025-11-16 14:48:21.438 | 2025-11-16T11:48:21.438898Z INFO text_embeddings_backend: backends/src/lib.rs:641: Downloading `model.onnx_data`
2025-11-16 14:48:21.439 | 2025-11-16T11:48:21.439075Z INFO text_embeddings_backend: backends/src/lib.rs:381: Model ONNX weights downloaded in 1.188241ms
2025-11-16 14:48:21.439 | 2025-11-16T11:48:21.439090Z INFO text_embeddings_backend: backends/src/lib.rs:389: Downloading `tokenizer_config.json`
2025-11-16 14:48:27.205 | 2025-11-16T11:48:27.205815Z INFO text_embeddings_router: router/src/lib.rs:282: Warming up model
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
Create a docker-compose.yml like this:
services:
embedding:
image: ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.3
ports:
- "8080:80"
volumes:
- embedding_cache:/data
command: ["--model-id", "janni-t/qwen3-embedding-0.6b-tei-onnx","--pooling", "mean", "--auto-truncate"]
volumes:
embedding_cache:
driver: local
Expected behavior
This happens when I use normal qwen3-embedding-0.6b model with cuda image:
services:
embedding:
image: ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.3
ports:
- "8080:80"
volumes:
- embedding_cache:/data
command: ["--model-id", "Qwen/Qwen3-Embedding-0.6B", "--dtype", "float16", "--max-client-batch-size", "512", "--max-batch-tokens", "32768"]
environment:
- USE_FLASH_ATTENTION=True
gpus: all
2025-11-13 15:06:25
2025-11-13T12:06:25.650647Z INFO text_embeddings_router: router/src/main.rs:203: Args { model_id: "Qwe*/*****-*********-0.6B", revision: None, tokenization_workers: None, dtype: Some(Float16), pooling: None, max_concurrent_requests: 512, max_batch_tokens: 32768, max_batch_requests: None, max_client_batch_size: 512, auto_truncate: false, default_prompt_name: None, default_prompt: None, dense_path: None, hf_api_token: None, hf_token: None, hostname: "76f54a599a7b", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, disable_spans: false, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", prometheus_port: 9000, cors_allow_origin: None }
2025-11-13 15:06:25
2025-11-13T12:06:25.730681Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:42: Starting download
2025-11-13 15:06:25
2025-11-13T12:06:25.730710Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `1_Pooling/config.json`
2025-11-13 15:06:25
2025-11-13T12:06:25.731065Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_bert_config.json`
2025-11-13 15:06:33
2025-11-13T12:06:33.737414Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_roberta_config.json`
2025-11-13 15:06:39
2025-11-13T12:06:39.970555Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_distilbert_config.json`
2025-11-13 15:06:47
2025-11-13T12:06:47.979089Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_camembert_config.json`
2025-11-13 15:06:55
2025-11-13T12:06:55.981490Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_albert_config.json`
2025-11-13 15:07:03
2025-11-13T12:07:03.991526Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlm-roberta_config.json`
2025-11-13 15:07:10
2025-11-13T12:07:10.376962Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `sentence_xlnet_config.json`
2025-11-13 15:07:18
2025-11-13T12:07:18.379599Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config_sentence_transformers.json`
2025-11-13 15:07:18
2025-11-13T12:07:18.379646Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `config.json`
2025-11-13 15:07:18
2025-11-13T12:07:18.379654Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:18: Downloading `tokenizer.json`
2025-11-13 15:07:18
2025-11-13T12:07:18.380190Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:72: Model artifacts downloaded in 56.042275076s
2025-11-13 15:07:18
2025-11-13T12:07:18.661817Z WARN text_embeddings_router: router/src/lib.rs:191: Could not find a Sentence Transformers config
2025-11-13 15:07:18
2025-11-13T12:07:18.661844Z INFO text_embeddings_router: router/src/lib.rs:216: Maximum number of tokens per request: 32768
2025-11-13 15:07:18
2025-11-13T12:07:18.662172Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:38: Starting 20 tokenization workers
2025-11-13 15:07:19
2025-11-13T12:07:19.258831Z INFO text_embeddings_router: router/src/lib.rs:264: Starting model backend
2025-11-13 15:07:19
2025-11-13T12:07:19.260767Z INFO text_embeddings_backend: backends/src/lib.rs:586: Downloading `model.safetensors`
2025-11-13 15:07:19
2025-11-13T12:07:19.261868Z INFO text_embeddings_backend: backends/src/lib.rs:421: Model weights downloaded in 1.104616ms
2025-11-13 15:07:19
2025-11-13T12:07:19.261889Z INFO download_dense_modules: text_embeddings_backend: backends/src/lib.rs:685: Downloading `modules.json`
2025-11-13 15:07:19
2025-11-13T12:07:19.262114Z INFO text_embeddings_backend: backends/src/lib.rs:433: Dense modules downloaded in 229.358µs
2025-11-13 15:07:19
2025-11-13T12:07:19.904902Z INFO text_embeddings_backend_candle: backends/candle/src/lib.rs:506: Starting FlashQwen3 model on Cuda(CudaDevice(DeviceId(1)))
2025-11-13 15:10:10
2025-11-13T12:10:10.676754Z INFO text_embeddings_router: router/src/lib.rs:282: Warming up model
2025-11-13 15:10:13
2025-11-13T12:10:13.483120Z WARN text_embeddings_router: router/src/lib.rs:341: Invalid hostname, defaulting to 0.0.0.0
2025-11-13 15:10:13
2025-11-13T12:10:13.485628Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1852: Starting HTTP server: 0.0.0.0:80
2025-11-13 15:10:13
2025-11-13T12:10:13.485647Z INFO text_embeddings_router::http::server: router/src/http/server.rs:1853: Ready
Metadata
Metadata
Assignees
Labels
No labels