MTEB stands for Massive Text Embedding Benchmark—a comprehensive evaluation framework designed to test how well text embedding models perform across a wide range of natural language processing tasks2.
Best Sentence-to-Vector-to-Sentence (S-V-S) Pairings by Dimension
384D: Efficiency-Focused
Best Pairing: MiniLM-L6-v2 + vec2text-basepython
# Sentence → Vector (384D)
encoder = "sentence-transformers/all-MiniLM-L6-v2"
MTEB Avg: 63.05
Speed: 14,200 sentences/sec on CPU
Size: 80MB
Vector → Sentence
decoder = "jxm/vec2text-base-embeddings-384d"
ROUGE-L: ~0.42
Cosine recovery: ~0.85
768D: Balanced Performance (Your Current Target)
Best Pairing: GTR-T5-base + vec2text-gtr-basepython
# Sentence → Vector (768D)
encoder = "sentence-transformers/gtr-t5-base"
MTEB Avg: 66.13
Particularly strong on retrieval tasks
Size: 218MB
Vector → Sentence
decoder = "jxm/vec2text-gtr-base"
ROUGE-L: ~0.68
Cosine recovery: ~0.91
BERT Score: ~0.89
1024D: High Performance
Best Pairing: E5-large-v2 + Custom vec2textpython
# Sentence → Vector (1024D)
encoder = "intfloat/e5-large-v2"
MTEB Avg: 69.78
Excellent multilingual support
Size: 1.34GB
Vector → Sentence
Note: No official vec2text for 1024D yet
Options:
1. Project to 768D and use vec2text-gtr-base
2. Train custom vec2text model
projection_matrix = learn_projection(1024, 768)
1536D: Premium Quality
Best Pairing: OpenAI ada-002 Alternative (Open Source)python
# Since you want open source, best alternatives:
Option 1: GTR-T5-XL
encoder = "sentence-transformers/gtr-t5-xl"
MTEB Avg: 68.42
Size: 1.24GB
Option 2: Cohere Embed v3 (if API acceptable)
MTEB Avg: 64.47
Vector → Sentence
Currently no vec2text for 1536D
Recommendation: Train custom using your approach
2048D: Maximum Expressiveness
Best Pairing: Custom ensemble approachpython
# No standard models at 2048D
Recommendation: Ensemble approach
class Ensemble2048D:
def __init__(self):
self.models = [
"gtr-t5-xl", # 768D
"e5-large-v2", # 1024D
"bge-large-en" # 1024D
]
def encode(self, text):
embeddings = [model.encode(text) for model in self.models]
return np.concatenate(embeddings)[:2048]
Comprehensive Benchmark Comparison
Estimated based on model architecture
Key Findings for Your VMM Architecture
Implementation for Multi-Dimensional Support
python
class MultiDimS2V2S:
"""Support multiple dimensions with fallback"""
def __init__(self):
self.encoders = {
384: "all-MiniLM-L6-v2",
768: "gtr-t5-base",
1024: "e5-large-v2",
1536: "gtr-t5-xl"
}
self.decoders = {
384: "vec2text-base-384d",
768: "vec2text-gtr-base",
# Fallback for higher dims
1024: self.project_and_decode,
1536: self.project_and_decode
}
def project_and_decode(self, embedding, target_dim=768):
"""Project to 768D for decoding"""
projection = self.projections[f"{len(embedding)}_to_{target_dim}"]
projected = projection @ embedding
return self.decoders768
Recommendation for Your Project
Given your focus on 768D and need for high-quality reconstruction:
The GTR-T5 family provides the best balance of:
🧪 What MTEB Measures
MTEB evaluates models on 40+ tasks (in English v2) across categories like:
Each model gets a composite score (MTEB avg) based on its performance across these tasks. This score helps compare generalist embedders like NV-Embed-v2 or STELLA across use cases.
🏆 Why It Matters
You can explore the official MTEB leaderboard to see how models stack up.
mps) †mps) †mps) †mps) †mps) †mps) †mps) †mps) †mps) †mps) †mps) †mps) †mps) †mps) †† MPS notes (Apple GPU on Mac): PyTorch’s MPS backend accelerates Transformers on Apple silicon using Metal. Most HF/SBERT models “just work” on mps, but a few ops may fall back to CPU (set PYTORCH_ENABLE_MPS_FALLBACK=1). Conversion to Core ML is also possible (Optimum/Exporters) if you want ANE/GPU deployment. Apple DeveloperHugging Face+2Hugging Face+2GitHub
Extra notes you’ll care about
Want this as a CSV (with fp32/fp16/int8 memory columns) or filtered to 384 / 768 / 1024 only? I can generate it directly.
High-confidence pairs (evaluated in papers)
EMNLP’23 shows very strong reconstructions (e.g., ~92% exact match on 32-token snippets with sequence-beam search); GTR-T5 models output 768-d vectors. ACL AnthologyHugging Face
Same paper reports solid exact-match rates on ada-002; the legacy ada-002 uses 1536 dims. ACL AnthologyOpenAI Community
ZSInvert evaluates on Contriever and recovers semantically faithful text (F1 > 50, cosine > 0.90) without per-encoder training; Contriever exports 768-d vectors. ar5ivHugging Face
Same ZSInvert study includes GTR; zero-shot, high cosine similarity. ar5iv
GTE is one of the evaluated encoders; large-en-v1.5 outputs 1024-d vectors. ar5ivHugging Face
Explicitly listed among ZSInvert’s encoders; model card notes 1536-d embeddings. ar5ivHugging Face
ALGEN trains a local decoder (FLAN-T5-decoder) and aligns to victim embedders; with ~1k alignment samples it gets strong Rouge/Cosine on T5 embeddings. Sentence-T5 uses 768-d vectors. ACL AnthologyHugging Face
Reported Rouge-L ≈ 38 and cosine ≈ 0.89 with 1k leaked samples. ACL Anthology
Rouge-L ≈ 43, cosine ≈ 0.94 at 1k samples. ACL Anthology
Rouge-L ≈ 40, cosine ≈ 0.92 at 1k samples. ACL Anthology
Rouge-L ≈ 41 and cosine ≈ 0.93 at 1k samples. ACL Anthology
Rouge-L ≈ 41 and cosine ≈ 0.91 at 1k samples; 3-large uses 3072 dims. ACL AnthologyOpenAI Platform
GEIA reconstructs ordered sequences across SBERT family. (Paper evaluates SBERT/SimCSE/ST5/MPNet.) ACL Anthology
Same GEIA paper shows good lexical overlap (ROUGE-1 ≈ 0.59–0.72; BLEU-1 ≈ 0.35–0.46 across victims). SimCSE-RoBERTa exports 768-d embeddings. ACL AnthologyHugging Face
MPNet is one of GEIA’s evaluated victims; all-mpnet-base-v2 is 768-d. ACL AnthologyHugging Face
Also explicitly evaluated as a victim model in GEIA. ACL Anthology
Solid “works in practice” pairs (generalizable decoders, widely used encoders)
ZSInvert is universal and tested across BERT/T5-style encoders; E5-large-v2 is a popular 1024-d BERT-style encoder, making it a good match in the same family. ar5ivHugging Face
ALGEN’s method is encoder-agnostic once you align a small leaked set; contrary encoders like Contriever are typical BERT-style and align well under ALGEN’s linear map. (ALGEN framework + Contriever dim.) ACL AnthologyHugging Face
ZSInvert is embedding-agnostic; STELLA 400M v5 offers multiple output dims (e.g., 768/1024), so you can pick a dimension that matches your store. ar5ivHugging Face
GEIA generalizes across SBERT/SimCSE-style encoders; all-MiniLM-L6-v2 (384-d) is a compact SBERT model often used in the wild. ACL AnthologyHugging Face
Quick pairing tips
- Vec2Text → best when you can afford per-encoder training; it yields _exact_ matches on some encoders/datasets. ACL Anthology
- ZSInvert → best when you want _zero-shot_ support across many encoders with high semantic fidelity. ar5iv
- ALGEN → best when you can get ~1–1000 leaked (text,embedding) pairs for the victim and want strong results fast via linear alignment + a single local decoder. ACL Anthology
- GEIA → good general generative baseline across classic SBERT/SimCSE/T5/MPNet families. ACL Anthology
-