From the ground up. Connected to what you're building.
You give it text — a word, sentence, paragraph, whatever. It returns a list of numbers. Your model (Qwen3-Embedding-0.6B) returns exactly 1,024 numbers. That's it. That list of 1,024 floats IS the embedding.
"Thompson Sampling" → [0.0234, -0.1891, 0.0012, ..., 0.0847] (1,024 values) "Banana split" → [0.1102, 0.0034, -0.2201, ..., -0.0391] (1,024 values)
Each of those 1,024 positions is a dimension. Not a "chunk" of the text — a dimension of meaning. The model learned during training that, say, dimension 47 correlates with "technical vs casual," dimension 312 correlates with "concrete vs abstract," dimension 891 correlates with "positive vs negative." But you don't get to pick what each dimension means — the model discovered them during training. Most dimensions don't map to anything a human would name.
The text goes through the transformer layers — attention heads, feedforward networks, the whole stack. The final hidden state (the last layer's output) gets pooled (usually averaged across all token positions) into a single 1,024-dim vector. That vector is the embedding.
The model was trained with a contrastive objective: "these two texts are similar, push their vectors close together. These two texts are different, push them apart." After millions of such comparisons, the 1,024 dimensions organize themselves into a space where cosine similarity between vectors ≈ semantic similarity between texts.
similarity = dot(embedding_A, embedding_B) / (norm(A) * norm(B))
This is the angle between two 1,024-dimensional arrows. Close to 1.0 = pointing the same direction = similar meaning. Close to 0 = perpendicular = unrelated. Close to -1.0 = opposite.
This is the "defactorization" question. You choose what text to feed in:
| Input | What you get | Trade-off |
|---|---|---|
| Single word: "retrieval" | Word-level meaning | No context, ambiguous |
| Sentence: "Thompson Sampling converges in 50 tasks" | Sentence-level meaning | Specific claim, loses surrounding context |
| Paragraph: full method description | Paragraph-level meaning | Rich context, but diluted — one 1,024-dim vector for 200 words means each word gets ~5 dimensions of influence |
| Whole document | Document-level meaning | Very diluted — a 5-page paper compressed to 1,024 numbers loses enormous detail |
Each of your 205 skills gets embedded as one vector (the full skill description, maybe 50-100 words → 1,024 dims). Each task gets embedded similarly. You compare them with cosine similarity. That's your relevance dimension.