AI/ML Embeddings
AI/ML Embeddings
What Are Embeddings?
Embeddings are numerical representationsâvectors of real numbersâthat encode meaning, relationships, and context. Think of an embedding for "cat" as something like [0.2, -0.4, 0.7, âŚ]
, and for "dog" as [0.3, -0.5, 0.6, âŚ]
. Because cats and dogs are semantically related, their embeddings sit close together in this high-dimensional space.
These vectors provide a compact, dense representation of words (or sentences, images, etc.), capturing semantic and syntactic information far better than older methods like one-hot encoding, which treat words as isolated tokens.
Why Use Embeddings?
-
Semantic Geometry The numerical relationships within embeddings reflect real-world meaning. A famous example is:
embedding("king") â embedding("man") + embedding("woman") â embedding("queen")
This illustrates how embeddings encode relational meaning through geometry .
-
Context Awareness Modern embeddings differentiate word meanings based on context. In BERT, "running" in different sentences gets different vector representations.
-
Efficiency & Versatility Dense vectors are more memory-efficient and generalize well. They support tasks like search, sentiment analysis, translation, and structured retrieval.
How Are Embeddings Trained?
Embeddings learn from context, based on the principle that âyou shall know a word by the company it keepsâ:
-
Prediction-based models
- Word2Vec (Google, 2013): Train with CBOW (predict target from context) or Skipâgram (predict context from target).
- GloVe (Stanford): Uses global coâoccurrence statistics to learn representations.
-
Contextual embeddings
- ELMo (2018): Applies bidirectional LSTM to create context-aware vectors.
- BERT (Google, 2018, later contextual updates): Uses transformers and masked token prediction for deep contextual embeddings.
Word Prediction = Numbers Prediction
When models perform word-prediction tasks like masked language modeling, they predict an embedding vector, not a discrete token. That predicted vector is then mapped back to the nearest word in the vocabulary. So yesâword prediction is really just predicting meaningful numbers.
Why This Matters
- Analogical reasoning: Using vector math to discover relationships.
- Contextual understanding: Captures nuance and meaning shifts.
- Broad applicability: Powers search, translation, NER, summarization, QA, and more.
In Summary
- Embeddings = numeric vectors that capture word meaning.
- Word prediction models = predicting those meaningful vectors.
- What they enable = semantic geometry, context-awareness, and versatile NLP applications.
Embeddings might just look like lists of numbersâbut theyâre the secret structure of language in AI.