Embeddings: Obtaining Embeddings | Machine Learning

Machine Learning

English
Deutsch
Español
Français
Indonesia
Português – Brasil
Русский
中文 – 简体
日本語
한국어

Foundational courses

Machine Learning

Foundational courses
- Home
- Crash Course
Advanced courses
Guides
Glossary
- More

Quick Links
Overview
Prerequisites and Prework
Exercises
ML Concepts
Introduction to ML (3 min)
Framing (15 min)
See Also
Embeddings | Machine Learning | Google for Developers Neural Network Embeddings Explained Embeddings in Machine Learning: Everything You Need to Know | FeatureForm What Is Embedding and What Can You Do with It
- Video Lecture
- Key ML Terminology
- Check Your Understanding
Descending into ML (20 min)
- Video Lecture
- Linear Regression
- Training and Loss
- Check Your Understanding
Reducing Loss (60 min)
- Video Lecture
- An Iterative Approach
- Gradient Descent
- Learning Rate
- Optimizing Learning Rate
- Stochastic Gradient Descent
- Playground Exercise
- Check Your Understanding
First Steps with TF (65 min)
- Toolkit
- Programming Exercises
Generalization (15 min)
- Video Lecture
- Peril of Overfitting
Training and Test Sets (25 min)
- Video Lecture
- Splitting Data
- Playground Exercise
Validation Set (35 min)
- Check Your Intuition
- Video Lecture
- Another Partition
- Programming Exercise
Representation (35 min)
- Video Lecture
- Feature Engineering
- Qualities of Good Features
- Cleaning Data
Feature Crosses (70 min)
- Video Lecture
- Encoding Nonlinearity
- Crossing One-Hot Vectors
- Playground Exercises
- Programming Exercise
- Check Your Understanding
Regularization: Simplicity (40 min)
- Playground Exercise: Overcrossing?
- Video Lecture
- L2 Regularization
- Lambda
- Playground Exercise: L2 Regularization
- Check Your Understanding
Logistic Regression (20 min)
- Video Lecture
- Calculating a Probability
- Loss and Regularization
Classification (90 min)
- Video Lecture
- Thresholding
- True vs. False; Positive vs. Negative
- Accuracy
- Precision and Recall
- Check Your Understanding: Accuracy, Precision, Recall
- ROC Curve and AUC
- Check Your Understanding: ROC and AUC
- Prediction Bias
- Programming Exercise
Regularization: Sparsity (20 min)
- Video Lecture
- L1 Regularization
- Playground Exercise
- Check Your Understanding
See Also
Machine Learning's Most Useful Multitool: Embeddings
Neural Networks (65 min)
- Video Lecture
- Structure
- Playground Exercises
- Programming Exercise
Training Neural Nets (10 min)
- Video Lecture
- Best Practices
Multi-Class Neural Nets (45 min)
- Video Lecture
- One vs. All
- Softmax
- Programming Exercise
Embeddings (50 min)
- Video Lecture
- Motivation from Collaborative Filtering
- Categorical Input Data
- Translating to a Lower-Dimensional Space
- Obtaining Embeddings
ML Engineering
Production ML Systems (3 min)
Static vs. Dynamic Training (7 min)
- Video Lecture
- Check Your Understanding
Static vs. Dynamic Inference (7 min)
- Video Lecture
- Check Your Understanding
Data Dependencies (14 min)
- Video Lecture
- Check Your Understanding
Fairness (70 min)
- Video Lecture
- Types of Bias
- Identifying Bias
- Evaluating for Bias
- Programming Exercise
- Check Your Understanding
ML Systems in the Real World
Cancer Prediction (5 min)
Literature (5 min)
Guidelines (2 min)
Conclusion
Next Steps

Home
Products
Machine Learning
Foundational courses
Crash Course

Stay organized with collections Save and categorize content based on your preferences.

There are a number of ways to get an embedding, including a state-of-the-artalgorithm created at Google.

Standard Dimensionality Reduction Techniques

There are many existing mathematical techniques for capturing the importantstructure of a high-dimensional space in a low dimensional space. In theory,any of these techniques could be used to create an embedding for a machinelearning system.

For example, principal component analysis (PCA)has been used to create word embeddings. Given a set of instances like bag ofwords vectors, PCA tries to find highly correlated dimensions that can becollapsed into a single dimension.

Word2vec

Word2vec is an algorithm invented at Google for training word embeddings.Word2vec relies on the distributional hypothesis to map semantically similarwords to geometrically close embedding vectors.

The distributional hypothesis states that words which often have the sameneighboring words tend to be semantically similar. Both "dog" and "cat"frequently appear close to the word "veterinarian", and this fact reflects theirsemantic similarity. As the linguist John Firth put it in 1957, "You shallknow a word by the company it keeps".

Word2Vec exploits contextual information like this by training a neural net todistinguish actually co-occurring groups of words from randomly grouped words.The input layer takes a sparse representation of a target word together withone or more context words. This input connects to a single, smaller hiddenlayer.

In one version of the algorithm, the system makes a negative example bysubstituting a random noise word for the target word. Given the positiveexample "the plane flies", the system might swap in "jogging" to create thecontrasting negative example "the jogging flies".

The other version of the algorithm creates negative examples by pairing thetrue target word with randomly chosen context words. So it might take thepositive examples (the, plane), (flies, plane) and the negative examples(compiled, plane), (who, plane) and learn to identify which pairs actuallyappeared together in text.

The classifier is not the real goal for either version of the system, however.After the model has been trained, you have an embedding. You can use theweights connecting the input layer with the hidden layer to map sparserepresentations of words to smaller vectors. This embedding can be reused inother classifiers.

For more information about word2vec, see the tutorial on tensorflow.org

Training an Embedding as Part of a Larger Model

You can also learn an embedding as part of the neural network for your targettask. This approach gets you an embedding well customized for your particularsystem, but may take longer than training the embedding separately.

In general, when you have sparse data (or dense data that you'd like to embed),you can create an embedding unit that is just a special type of hidden unit ofsize d. This embedding layer can be combined with any other features andhidden layers. As in any DNN, the final layer will be the loss that is beingoptimized. For example, let's say we're performing collaborative filtering,where the goal is to predict a user's interests from the interests of otherusers. We can model this as a supervised learning problem by randomly settingaside (or holding out) a small number of the movies that the user has watched asthe positive labels, and then optimize a softmax loss.

Figure 5. A sample DNN architecture for learning movie embeddings from collaborativefiltering data.

As another example if you want to create an embedding layer for the words in areal-estate ad as part of a DNN to predict housing prices then you'd optimize anL₂ Loss using the known sale price of homes in yourtraining data as the label.

When learning a d-dimensional embedding each item is mapped to a pointin a d-dimensional space so that the similar items are nearby in thisspace. Figure 6 helps to illustrate the relationship between the weightslearned in the embedding layer and the geometric view. The edge weights betweenan input node and the nodes in the d-dimensional embedding layercorrespond to the coordinate values for each of the d axes.

Figure 6. A geometric view of the embedding layer weights.

Help Center

Previous arrow_back Translating to a Lower-Dimensional Space

Next Production ML Systems (3 min) arrow_forward

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-08-17 UTC.

Embeddings: Obtaining Embeddings | Machine Learning | Google for Developers (2024)

Standard Dimensionality Reduction Techniques

Word2vec

Training an Embedding as Part of a Larger Model