- Home
- Products
- Machine Learning
- Foundational courses
- Crash Course
There are a number of ways to get an embedding, including a state-of-the-artalgorithm created at Google.
Standard Dimensionality Reduction Techniques
There are many existing mathematical techniques for capturing the importantstructure of a high-dimensional space in a low dimensional space. In theory,any of these techniques could be used to create an embedding for a machinelearning system.
For example, principal component analysis (PCA)has been used to create word embeddings. Given a set of instances like bag ofwords vectors, PCA tries to find highly correlated dimensions that can becollapsed into a single dimension.
Word2vec
Word2vec is an algorithm invented at Google for training word embeddings.Word2vec relies on the distributional hypothesis to map semantically similarwords to geometrically close embedding vectors.
The distributional hypothesis states that words which often have the sameneighboring words tend to be semantically similar. Both "dog" and "cat"frequently appear close to the word "veterinarian", and this fact reflects theirsemantic similarity. As the linguist John Firth put it in 1957, "You shallknow a word by the company it keeps".
Word2Vec exploits contextual information like this by training a neural net todistinguish actually co-occurring groups of words from randomly grouped words.The input layer takes a sparse representation of a target word together withone or more context words. This input connects to a single, smaller hiddenlayer.
In one version of the algorithm, the system makes a negative example bysubstituting a random noise word for the target word. Given the positiveexample "the plane flies", the system might swap in "jogging" to create thecontrasting negative example "the jogging flies".
The other version of the algorithm creates negative examples by pairing thetrue target word with randomly chosen context words. So it might take thepositive examples (the, plane), (flies, plane) and the negative examples(compiled, plane), (who, plane) and learn to identify which pairs actuallyappeared together in text.
The classifier is not the real goal for either version of the system, however.After the model has been trained, you have an embedding. You can use theweights connecting the input layer with the hidden layer to map sparserepresentations of words to smaller vectors. This embedding can be reused inother classifiers.
For more information about word2vec, see the tutorial on tensorflow.org
Training an Embedding as Part of a Larger Model
You can also learn an embedding as part of the neural network for your targettask. This approach gets you an embedding well customized for your particularsystem, but may take longer than training the embedding separately.
In general, when you have sparse data (or dense data that you'd like to embed),you can create an embedding unit that is just a special type of hidden unit ofsize d. This embedding layer can be combined with any other features andhidden layers. As in any DNN, the final layer will be the loss that is beingoptimized. For example, let's say we're performing collaborative filtering,where the goal is to predict a user's interests from the interests of otherusers. We can model this as a supervised learning problem by randomly settingaside (or holding out) a small number of the movies that the user has watched asthe positive labels, and then optimize a softmax loss.
Figure 5. A sample DNN architecture for learning movie embeddings from collaborativefiltering data.
As another example if you want to create an embedding layer for the words in areal-estate ad as part of a DNN to predict housing prices then you'd optimize anL2 Loss using the known sale price of homes in yourtraining data as the label.
When learning a d-dimensional embedding each item is mapped to a pointin a d-dimensional space so that the similar items are nearby in thisspace. Figure 6 helps to illustrate the relationship between the weightslearned in the embedding layer and the geometric view. The edge weights betweenan input node and the nodes in the d-dimensional embedding layercorrespond to the coordinate values for each of the d axes.
Figure 6. A geometric view of the embedding layer weights.
Help Center
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2022-08-17 UTC.
[{ "type": "thumb-down", "id": "missingTheInformationINeed", "label":"Missing the information I need" },{ "type": "thumb-down", "id": "tooComplicatedTooManySteps", "label":"Too complicated / too many steps" },{ "type": "thumb-down", "id": "outOfDate", "label":"Out of date" },{ "type": "thumb-down", "id": "samplesCodeIssue", "label":"Samples / code issue" },{ "type": "thumb-down", "id": "otherDown", "label":"Other" }] [{ "type": "thumb-up", "id": "easyToUnderstand", "label":"Easy to understand" },{ "type": "thumb-up", "id": "solvedMyProblem", "label":"Solved my problem" },{ "type": "thumb-up", "id": "otherUp", "label":"Other" }]