Embeddings: Obtaining Embeddings  |  Machine Learning  |  Google for Developers (2024)

  • Machine Learning

Sign in
  • Foundational courses
  • Home
  • Products
  • Machine Learning
  • Foundational courses
  • Crash Course
Stay organized with collections Save and categorize content based on your preferences.

There are a number of ways to get an embedding, including a state-of-the-artalgorithm created at Google.

Standard Dimensionality Reduction Techniques

There are many existing mathematical techniques for capturing the importantstructure of a high-dimensional space in a low dimensional space. In theory,any of these techniques could be used to create an embedding for a machinelearning system.

For example, principal component analysis (PCA)has been used to create word embeddings. Given a set of instances like bag ofwords vectors, PCA tries to find highly correlated dimensions that can becollapsed into a single dimension.

Word2vec

Word2vec is an algorithm invented at Google for training word embeddings.Word2vec relies on the distributional hypothesis to map semantically similarwords to geometrically close embedding vectors.

The distributional hypothesis states that words which often have the sameneighboring words tend to be semantically similar. Both "dog" and "cat"frequently appear close to the word "veterinarian", and this fact reflects theirsemantic similarity. As the linguist John Firth put it in 1957, "You shallknow a word by the company it keeps".

Word2Vec exploits contextual information like this by training a neural net todistinguish actually co-occurring groups of words from randomly grouped words.The input layer takes a sparse representation of a target word together withone or more context words. This input connects to a single, smaller hiddenlayer.

In one version of the algorithm, the system makes a negative example bysubstituting a random noise word for the target word. Given the positiveexample "the plane flies", the system might swap in "jogging" to create thecontrasting negative example "the jogging flies".

The other version of the algorithm creates negative examples by pairing thetrue target word with randomly chosen context words. So it might take thepositive examples (the, plane), (flies, plane) and the negative examples(compiled, plane), (who, plane) and learn to identify which pairs actuallyappeared together in text.

The classifier is not the real goal for either version of the system, however.After the model has been trained, you have an embedding. You can use theweights connecting the input layer with the hidden layer to map sparserepresentations of words to smaller vectors. This embedding can be reused inother classifiers.

For more information about word2vec, see the tutorial on tensorflow.org

Training an Embedding as Part of a Larger Model

You can also learn an embedding as part of the neural network for your targettask. This approach gets you an embedding well customized for your particularsystem, but may take longer than training the embedding separately.

In general, when you have sparse data (or dense data that you'd like to embed),you can create an embedding unit that is just a special type of hidden unit ofsize d. This embedding layer can be combined with any other features andhidden layers. As in any DNN, the final layer will be the loss that is beingoptimized. For example, let's say we're performing collaborative filtering,where the goal is to predict a user's interests from the interests of otherusers. We can model this as a supervised learning problem by randomly settingaside (or holding out) a small number of the movies that the user has watched asthe positive labels, and then optimize a softmax loss.

Embeddings: Obtaining Embeddings | Machine Learning | Google for Developers (1)

Figure 5. A sample DNN architecture for learning movie embeddings from collaborativefiltering data.

As another example if you want to create an embedding layer for the words in areal-estate ad as part of a DNN to predict housing prices then you'd optimize anL2 Loss using the known sale price of homes in yourtraining data as the label.

When learning a d-dimensional embedding each item is mapped to a pointin a d-dimensional space so that the similar items are nearby in thisspace. Figure 6 helps to illustrate the relationship between the weightslearned in the embedding layer and the geometric view. The edge weights betweenan input node and the nodes in the d-dimensional embedding layercorrespond to the coordinate values for each of the d axes.

Embeddings: Obtaining Embeddings | Machine Learning | Google for Developers (2)

Figure 6. A geometric view of the embedding layer weights.

Help Center

Previous arrow_back Translating to a Lower-Dimensional Space
Next Production ML Systems (3 min) arrow_forward

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2022-08-17 UTC.

Embeddings: Obtaining Embeddings  |  Machine Learning  |  Google for Developers (2024)
Top Articles
Latest Posts
Article information

Author: Errol Quitzon

Last Updated:

Views: 6016

Rating: 4.9 / 5 (59 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Errol Quitzon

Birthday: 1993-04-02

Address: 70604 Haley Lane, Port Weldonside, TN 99233-0942

Phone: +9665282866296

Job: Product Retail Agent

Hobby: Computer programming, Horseback riding, Hooping, Dance, Ice skating, Backpacking, Rafting

Introduction: My name is Errol Quitzon, I am a fair, cute, fancy, clean, attractive, sparkling, kind person who loves writing and wants to share my knowledge and understanding with you.