written by Will Koehrsen by Oct 2, 2018

How deep learning can represent War and Peace as a vector

Applications of neural networks have expanded significantly in recent years from image segmentation to natural language processing to time-series forecasting. One notably successful use of deep learning is embedding, a method used to represent discrete variables as continuous vectors. This technique has found practical applications with word embeddings for machine translation and entity embeddings for categorical variables.

In this article, I’ll explain what neural network embeddings are, why we want to use them, and how they are learned. We’ll go through these concepts in the context of a real problem I’m working on: representing all the books on Wikipedia as vectors to create a book recommendation system.


Embeddings

An embedding is a mapping of a discrete — categorical — variable to a vector of continuous numbers. In the context of neural networks, embeddings are low-dimensional, learned continuous vector representations of discrete variables. Neural network embeddings are useful because they can reduce the dimensionality of categorical variables and meaningfully representcategories in the transformed space.

Neural network embeddings have 3 primary purposes:

  1. Finding nearest neighbors in the embedding space. These can be used to make recommendations based on user interests or cluster categories.
  2. As input to a machine learning model for a supervised task.
  3. For visualization of concepts and relations between categories.

This means in terms of the book project, using neural network embeddings, we can take all 37,000 book articles on Wikipedia and represent each one using only 50 numbers in a vector. Moreover, because embeddings are learned, books that are more similar in the context of our learning problem are closer to one another in the embedding space.

Neural network embeddings overcome the two limitations of a common method for representing categorical variables: one-hot encoding.

Limitations of One Hot Encoding

The operation of one-hot encoding categorical variables is actually a simple embedding where each category is mapped to a different vector. This process takes discrete entities and maps each observation to a vector of 0s and a single 1 signaling the specific category.

The one-hot encoding technique has two main drawbacks:

  1. For high-cardinality variables — those with many unique categories — the dimensionality of the transformed vector becomes unmanageable.
  2. The mapping is completely uninformed: “similar” categories are not placed closer to each other in embedding space.

The first problem is well-understood: for each additional category — referred to as an entity — we have to add another number to the one-hot encoded vector. If we have 37,000 books on Wikipedia, then representing these requires a 37,000-dimensional vector for each book, which makes training any machine learning model on this representation infeasible.

The second problem is equally limiting: one-hot encoding does not place similar entities closer to one another in vector space. If we measure similarity between vectors using the cosine distance, then after one-hot encoding, the similarity is 0 for every comparison between entities.

This means that entities such as War and Peace and Anna Karenina (both classic books by Leo Tolstoy) are no closer to one another than War and Peace is to The Hitchhiker’s Guide to the Galaxy if we use one-hot encoding.

# One Hot Encoding Categoricalsbooks = ["War and Peace", "Anna Karenina", 
"The Hitchhiker's Guide to the Galaxy"]
books_encoded = [[1, 0, 0],
[0, 1, 0],
[0, 0, 1]]
Similarity (dot product) between First and Second = 0
Similarity (dot product) between Second and Third = 0
Similarity (dot product) between First and Third = 0

Considering these two problems, the ideal solution for representing categorical variables would require fewer numbers than the number of unique categories and would place similar categories closer to one another.

# Idealized Representation of Embeddingbooks = ["War and Peace", "Anna Karenina", 
"The Hitchhiker's Guide to the Galaxy"]
books_encoded_ideal = [[0.53, 0.85],
[0.60, 0.80],
[-0.78, -0.62]]
Similarity (dot product) between First and Second = 0.99
Similarity (dot product) between Second and Third = -0.94
Similarity (dot product) between First and Third = -0.97

To construct a better representation of categorical entities, we can use an embedding neural network and a supervised task to learn embeddings.


Learning Embeddings

The main issue with one-hot encoding is that the transformation does not rely on any supervision. We can greatly improve embeddings by learning them using a neural network on a supervised task. The embeddings form the parameters — weights — of the network which are adjusted to minimize loss on the task. The resulting embedded vectors are representations of categories where similar categories — relative to the task — are closer to one another.

For example, if we have a vocabulary of 50,000 words used in a collection of movie reviews, we could learn 100-dimensional embeddings for each word using an embedding neural network trained to predict the sentimentality of the reviews. (For exactly this application see this Google Colab Notebook). Words in the vocabulary that are associated with positive reviews such as “brilliant” or “excellent” will come out closer in the embedding space because the network has learned these are both associated with positive reviews.

In the book example given above, our supervised task could be “identify whether or not a book was written by Leo Tolstoy” and the resulting embeddings would place books written by Tolstoy closer to each other. Figuring out how to create the supervised task to produce relevant representations is the toughest part of making embeddings.

Implementation

In the Wikipedia book project (complete notebook here), the supervised learning task is set as predicting whether a given link to a Wikipedia page appears in the article for a book. We feed in pairs of (book title, link) training examples with a mix of positive — true — and negative — false — pairs. This set-up is based on the assumption that books which link to similar Wikipedia pages are similar to one another. The resulting embeddings should therefore place alike books closer together in vector space.

The network I used has two parallel embedding layers that map the book and wikilink to separate 50-dimensional vectors and a dot product layer that combines the embeddings into a single number for a prediction. The embeddings are the parameters, or weights, of the network that are adjusted during training to minimize the loss on the supervised task.

In Keras code, this looks like the following (don’t worry if you don’t completely understand the code, just skip to the images):

# Both inputs are 1-dimensional book = Input(name = ‘book’, shape = [1]) link = Input(name = ‘link’, shape = [1]) # Embedding the book (shape will be (None, 1, 50)) book_embedding = Embedding(name = ‘book_embedding’, input_dim = len(book_index), output_dim = embedding_size)(book) # Embedding the link (shape will be (None, 1, 50)) link_embedding = Embedding(name = ‘link_embedding’, input_dim = len(link_index), output_dim = embedding_size)(link) # Merge the layers with a dot product along the second axis (shape will be (None, 1, 1)) merged = Dot(name = ‘dot_product’, normalize = True, axes = 2)([book_embedding, link_embedding]) # Reshape to be a single number (shape will be (None, 1)) merged = Reshape(target_shape = [1])(merged) # Output neuron out = Dense(1, activation = ‘sigmoid’)(merged) model = Model(inputs = [book, link], outputs = out) # Minimize binary cross entropy model.compile(optimizer = ‘Adam’, loss = ‘binary_crossentropy’, metrics = [‘accuracy’])

Although in a supervised machine learning task the goal is usually to train a model to make predictions on new data, in this embedding model, the predictions can be just a means to an end. What we want is the embedding weights, the representation of the books and links as continuous vectors.

The embeddings by themselves are not that interesting: they are simply vectors of numbers:

However, the embeddings can be used for the 3 purposes listed previously, and for this project, we are primarily interested in recommending books based on the nearest neighbors. To compute similarity, we take a query book and find the dot product between its vector and those of all the other books. (If our embeddings are normalized, this dot product is the cosine distance between vectors that ranges from -1, most dissimilar, to +1, most similar. We could also use the Euclidean distance to measure similarity).

This is the output of the book embedding model I built:

Books closest to War and Peace.Book: War and Peace              Similarity: 1.0
Book: Anna Karenina Similarity: 0.79
Book: The Master and Margarita Similarity: 0.77
Book: Doctor Zhivago (novel) Similarity: 0.76
Book: Dead Souls Similarity: 0.75

(The cosine similarity between a vector and itself must be 1.0). After some dimensionality reduction (see below), we can make figures like the following:

We can clearly see the value of learning embeddings! We now have a 50-number representation of every single book on Wikipedia, with similar books closer to one another.


Embedding Visualizations

One of the coolest parts about embeddings are that they can be used to visualize concepts such as novel or non-fiction relative to one another. This requires a further dimension reduction technique to get the dimensions to 2 or 3. The most popular technique for reduction is itself an embedding method: t-Distributed Stochastic Neighbor Embedding (TSNE).

We can take the original 37,000 dimensions of all the books on Wikipedia, map them to 50 dimensions using neural network embeddings, and then map them to 2 dimensions using TSNE. The result is below:

(TSNE is a manifold learning technique which means that it tries to map high-dimensional data to a lower-dimensional manifold, creating an embedding that attempts to maintain local structure within the data. It’s almost exclusively used for visualization because the output is stochastic and it does not support transforming new data. An up and coming alternative is Uniform Manifold Approximation and Projection, UMAP,which is much faster and does support transform new data into the embedding space).

By itself this isn’t very useful, but it can be insightful once we start coloring it based on different book characteristics.

We can clearly see groupings of books belonging to the same genre. It’s not perfect, but it’s still impressive that we can represent all books on Wikipedia using just 2 numbers that still capture the variability between genres.

The book example (full article coming soon) shows the value of neural network embeddings: we have a vector representation of categorical objects that is both low-dimensional and places similar entities closer to one another in the embedded space.

Bonus: Interactive Visualizations

The problem with static graphs is that we can’t really explore the data and investigate groupings or relationships between variables. To solve this problem, TensorFlow developed projector, an online application that lets us visualize and interact with embeddings. I’ll release an article on how to use this tool shortly, but for now, here’s the results:

Conclusions

Neural network embeddings are learned low-dimensional representations of discrete data as continuous vectors. These embeddings overcome the limitations of traditional encoding methods and can be used for purposes such as finding nearest neighbors, input into another model, and visualizations.

Although many deep learning concepts are talked about in academic terms, neural network embeddings are both intuitive and relatively simple to implement. I firmly believe that anyone can learn deep learning and use libraries such as Keras to build deep learning solutions. Embeddings are an effective tool for handling discrete variables and present a useful application of deep learning.

Collected at:  https://towardsdatascience.com/neural-network-embeddings-explained-4d028e6f0526 

52 thoughts on “Neural Network Embeddings Explained”

  1. Howdy very cool site!! Man .. Excellent .. Superb .. I will bookmark your blog and take the feeds additionally…I am happy to find a lot of helpful information right here in the put up, we want work out more techniques in this regard, thank you for sharing.

  2. I am no longer certain where you are getting your info, however great topic.
    I must spend some time learning much more or working out more.
    Thanks for great info I used to be looking for this information for my mission.

  3. I do consider all of the ideas you’ve presented on your post.
    They are really convincing and will definitely work.

    Still, the posts are very quick for newbies. May you please lengthen them a bit from subsequent time?
    Thanks for the post.

  4. Greate post. Keep writing such kind of information on your blog.
    Im really impressed by it.
    Hey there, You have done an excellent job. I will definitely digg it and for my part recommend to my friends.
    I am confident they’ll be benefited from this site.

  5. Thanks for your personal marvelous posting! I definitely enjoyed reading it, you
    are a great author. I will be sure to bookmark your blog and may come back at some point.
    I want to encourage yourself to continue your great writing, have a nice afternoon!

  6. Please let me know if you’re looking for a writer for your blog.
    You have some really great posts and I think I would be a good asset.

    If you ever want to take some of the load off, I’d really like to write some content for your
    blog in exchange for a link back to mine. Please shoot me an e-mail if interested.
    Cheers!

  7. Hi there this is kind of of off topic but I was wanting to know if blogs use WYSIWYG editors or if you
    have to manually code with HTML. I’m starting a blog soon but have no coding expertise so I wanted to
    get guidance from someone with experience. Any help would be enormously appreciated!

  8. I like the helpful info you provide in your articles.
    I will bookmark your weblog and check again here frequently.
    I’m quite sure I will learn many new stuff right here!
    Best of luck for the next!

  9. Good day very cool blog!! Man .. Beautiful .. Amazing .. I’ll bookmark your web site and take the feeds additionally?
    I am satisfied to find a lot of helpful information right here within the post, we need develop extra techniques
    on this regard, thanks for sharing. . . .

    . .

  10. Please let me know if you’re looking for a author for your blog.
    You have some really great articles and I feel I would be a good asset.
    If you ever want to take some of the load off, I’d absolutely love to
    write some material for your blog in exchange for a
    link back to mine. Please shoot me an e-mail if interested.
    Thanks!

  11. You’re so awesome! I don’t think I’ve read something like that before.
    So great to discover another person with genuine thoughts on this
    issue. Really.. many thanks for starting this up.
    This website is something that is needed on the internet,
    someone with a little originality!

  12. Very nice post. I just stumbled upon your weblog and wanted to say that I’ve really
    enjoyed browsing your blog posts. In any case I will be subscribing to your
    rss feed and I hope you write again very soon!

  13. You actually make it appear so easy together with your presentation however I in finding this matter to
    be actually something that I think I would never understand.
    It sort of feels too complex and extremely huge for me.
    I’m taking a look ahead to your next put
    up, I’ll try to get the cling of it!

  14. I was very happy to find this site. I wanted to thank you for your
    time for this wonderful read!! I definitely really liked every bit
    of it and I have you saved to fav to see new things in your website.

  15. This is very interesting, You’re a very skilled blogger.
    I’ve joined your feed and look forward to seeking more of your magnificent post.

    Also, I have shared your site in my social networks!

  16. May I simply just say what a relief to find
    someone that actually understands what they are talking about over the internet.

    You definitely realize how to bring an issue to light and make it important.
    More people ought to check this out and understand this side of the story.

    I can’t believe you aren’t more popular because you definitely
    possess the gift.

  17. What’s Happening i am new to this, I stumbled upon this I have
    discovered It absolutely helpful and it has helped me out loads.
    I am hoping to contribute & assist different users like its helped me.
    Great job.

  18. Hi! Someone in my Myspace group shared this website with us so I came to
    take a look. I’m definitely enjoying the information. I’m book-marking and will be tweeting this
    to my followers! Outstanding blog and fantastic style and design.

  19. Wow that was unusual. I just wrote an extremely long comment but after I clicked
    submit my comment didn’t show up. Grrrr… well I’m not writing all that over again.
    Regardless, just wanted to say wonderful blog!

  20. I am curious to find out what blog platform you are using?
    I’m experiencing some minor security problems
    with my latest site and I’d like to find something
    more risk-free. Do you have any suggestions?

  21. Hi there! Someone in my Myspace group shared this website with us so I came to
    take a look. I’m definitely loving the information. I’m book-marking and
    will be tweeting this to my followers! Superb blog
    and terrific style and design.

  22. It’s perfect time to make some plans for the longer term and
    it’s time to be happy. I’ve read this submit and if I may
    I want to counsel you few fascinating things or advice.
    Perhaps you could write subsequent articles relating to this article.
    I desire to learn more things approximately
    it!

  23. Hello, i think that i saw you visited my blog thus
    i got here to return the choose?.I’m attempting to find issues
    to enhance my site!I guess its good enough to use
    a few of your ideas!!

  24. I’d like to thank you for the efforts you have put in penning this site.
    I am hoping to view the same high-grade content from you in the future
    as well. In truth, your creative writing abilities has inspired me to get my
    own website now 😉

  25. Thanks for sharing superb informations. Your web-site is so cool. I am impressed by the details that you’ve on this web site. It reveals how nicely you understand this subject. Bookmarked this website page, will come back for more articles. You, my pal, ROCK! I found just the info I already searched all over the place and just couldn’t come across. What an ideal website.

Leave a Reply

Your email address will not be published. Required fields are marked *