Skip to main content

Embedding Toolkit

Embedding Toolkit Source

go-ctr's Embedding is the implementation of Item2vec, which mainly depends on the Item ID sequence formed by user behavior. It needs to call the function separately for training. The default vector distance sorting implementation is based on cosine similarity.

ELI5

You just need to provide a sequence of Items returned by SQL like this:

SELECT movieId FROM ratings_train r WHERE r.rating > 3.5 order by userId, timestamp
Embedding Toolkit can generate a Tensor (Slice of Float64) code for each `movieId`, and the encoded vector distance can represent the similarity of movies. The following is a movie similar to "Kung Fu Panda":  ```
read 9520886 words 12.169282375s
trained 9519544 words 17.155356791s
Search Embedding of:
59784 "Kung Fu Panda (2008)" Action|Animation|Children|Comedy
RANK | WORD | SIMILARITY | TITLE & GENRES
-------+-------+-------------+-------------
1 | 60072 | 0.974392 | Wanted (2008) Action|Thriller
2 | 60040 | 0.974080 | Incredible Hulk, The (2008) Action|Fantasy|Sci-Fi
3 | 60069 | 0.973728 | WALL·E (2008) Adventure|Animation|Children|Comedy|Romance|Sci-Fi
4 | 60074 | 0.970396 | Hancock (2008) Action|Comedy|Drama|Fantasy
5 | 63859 | 0.969845 | Bolt (2008) Action|Adventure|Animation|Children|Comedy
6 | 57640 | 0.969305 | Hellboy II: The Golden Army (2008) Action|Adventure|Comedy|Fantasy|Sci-Fi
7 | 58299 | 0.967733 | Horton Hears a Who! (2008) Adventure|Animation|Children|Comedy
8 | 59037 | 0.966410 | Speed Racer (2008) Action|Adventure|Children
9 | 59315 | 0.964556 | Iron Man (2008) Action|Adventure|Sci-Fi
10 | 58105 | 0.963332 | Spiderwick Chronicles, The (2008) Adventure|Children|Drama|Fantasy

Please see Full Test Example for the complete code of the above output: feature/embedding/wordemb_test.go

Interfaces

User behavior can be provided by the following interfaces:

type ItemEmbedding

ItemEmbedding is an interface used to generate item embedding with item2vec model by just providing a behavior based item sequence. Example: user liked items sequence, user bought items sequence, user viewed items sequence

type ItemEmbedding interface {
ItemSeqGenerator(context.Context) (<-chan string, error)
}

Example

itemSeq, err := iSeq.ItemSeqGenerator(ctx)
if err != nil {
return
}
mod, err = embedding.TrainEmbedding(itemSeq, ItemEmbWindow, ItemEmbDim, 1)
return

Example in Training Framework: recommend/rcmd.go