Community Content-sharing Without Idols Nor Masters
Tell-and-Show has two parts to the project. The first part is the data to model component, what we call the training part and it is currently ongoing.
The second part, the production component is planned and designed, waiting for the training part to deliver base models.
We also have a technical write-up.
Most recommendation system use Collaborative-Filtering (CF), a technique where large datasets of preferences by different users over a set of items are recorded (a preference matrix). CF algorithms use the preference matrix to suggest new items based, for example, on the similarity between two users (e.g., if you liked similar items as User 712, CF will suggest you other items possibly unknown to you that User 712 really enjoyed).
In Tell-and-Show we are taking a different approach based on textual descriptions of items and a trained preference text metric. In general, it is possible to tell whether two short pieces of text are similar in meaning (for example, using word embeddings). Tell-and-Show goes one step further: our model assesses whether two item descriptions will be both liked or disliked given their differences in meaning. For example, if a user has a preference for items described as:
blue Japanese car
and a new item is described as
blue American car
and another described as
green Japanese cara text-based distance will say both texts are equally similar but Tell-and-Show preference metric will say they are not because color is less important than make (in the case of cars). (See here for technical details.) This preference model is trained from preference data submitted by volunteers like yourself.
Given a trained function that produces a preference vector for text descriptions, it possible to map a large set of item descriptions (e.g., Wikimedia Commons) and find central items to the map. A person preferences over these items constitue their preference profile. It is not just preferences over random items, but over items that are example of large classes and can be used to assess new items later on.
This preference profile is very private and we believe it should not be shared with third parties, just kept in the user's device at all times.
When it comes the time to use the preference profile, a server sends a random set of items (what we call the BucketAPI) that the web browser ranks client-side using the preference profile. The ranking is done as follows: items more similar to items in the preference profile that the user liked should rank higher (and viceversa for items the user did not like). Other rankings are possible, allowing the user to switch between discovery and just-on-target recommendations.