Building a Taste Graph: The basic principles

17th Feb `15, 04:30 PM in Analytics

This is the second part of Vijaya Kumar Ivaturi’s earlier post Big Data – Not just a matter of…

I Vijaya Kumar Contributor

This is the second part of Vijaya Kumar Ivaturi’s earlier post Big Data – Not just a matter of scale: 5 new trends and twists.

In my previous post, we have discussed about five new trends, twists and some key differences in big data analytics in terms of its conceptual model. We’ve also seen some recent advances in recommender systems and how the the emergence of behavioural systems theory added a new dimension to the rise of choice engines. In this post,  let’s talk about the core concept of taste graph and how it is built based on the affinity of cross categories for a class of users.Social curation is emerging as a significant input to decision making in the online world and driving buying behaviour of consumers in the physical world as well. The rise of decision science as one of the major disciplines in computer science in the recent years is the result of this trend.

The core concept of socially assisted buying, is to enable the buyer with both relevant and novel choices in real time context. It is important to note that relevance is driven more by the context of the user while novelty is derived from the taste of the user. Most of the recommender systems focus on the single category of choice like books or movies or food, and the choice is a list driven by recent or past transactional history of the user in that category. This means that it is not an easy task to derive choices if there is no transaction history in the specific category.

The concept of a taste graph is based on the affinity of cross categories for a class of users. In other words, the item to item affinity across different categories is a fairly good approximation to the kind of behaviour expected of a consumer in a specific context. In behavioural science, it is often argued that a taste which is a cross category linkage is category invariant at a higher level of abstraction.

For example, a buyer who shops for organic vegetables is most likely to shop for organic tea, and this is perhaps a simpler case to guess given the fact that it is related to food overall. If the affinity of organic food is very close to a different category of organic clothing in a region, it can be argued that a buyer of organic vegetable and organic Tea will most likely prefer an organic cotton shirt if provided an option to choose. In other words, the buyer has got a taste for “organic” in general. As an organic consumer is considered to be sensitive to nature in terms of actions and practices, it can be extrapolated to a fair degree of accuracy that such a buyer will prefer to stay in an eco-friendly resort.

Hence, the taste model for consumers is a higher order classification of the first order item to item affinities across categories. This is the reason taste is category independent, and one can guess with a higher level of confidence in the selection of an item in a new category even if there is no transaction history in that category. Another popular example is a taste for mechanical precision across watches, pens, tool sets and cars. Hence, an ontology based on taste is a more effective driver of choice in both online and physical worlds.

As this topic of taste is in its infancy still, it is difficult to build such ontology a priori and impose it on the data that is collected. A more pragmatic way is to build the first order affinities across the items in different categories and blend it with user context to develop curated choices for each domain like retail, media, hospitality, banking, etc. It does drive some novelty into the list of curated choices while being aware of the user and domain context.

Computing taste graphs

Having covered the basic principles of taste and choice, it is time to explore the computational aspects of this approach to process the vast data sets across categories.

From a topological view point, the connections across items in different categories are best represented by a graph. Graph theory in theoretical computer science is a popular choice to represent connections between different items or natural events in the real world as it is both visually and computationally similar to the use case it is designed to represent. At its simplest level, a graph is a collection of nodes, and they are connected by the edges called vertices.

To model a physical world phenomenon, nodes are used to represent entities (nouns) and edges are used to represent actions or relationships (verbs). For this reason, it is not a surprise that many social platforms like Facebook use graphs at the backend to represent, analyse and model the online relationships and behaviour.

While it is an elegant structure to represent the concept of taste as a graph, called taste graph, it is computationally intensive when the items across categories increase or the connections across between them increase in scale. This is because of the fundamental objective in the graph model is to compute the semantic distance between the nodes to find the relative affinity strength across the different paths. This is the reason why most of these systems are implemented in cloud based infrastructure and leverage the advances in Map-Reduce models and Graph data bases. Graph processing functions operate like re-calculate functions in spread sheets, and they need to re-compute from ground up when new data gets added. Hence, it is in general performed as a batch process.

As the scale becomes very large, the affinities between the categories do no change much and are not impacted by small changes in the connections. It is very interesting to note that the principle of Markov chains is the computational equivalent to the behavioural science view that taste is category independent.

Shades of grey that matter

The usage of taste graphs in driving curated choice for the user is a complex blend of mathematics, computer science, and behavioural science. It is also a combination of algorithmic complexity, computational efficiency and domain heuristics to manage the graph’s scale, speed, and accuracy. This is what results in being “nearly right” rather than being “exactly right”. The shades of grey in computational choice make it an interesting and relevant choice in life for the consumer.

As the saying goes, it is those imperfections that make life interesting, and this is the science and art of choice!

Vijaya Kumar Ivaturi (IVK) is the CTO and co-founder of Singapore-based Big Data start-up Crayon Data. This post originally appeared here.