You are on page 1of 8

Property of Nick Bucheleres

MerLin SugX Algorithm


Do not distribute or reproduce any of the Intellectual Property outlined in this
document. This is a private document for internal use only. If you have any
questions or comments, reach the author here: nick.bucheleres@gmail.com

Nick Bucheleres
August 30, 2016

Feature-based User Suggestion Algorithm


The algorithm I outline here is a variance weighted linear regression kernel
function with error correction terms, the goal of which is to intelligently suggest
data to users, which is more likely to cause a desired action by the target user.
Definitions

Target user: The user who we are serving dynamic content to


Suggested user: A user who, given our assumptions, may reasonably be a
suggestion for the target user. Note: in the case of ******, we limit our
training set of suggested users to the city (or state) of the target user
Feature error: The degree to which a suggested users feature
underperforms the feature mean of a training set of all possible suggested
users
Total suggested user error (SigVWE): The sum of all feature errors for a
given user, ultimately used to rank the set of all possible suggestions on
the basis of quality and relevance
Principal Component Feature: A feature with low variance within the
training set, which implies that the feature value is a significant
constituent of the overall identity of the user model
Training set: The set of all possible suggestions for the target user, usually
generated based on the intersection of meta-data tokens that the target
and user share
Free parameter: A constant that is set by the data scientist at will in order
to affect the weighting of output

User Features
The algorithm takes as inputs quantized user feature meta-data, where each value
is a floating number. The methodology can be extended to categorical, Boolean,
and string meta-data, but the scope of this paper is to describe the methodology
behind the floating number case.
User features can include, but are not limited to: message reply rate, ***** term
tokens, in-app ***** *****, number of ***** sent/received, frequency of a

particular behavior, and so on. An example of a user feature vector is below, as is


the feature errors.
[ 7, 1, 0, 0, 0, 0, 0, 0, 2, 0, 0, 64 ] 'user_features'
=================================*
active:
[{ step: 'notification_opens', error: 5.5099 },
{ step: 'conversations', error: 5.075},
{ step: 'messages_sent', error: 1.412},
{ step: 'messages_response_rate', error: 1.041},
{ step: 'profiles_visited', error: 0.547 },
{ step: 'searches', error: 0 },
{ step: 'notification_clicks', error: 0 } ],
passive:
[ { step: 'prof_steps_completed', error: 6.983 },
{ step: 'liked_by', error: 1.8328 },
{ step: 'profile_visited_by', error: 0.9179 },
{ step: 'random', error: 0.0266 },
{ step: 'number_of_skills', error: 0 },
{ step: 'messages_received', error: 0 } ] }

Example of user error feature object: zeros imply that the suggested user
has a better feature score than the training feature mean for all
suggested users.
If it makes sense for suggestion purposes, user features can be broken out into
passive and active categories. An example of a passive feature would be number
of skills listed on profile, and an active feature would be, number of messages
sent. This may be helpful for offering up multiple concurrent suggestions to a
target user in order to enhance profile richness (by targeting passive features),
while also encouraging catalytic interaction (by targeting active features).
Additionally, user features that we want to emphasize can be done so through
free parameters. Say we want to ensure that the suggested user will answer a
message that our target user sends them. We can increase the error associated
with users who have a suboptimal message_response_rate feature by adding a
weight

{ step: 'messages_response_rate', error: 1.041 }


large more severely punishes suggested users who have a
message_response_rate below the training set feature mean, decreasing
the suggested users rank in the final returned list of suggestions

Starting with the Kernel Function


The kernel function delivers feature-specific errors (and thus total error) for each
suggested user. The first step in deriving the error of a single suggested users
feature set against a noise-reduced training population (T) is to gather training
averages. The pre-kernel algorithm does so in a noise-reduced way through
variance-weighted linear regression.
Note: Variance-weighting also allows us to run Principal Component
Analysis (PCA)-like processing on individual user and population error
sets by determining principal and non-principal user features, showing
us which features are significant

The algorithm takes an initial pass on the training set, extracting their feature
quantizations, and then calculates a mean feature vector

TRAINING_FEATURE_MEAN, !"#$ =

!
!

! !,!|!

and a feature variance vector, both of which are of size length(j) (number of
features in the user model)
TRAINING_FEATURE_VARIANCE_SQUARED, !"#$"%&'

Which are used in the following function in order to return the VarianceWeighted Error (VWE)

Variance-Weighted Error
The VWE is the variance-weighted kernel function values for user i, for each user
[0, n], and free parameter :

! = !"#$ , ! =

2 !"#$"%&'

Expanding this equation, we can see how each training and user feature value is
used to build up the calculation, where ! is the error for user ! against the
training set of all possible suggestions
!

For each user is feature j in the sample population, ! , let us determine


their feature error by calculating !
!

! =

!"#$ !

2 (!"#$"%&' )!
!

where user is error array ! is made up of feature errors ! for each


feature [0, k]
Important: We are weighting each feature error value by the inverse of the total
training feature variance, in order to give smaller errors for features whose
training values have high covariance, indicating they are not principal component
features of the user model.
Token-Based Suggestion Generation
We have defined the kernel function, which is used to grade user features against
a suggestion training set, and evaluate the richness of their profile in a completely
implantation-agnostic sense. We want to put this corpus to work and actually
deliver suggestions, though. In order to generate and rank possible suggestions
by their (SigVWE) value, we need to be able to dynamically populate our training
set (the pool of all possible suggestions).

Assumption 1: a plausible suggestion for a target user is a factor of target


users tokenized ***** *****.
Lets assume that we want to generate suggestions for a single target user. We
first start by tokenizing user ! s ***** ***** and storing a hash of token
frequency in memory for that user, freq(!!"#$%& ). For a target users ***** token
!"#$%!

, we store the frequency of token j in a variable (! ), for all tokens [0,

length(j)]
The training set is the set of users who share at least 1 token with the target users
tokenized ***** *****. This is one of many (token-based) suggestion generation
methods.
Error Correction (Bias) Terms
Based on our assumptions, we can add error correction terms to our kernel
function that forgive total suggested user kernel error (VWE) on the basis of
Assumption 1.
We already have the ***** token frequency for the target user freq(!!"#$%& ), and
we are going to forgive possible suggested users error proportional to the
intersection of the frequency of terms the target and suggested users share
!

! =

for target user ! and possible suggested user ! over all tokens
Error correction term m for user (! ) takes the form

!!

(!! )
!

For each token j that the target and suggestion share, and free parameter

Our VWE function now looks like

! =

!"#$

2 !"#$"%&'

!!

for each error correction term ! for user i


Returning Ranked Suggestions
Our ideal suggestion is a user with a negative VWEimplying that kernel
function value of zero, which indicates that the suggested user performs better on
every user model feature than the training set mean. Said users VWE then goes
negative when error correction terms are consideredthe more negative a
suggested users error, the stronger the error correction terms, and thus the more
relevant the suggested user is to the target user.
Assumption 2: in order to maximize relevance to the target user, we must
minimize the total error SigVWE between the suggested user and
training mean, as well as the error correction between the suggested and
target users
! =

(! )

for all features j of user i


Therefore, the most cogent suggestionthe suggestion that most effectively
minimizes errorfollows the form
!

[] = !!

!
!!!

delivers the set of n users whose combined error is minimized compared


another set of the same size; for a single suggestion, n=1

Closing Remarks
From a high level, the algorithm can be thought of as having two partswe have
the feature-based variance-weighted kernel function, which grades users
according to the performance of their meta-data user features, compared to a
noise-reduced total suggestion training population (based on Assumption 1).
Then we have the error correction terms, which reward suggested users for the
degree of their relevance to the target used, primarily based on tokenized metadata (intersection of target user ***** ***** and suggested users profile tokens).
These error correction terms can (and should) be added whenever a new source
of suggested users is added to the training population.
Further Optimization
We can optimize this algorithm by further pre-filtering inputs, which, like in the
example above, has the effect of adding error correction terms based on the
degree to which the target and suggested user share meta-data and feature
scores.
Next Steps: Feature Error Weights
Granular control over the bias of the algorithms suggestions can be achieved by
weighting significant features. Variance-weighting achieves this to a certain
degree, by implicitly emphasizing principal (low variance) component features,
but we can explicitly influence the ranking of suggestions through feature
weights.
These weights can be set through trial and error, but there is room to use
machine learning to training proper weights (including zero) for each user model
feature.

You might also like