In Math: We have N jobs. Every day we generate a vector of N integers between 0 and 100. We feed this vector it into a black box that’s mostly just Google. If we do a good job, the black box rewards us with many job applications.
By putting the “right” jobs at the top of the page (loaded word there), we can improve upon a chronological sort. Before we can identify the right jobs, we need to know whether Google actually rewards higher-placed jobs and, if so, by how much.
Sometimes, just to justify all the simplifying assumptions I’m going to make later, I start a project by writing down the math equation I’d like to solve. I imagine ours looks something like this:
- S is our vector of relevancy scores. There are N jobs, so each s_i (an element of S) corresponds to a different job. A function called applies turns S into a scalar. Each day we’d like to find the the S that makes that number as large as possible — the relevancy scores that generate the greatest number of job applications for intelycare.com/jobs.
- applies is a fine objective function on Day 0. Later on our objective function could change (e.g. revenue, lifetime value). Applies are easy to count, though, and lets me spend my complexity tokens elsewhere. It’s Day 0. We’ll come back to these questions on Day 1.
- Problem. We know nothing about the applies function until we start feeding it relevancy scores. 😱
First things first: Seeing that we know nothing about the applies function, our first question is, “how do we choose an ongoing wave of daily S vectors so we can learn what the applies function looks like?”
- We know (1) which jobs are boosted and when, (2) how many applies each job receives each day. Note the absence of page-load data. It’s Day 0! You might not have all the data you want on Day 0, but if we’re clever we can make do with what we have.
- Note the subtle change in our objective. Earlier our goal was to accomplish some business objective (maximize applies), and eventually we’ll come back to that goal. We’ve taken off the business hat for a minute and put on our science hat. Our only goal now is to learn something. If we can learn something, we can use it (later) to help achieve some business objective.🤓
- Since our goal is to learn something, above all we want to avoid learning nothing. Remember it’s Day 0 and we have no guarantee that the Google Monster will pay any attention to how we sort things. We may as well go for broke and make sure this thing even works before throwing more time at improving it.
How do we choose an initial wave of daily S vectors? We’ll give every job a score of 0 (default score), and choose a random subset of jobs to boost to 100.
- Maybe I’m stating the obvious, but it has to be random if you want to isolate the effect of page-position on job applications. We want the only difference between boosted jobs and other jobs to be their relative ordering on the page as determined by our relevance scores. [I can’t tell you how many phone screens I’ve conducted where a candidate doubled down on running an A/B test with the good customers in one group and the bad customers in the other group. In fairness, I’ve also vetted marketing vendors who do the same thing 😭].
- The randomness will be nice later on for other reasons. It’s likely that some jobs benefit from page-placement more than others. We’ll have an easier time identifying those jobs with a big, randomly-generated dataset.
We know we can’t boost every job. Anytime I put a job at the top of the page, I bump all other jobs down the page (classic example of a “spillover”).
- The spillover gets worse as I boost more and more jobs, I impose a greater and greater punishment on all other jobs by pushing them down in the sort (including other boosted jobs).
- With little exception, nursing jobs are in-person and local, so any boosting spillovers will be limited to other nearby jobs. This is important.
How do we choose an initial wave of daily S vectors? (final answer) We’ll give every job a score of 0 (default score), and choose a random subset of jobs to boost to 100. The size of the random subset will vary across geographies.
- We create 4 groups of distinct geographies with roughly the same amount of web traffic in each group. Each group is balanced along the key dimensions we think are important. We randomly boost a different percentage of jobs in each group.
Here’s how it looked…