In the Data Science tradition of deploying new code on Friday, giving it the whole weekend to be broken, we’re pushing out a new version of the bright score code today.
As far as the changes go, the nerdy explanation is that we did a Principal Component Analysis (PCA) for some of our strongest features to create an orthogonal feature space that reduces double-counting……. What this means is that we’re combining some of the features so that we don’t give too much extra credit for getting a good score on similar features. For example, we have 2 different metrics for matching the text in a person’s resume and the text in a job description. One is fuzzy match of words and one is an exact match. If a person scores high on the exact match, then they’re going to score well on the fuzzy match, but a person won’t necessarily score well on the exact matching with a job if they have a good fuzzy match. So, instead of treating these 2 features as completely independent, we combine them in an equation that figures out how independent they are and scores them based on that.
With these changes, we should see a reduction in over-scoring and thus false positives. The speed should stay about the same. If you notice anything funky going down, alert your nearest DS member and we’ll check it out.