Stanford University

WORKING PAPER

job2vec:
Using Language Models to Understand Wage Premia

October 8, 2021

Show buttons

Does the text content of a job posting predict the salary offered for the role? There is ample evidence that even within an occupation, a job’s skills and tasks affect the job’s salary. Capturing this fine-grained information from postings can provide real-time insights on prices of various job characteristics. Using a new dataset from Greenwich.HR with salary information linked to posting data from Burning Glass Technologies, I apply natural language processing (NLP) techniques to build a model that predicts salaries from job posting text. This follows the rich tradition in the economics literature of estimating wage premia for various job characteristics by applying hedonic regression. My model explains 73 percent of the variation, twelve percentage points over a model with occupation and location fixed effects. I apply this model to the question of online certifications by creating counterfactual postings and estimating the salary differential. I find that there is substantial variation in the predicted value of various certifications. As firms and workers make strategic decisions about their human capital, this information is a crucial input.

Stanford University