A Survey of Probabilistic Record Matching Models, Techniques and Tools

Abstract

Probabilistic record linkage regards the use of stochastic decision models to solve the problem of record linkage (also known as record matching). Data quality has became a key aspect in many institutions and the demand for novel, effective techniques is increasing. Record linkage in general has been studied in the last three decades and a solid probabilistic decision framework has been proposed along with several extensions and specific estimation methods. This paper is a survey work narrowed to the most recent and promising approaches also including a selection of data cleansing tools based on probabilistic decision models.