Noise-tolerant approximate blocking for dynamic real-time entity resolution

[thumbnail of 237_Liang.pdf]
Preview
Text - Accepted Version
· Please see our End User Agreement before downloading.
| Preview

Please see our End User Agreement.

It is advisable to refer to the publisher's version if you intend to cite from this work. See Guidance on citing.

Add to AnyAdd to TwitterAdd to FacebookAdd to LinkedinAdd to PinterestAdd to Email

Liang, H., Wang, Y., Christen, P. and Gayler, R. (2014) Noise-tolerant approximate blocking for dynamic real-time entity resolution. In: The 18th Pacific-Asia Conference on Knowledge Discovery and Data Mining, 13-16 May 2014, Taiwan, pp. 449-460.

Abstract/Summary

Entity resolution is the process of identifying records in one or multiple data sources that represent the same real-world entity. This process needs to deal with noisy data that contain for example wrong pronunciation or spelling errors. Many real world applications require rapid responses for entity queries on dynamic datasets. This brings challenges to existing approaches which are mainly aimed at the batch matching of records in static data. Locality sensitive hashing (LSH) is an approximate blocking approach that hashes objects within a certain distance into the same block with high probability. How to make approximate blocking approaches scalable to large datasets and effective for entity resolution in real-time remains an open question. Targeting this problem, we propose a noise-tolerant approximate blocking approach to index records based on their distance ranges using LSH and sorting trees within large sized hash blocks. Experiments conducted on both synthetic and real-world datasets show the effectiveness of the proposed approach.

Additional Information Part of the Lecture Notes in Computer Science book series (LNCS, volume 8444). ISBN 9783319066042.
Item Type Conference or Workshop Item (Paper)
URI https://reading-clone.eprints-hosting.org/id/eprint/82137
Refereed Yes
Divisions Science > School of Mathematical, Physical and Computational Sciences > Department of Computer Science
Additional Information Part of the Lecture Notes in Computer Science book series (LNCS, volume 8444). ISBN 9783319066042.
Download/View statistics View download statistics for this item

Downloads

Downloads per month over past year

University Staff: Request a correction | Centaur Editors: Update this record

Search Google Scholar