Discovering Related Data at Scale

[[paper-reading]]


Microsoft Cosmos

embedding-enhanced column-name similarity column-name uniqueness

We show that a metadata-only approach to build data graphs is indeed feasible, provides value, and is scalable. We also show that data-based features do provide value, though more marginal than expected.

Pasted image 20211228163445.png