Discovering Related Data at Scale
[[paper-reading]]
Microsoft Cosmos
embedding-enhanced column-name similarity column-name uniqueness
We show that a metadata-only approach to build data graphs is indeed feasible, provides value, and is scalable. We also show that data-based features do provide value, though more marginal than expected.
