Mariu Learning Massive Graph Embeddings on a Single Machine

We identify that current systems for learning the embeddings of large-scale graphs are bottlenecked by data movement, which results in poor resource utilization and inefficient training. These limitations require state-of-the-art systems to distribute training across multiple machines.

We introduce a new pipelined training architecture that can interleave data access, transfer, and computation to achieve high utilization.

our architecture introduces asynchronous training of nodes with bounded staleness. We combine this with synchronous training for edge embeddings to handle graphs that may contain edges of different types

1) to show that existing state-of-the-art graph embedding systems are hindered by IO inefficiencies when moving data from disk and from CPU to GPU 2) to introduce the Buffer-aware Edge Traversal Algorithm (BETA), an algorithm to generate an IO minimizing data ordering for graph learning 3) to combine the BETA ordering with a partition buffer and async IO via pipelining to introduce the first graph learning system that utilizes the full memory hierarchy (Disk-CPU-GPU)