GRACE: A scalable graph-based approach to accelerating recommendation model inference
Ye H, Vedula S, Chen Y, Yang Y, Bronstein AM, Dreslinski R, Mudge T, Talati N. 2023. GRACE: A scalable graph-based approach to accelerating recommendation model inference. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. vol. 11, 282–301.
Download
No fulltext has been uploaded. References only!
Conference Paper
| Published
| English
Scopus indexed
Author
Ye, Haojie;
Vedula, Sanketh;
Chen, Yuhan;
Yang, Yichen;
Bronstein, Alex M.ISTA ;
Dreslinski, Ronald;
Mudge, Trevor;
Talati, Nishil
Abstract
The high memory bandwidth demand of sparse embedding layers continues to be a critical challenge in scaling the performance of recommendation models. While prior works have exploited heterogeneous memory system designs and partial embedding sum memoization techniques, they offer limited benefits. This is because prior designs either target a very small subset of embeddings to simplify their analysis or incur a high processing cost to account for all embeddings, which does not scale with the large sizes of modern embedding tables. This paper proposes GRACE-a lightweight and scalable graph-based algorithm-system co-design framework to significantly improve the embedding layer performance of recommendation models. GRACE proposes a novel Item Co-occurrence Graph (ICG) that scalably records item co-occurrences. GRACE then presents a new system-aware ICG clustering algorithm to find frequently accessed item combinations of arbitrary lengths to compute and memoize their partial sums. High-frequency partial sums are stored in a software-managed cache space to reduce memory traffic and improve the throughput of computing sparse features. We further present a cache data layout and low-cost address computation logic to efficiently lookup item embeddings and their partial sums. Our evaluation shows that GRACE significantly outperforms the state-of-the-art techniques SPACE and MERCI by 1.5x and 1.4x, respectively.
Publishing Year
Date Published
2023-03-01
Proceedings Title
Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
Publisher
Association for Computing Machinery
Volume
11
Issue
3
Page
282-301
ISBN
IST-REx-ID
Cite this
Ye H, Vedula S, Chen Y, et al. GRACE: A scalable graph-based approach to accelerating recommendation model inference. In: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. Vol 11. Association for Computing Machinery; 2023:282-301. doi:10.1145/3582016.3582029
Ye, H., Vedula, S., Chen, Y., Yang, Y., Bronstein, A. M., Dreslinski, R., … Talati, N. (2023). GRACE: A scalable graph-based approach to accelerating recommendation model inference. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (Vol. 11, pp. 282–301). Association for Computing Machinery. https://doi.org/10.1145/3582016.3582029
Ye, Haojie, Sanketh Vedula, Yuhan Chen, Yichen Yang, Alex M. Bronstein, Ronald Dreslinski, Trevor Mudge, and Nishil Talati. “GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference.” In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 11:282–301. Association for Computing Machinery, 2023. https://doi.org/10.1145/3582016.3582029.
H. Ye et al., “GRACE: A scalable graph-based approach to accelerating recommendation model inference,” in Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2023, vol. 11, no. 3, pp. 282–301.
Ye H, Vedula S, Chen Y, Yang Y, Bronstein AM, Dreslinski R, Mudge T, Talati N. 2023. GRACE: A scalable graph-based approach to accelerating recommendation model inference. Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems. vol. 11, 282–301.
Ye, Haojie, et al. “GRACE: A Scalable Graph-Based Approach to Accelerating Recommendation Model Inference.” Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, vol. 11, no. 3, Association for Computing Machinery, 2023, pp. 282–301, doi:10.1145/3582016.3582029.