{"user_id":"2DF688A6-F248-11E8-B48F-1D18A9856A87","page":"160 - 167","oa_version":"None","extern":"1","status":"public","publist_id":"6865","language":[{"iso":"eng"}],"publisher":"IEEE","abstract":[{"lang":"eng","text":"Stochastic gradient descent (SGD) is a commonly used algorithm for training linear machine learning models. Based on vector algebra, it benefits from the inherent parallelism available in an FPGA. In this paper, we first present a single-precision floating-point SGD implementation on an FPGA that provides similar performance as a 10-core CPU. We then adapt the design to make it capable of processing low-precision data. The low-precision data is obtained from a novel compression scheme - called stochastic quantization, specifically designed for machine learning applications. We test both full-precision and low-precision designs on various regression and classification data sets. We achieve up to an order of magnitude training speedup when using low-precision data compared to a full-precision SGD on the same FPGA and a state-of-the-art multi-core solution, while maintaining the quality of training. We open source the designs presented in this paper."}],"year":"2017","_id":"790","month":"06","date_updated":"2023-02-23T13:19:52Z","author":[{"full_name":"Kara, Kaan","last_name":"Kara","first_name":"Kaan"},{"id":"4A899BFC-F248-11E8-B48F-1D18A9856A87","orcid":"0000-0003-3650-940X","full_name":"Alistarh, Dan-Adrian","last_name":"Alistarh","first_name":"Dan-Adrian"},{"last_name":"Alonso","full_name":"Alonso, Gustavo","first_name":"Gustavo"},{"first_name":"Onur","full_name":"Mutlu, Onur","last_name":"Mutlu"},{"first_name":"Ce","full_name":"Zhang, Ce","last_name":"Zhang"}],"doi":"10.1109/FCCM.2017.39","title":"FPGA-accelerated dense linear machine learning: A precision-convergence trade-off","article_processing_charge":"No","publication_status":"published","date_created":"2018-12-11T11:48:31Z","day":"30","date_published":"2017-06-30T00:00:00Z","conference":{"name":"FCCM: Field-Programmable Custom Computing Machines"},"type":"conference","citation":{"ieee":"K. Kara, D.-A. Alistarh, G. Alonso, O. Mutlu, and C. Zhang, “FPGA-accelerated dense linear machine learning: A precision-convergence trade-off,” presented at the FCCM: Field-Programmable Custom Computing Machines, 2017, pp. 160–167.","ama":"Kara K, Alistarh D-A, Alonso G, Mutlu O, Zhang C. FPGA-accelerated dense linear machine learning: A precision-convergence trade-off. In: IEEE; 2017:160-167. doi:<a href=\"https://doi.org/10.1109/FCCM.2017.39\">10.1109/FCCM.2017.39</a>","short":"K. Kara, D.-A. Alistarh, G. Alonso, O. Mutlu, C. Zhang, in:, IEEE, 2017, pp. 160–167.","ista":"Kara K, Alistarh D-A, Alonso G, Mutlu O, Zhang C. 2017. FPGA-accelerated dense linear machine learning: A precision-convergence trade-off. FCCM: Field-Programmable Custom Computing Machines, 160–167.","mla":"Kara, Kaan, et al. <i>FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off</i>. IEEE, 2017, pp. 160–67, doi:<a href=\"https://doi.org/10.1109/FCCM.2017.39\">10.1109/FCCM.2017.39</a>.","chicago":"Kara, Kaan, Dan-Adrian Alistarh, Gustavo Alonso, Onur Mutlu, and Ce Zhang. “FPGA-Accelerated Dense Linear Machine Learning: A Precision-Convergence Trade-Off,” 160–67. IEEE, 2017. <a href=\"https://doi.org/10.1109/FCCM.2017.39\">https://doi.org/10.1109/FCCM.2017.39</a>.","apa":"Kara, K., Alistarh, D.-A., Alonso, G., Mutlu, O., &#38; Zhang, C. (2017). FPGA-accelerated dense linear machine learning: A precision-convergence trade-off (pp. 160–167). Presented at the FCCM: Field-Programmable Custom Computing Machines, IEEE. <a href=\"https://doi.org/10.1109/FCCM.2017.39\">https://doi.org/10.1109/FCCM.2017.39</a>"}}