Abstract
We study entrywise scalar quantization of two matrices prior to multiplication. Given A\in R^{m\times k} and B\in R^{k\times n}, we quantize entries of A and B independently using scalar quantizers with K_X and K_Y levels per entry, and form \widehat C=\widehat A\,\widehat B. The objective is to minimize the matrix multiplication mean-squared error (MSE) E[\|{AB-\widehat A\widehat B}\|_F^2] under a pair-i.i.d.\ inner-product model. In the high-resolution regime K_X,K_Y\to\infty, we derive a sharp K^{-2} asymptotic expansion for \mathcal{E}, identify the exact optimal leading constants, and characterize asymptotically optimal quantization center densities in terms of conditional second moments. We then specialize to correlated Gaussian multiplicative pairs, obtaining a closed-form optimal point density \[ \lambda^\star(u)\ \propto\ \exp\!\left(-\frac{u^2}{6}\right)\bigl((1-\rho^2)+\rho^2u^2\bigr)^{1/3}, \qquad u=\frac{x}{\sigma_X}, \] with the same form for y/\sigma_Y, and prove a correlation-driven phase transition: the density is unimodal at the origin for |\rho|\leq 1/\sqrt{3} and becomes bimodal for |\rho|>1/\sqrt{3} with peaks at u_{\mathrm{peak}}=\pm\sqrt{3-1/\rho^2}. We show our method's applicability in synthetic experiments such as matrix multiplication quantization and least squares optimization, as well as quantization of large language model key and query activations.