Ichitaro Yamazaki

Linear Algebra PACKage (LAPACK)
Matrix Algebra on GPU and Multicore Architectures (MAGMA)
Parallel Linear Algebra for Scalable Multi-core Architectures (PLASMA)
Direct solution of large-scale sparse nonsymmetric systems of equations (SuperLU_DIST)
Solution of Hermitian eigenvalue problems by a thick-restart Lanczos method (TRLan)

Dense Linear Algebra

I. Yamazaki, S. Tomov, and J. Dongarra: Non-GPU-resident symmetric indefinite factorization. Concurrency Computat.: Pract. Exper. (2016) (link)
I. Yamazaki, S. Tomov, and J. Dongarra: Stability and performance of various Singular Value QR implementations on multicore CPU with a GPU. ACM Trans. Math. Soft. 42(2): (2016) (link)

I. Yamazaki, and S. Tomov, and J. Dongarra: Mixed-precision Cholesky QR factorization and its case studies on multicore CPU with multiple GPUs. SIAM J. Sci. Comput. 37(3): C307-C330 (2015).
▷ Initial performance studies: Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs. VECPAR'14 (Best paper), LNCS 8969: 17-30 (2015): (link, preprint).
▷ Extension to orthogonalize a larger number of columns: Mixed-precision block Gram Schmidt orthogonalization. ScalA'15 (link).

G. Ballard, D. Becker, J. Demmel, J. Dongarra, A. Druinsky, I. Peled, O. Schwartz, S. Toledo, and I. Yamazaki: Communication-avoiding symmetric-indefinite factorization. SIAM J. Matrix Anal. Appl. 35(4): 1364-1460 (2014). (link, preprint).
▷ Implementation and performance studies in: Implementing a blocked Aasen's algorithm with a dynamic scheduler on multicore architecture. IPDPS 2013 (Best paper): 895-907 (link)
I. Yamazaki, T. Dong, R. Solca, S. Tomov, J. Dongarra, and T. Schulthess: Tridiagonalization of a dense symmetric matrix on multiple GPUs and its application to symmetric eigenvalue problems. Concurrency Computat.: Pract. Exper. 26(16): 2652-2666 (2014) (link, preprint).

I. Yamazaki, S. Tomov, and J. Dongarra: One-sided dense matrix factorizations on a multicore with multiple GPU accelerators. ICCS 2012: 37-46 (link)
Y. Nakatsukasa, K. Aishima, and I. Yamazaki: dqds with aggressive early deflation. SIAM J. Matrix Anal. Appl. 33(1): 22-51 (2012) (preprint, link).

Sparse Linear Algebra

I. Yamazaki, S. Rajamanickam, E. Boman, M. Hoemmen, M. Heroux, and S. Tomov: Domain decomposition preconditioners for communication-avoiding Krylov methods on a hybrid CPU/GPU cluster. SC 2014: 933-944 (link).
I. Yamazaki, S. Tomov, and J. Dongarra: Deflation strategies to improve the convergence of communication-avoiding GMRES. ScalA Workshop 2014: 39-46 (link).
I. Yamazaki, S. Tomov, T. Dong, and J. Dongarra: Mixed-precision orthogonalization scheme and adaptive step size for CA-GMRES on GPUs. VECPAR'14 (Best paper), LNCS 8969: 17-30 (2015): (link, preprint).
I. Yamazaki, H. Anzt, S. Tomov, M. Hoemmen, and J. Dongarra: Improving the performance of CA-GMRES on multicores with multiple GPUs. IPDPS 2014: 382-391 (link, preprint).

I. Yamazaki, X. Li, F.-H. Rouet, and B. Ucar: Partitioning, ordering, and load balancing in a hierchically parallel hybrid linear solver. PDSEC 2013: (preprint, link)

X. Lacoste, P. Ramet, M. Faverge, I. Yamazaki, and J. Dongarra: Sparse direct solvers with accelerators over DAG runtimes. (preprint)
I. Yamazaki and X. Li: New scheduling strategies and hybrid programming for a parallel right-looking sparse LU factorization algorithm on multicore cluster systems. IPDPS 2012: 619-630 (preprint, link)

I. Yamazaki and K. Wu: A communication-avoiding thick-restart Lanczos method on a distributed-memory system. HPSS worksop at EuroPar 2011: 345-354 (preprint, link).

I. Yamazaki and X. Li: On techniques to improve robustness and scalability of a parallel hybrid linear solver. VECPAR 2010: 421-434 (preprint, link).
I. Yamazaki, Z. Bai, H. Simon, L.-W. Wang, and K. Wu: Adaptive projection subspace dimension for the thick-restart Lanczos method. ACM Trans. Math. Soft. 37(3): (2010) (preprint, link).

Randomized Linear Algebra

I. Yamazaki, J. Kurzak, P. Luszczek, and J. Dongarra: Randomized algorithms to update partial singular value decomposition on a hybrid CPU/GPU cluster. SC 2015: 59:1-59:12 (link).
T. Mary, I. Yamazaki, J. Kurzak, P. Luszczek, S. Tomov, and J. Dongarra: Performance of random sampling for computing low-rank approximations of a dense matrix on GPUs. SC 2015: 60:1-60:11 (link).

I. Yamazaki, T. Mary, J. Kurzak, S. Tomov, and J. Dongarra: Access-averse framework for computing low-rank matrix approximations. BigData Workshop 2014: 70-77 (link).

Applications

I. Yamazaki, T. Ikegami, T. Sakurai, and H. Tadano: Performance comparison of parallel eigensolvers based on a contour integral method and a Lanczos method, PMAA 2010 (Parallel Computing 39: 280-290, 2013, link)

X. Yuan, X. Li, I. Yamazaki, S. Jardin, A. Koniges, and D. Keyes: Application of PDSLin to the magnetic reconnection problem. ICNSP 2011 (link).

I. Yamazaki, V. Natarajan, Z. Bai, and B. Hamann: Segmenting point-sampled surfaces. The Visual Computer 26(12): 1421-1433 (2010) (preprint, link).
▷ Initial results in: Segmenting point sets. SMI 2006: 6-15 (preprint, link).

Z. Bai, W. Chen, R. Scalettar, and I. Yamazaki: Numerical methods for quantum Monte Carlo simulations of the Hubbard model in multi-scale phenomena in complex fluids, Higher Education Press and World Scientific, 2009 (preprint).
I. Yamazaki, Z. Bai, W. Chen, and R. Scalettar: A high-quality preconditioning technique for multi-length-scale symmetric positive definite linear systems. Numer. Math. Theor. Meth. Appl. 2(4) 2009 (preprint, link).

S. Chatterji, I. Yamazaki, Z. Bai, and J. Eisen: CompostBin: A DNA composition-based algorithm for binning environmental shotgun reads. RECOMB 2008: 17-28 (preprint).

Performance of random sampling to compute or update partial SVD on hybrid CPU/GPU architectures, with T. Mary, J. Kurzak, S. Tomov, and J. Dongarra, at SIAM CSE15 in Salt Lake City, Utah (slides).
Communication-avoiding GMRES with GPUs, with H. Anzt, M. Hoemmen, S. Tomov, and J. Dongarra, at SIAM PP14 in Portland, Oregon.
Implementing a blocked Aasen's algorithm with a dynamic scheduler on multicore architectures, with G. Ballard, D. Becker, J. Demmel, J. Dongarra, A. Druinsky, I. Peled, O. Schwartz, and S. Toledo, at IPDPS13 in Boston, Massachusetts (slides).
Symmetric Dense Matrix tridiagonalization on a GPU cluster, with T. Dong, S. Tomov, and J. Dongarra, at SIAM CSE13 in Boston, Massachusetts (slides).
Performance of a parallel direct solver SuperLU_DIST on multicore clusters, with X. Li at SIAM PP12 in Savannah, Georgia (slides)

Email: ic[dot]yamazaki[at]gmail[dot]com
Mail: 1122 Volunteer Blvd, Claxton Building, Office: 351
Knoxville, Tennessee 37996-3450