Publications
Publications
   

Showing records 1 - 10 of 18

Herault, T., Bouteiller, A., Bosilca, G., Gamell, M., Teranishi, K., Parashar, M., Dongarra, J. "Practical Scalable Consensus for Pseudo-Synchronous Distributed Systems: Formal Proof," University of Tennessee Computer Science Technical Report, ICL-UT-15-01, April, 2015.

PDF
Benoit A., Robert, Y., Raina S.K. "Efficient checkpoint/verification patterns for silent error detection," University of Tennessee Computer Science Technical Report, ICL-UT-14-03, May, 2014.

PDF
Aurelien Bouteiller, Thomas Herault, George Bosilca, Peng Du, and Jack Dongarra "Algorithm-based Fault Tolerance for Dense Matrix Factorizations, Multiple Failures and Accuracy," ACM Transactions on Parallel Computing, Phillip B. Gibbons eds. ACM, New York, NY, USA, (to appear), 2014, 2014.

PDF
Yulu Jia, George Bosilca, Piotr Luszczek, and Jack Dongarra "Parallel Reduction to Hessenberg Form with Algorithm-Based Fault Tolerance," International Conference for High Performance Computing, Networking, Storage and Analysis, IEEE-SC 2013, Denver, CO, November, 2013.

PDF
Yulu Jia, Piotr Luszczek, George Bosilca, Jack Dongarra "CPU-GPU Hybrid Bidiagonal Reduction With Soft Error Resilience," ScalA '13 Proceedings of the Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems, November, 2013.

PDF
Bosilca, G., Bouteiller, A., Herault, T., Robert, Y., and Jack Dongarra "Assessing the impact of {ABFT} and Checkpoint composite strategies," University of Tennessee Computer Science Technical Report, ICL-UT-13-03, September, 2013.

PDF
Jia, Y., Luszczek, P., Dongarra, J. "Transient Error Resilient Hessenberg Reduction on GPU-based Hybrid Architectures," University of Tennessee Computer Science Technical Report, UT-CS-13-712 (lawn279), June, 2013.

PDF
Bland, W., Bouteiller, A., Herault, T., Hursey, J., Bosilca, G., Dongarra, J.J. "An evaluation of User-Level Failure Mitigation support in MPI," Computing, Springer, Vienna, DOI 10.1007/s00607-013-0331-3, 1-14, May, 2013.

PDF
Jack Dongarra, Thomas Herault and Yves Robert "Revisiting the Double Checkpointing Algorithm," 15th Workshop on Advances in Parallel and Distributed Computational Models, at the IEEE International Parallel & Distributed Processing Symposium, Boston, MA, January, 2013.

PDF
Bland, W., Du, P., Bouteiller, A., Herault, T., Bosilca, G., Dongarra, J. "A Checkpoint-on-Failure Protocol for Algorithm-Based Recovery in Standard MPI," 18th International European Conference on Parallel and Distributed Computing (Euro-Par 2012) (Best Paper Award), Christos Kaklamanis, Theodore Papatheodorou and Paul Spirakis eds. Springer-Verlag, Rhodes, Greece, August 27-31, 2012.

PDF

Showing records 1 - 10 of 18

Jun 29 2022 Admin Login