MT-MPI: Multithreaded MPI for many-core environments M Si, AJ Peña, P Balaji, M Takagi, Y Ishikawa Proceedings of the 28th ACM international conference on Supercomputing, 125-134, 2014 | 69 | 2014 |
Casper: An asynchronous progress model for MPI RMA on many-core architectures M Si, AJ Pena, J Hammond, P Balaji, M Takagi, Y Ishikawa 2015 IEEE International Parallel and Distributed Processing Symposium, 665-676, 2015 | 60 | 2015 |
The glorious Glasgow Haskell compilation system user’s guide GHC Team Version 7 (3), 2002-2007, 2005 | 47* | 2005 |
Scalable deep learning via I/O analysis and optimization S Pumma, M Si, WC Feng, P Balaji ACM Transactions on Parallel Computing (TOPC) 6 (2), 1-34, 2019 | 39 | 2019 |
Why is MPI so slow? analyzing the fundamental limits in implementing MPI-3.1 K Raffenetti, A Amer, L Oden, C Archer, W Bland, H Fujita, Y Guo, ... Proceedings of the international conference for high performance computing …, 2017 | 38 | 2017 |
Parallel I/O optimizations for scalable deep learning S Pumma, M Si, W Feng, P Balaji 2017 IEEE 23rd International Conference on Parallel and Distributed Systems …, 2017 | 30 | 2017 |
Direct MPI library for Intel Xeon Phi co-processors M Si, Y Ishikawa, M Tatagi 2013 IEEE International Symposium on Parallel & Distributed Processing …, 2013 | 28 | 2013 |
Process-in-process: techniques for practical address-space sharing A Hori, M Si, B Gerofi, M Takagi, J Dayal, P Balaji, Y Ishikawa Proceedings of the 27th International Symposium on High-Performance Parallel …, 2018 | 24 | 2018 |
Towards scalable deep learning via I/O analysis and optimization S Pumma, M Si, W Feng, P Balaji 2017 IEEE 19th International Conference on High Performance Computing and …, 2017 | 23 | 2017 |
Process-based asynchronous progress model for MPI point-to-point communication M Si, P Balaji 2017 IEEE 19th International Conference on High Performance Computing and …, 2017 | 21 | 2017 |
Design of direct communication facility for many-core based accelerators M Si, Y Ishikawa 2012 IEEE 26th International Parallel and Distributed Processing Symposium …, 2012 | 20 | 2012 |
Software combining to mitigate multithreaded MPI contention A Amer, C Archer, M Blocksome, C Cao, M Chuvelev, H Fujita, ... Proceedings of the ACM International Conference on Supercomputing, 367-379, 2019 | 12 | 2019 |
Scaling NWChem with efficient and portable asynchronous communication in MPI RMA M Si, AJ Pena, J Hammond, P Balaji, Y Ishikawa 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2015 | 11 | 2015 |
CAB-MPI: Exploring interprocess work-stealing towards balanced MPI communication K Ouyang, M Si, A Hori, Z Chen, P Balaji SC20: International Conference for High Performance Computing, Networking …, 2020 | 8 | 2020 |
Dynamic adaptable asynchronous progress model for MPI RMA multiphase applications M Si, AJ Pena, J Hammond, P Balaji, M Takagi, Y Ishikawa IEEE Transactions on Parallel and Distributed Systems 29 (9), 1975-1989, 2018 | 7 | 2018 |
A FACT-based approach: Making machine learning collective autotuning feasible on exascale systems M Wilkins, Y Guo, R Thakur, N Hardavellas, P Dinda, M Si 2021 Workshop on Exascale MPI (ExaMPI), 36-45, 2021 | 6 | 2021 |
An MPI Library Implementing Direct Communication for Many-Core based Accelerators M Si, Y Ishikawa 2012 SC Companion: High Performance Computing, Networking, Storage and …, 2012 | 6 | 2012 |
OpenSHMEM over MPI as a Performance Contender: Thorough Analysis and Optimizations M Si, H Fu, JR Hammond, P Balaji Workshop on OpenSHMEM and Related Technologies, 39-60, 2021 | 4 | 2021 |
Daps: a dynamic asynchronous progress stealing model for MPI communication K Ouyang, M Si, A Hori, Z Chen, P Balaji 2021 IEEE International Conference on Cluster Computing (CLUSTER), 516-527, 2021 | 3 | 2021 |
Dynamic scaling for low-precision learning R Han, M Si, J Demmel, Y You Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021 | 3 | 2021 |