Time-stamp Incremental Checkpointing and Its Application for an Optimization of Execution Model to Improve Performance of CAPE

Van Long Tran, Éric Renault, Viet Hai Ha, Xuan Huyen Do

Abstract


CAPE, which stands for Checkpointing-Aided Parallel Execution,is a checkpoint-based approach to automatically translate and execute OpenMP programs on distributed-memory architectures. This approach demonstrates high-performance and complete compatibility with OpenMP on distributed-memory systems. In CAPE, checkpointing is one of the main factors acted on the performance of the system. This is shown over two versions of CAPE. The first version based on complete checkpoints is too slow as compared to the second version based on Discontinuous Incremental Checkpointing. This paper presents an improvement of Discontinuous Incremental Checkpointing, and a new execution model for CAPE using new techniques of checkpointing. It contributes to improve the performance and make CAPE even more flexible.


Full Text:

PDF

References


Message Passing Interface Forum (2014) MPI: A Message-Passing Interface Standard, http://mpi-forum.org/docs/mpi-3.1/mpi31-

report.pdf.

OpenMP ARB (2013) OpenMP application program interface version 4.0,

http://www.openmp.org.

Morin, Christine and Lottiaux, Renaud and Vallée, Geoffroy and Gallard, Pascal and Utard, Gael and Badrinath, Ramamurthy and Rilling, Louis (2003) Kerrighed: a single system image cluster operating system for high performance computing, Euro-Par 2003 Parallel Processing, Springer, pp. 1291–1294.

Sato, Mitsuhisa and Harada, Hiroshi and Hasegawa, Atsushi and Ishikawa, Yutaka (2001) Cluster-enabled OpenMP: An OpenMP compiler for the SCASH software distributed shared memory system, Scientific Programming, Hindawi, pp. 123–130.

Karlsson, Sven and Lee, Sung-Woo and Brorsson, Mats (2002) A fully compliant OpenMP implementation on software distributed shared memory, High Performance ComputingHiPC 2002, Springer, Berlin, pp. 195–206.

Basumallik, Ayon and Eigenmann, Rudolf (2005) Towards automatic translation of OpenMP to MPI, Proceedings of the 19th annual international conference on Supercomputing (SC), ACM, pp. 189–198.

Dorta, Antonio J and Badıa, Jose M and Quintana, Enrique S and de Sande, Francisco (2005) Implementing OpenMP for clusters on top of MPI , Recent Advances in Parallel Virtual Machine and Message Passing Interface, Springer, pp. 148–155.

Huang, Lei and Chapman, Barbara and Liu, Zhenying (2005) Towards a more efficient implementation of OpenMP for clusters via translation to global arrays, Parallel Computing, Elsevier, pp. 1114–1139.

Hoeflinger, Jay P (2006) Extending OpenMP to clusters, White Paper, Intel Corporation.

Renault, Eric (2007) Distributed Implementation of OpenMP Based on Checkpointing Aided Parallel Execution, A Practical Programming Model for the Multi-Core Era, Springer, pp. 195–206.

Plank, James S and Beck, Micah and Kingsley, Gerry and Li, Kai (1994) Libckpt: Transparent checkpointing under unix, White Paper, Computer Science Department.

Ha, Viet Hai and Renault, Eric (2011) Discontinuous Incremental: A new approach towards extremely lightweight checkpoints, Computer Networks and Distributed Systems (CNDS), IEEE, pp. 227–232.

Ha, Viet Hai and Renault, Eric (2011) Design and performance analysis of CAPE based on discontinuous incremental checkpoints, IEEE Pacific Rim Conference on Communications, Computers and Signal Processing, IEEE, pp. 862-867.

Tran, Van Long and Renault, Eric and Ha, Viet Hai (2016) Analysis and evaluation of the performance of CAPE, IEEE Conferences on Ubiquitous Intelligence & Computing, Advanced and Trusted Computing, Scalable Computing and Communications, Cloud and Big Data Computing, Internet of People, and Smart World Congress, IEEE, pp. 620–627.

Ha, Viet Hai and Renault, Eric (2011) Improving performance of CAPE using discontinuous incremental checkpointing, High Performance Computing and Communications (HPCC), IEEE, pp. 802–807.

Tran, Van Long and Renault, Eric and Do, Xuan Huyen and Ha, Viet Hai (2017) Design and implementation of a new execution model for CAPE, Proceedings of the Eighth International Symposium on Information and Communication Technology (SoICT’s 2017), ACM, pp. 453–459.

Bernstein (1966) Program Analysis for Parallel Processing, IEEE Transaction on Electronic Computers, IEEE, pp. 757–762.

Cores, Ivan and Rodrıguez, Monica and Gonzalez, Patricia and Martın, Marıa J (2016) Reducing the overhead of an MPI application-level migration approach, Parallel Computing, Elsevier, pp. 72–82.

Li, C-CJ and Fuchs, W Kent (1990) Catch compiler-assisted techniques for checkpointing, Fault-Tolerant Computing (FTCS), IEEE, pp. 74–81.

Chen, Zhengyu and Sun, Jianhua and Chen, Hao (2016) Optimizing Checkpoint Restart with Data Deduplication, Scientific Programming, Hindawi, doi:10.1155/2016/9315493.

Plank, James S and Xu, Jian and Netzer, Robert HB (1995) Compressed differences: An algorithm for fast incremental checkpointing, Technical Report CS-95-302, University of Tennessee.

Hyochang, NAM and Jong, KIM and Hong, Sung Je and Sunggu, LEE (2002) Probabilistic checkpointing, IEICE TRANSACTIONS on Information and Systems, The Institute of Electronics, Information and Communication Engineers, pp. 1093–1104.

Mehnert-Spahn, John and Feller, Eugen and Schoettner, Michael (2009) Incremental checkpointing for grids, Linux Symposium, Montreal, Quebec, Canada, pp. 201–220.

Cores, Ivan and Rodrıguez, Gabriel and Gonzalez, Patricia and Osorio, Roberto R (2013) Improving scalability of application-level checkpoint-recovery by reducing checkpoint sizes, New Generation Computing, Springer, pp. 163–185.

Alfred, V.Aho and Monica, S. Lam and Ravi, Sethi and Jeffrey, D. Ullman (2006) Compilers Principles, Techniques,& Tools, Addion Wesley.

Thakur, Rajeev and Rabenseifner, Rolf and Gropp, William (2005) Optimization of collective communication operations in MPICH, International Journal of High Performance Computing Applications, Sage Publications, pp. 49–66.




DOI: https://doi.org/10.31449/inf.v42i3.2244

Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.