Performance Analysis of Parallel Merge Sort Using MPI (Message Passing Interface) on Big Data Dataset

Erwin Panggabean; Yuda Perwira; Dedi Candro Parulian Sinaga; Annisa Tri Utami; Vincha Swe Meiya Pricilla Sembiring

doi:10.30865/json.v7i2.9307

Authors

Erwin Panggabean STMIK PELITA NUSANTARA MEDAN
Yuda Perwira STMIK Pelita Nusanatara Medan
Dedi Candro Parulian Sinaga STMIK Pelita Nusanatara Medan
Annisa Tri Utami STMIK Pelita Nusanatara Medan
Vincha Swe Meiya Pricilla Sembiring STMIK Pelita Nusanatara Medan

DOI:

https://doi.org/10.30865/json.v7i2.9307

Keywords:

Parallel Merge Sort; Message Passing Interface (MPI); Big Data; Parallel Computing; Performance Analysis; Speedup, Efficiency

Abstract

The rapid growth of data in the era of Big Data demands efficient and scalable algorithms to handle large datasets. Sorting, as a fundamental operation in data processing, plays a crucial role in various computational tasks. This study focuses on the performance analysis of the Parallel Merge Sort algorithm using the Message Passing Interface (MPI) to accelerate sorting operations on large-scale datasets. The implementation utilizes MPI for distributed memory communication across multiple processes, enabling concurrent data partitioning and merging. Experiments were conducted on datasets ranging from several hundred megabytes to multiple gigabytes to evaluate performance metrics such as execution time, speedup, and efficiency. The results demonstrate that the parallel implementation significantly reduces computation time compared to the sequential version, especially as the dataset size and the number of processes increase. However, the performance gain tends to decrease when communication overhead between MPI processes becomes dominant. Overall, the findings indicate that MPI-based Parallel Merge Sort is an effective approach for large-scale data sorting, providing a balance between computation and communication efficiency in parallel environments.

References

T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 4th ed. Cambridge, MA: MIT Press, 2022.

J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 6th ed. San Francisco, CA: Morgan Kaufmann, 2019.

W. Gropp, T. Hoefler, R. Thakur, and E. Lusk, Using MPI: Portable Parallel Programming with the Message Passing Interface, 3rd ed. Cambridge, MA: MIT Press, 2014.

A. D. Brown, “Performance Evaluation of Parallel Sorting Algorithms Using MPI,” Journal of Parallel and Distributed Computing, vol. 157, pp. 34–44, 2021, doi: 10.1016/j.jpdc.2021.07.004.

H. Li, Y. Zhang, and J. Zhao, “Optimized Parallel Merge Sort Algorithm on Multi-Core and Distributed Systems,” IEEE Access, vol. 10, pp. 85723–85734, 2022, doi: 10.1109/ACCESS.2022.3195009.

P. Zulian, et al., “Data-centric workloads with MPI_Sort,” Journal of Parallel and Distributed Computing, 2024. (Membahas desain distributed sorting berbasis MPI, termasuk implementasi multi-way mergesort pada superkomputer.)

R. Patel and S. Nair, “Improving the Efficiency of Merge Sort in Big Data Applications Using MPI,” in Proc. 2020 Int. Conf. on Computational Science and Engineering (CSE), Singapore, pp. 512–519, 2020, doi: 10.1109/CSE51234.2020.00090.

K. Agarwal, N. Singh, and P. Chauhan, “A Comparative Study of Parallel Sorting Techniques on Distributed Memory Systems,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 13, no. 5, pp. 87–96, 2022, doi: 10.14569/IJACSA.2022.0130511.

Y. Zheng and J. Wu, “Towards Efficient HPC: Exploring Overlap Strategies Using MPI Non-Blocking Communication,” Mathematics, vol. 13, no. 11, art. 1848, Jun. 2025, doi: 10.3390/math13111848

K. Nakajima, “Optimization of communication-computation overlapping in parallel multigrid methods by process/thread allocation,” Japan Journal of Industrial and Applied Mathematics, vol. 42, pp. 1029–1062, Sep. 2025, doi: 10.1007/s13160-025-00732-3

M. R. Khan and F. Anwar, “Analyzing Communication Overheads in MPI-Based Parallel Algorithms,” International Journal of Computer Applications, vol. 183, no. 48, pp. 15–22, 2022, doi: 10.5120/ijca2022922173.

P. S. Kumar and D. S. Bhatia, “Scalability Analysis of Parallel Merge Sort Using Message Passing Interface,” in Proc. IEEE Int. Conf. on High Performance Computing (HiPC), pp. 223–231, 2019, doi: 10.1109/HiPC.2019.00034.

J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, vol. 53, no. 1, pp. 72–80, 2020, doi: 10.1145/1327452.1327492.

H. Liu and L. Chen, “Performance Optimization of MPI Programs for Big Data Applications,” Future Generation Computer Systems, vol. 137, pp. 145–156, 2023, doi: 10.1016/j.future.2022.12.011.

A. Sharma, “Parallel Merge Sort Implementation and Performance Evaluation Using MPI in Python,” IEEE Xplore Digital Library, 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9621847

R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 8th ed. Harlow, U.K.: Pearson Education, 2020.

M. Zaharia et al., “Apache Spark: A Unified Engine for Big Data Processing,” Communications of the ACM, vol. 59, no. 11, pp. 56–65, 2016.

M. Silberschatz, P. B. Galvin, and G. Gagne, Operating System Concepts, 10th ed. Hoboken, NJ: Wiley, 2020.