Performance Analysis of Parallel Merge Sort Using MPI (Message Passing Interface) on Big Data Dataset
DOI:
https://doi.org/10.30865/json.v7i2.9307Keywords:
Parallel Merge Sort; Message Passing Interface (MPI); Big Data; Parallel Computing; Performance Analysis; Speedup, EfficiencyAbstract
The rapid growth of data in the era of Big Data demands efficient and scalable algorithms to handle large datasets. Sorting, as a fundamental operation in data processing, plays a crucial role in various computational tasks. This study focuses on the performance analysis of the Parallel Merge Sort algorithm using the Message Passing Interface (MPI) to accelerate sorting operations on large-scale datasets. The implementation utilizes MPI for distributed memory communication across multiple processes, enabling concurrent data partitioning and merging. Experiments were conducted on datasets ranging from several hundred megabytes to multiple gigabytes to evaluate performance metrics such as execution time, speedup, and efficiency. The results demonstrate that the parallel implementation significantly reduces computation time compared to the sequential version, especially as the dataset size and the number of processes increase. However, the performance gain tends to decrease when communication overhead between MPI processes becomes dominant. Overall, the findings indicate that MPI-based Parallel Merge Sort is an effective approach for large-scale data sorting, providing a balance between computation and communication efficiency in parallel environments.
References
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein, Introduction to Algorithms, 4th ed. Cambridge, MA: MIT Press, 2022.
J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach, 6th ed. San Francisco, CA: Morgan Kaufmann, 2019.
W. Gropp, T. Hoefler, R. Thakur, and E. Lusk, Using MPI: Portable Parallel Programming with the Message Passing Interface, 3rd ed. Cambridge, MA: MIT Press, 2014.
A. D. Brown, “Performance Evaluation of Parallel Sorting Algorithms Using MPI,” Journal of Parallel and Distributed Computing, vol. 157, pp. 34–44, 2021, doi: 10.1016/j.jpdc.2021.07.004.
H. Li, Y. Zhang, and J. Zhao, “Optimized Parallel Merge Sort Algorithm on Multi-Core and Distributed Systems,” IEEE Access, vol. 10, pp. 85723–85734, 2022, doi: 10.1109/ACCESS.2022.3195009.
P. Zulian, et al., “Data-centric workloads with MPI_Sort,” Journal of Parallel and Distributed Computing, 2024. (Membahas desain distributed sorting berbasis MPI, termasuk implementasi multi-way mergesort pada superkomputer.)
R. Patel and S. Nair, “Improving the Efficiency of Merge Sort in Big Data Applications Using MPI,” in Proc. 2020 Int. Conf. on Computational Science and Engineering (CSE), Singapore, pp. 512–519, 2020, doi: 10.1109/CSE51234.2020.00090.
K. Agarwal, N. Singh, and P. Chauhan, “A Comparative Study of Parallel Sorting Techniques on Distributed Memory Systems,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 13, no. 5, pp. 87–96, 2022, doi: 10.14569/IJACSA.2022.0130511.
Y. Zheng and J. Wu, “Towards Efficient HPC: Exploring Overlap Strategies Using MPI Non-Blocking Communication,” Mathematics, vol. 13, no. 11, art. 1848, Jun. 2025, doi: 10.3390/math13111848
K. Nakajima, “Optimization of communication-computation overlapping in parallel multigrid methods by process/thread allocation,” Japan Journal of Industrial and Applied Mathematics, vol. 42, pp. 1029–1062, Sep. 2025, doi: 10.1007/s13160-025-00732-3
M. R. Khan and F. Anwar, “Analyzing Communication Overheads in MPI-Based Parallel Algorithms,” International Journal of Computer Applications, vol. 183, no. 48, pp. 15–22, 2022, doi: 10.5120/ijca2022922173.
P. S. Kumar and D. S. Bhatia, “Scalability Analysis of Parallel Merge Sort Using Message Passing Interface,” in Proc. IEEE Int. Conf. on High Performance Computing (HiPC), pp. 223–231, 2019, doi: 10.1109/HiPC.2019.00034.
J. Dean and S. Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters,” Communications of the ACM, vol. 53, no. 1, pp. 72–80, 2020, doi: 10.1145/1327452.1327492.
H. Liu and L. Chen, “Performance Optimization of MPI Programs for Big Data Applications,” Future Generation Computer Systems, vol. 137, pp. 145–156, 2023, doi: 10.1016/j.future.2022.12.011.
A. Sharma, “Parallel Merge Sort Implementation and Performance Evaluation Using MPI in Python,” IEEE Xplore Digital Library, 2021. [Online]. Available: https://ieeexplore.ieee.org/document/9621847
R. Elmasri and S. B. Navathe, Fundamentals of Database Systems, 8th ed. Harlow, U.K.: Pearson Education, 2020.
M. Zaharia et al., “Apache Spark: A Unified Engine for Big Data Processing,” Communications of the ACM, vol. 59, no. 11, pp. 56–65, 2016.
M. Silberschatz, P. B. Galvin, and G. Gagne, Operating System Concepts, 10th ed. Hoboken, NJ: Wiley, 2020.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Jurnal Sistem Komputer dan Informatika (JSON)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

This work is licensed under a Creative Commons Attribution 4.0 International License
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under Creative Commons Attribution 4.0 International License that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (Refer to The Effect of Open Access).

