Performance Tuning Of Apache Spark Framework In Big Data Processing with Respect To Block Size And Replication Factor

Brijesh Y Joshi; Poornashankar .; Deepali Sawai

doi:10.18090/samriddhi.v14i02.4

View Abstract PDF download PDF View PDF

DOI https://doi.org/10.18090/samriddhi.v14i02.4

Published Jun 30, 2022

DOI https://doi.org/10.18090/samriddhi.v14i02.4

Brijesh Y Joshi

Institute for Industrial and Computer Management and Research, Pradhikaran, Pune, Maharashtra, India

Poornashankar .

Institute for Industrial and Computer Management and Research, Pradhikaran, Pune, Maharashtra, India

Deepali Sawai

Institute for Industrial and Computer Management and Research, Pradhikaran, Pune, Maharashtra, India

Abstract

Apache Spark has recently become the most popular big data analytics framework. Default configurations are provided by Spark. HDFS stands for Hadoop Distributed File System. It means the large files will be physically stored on multiple nodes in a distributed fashion. The block size determines how large files are distributed, while the replication factor determines how reliable the files are. If there is just one copy of each block for a given file and the node fails, the data in the files become unreadable. The block size and replication factor are configurable per file. The results and analysis of the experimental study to determine the efficiency of adjusting the settings of tuning Apache Spark for minimizing application execution time as compared to standard values are described in this paper. Based on a vast number of studies, we employed a trial-anderror strategy to fine-tune these values. We chose two workloads to test the Apache framework for comparative analysis: Wordcount and Terasort. We used the elapsed time to evaluate the same.

Downloads

Download data is not yet available.

How to Cite

1.

Joshi B, . P, Sawai D. Performance Tuning Of Apache Spark Framework In Big Data Processing with Respect To Block Size And Replication Factor. sms [Internet]. 30Jun.2022 [cited 6Oct.2025];14(02):152-8. Available from: https://smsjournals.com/index.php/SAMRIDDHI/article/view/2719

Issue

Vol 14 No 02 (2022): SAMRIDDHI : A Journal of Physical Sciences, Engineering and Technology (2022)

Section

Research Article

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Article Sidebar

Main Article Content

Abstract

Downloads

Article Details