Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı

Zeynep Orman; Mert Kavi

Research Article

Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı

Year 2020, Volume: 1 Issue: 1, 1 - 12, 22.08.2020

Zeynep Orman Mert Kavi

Abstract

Büyük ölçekli akan veri işleyen dağıtık sistemleri inşa etmek ve operasyonunu sağlamak oldukça karmaşık ve maliyetli bir süreçtir. Sistemlerin veri akışının değişen hızlarına adapte olabilmesi ve gerektiğinde ölçeklenebilmesi gerekmektedir. Bu nedenle, akan veriyi işleyen dağıtık sistemlere entegre edilecek etkin bir otomatik ölçekleme sistemi kullanılması çoğu zaman kaçınılmazdır. Son yıllarda, hızla artan akan veri kaynaklarını işleyebilen sistemlere olan ilgi oldukça artmıştır ve literatürde bu alanda yapılan çok sayıda çalışma bulunmaktadır. Ancak bu çalışmaların çoğu sistemin değişen iş yüklerine adapte olabilmesi ve ölçeklenebilirlik konusu yerine sistemin olağan şartlarda nasıl çalışacağı üzerine yoğunlaşmıştır. Literatürde az sayıda olan ölçeklenebilirlik ile ilgili çalışmalarda ise genellikle ölçeklenebilirlik bir kaynak kümesi ile gerçeklenmektedir. Ayrıca, Apache Flink üzerine yapılan çalışma sayısı da oldukça azdır. Bu çalışmada, literatürdeki bu eksikliklerden yola çıkılarak, Apache Flink üzerinde çalışan, değişen çalışma yüklerine adapte olabilen bir sistem tasarımı önerilmiştir. Apache Flink, hem sistem geliştirme hem de ölçekleme metriklerini hesaplama amacıyla kullanılmıştır. Ölçekleme, Kuyruk Teorisi kullanılarak hesaplanan, sistemde meydana gelmesi beklenen gecikme ve kritik sistem metrikleri değerlendirilerek gerçekleştirilmiştir. Büyük veri işleyen sistemlere entegre çalışabilecek bu model ile sistem performanslarının geliştirilmesi ve kalite kayıplarının azaltılması hedeflenmiştir. Son olarak, sistemin hangi durumlarda ölçeklendiği ve ölçeklemeden sonraki durumu benzetim çalışmaları ile gerçeklenerek önerilen sistemin etkinliği gösterilmiştir.

Keywords

Dağıtık Sistemler, Büyük Veri, Akan Veri İşleme, Ölçeklenebilirlik, Kuyruk Teorisi

References

Basanta-Val. P., Garcia N., Fernandez L., Fiesteus J. (2017). Patterns for Distributed Real-Time Stream Processing, IEEE Transactions on Parallel and Distributed Systems, 28(11), 3243-3257.
Botran T.L., Alonso J.M., Lozano J.A. (2014). A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments, Journal of Grid Computing, 12(4), 559-592.
Buddhika T., Stern R., Lindburg K., Ericson K., Pallickara S. (2017). Online Scheduling and Interference Alleviation for Low-Latency, High-Throughput Processing of Data Streams, IEEE Transactions on Parallel and Distributed Systems, 28(12), 3553-3569.
Chakraborty R., Majumdar S. (2016). A Priority Based Resource Scheduling Technique for Multitenant Storm Clusters, International Symposium on Performance Evaluation of Computer and Telecommunication Systems, pp1-6, 24-27 Temmuz, Kanada.
Chen H., Zhang F., Jin H. (2017). Popularity-aware Differentiated Distributed Stream Processing on Skewed Streams, IEEE 25th International Conference on Network Protocols, pp1-10, 10-13 Ekim, Kanada.
De Matteis T., Mencagli G. (2017). Elastic Scaling for Distributed Latency-sensitive Data Stream Operators, 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp61-68, 6-8 Mart, Rusya.
Farahabady M.R.H., Samani H.R.D., Wang Y., Zomaya A.Y., Tari Z. (2016). A QoS-Aware Controller for Apache Storm, IEEE 15th International Symposium on Network Computing and Applications, pp334-342, 31 Ekim-2 Kasım, ABD.
HoseinyFarahabady M., Zomaya A.Y., Tari Z. (2017). QoS- and Contention- Aware Resource Provisioning in a Stream Processing Engine, IEEE International Conference on Cluster Computing, pp137-146, 5-8 Eylül, ABD.
Khoshkbarforoushha A., Ranjan R., Gaire R., Abbasnejad E., Wang L. (2016). Zomaya A.Y., Distribution Based Workload Modelling of Continuous Queries in Clouds, IEEE Transactions on Emerging Topics in Computing, 5(1), 120-133.
Kingman J.F.C. (1962). On queues in heavy traffic, Journal of the Royal Statistical Society. Series B (Methodological), 24 (2), 383-392.
Kombi R.K., Lumineau N., Lamarre P. (2017). A preventive auto-parallelization approach for elastic stream processing, IEEE 37th International Conference on Distributed Computing Systems, pp1532-1542, 5-8 Haziran, ABD.
Li T., Tang J., Xu J. (2016). Performance Modeling and Predictive Scheduling for Distributed Stream Data Processing, IEEE Transactions on Big Data, 2(4), 353-364.
Liu X., Buyya R. (2017). Performance-Oriented Deployment of Streaming Applications on Cloud, IEEE Transactions on Big Data, 5(1), 46-59.
Papageorgiou A., Poormohammady E., Cheng B. (2016). Edge-Computing-aware Deployment of Stream Processing Tasks based on Topology-external Information: Model, Algorithms, and a Storm-based Prototype, IEEE International Congress on Big Data, pp259-266, 27 Haziran-2 Temmuz, ABD.
Qian W., Shen Q., Qin J., Yang D., Yang Y., Wu Z. (2016). S-Storm: A Slot-aware Scheduling Strategy for Even Scheduler in Storm, IEEE 18th International Conference on High Performance Computing and Communications, pp623-630, 12-14 Aralık, Avustralya.
Renart E.G., Diaz-Montes J., Parashar M. (2017). Data-driven Stream Processing at the Edge, IEEE 1st International Conference on Fog and Edge Computing, pp31-40, 14-15 Mayıs, İspanya.
Runsewe O., Samaan N. (2017). Cloud Resource Scaling for Big Data Streaming Applications Using A Layered Multi-dimensional Hidden Markov Model, 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp848-857, 14-17 Mayıs, İspanya.
Wang C., Meng X., Guo Q., Weng Z., Yang C. (2017). Automating Characterization Deployment in Distributed Data Stream Management Systems, IEEE Transactions on Knowledge and Data Engineering, 29(12), 2669 – 2681.
Wang Y., Tari Z., HoseinyFarahabady M., Zomaya A.Y. (2017). QoS-aware resource allocation for stream processing engines using priority channels, IEEE 16th International Symposium on Network Computing and Applications, pp1-9, 30 Ekim-1 Kasım, ABD.
Xu L., Peng B., Gupta I. (2016). Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On-demand, IEEE International Conference on Cloud Engineering, pp22-31, 4-8 Nisan, Almanya.
Zhang J., Li C., Zhu L., Liu Y. (2016). The Real-time Scheduling Strategy Based on Traffic and Load Balancing in Storm, IEEE 18th International Conference on High Performance Computing and Communications, pp372-279, 12-14 Aralık, Avustralya.

A System Design for Latency Aware Dynamic Scaling at Distributed Data Stream Processing System

Year 2020, Volume: 1 Issue: 1, 1 - 12, 22.08.2020

Zeynep Orman Mert Kavi

Abstract

Establishing large-scale distributed stream processing systems and ensuring their operations is a very complex and costly process. These systems should be capable of adapting the varying rates of data stream and they must be scaled, if required. It is usually inevitable to use an effective automatic scaling system which can be integrated into such systems. In recent literature, there are numerous studies on this issue. Many of these studies have focused on how these systems will operate under normal conditions. There are limited studies on scalability where scaling is usually implemented with a set of resources. In this study, based on these shortcomings, a system design which can adapt to changing working loads and work on Apache Flink, is proposed. Apache Flink is used for both system development and calculating the scaling metrics. Scaling is performed by evaluating the expected latency calculated with Queuing Theory and some critical metrics. It is aimed to improve system performances and reduce quality losses with this model, which can be integrated into big data processing systems. Pre-scaling and post-scaling cases are also demonstrated by simulations to show the effectiveness of the proposed system.

Keywords

Distributed systems, Big data, Stream processing, Scalability, Queuing theory

References

Basanta-Val. P., Garcia N., Fernandez L., Fiesteus J. (2017). Patterns for Distributed Real-Time Stream Processing, IEEE Transactions on Parallel and Distributed Systems, 28(11), 3243-3257.
Botran T.L., Alonso J.M., Lozano J.A. (2014). A Review of Auto-scaling Techniques for Elastic Applications in Cloud Environments, Journal of Grid Computing, 12(4), 559-592.
Buddhika T., Stern R., Lindburg K., Ericson K., Pallickara S. (2017). Online Scheduling and Interference Alleviation for Low-Latency, High-Throughput Processing of Data Streams, IEEE Transactions on Parallel and Distributed Systems, 28(12), 3553-3569.
Chakraborty R., Majumdar S. (2016). A Priority Based Resource Scheduling Technique for Multitenant Storm Clusters, International Symposium on Performance Evaluation of Computer and Telecommunication Systems, pp1-6, 24-27 Temmuz, Kanada.
Chen H., Zhang F., Jin H. (2017). Popularity-aware Differentiated Distributed Stream Processing on Skewed Streams, IEEE 25th International Conference on Network Protocols, pp1-10, 10-13 Ekim, Kanada.
De Matteis T., Mencagli G. (2017). Elastic Scaling for Distributed Latency-sensitive Data Stream Operators, 25th Euromicro International Conference on Parallel, Distributed and Network-Based Processing, pp61-68, 6-8 Mart, Rusya.
Farahabady M.R.H., Samani H.R.D., Wang Y., Zomaya A.Y., Tari Z. (2016). A QoS-Aware Controller for Apache Storm, IEEE 15th International Symposium on Network Computing and Applications, pp334-342, 31 Ekim-2 Kasım, ABD.
HoseinyFarahabady M., Zomaya A.Y., Tari Z. (2017). QoS- and Contention- Aware Resource Provisioning in a Stream Processing Engine, IEEE International Conference on Cluster Computing, pp137-146, 5-8 Eylül, ABD.
Khoshkbarforoushha A., Ranjan R., Gaire R., Abbasnejad E., Wang L. (2016). Zomaya A.Y., Distribution Based Workload Modelling of Continuous Queries in Clouds, IEEE Transactions on Emerging Topics in Computing, 5(1), 120-133.
Kingman J.F.C. (1962). On queues in heavy traffic, Journal of the Royal Statistical Society. Series B (Methodological), 24 (2), 383-392.
Kombi R.K., Lumineau N., Lamarre P. (2017). A preventive auto-parallelization approach for elastic stream processing, IEEE 37th International Conference on Distributed Computing Systems, pp1532-1542, 5-8 Haziran, ABD.
Li T., Tang J., Xu J. (2016). Performance Modeling and Predictive Scheduling for Distributed Stream Data Processing, IEEE Transactions on Big Data, 2(4), 353-364.
Liu X., Buyya R. (2017). Performance-Oriented Deployment of Streaming Applications on Cloud, IEEE Transactions on Big Data, 5(1), 46-59.
Papageorgiou A., Poormohammady E., Cheng B. (2016). Edge-Computing-aware Deployment of Stream Processing Tasks based on Topology-external Information: Model, Algorithms, and a Storm-based Prototype, IEEE International Congress on Big Data, pp259-266, 27 Haziran-2 Temmuz, ABD.
Qian W., Shen Q., Qin J., Yang D., Yang Y., Wu Z. (2016). S-Storm: A Slot-aware Scheduling Strategy for Even Scheduler in Storm, IEEE 18th International Conference on High Performance Computing and Communications, pp623-630, 12-14 Aralık, Avustralya.
Renart E.G., Diaz-Montes J., Parashar M. (2017). Data-driven Stream Processing at the Edge, IEEE 1st International Conference on Fog and Edge Computing, pp31-40, 14-15 Mayıs, İspanya.
Runsewe O., Samaan N. (2017). Cloud Resource Scaling for Big Data Streaming Applications Using A Layered Multi-dimensional Hidden Markov Model, 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp848-857, 14-17 Mayıs, İspanya.
Wang C., Meng X., Guo Q., Weng Z., Yang C. (2017). Automating Characterization Deployment in Distributed Data Stream Management Systems, IEEE Transactions on Knowledge and Data Engineering, 29(12), 2669 – 2681.
Wang Y., Tari Z., HoseinyFarahabady M., Zomaya A.Y. (2017). QoS-aware resource allocation for stream processing engines using priority channels, IEEE 16th International Symposium on Network Computing and Applications, pp1-9, 30 Ekim-1 Kasım, ABD.
Xu L., Peng B., Gupta I. (2016). Stela: Enabling Stream Processing Systems to Scale-in and Scale-out On-demand, IEEE International Conference on Cloud Engineering, pp22-31, 4-8 Nisan, Almanya.
Zhang J., Li C., Zhu L., Liu Y. (2016). The Real-time Scheduling Strategy Based on Traffic and Load Balancing in Storm, IEEE 18th International Conference on High Performance Computing and Communications, pp372-279, 12-14 Aralık, Avustralya.

There are 21 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Research Articles
Authors	Zeynep Orman 0000-0002-0205-4198 Mert Kavi This is me 0000-0001-6496-6400
Publication Date	August 22, 2020
Submission Date	July 13, 2020
Acceptance Date	August 11, 2020
Published in Issue	Year 2020 Volume: 1 Issue: 1

Cite

APA	Orman, Z., & Kavi, M. (2020). Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı. İleri Mühendislik Çalışmaları Ve Teknolojileri Dergisi, 1(1), 1-12.
AMA	Orman Z, Kavi M. Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı. imctd. August 2020;1(1):1-12.
Chicago	Orman, Zeynep, and Mert Kavi. “Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı”. İleri Mühendislik Çalışmaları Ve Teknolojileri Dergisi 1, no. 1 (August 2020): 1-12.
EndNote	Orman Z, Kavi M (August 1, 2020) Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı. İleri Mühendislik Çalışmaları ve Teknolojileri Dergisi 1 1 1–12.
IEEE	Z. Orman and M. Kavi, “Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı”, imctd, vol. 1, no. 1, pp. 1–12, 2020.
ISNAD	Orman, Zeynep - Kavi, Mert. “Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı”. İleri Mühendislik Çalışmaları ve Teknolojileri Dergisi 1/1 (August 2020), 1-12.
JAMA	Orman Z, Kavi M. Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı. imctd. 2020;1:1–12.
MLA	Orman, Zeynep and Mert Kavi. “Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı”. İleri Mühendislik Çalışmaları Ve Teknolojileri Dergisi, vol. 1, no. 1, 2020, pp. 1-12.
Vancouver	Orman Z, Kavi M. Akan Veri İşleyen Dağıtık Sistemlerde Gecikme Duyarlı Dinamik Ölçekleme İçin Bir Sistem Tasarımı. imctd. 2020;1(1):1-12.

Download Cover Image

Article Files

Full Text