Publications

Copyright Notice: All rights are reserved to the copyright holders: authors and publishers. All persons copying this information are expected to adhere to the corresponding copyright terms. In most cases, these works may be copied for personal use but may not be reposted without the explicit permission of the copyright holder.

[39] Ali Shoker, Fernando Alves, Paulo Esteves-Verissimo. "ScalOTA: Scalable Secure Over-the-Air Software Updates for Vehicles". In the 42nd International Symposium on Reliable Distributed Systems (SRDS 2023), Marrakech, Morocco, September, 2023

(Click for Abstract)

Abstract. Over-the-Air (OTA) software updates are becoming essential for electric/electronic vehicle architectures in order to reduce recalls amid the increasing software bugs and vulnerabilities. Current OTA update architectures rely heavily on direct cellular repository-to-vehicle links, which makes the repository a communication bottleneck, and increases the cellular bandwidth utilization cost as well as the software download latency. In this paper, we introduce ScalOTA, an end-to-end scalable OTA software update architecture and secure protocol for modern vehicles. For the first time, we propose using a network of update stations, as part of Electric Vehicle charging stations, to boost the download speed through these stations, and reduce the cellular bandwidth overhead significantly. Our formalized OTA update protocol ensures proven end-to-end chain-of-trust including all stakeholders: manufacturer, suppliers, update stations, and all layers of in-vehicle Electric Control Units (ECUs). The empirical evaluation shows that ScalOTA reduces the bandwidth utilization and download latency up to an order of magnitude compared with current OTA update systems.

[38] Ali Shoker, Paulo Esteves-Verissimo, Marcus Völp. "The Path to Fault- and Intrusion-Resilient Manycore Systems on a Chip". In the Disrupt track of the 53rd Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2023), Porto, Portugal, June, 2023

(Click for Abstract)

Abstract. The hardware computing landscape is changing. What used to be distributed systems can now be found on a chip with highly configurable, diverse, specialized and general purpose units. Such Systems-on-a-Chip (SoC) are used to control today's cyber-physical systems, being the building blocks of critical infrastructures. They are deployed in harsh environments and are connected to the cyberspace, which makes them exposed to both accidental faults and targeted cyberattacks. This is in addition to the changing fault landscape that continued technology scaling, emerging devices and novel application scenarios will bring. In this paper, we discuss how the very features, distributed, parallelized, reconfigurable, heterogeneous, that cause many of the imminent and emerging security and resilience challenges, also open avenues for their cure though SoC replication, diversity, rejuvenation, adaptation, and hybridization. We show how to leverage these techniques at different levels across the entire SoC hardware/software stack, calling for more research on the topic.

[37] Ali Shoker, Vincent Rahli, Jérémie Decouchant, and Paulo Esteves-Verissimo. "Intrusion Resilience Systems for Modern Vehicles". In the 97th IEEE Vehicular Technology Conference: VTC2023, Florence, Italy, June 2023. 

(Click for Abstract)

Abstract. Current vehicular Intrusion Detection and Prevention Systems either incur high false-positive rates or do not capture zero-day vulnerabilities, leading to safety-critical risks. In addition, prevention is limited to few primitive options like dropping network packets or extreme options, e.g., ECU Bus-off state. To fill this gap, we introduce the concept of vehicular Intrusion Resilience Systems (IRS) that ensures the resilience of critical applications despite assumed faults or zero-day attacks, as long as threat assumptions are met. IRS enables running a vehicular application in a replicated way, i.e., as a Replicated State Machine, over several ECUs, and then requiring the replicated processes to reach a form of Byzantine agreement before changing their local state. Our study rides the mutation of modern vehicular environments, which are closing the gap between simple and resource-constrained "real-time and embedded systems", and complex and powerful "information technology" ones. It shows that current vehicle (e.g., Zonal) architectures and networks are becoming plausible for such modular fault and intrusion tolerance solutions—deemed too heavy in the past. Our evaluation on a simulated Automotive Ethernet network running two state-of-the-art agreement protocols (Damysus and Hotstuff) shows that the achieved latency and throughout are feasible for many Automotive applications.

[36] Ahmad T Sheikh, Ali Shoker, Paulo Esteves-Verissimo. "Resilient and Secure System on Chip with Rejuvenation in the Wake of Persistent Attacks". At the 16th European Workshop on Systems Security (EuroSec), EuroSys'23, May, 2023.

May 8, 2023, January, 2023

(Click for Abstract)

Abstract. To cope with the ever increasing threats of dynamic and adaptive persistent attacks, Fault and Intrusion Tolerance (FIT) is being studied at the hardware level to increase critical systems resilience. Based on state-machine replication, FIT is known to be effective if replicas are compromised and fail independently. This requires different ways of diversification at the software and hardware levels. In this paper, we introduce the first FIT hardware-based rejuvenation framework, we call Samsara, that allows for creating new FIT replicas with computing cores of diverse architectures. This is made possible by taking advantage of the reconfiguration features of MPSoC with FPGAs. A persistent attack that analyzes and exploits the vulnerability of a core will not be effective as rejuvenation using a different core architecture can be done periodically. Samsara allows for both replacing and adding/removing new cores to adapt to varying levels of threat severity. We introduce this concept and discuss the feasibility using a preliminary design we propose. A more rigorous study and empirical evaluation are left for future work.

[35] Ali Shoker. "Digital Sovereignty Strategies for Every Nation". In the Applied Cybersecurity & Internet Governance (ACIG), V. 1 (1): 56-72, NASK, December 2022. 

(Invited journal)

(Click for Abstract)

Abstract. Digital Sovereignty must be on the agenda of every modern nation. Digital technology is becoming part of our life details, from the vital essentials, like food and water management, to transcendence in the Metaverse and Space. Protecting these digital assets will, therefore, be inevitable for a modern country to live, excel and lead. Digital Sovereignty is a strategic necessity to protect these digital assets from the monopoly of friendly rational states, and the threats of unfriendly Malicious states and behaviors. In this work, we revisit the definition and scope of digital sovereignty through extending it to cover the entire value chain of using, owning, and producing digital assets. We emphasize the importance of protecting the operational resources, both raw materials and human expertise, in addition to research and innovation necessary to achieve sustainable sovereignty. We also show that digital sovereignty by autonomy is often impossible, and by mutual cooperation is not always sustainable. To this end, we propose implementing digital sovereignty using Nash Equilibrium, often studied in Game Theory, to govern the relation with Rational states. Finally, we propose a digital sovereignty agenda for different country’s digital profiles, based on their status quo, priorities, and capabilities. We survey state-of-the-art digital technology that is useful to make the current digital assets sovereign. Additionally, we propose a roadmap that aims to develop a sovereign digital nation, as close as possible to autonomy. Finally, we draw attention to the need of more research to better understand and implement digital sovereignty from different perspectives: technological, economic, and geopolitical.

[34] Ali Shoker and Paulo Esteves-Verissimo. "Intrusion Resilience Systems for Modern Vehicles - Position Paper". In the 7th Critical Automotive applications: Robustness & Safety  workshop, CARS@EDCC, September 2022. 

(Position Paper)

(Click for Abstract)

Abstract. We introduce the concept of Intrusion Resilience Systems (IRS) for modern vehicles. An IRS is a middleware that enables running a vehicular application in a replicated way, i.e., as a Replicated State Machine, over several ECUs. By requiring the replicated processes to reach a form of Byzantine agreement before changing their local state, the IRS ensures the resilience of critical vehicular applications despite assumed faults or attacks, as long as threat assumptions are met. This position paper proposes the tentative architecture of IRS and discusses its conceptual feasibility and underlying challenges. Our study rides the mutation of modern vehicular environments, which are closing the gap between simple and resource-scarce 'real-time and embedded systems', and complex and powerful 'information technology' ones. We show that current architectures are becoming plausible for such modular fault and intrusion tolerance solutions-deemed too heavy in the past. Our conclusion is that this topic deserves more attention in both academia and industry.

[33] Ziad Kassam, Paulo Sérgio Almeida, and Ali Shoker. "Exon: An Oblivious Exactly-Once Messaging Protocol". In the 31st International Conference on Computer Communications and Networks (ICCCN 2022). IEEE/IEEE Communication Society, July 25, 2022.

(Click for Abstract)

Abstract. TCP is typically the default transport protocol of choice for its supposed reliability, even for message-oriented middleware (e.g., ZeroMQ) or inter-actor communication (e.g., distributed Erlang). However, under network issues, TCP connections can fail, which requires ensuring both at-least-once and at-most-once delivery at the upper middleware layer. Moreover, the use of TCP at scale, in highly concurrent systems, can lead to drastic performance loss due to the need for TCP connection multiplexing and the resulting head-of-line blocking. This paper introduces Exon, an oblivious exactly-once messaging protocol, and a corresponding lightweight library implementation. Exon uses a novel strategy of a per-message four-way protocol to ensure oblivious exactly-once messaging, with on-demand protocol-level “soft half-connections” that are established when needed and safely discarded. This achieves correctness, obliviousness, and performance, through merging and pipelining basic protocol messages. The empirical evaluation of Exon demonstrates significant improvements in throughput and latency under packet loss, while maintaining a negligible overhead over TCP in healthy networks.

[32] Ali Shoker. "Blockchain Technology as a Means for Sustainable Development". In the One Earth Journal , Cell Press and Elsevier, June 2021. 

(Invited Paper)

(Click for Abstract)

Abstract. Blockchain technology is a novel computing model and ecosystem for trusted digital services. The technology promotes trust and decentralization, which has inspired new ways to develop smart digital business, engage communities, and automate societies. This makes it a typical multidisciplinary technology enabler for sustainable development with unique properties including trust, automation, decentralization, immutability, and resilience. This Primer aims to emphasize the potential role of blockchain technology in assisting a transition to a sustainable society through shedding light on the rationale behind blockchains, their core properties, and the available variants to get started. The article also conveys the main challenges impeding blockchain technology from attaining its full potential. Therefore, we encourage researchers, professionals, and enthusiasts from different sectors to continue exploring blockchains and investigate their potential for multidisciplinary applications while raising awareness about the considerate and wise use of this technology.

[31] Houssam Yactine, Ali Shoker, and Georges Younes. "ASPAS: As Secure as Possible Available Systems". In the IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS). , Malta, 2021. 

(DisCoTec Best Paper Award, and DAIS Best Paper).

Abstract. Available-Partition-tolerant (AP) geo-replicated systems trade consistency for availability. They allow replicas to serve clients' requests without prior synchronisation. Potential conflicts due to concurrent operations can then be resolved using a conflict resolution mechanism if operations are commutative and execution is deterministic. However, a Byzantine replica can diverge from deterministic execution of operations and break convergence. In this paper, we introduce ASPAS: As Secure as Possible highly Available System that is a Byzantine resilient AP system. ASPAS follows an optimistic approach to maintain a single round-trip response time. It then allows the detection of Byzantine replicas in the background, i.e., off the critical path of clients requests. Our empirical evaluation of ASPAS in a geo-replicated setting shows that its latency in the normal case is close to that of an AP system, and one order of magnitude better than classical BFT protocols that provide stronger (total ordering) guarantees, unnecessary in AP systems.

[30] Ali Shoker, Peter Moertl, Ramiro Robles. "A First Step Towards Holistic Trustworthy Platoons". In the IEEE 7th World Forum on Internet of Things (WF-IoT). Track 13: Workshop on Wireless Intelligent Secure Trustable Things: bringing IoT and AI together, Louisiana, USA, June 2021.

Abstract. Truck platooning is a form of convoy cooperative driving of connected trucks assisted by a lead truck. The aim is to reduce the fuel and driving costs, improve road safety, and reduce CO2 emission. Being semi-autonomous, platoons must be trustworthy in many perspectives. This paper presents a high-level trustworthy requirements analysis on three key perspectives: driver, communication, and security. In addition, we observed that any trustworthy requirement analysis is incomplete if perspectives are addressed independently. Therefore, we propose a simple holistic methodology that addresses the different perspectives as well as their dependencies, and we exemplify the use of the methodology with two use cases presented in the paper. However, we draw attention to the importance of more research to drive a more exhaustive and validated methodology.

[29] Ali Shoker and Houssam Yactine. "On the Feasibility of Byzantine Agreement to Secure Fog/Edge Data Management". Chapter in Book 'Fog/Edge Computing For Security, Privacy, and Applications', DOI: 10.1007/978-3-030-57328-7, Springer International Publishing, January, 2021.

Abstract. Fog/Edge computing improves the latency and security of data by keeping storage and computation close to the data source. Nevertheless, this raises other security challenges against malicious, a.k.a, Byzantine, attacks that can exploit the isolation of nodes, or when access to distributed data is required in untrusted environments. In this work, we study the feasibility of deploying Byzantine Agreement protocols to improve the security of fog/edge systems in untrusted environments. In particular, we explore existing Byzantine Agreement protocols, heavily developed in the Blockchain area, emphasizing the Consistency, Availability, and Partition-Tolerance tradeoffs in a geo-replicated system. Our work identifies and discusses three different approaches that follow the Strong Consistency, Eventual Consistency, and Strong Eventual Consistency models. Our conclusions show that Byzantine Agreement protocols are still immature to be used by fog/edge computing in untrusted environment due to their high finality latency; however, they are promising candidates that encourage further research in this direction.

[28] Ali Shoker. "TorMass: Tor for the Masses Domestic and Monetized Anonymous Communication". In the International Conference on ENTERprise Information Systems (CENTERIS) , Vilamoura, Algarve, Portugal, 2020.

Abstract. The increasing demand on privacy is driving a notable quest for privacy legal instruments and practical techniques. The Onion Router (Tor) is considered the most practical service for anonymous public communications; however, it is not yet mainstream despite more than one decade of practical use and research. The reason is referred to using message encryption layers, i.e., onions, via relays offered through unknown volunteers which does not fully protect legitimate users nor services. It can also be used for illegal purposes which hampers its admissibility in many countries. In this paper, we introduce a new ecosystem built on top of Tor to broaden its use by legitimate users and at the same time provide provenance when users violate the usage policy terms. We propose Anonymity Service Providers that provide paid relays as a service to users. A user buys this service from different providers to diversify the onion circuit and avoid collude and thus disclosing her identity. Anonymity is maintained as long as the user abides to the policy; otherwise, her identity is disclosed via a reporting system. This accountability reporting system can be implemented over a Smart Contract to make arbitration automatic. This work proposes new use cases and business opportunities that are worth consideration by both the research and business communities.

[27] Ali Shoker et al. "LightKone Reference Architecture (LiRA V0.9)". White Paper. Available: https://www.lightkone.eu/ , December, 2019.


Abstract. The LightKone Reference Architecture (LiRA) presents a novel edge reference architecture that takes advantage of decentralized lateral data sharing and convergent vertical data semantics across a myriad of different edge resources. These two principles are key to bridge the existing gaps in current edge-based proposals, namely when considering fog reference architectures, and in particular, by removing the need of centralized lateral data sharing among components that exist in close vicinity and promoting unified data sharing semantics across components in the edge and the core of the system, respectively. LiRA achieves this by moving its focus to application level semantics and exploiting them to enable convergence properties for decentralized data management mechanisms. LiRA is supported by proved and sound techniques like Conflict-free Replicated Data Types (CRDTs) and Transaction Causal Consistency (TCC). These are key to achieve fundamental properties in edge-based solutions such as: autonomy, availability, consistency, and robustness. LiRA is compatible with, and complementary to, cutting edge/fog standards such as the OpenFog Reference Architecture. To show the practical feasibility of LiRA, this paper presents a reference implementation of LIRA, called i-LiRA, composed of a coherent collection of software artifacts and components, that ease the use of the concepts underlying to LiRA in the development of practical applications that take advantage of edge computing. The paper concludes with a discussion on how the artifacts of i-LiRA were leveraged to implement four different edge computing case studies across different application domains.

[26] Ali Shoker. "Successful Systems in Production Graduate Teaching". In the SuperComputing'19 Workshop on Education for High Performance Computing (EduHPC) , Denver 2019.


Abstract. We present our experience in coordinating and teaching a novel graduate systems and computing course named “Successful Systems in Production” (SSP). The course targets graduate students of different research interests in Computer Science. The course aims at giving a breadth knowledge on cutting-edge well-known systems in production, and exploring the potential synergies across different areas of research. Having its roots in Distributed Computing, SSP addresses those systems that overlap with other research areas like Computational Systems, Parallel Computing, Databases, Cloud Computing, Artificial Intelligence, Security, etc. SSP exhibits an agile topic selection model that fits several students’ backgrounds in each academic year. The topics focus on the practical aspects of each selected system that is considered “successful”, i.e., based on its worldwide impact and technical significance. This is important for graduate students to acquire best practices in industry and academia, necessary to build practical computing systems. In the same vein, the assessment method includes a project that is based on one of the presented systems and also intersects with the student’s own research plan. Based on our teaching experience and the excellent feedback of the students, we strongly recommend this graduate course to be taught at other universities.

[25] Ali Shoker, Anna Queralt, and Toni Cortes. "Advanced Conflict-free Replicated DataTypes". In Book chapter 'Data management techniques' in 'Ultrascale Computing Systems', Chap. 4, pp. 85-126. Computing, 2019. DOI: 10.1049/PBPC024E_ch4, IET Digital Library, January, 2019.


Abstract. One approach to achieve ultra-scale data management is to use a full data replication paradigm with a relaxed consistency model. This is advocated given the tradeoffs of Availability, Consistency, and network Partition addressed by the CAP theorem. While a relaxed consistency model allows for prompt local updates, this can lead to potential conflicts in the data once merged else- where. With the premise to eventually converge to a single state, manual and case-tailored solutions are unproven correct and cumbersome to use. In this section, we present a more generic and mathematically proven method though Conflict-free Replicated DataTypes that guarantee eventual convergence. We present four variants of CRDT models: operation-based, pure operation- based, state-based, and delta-state based CRDTs. We aim to keep the presentation simple by addressing a common “set” datatype example throughout all CRDT variants to show their differences. We finally present a case study, on the dataClay distributed platform, demonstrating how CRDTs can be used in practice.

Note: please cite the Book chapter 'Data management techniques' 

[24] Ali Shoker, Joao Leitao, Peter Van Roy, and Albert van der Linde. "Towards General Purpose Computations at the Edge". In Book chapter 'Programming models and runtimes' in 'Ultrascale Computing Systems', Chap. 2, pp. 9-63. Computing, 2019. DOI: 10.1049/PBPC024E_ch2, IET Digital Library, January, 2019.


Abstract. Originally designed to exploit the power of multi-core processors through virtualization, Cloud Computing has changed over the past decade to support ultrascale computations. The new paradigm, often called aggregation, collects a large number of resources in a pool to form a single service with huge storage and computation capacities. Unfortunately, with the huge amounts of data generated via modern applications, the cloud center has become a bottleneck and a single point of failure. This advocated an extended paradigm, called Edge Computing, that brings part of the data storage and computation closer to the user. The benefits are plenty: reduced delays, high availability, low bandwidth usage, improved data privacy, etc. In this section, we introduce recent advances in edge computing that makes the coordination of edge networks synchronization-free and convergent. We address the main challenges facing applications on the data management and communication aspects. The section also provides convenient runtime environments for different categories of edge computing scenarios.

Note: please cite the Book chapter 'Programming models and runtimes'

[23]  Ali Shoker. "Brief Announcement: Sustainable Blockchains through Proof of eXercise". In PODC'18: ACM Symposium on Principles of Distributed Computing, ACM, UK, July, 2018.



Abstract. Cryptocurrency and blockchain technologies are recently gaining wide adoption since the introduction of Bitcoin, being distributed, authority-free, and secure. Proof of Work (PoW) is at the heart of blockchain’s security, asset generation, and maintenance. Although simple and secure, a hash-based PoW like Bitcoin’s puzzle is often referred to as “useless”, and the used intensive computations are considered “waste” of energy. A myriad of Proof of “something” alternatives have been proposed to mitigate energy consumption; however, they either introduced new security threats and limitations, or the “work” remained far from being really “useful”. In this work, we introduce Proof of eXercise (PoX): a sustainable alternative to PoW where an eXercise is a real world matrix-based scientific computation problem. We provide a novel study of the properties of Bitcoin’s PoW, the challenges of a more “rational” solution as PoX, and we suggest a comprehensive approach for PoX.

[22]  Paulo Sérgio Almeida, Ali Shoker, and Carlos Baquero. "Delta state replicated data types". In the Journal of Parallel and Distributed Computing (JPDC), Elsevier, January, 2018.



Abstract. Conflict-free Replicated Data Types (CRDTs) are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the entire state, that may be large, and merging it to other replicas. We introduce Delta State Conflict-Free Replicated Data Types (δ-CRDT) that can achieve the best of both operation-based and state-based CRDTs: small messages with an incremental nature, as in operation-based CRDTs, disseminated over unreliable communication channels, as in traditional state-based CRDTs. This is achieved by defining δ-mutators to return a delta-state, typically with a much smaller size than the full state, that to be joined with both local and remote states. We introduce the δ-CRDT framework, and we explain it through establishing a correspondence to current state-based CRDTs. In addition, we present an anti-entropy algorithm for eventual convergence, and another one that ensures causal consistency. Finally, we introduce several δ-CRDT specifications of both well-known replicated datatypes and novel datatypes, including a generic map composition.

[21]  Ali Shoker. "Sustainable Blockchain through Proof of eXercise". In the proceedings of the 16th IEEE International Symposium on Network Computing and Applications (NCA'17), IEEE Computer Society, Cambridge, MA, USA; October 2017.



Abstract. Cryptocurrency and blockchain technologies are recently gaining wide adoption since the introduction of Bitcoin, being distributed, authority-free, and secure. Proof of Work (PoW) is at the heart of blockchain’s security, asset generation, and maintenance. Although simple and secure, a hash-based PoW like Bitcoin’s puzzle is often referred to as “useless”, and the used intensive computations are considered “waste” of energy. A myriad of Proof of “something” alternatives have been proposed to mitigate energy consumption; however, they either introduced new security threats and limitations, or the “work” remained far from being really “useful”. In this work, we introduce Proof of eXercise (PoX): a sustainable alternative to PoW where an eXercise is a real world matrix- based scientific computation problem. We provide a novel study of the properties of Bitcoin’s PoW, the challenges of a more “rational” solution as PoX, and we suggest a comprehensive approach for PoX.

[20]  Ziad Kassam, Ali Shoker, Paulo Sérgio Almeida and Carlos Baquero. "Aggregation Protocols in Light of Reliable Communication". In the proceedings of the 16th IEEE International Symposium on Network Computing and Applications (NCA'17), IEEE Computer Society, Cambridge, MA, USA; October 2017.


Abstract. Aggregation protocols allow for distributed lightweight computations deployed on ad-hoc networks in a peer-to-peer fashion. Due to reliance on wireless technology, the communication medium is often hostile which makes such protocols susceptible to correctness and performance issues. In this paper, we study the behavior of aggregation protocols when subject to communication failures: message loss, duplication, and network partitions. We show that resolving communication failures at the communication layer, through a simple reliable communication layer, reduces the overhead of using alternative fault tolerance techniques at upper layers, and also preserves the original accuracy and simplicity of protocols. The empirical study we drive shows that tradeoffs exist across various aggregation protocols, and there is no one-size-fits-all protocol.

[19]   Carlos Baquero, Paulo Sérgio Almeida, and Ali Shoker. "Pure Operation-Based Replicated Data Types". arXiv CoRR . October, 2017.

Abstract. Distributed systems designed to serve clients across the world often make use of geo-replication to attain low latency and high availability. Conflict-free Replicated Data Types (CRDTs) allow the design of predictable multi-master replication and support eventual consistency of replicas that are allowed to transiently diverge. CRDTs come in two flavors: state-based, where a state is changed locally and shipped and merged into other replicas; operation-based, where operations are issued locally and reliably causal broadcast to all other replicas. However, the standard definition of op-based CRDTs is very encompassing, allowing even sending the full-state, and thus imposing storage and dissemination overheads as well as blurring the distinction from state-based CRDTs. We introduce pure op-based CRDTs, that can only send operations to other replicas, drawing a clear distinction from state-based ones. Data types with commutative operations can be trivially implemented as pure op-based CRDTs using standard reliable causal delivery; whereas data types having non-commutative operations are implemented using a PO-Log, a partially ordered log of operations, and making use of an extended API, i.e., a Tagged Causal Stable Broadcast (TCSB), that provides extra causality information upon delivery and later informs when delivered messages become causally stable, allowing further PO-Log compaction. The framework is illustrated by a catalog of pure op-based specifications for classic CRDTs, including counters, multi-value registers, add-wins and remove-wins sets.

[18] Ali Shoker, Houssam Yactine, and Carlos Baquero. "As Secure As Possible Eventual Consistency: Work in Progress". In the proceedings of the 3rd International Workshop on Principles and Practice of Consistency for Distributed Data (EuroSys PAPOC’17), ACM, Belgrade, Serbia; April 2017.

Abstract. Eventual consistency (EC) is a relaxed data consistency model that, driven by the CAP theorem, trades prompt consistency for high availability. Although, this model has shown to be promising and greatly adopted by industry, the state of the art only assumes that replicas can crash and recover. However, a Byzantine replica (i.e., arbitrary or malicious) can hamper the eventual convergence of replicas to a global consistent state, thus compromising the entire service. Classical BFT state machine replication protocols cannot solve this problem due to the blocking nature of consensus, something at odd with the availability via replica divergence in the EC model. In this work in progress paper, we introduce a new secure highly available protocol for the EC model that assumes a fraction of replicas and any client can be Byzantine. To respect the essence of EC, the protocol gives priority to high availability, and thus Byzantine detection is performed off the critical path on a consistent data offset. The paper concisely explains the protocol and discusses its feasibility. We aim at presenting a more comprehensive and empirical study in the future.

[17]  Ali Shoker, Ziad Kassam, Paulo Sérgio Almeida, and Carlos Baquero. "Life Beyond Distributed Transactions on the Edge". In the proceedings of the International Middleware Workshop for Edge Clouds and Cloudlets (MECC’16), ACM, Trento, Italy; December 2016.

Abstract. Edge/Fog Computing is an extension to the Cloud Computing model, primarily proposed to pull some of the load on cloud data center towards the edge of the network, i.e., closer to the clients. Despite being a promising model, the foundations to adopt and fully exploit the edge model are yet to be clear, and thus new ideas are continuously advocated. In his paper on “Life beyond Distributed Transactions: an Apostate’s Opinion”, Pat Helland proposed his vision to build “almost infinite” scale future applications, demonstrating why Distributed Transactions are not very practical under scale. His approach models the applications data state as independent “entities” with separate serialization scopes, thus allowing efficient local transactions within an entity, but precluding transactions involving different entities. Accessing remote data (which is assumed rare) can be done through separate channels in a more message-oriented manner. In this paper, we recall Helland’s vision in the aforementioned paper, explaining how his model fits the Edge Computing Model either regarding scalability, applications, or assumptions, and discussing the potential challenges leveraged.

[16]  Ali Shoker. "Exploiting Universal Redundancy". In the proceedings of the 15th IEEE International Symposium on Network Computing and Applications (NCA), IEEE Computer Society, Cambridge, MA, USA; November 2016.


Abstract. Fault tolerance is essential for building reliable services; however, it comes at the price of redundancy, mainly the “replication factor” and “diversity”. With the increasing reliance on Internet-based services, more machines (mainly servers) are needed to scale out, multiplied with the extra expense of replication. This paper revisits the very fundamentals of fault tolerance and presents “artificial redundancy”: a formal generalization of “exact copy” redundancy in which new sources of redundancy are exploited to build fault tolerant systems. On this concept, we show how to build “artificial replication” and design “artificial fault tolerance” (AFT). We discuss the properties of these new techniques showing that AFT extends current fault tolerant approaches to use other forms of redundancy aiming at reduced cost and high diversity.

[15]  Georges Younes, Ali Shoker, Paulo Sergio Almeida, and Carlos Baquero. "Integration Challenges of Pure Operation-based CRDTs in Redis". In the proceedings of the ECOOP Programming Models and Languages for Distributed Computing Workshop (PMLDC'16) , Rome, Italy, July, 2016.


Abstract. Pure operation-based (op-based) Conflict-free Replicated Data Types (CRDTs) are generic and very efficient as they allow for compact solutions in both sent messages and state size. Although the pure op-based model looks promising, it is still not fully understood in terms of practical implementation. In this paper, we explain the challenges faced in implementing pure op-based CRDTs in a real system: the well-known in-memory cache key-value store Redis. Our purpose of choosing Redis is to implement a multi-master replication feature, which the current system lacks. The experience demonstrates that pure op-based CRDTs can be implemented in existing systems with minor changes in the original API.

[14]  Vitor Enes, Carlos Baquero, Paulo Sergio Almeida, and Ali Shoker. "Join Decompositions for Efficient Synchronization of CRDTs after a Network Partition". In the proceedings of the ECOOP Programming Models and Languages for Distributed Computing Workshop (PMLDC'16) , Rome, Italy, July, 2016.


Abstract. State-based CRDTs allow updates on local replicas with- out remote synchronization. Once these updates are propagated, possible conflicts are resolved deterministically across all replicas. δ-CRDTs bring significant advantages in terms of the size of messages exchanged between replicas during normal operation. However, when a replica joins the system after a network partition, it needs to receive the updates it missed and propagate the ones performed locally. Current systems solve this by exchanging the full state bidirectionally or by storing additional metadata along the CRDT. We introduce the concept of join-decomposition for state- based CRDTs, a technique orthogonal and complementary to delta-mutation, and propose two synchronization methods that reduce the amount of information exchanged, with no need to modify current CRDT definitions.

[13]   Paulo Sergio Almeida, Ali Shoker, and Carlos Baquero. "Delta State Replicated Data Types". arXiv CoRR abs/1603.01529. Version 1, March, 2016.


Abstract. CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the entire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce Delta State Conflict-Free Replicated Data Types (δ-CRDTs) that can achieve the best of both worlds: small messages with an incremental nature, as in operation-based CRDTs, disseminated over unreliable communication channels, as in traditional state-based CRDTs. This is achieved by defining delta mutators to return a delta-state, typically with a much smaller size than the full state, that to be joined with both local and remote states. We introduce the δ-CRDT framework, and we explain it through establishing a correspondence to current state-based CRDTs. In addition, we present an anti-entropy algorithm for eventual convergence, and another one that ensures causal consistency. Finally, we introduce several δ-CRDT specifications of both well-known replicated datatypes and novel datatypes, including a generic map composition.

[12]  Ali Shoker, Paulo Sergio Almeida, and Carlos Baquero. "Exactly-Once Quantity Transfer". In the proceedings of SRDS Planetary-Scale Distributed Systems Workshop (W-PSDS 2015) , Montreal, Canada, September, 2015.


Abstract. Strongly consistent systems supporting distributed transactions can be prone to high latency and do not tolerate partitions. The present trend of using weaker forms of consistency, to achieve high availability, poses notable challenges in writing applications due to the lack of linearizability, e.g., to ensure global invariants, or perform mutator operations on a distributed datatype. This paper addresses a specific problem: the exactly-once transfer of a “quantity” from one node to another on an unreliable network (coping with message duplication, loss, or reordering) and without any form of global synchronization. This allows preserving a global property (the sum of quantities remains unchanged) without requiring global linearizability and only through using pairwise interactions between nodes, therefore allowing partitions in the system. We present the novel quantity- transfer algorithm while focusing on a specific use-case: a redistribution protocol to keep the quantities in a set of nodes balanced; in particular, averaging a shared real number across nodes. Since this is a work in progress, we briefly discuss the correctness of the protocol, and we leave potential extensions and empirical evaluations for future work.

[11]   Jean-Paul Bahsoun, Rachid Guerraoui, and Ali Shoker. "Making BFT Protocols Really Adaptive". In the proceedings of the 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS'15), Hyderabad, INDIA, May, 2015.


Abstract. Many state-machine Byzantine Fault Tolerant (BFT) protocols have been introduced so far. Each protocol addressed a different subset of conditions and use-cases. However, if the underlying conditions of a service span different subsets, choosing a single protocol will likely not be a best fit. This yields robustness and performance issues which may be even worse in services that exhibit fluctuating conditions and workloads. In this paper, we reconcile existing state-machine Byzantine Fault Tolerant (BFT) protocols in a single adaptive BFT system, called ADAPT, aiming at covering a larger set of conditions and use-cases, probably the union of individual subsets of these protocols. At anytime, a launched protocol in ADAPT can be aborted and replaced by another protocol according to a potential change (an event) in the underlying system conditions. The launched protocol is chosen according to an “evaluation process” that takes into consideration both: protocol characteristics and its performance. This is achieved by applying some mathematical formulas that match the profiles of protocols to given user (e.g., service owner) preferences. ADAPT can assess the profiles of protocols (e.g., throughput) at run-time using Machine Learning prediction mechanisms to get accurate evaluations. We compare ADAPT with well known BFT protocols showing that it outperforms others as system conditions change and under dynamic workloads.

[10]  Paulo Sergio Almeida, Ali Shoker, and Carlos Baquero. "Efficient State-based CRDTs by Delta-Mutation". In the proceedings of the International Conference of Networked sYStems (NETYS'15) , Agadir, Morocco, May 2015.


Abstract. CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs ensure convergence through disseminating the entire state, that may be large, and merging it to other replicas; whereas operation-based CRDTs disseminate operations (i.e., small states) assuming an exactly-once reliable dissemination layer. We introduce Delta State Conflict-Free Replicated Datatypes ( 𝛿 -CRDT) that can achieve the best of both worlds: small messages with an incremental nature, disseminated over unreliable communication channels. This is achieved by defining   𝛿  -mutators to return a delta-state, typically with a much smaller size than the full state, that is joined to both: local and remote states. We introduce the   𝛿 -CRDT framework, and we explain it through establishing a correspondence to current state-based CRDTs. In addition, we present an anti-entropy algorithm that ensures causal consistency, and two   𝛿 -CRDT specifications of well-known replicated datatypes.

[9]   Carlos Baquero, Paulo Sergio Almeida, and Ali Shoker. "Making Operation-Based CRDTs Operation-Based". In the proceedings of the International Conference on Distributed Applications and Interoperable Systems - (DAIS) , Berlin, Germany, pages 126-140, June, 2014. 

(DisCoTec Best Paper Award finalist)

Abstract. Conflict-free Replicated Datatypes (CRDT) can simplify the design of eventually consistent systems. They can be classified into state- based or operation-based. Operation-based designs have the potential for allowing very compact solutions in both the sent messages and the object state size. Unfortunately, the current approaches are still far from this objective. In this paper, we introduce a new pure operation-based framework that makes the design and the implementation of these CRDTs more simple and efficient. We show how to leverage the meta-data of the messaging middleware to design very compact CRDTs, while only disseminating operation names and their optional arguments.

[8]   Paulo Sergio Almeida, Ali Shoker, and Carlos Baquero. "Efficient State-based CRDTs by Decomposition". In the proceedings of EuroSys PAPEC Workshop , The Netherlands, ACM EuroSys, pp. 2, May, 2014.


Abstract. Eventual consistency is a relaxed consistency model used in large-scale distributed systems that seek better availability when consistency can be delayed. CRDTs are distributed data types that make eventual consistency of a distributed object possible and non ad-hoc. Specifically, state-based CRDTs achieve this through shipping the entire replica state that is, eventually, merged to other replicas ensuring convergence. This imposes a large communication overhead when the replica size or the number of replicas gets larger. In this work, we introduce a decomposable version of state-based CRDTs, called Delta State-based CRDTs (𝛿-CRDT). A 𝛿-CRDT is viewed as a join of multiple fine-grained CRDTs of the same type, called deltas (𝛿). The deltas are produced by applying Î𝛿-mutators, on a replica state, which are modified versions of the original CRDT mutators. This makes it possible to ship small deltas (or batches) instead of shipping the entire state. The challenges are to make the join of deltas equivalent to the join of the entire object in classical state-based CRDTs, and to find a way to derive the 𝛿-mutators. We address this challenge in this work, and we explore the minimal requirements that a communication algorithm must offer according to the guarantees provided by the underlying messaging middleware.

[7]  Carlos Baquero, Paulo Sergio Almeida, and Ali Shoker. "Making Operation-Based CRDTs Operation-Based". In the proceedings of EuroSys PAPEC Workshop , The Netherlands, ACM EuroSys, pp. 2, May, 2014


Abstract. Conflict-free Replicated Datatypes can simplify the design of predictable eventual consistency. They can be classified into state-based or operation-based. Operation-based approaches have the potential for allowing compact designs in both the sent message and the object state size, but current approaches are still far from this objective. Here we explore the design space for operation-based solutions, and we leverage the interaction with the middleware by offering a technique that delivers very compact solutions, while only broadcasting operation names and arguments.

[6]  Sonia Ben Mokhtar, Gautier Berthou, Amadou Diarra, Vivien Quema, and Ali Shoker. "RAC: a Freerider-resilient, Scalable, Anonymous Communication Protocol". In the proceedings of the 33rd International Conference on Distributed Computing Systems (ICDCS'13), Philadelphia, USA, July 2013.

Abstract. Enabling anonymous communication over the Internet is crucial. The first protocols that have been devised for anonymous communication are subject to freeriding. Recent protocols have thus been proposed to deal with this issue. However, these protocols do not scale to large systems, and some of them further assume the existence of trusted servers. In this paper, we present RAC, the first anonymous communication protocol that tolerates freeriders and that scales to large systems. Scalability comes from the fact that the complexity in terms of the number of message exchanges is independent from the number of nodes in the system. Another important aspect of RAC is that it does not rely on any trusted third party. We theoretically prove, using game theory that our protocol is a Nash equilibrium, i.e, that freeriders have no interest in deviating from the protocol. Further, we experimentally evaluate RAC using simulations. Our evaluation shows that, whatever the size of the system (up to 100.000 nodes), the nodes participating in the system observe the same throughput.

[5]  Ali Shoker, Jean-Paul Bahsoun, and Maysam Yabandeh. "Improving Independence of Failures in BFT". In the proceedings of the 12th IEEE International Symposium on Network Computing and Applications, IEEE Computer Society, Cambridge, MA, USA; August 2013.

Abstract. Independence of failures is a basic assumption for the correctness of BFT protocols. In literature, this subject was addressed by providing N-version like abstractions. Though this can provide a good level of obfuscation against semantic-based attacks, if the replicas know each others identities then non-semantic attacks like DoS can still compromise all replicas together. In this paper, we address the obfuscation problem in a different way by keeping replicas unaware of each other. This makes it harder for attackers to sneak from one replica to another and reduces the impact of simultaneous attacks on all replicas. For this sake, we present a new obfuscated BFT protocol, called OBFT, where the replicas remain unaware of each other by exchanging their messages through the clients. Thus, OBFT assumes honest, but possibly crash-prone clients. We show that obfuscation in our context could not be achieved without this assumption, and we give possible applications where this assumption can be accepted. We evaluated our protocol on an Emulab cluster with a wide area topology. Our experiments show that the scalability and throughput of OBFT remain comparable to existing BFT protocols despite the obfuscation overhead.

[4]  Ali Shoker and Jean-Paul Bahsoun. "BFT Selection". In the proceedings of the International Conference of Networked Systems (NETYS'13), Marrakech, Morocco, May 2013.


Abstract. This paper presents the first BFT selection model and algorithm that can be used to choose the most convenient protocol according to the BFT user (i.e., an enterprise) preferences. The selection algorithm applies some mathematical formulas to make the selection process easy and automatic. The algorithm operates in three modes: Static, Dynamic, and Heuristic. The Static mode addresses the cases where a single protocol is needed; the Dynamic mode assumes that the system conditions are quite fluctuating and thus requires runtime decisions, and the Heuristic mode uses additional heuristics to improve user choices. To the best of our knowledge, this is the first work that addresses selection in BFT.

[3]  Ali Shoker. "Byzantine Fault Tolerance: From Static Selection to Dynamic Switching". PhD, held at the School of Mathematics, Informatics, and Telecommunications (MITT), University of Toulouse, IRIT Lab. , Toulouse, France. November 2012.


Abstract. Byzantine Fault Tolerance (BFT) is becoming crucial with the revolution of online applications and due to the increasing number of innovations in computer technologies. Although dozens of BFT protocols have been introduced in the previous decade, their adoption by practitioners sounds disappointing. To some extant, this indicates that existing protocols are, perhaps, not yet too convincing or satisfactory. The problem is that researchers are still trying to establish `the best protocol' using traditional methods, e.g., through designing new protocols. However, theoretical and experimental analyses demonstrate that it is hard to achieve one-size-fits-all BFT protocols. Indeed, we believe that looking for smarter tactics like `fasten fragile sticks with a rope to achieve a solid stick' is necessary to circumvent the issue. In this thesis, we introduce the first BFT selection model and algorithm that automate and simplify the election process of the `preferred' BFT protocol among a set of candidate ones. The selection mechanism operates in three modes: Static, Dynamic, and Heuristic. For the two latter modes, we present a novel BFT system, called Adapt, that reacts to any potential changes in the system conditions and switches dynamically between existing BFT protocols, i.e., seeking adaptation. The Static mode allows BFT users to choose a single BFT protocol only once. This is quite useful in Web Services and Clouds where BFT can be sold as a service (and signed in the SLA contract). This mode is basically designed for systems that do not have too fluctuating states. In this mode, an evaluation process is in charge of matching the user preferences against the profiles of the nominated BFT protocols considering both: reliability, and performance. The elected protocol is the one that achieves the highest evaluation score. The mechanism is well automated via mathematical matrices, and produces selections that are reasonable and close to reality. Some systems, however, may experience fluttering conditions, like variable contention or message payloads. In this case, the static mode will not be efficient since a chosen protocol might not fit the new conditions. The Dynamic mode solves this issue. Adapt combines a collection of BFT protocols and switches between them, thus, adapting to the changes of the underlying system state. Consequently, the `preferred' protocol is always polled for each system state. This yields an optimal quality of service, i.e., reliability and performance. Adapt monitors the system state through its Event System, and uses a Support Vector Regression method to conduct run time predictions for the performance of the protocols (e.g., throughput, latency, etc). Adapt also operates in a Heuristic mode. Using predefined heuristics, this mode optimizes user preferences to improve the selection process. The evaluation of our approach shows that selecting the `preferred' protocol is automated and close to reality in the static mode. In the Dynamic mode, Adapt always achieves the optimal performance among available protocols. The evaluation demonstrates that the overall system performance can be improved significantly too. Other cases explore that it is not always worthy to switch between protocols. This is made possible through conducting predictions with high accuracy, that can reach more than 98% in many cases. Finally, the thesis shows that Adapt can be smarter through using heuristics.

[2]  Ali Shoker and Jean-Paul Bahsoun. "Towards Byzantine Resilient Directories". In the proceedings of the 11th IEEE International Symposium on Network Computing and Applications (NCA), IEEE Computer Society, Cambridge, MA, USA; August 2012.

Abstract. Notable Byzantine Fault Tolerant protocols have been designed so far. These protocols are often evaluated on simple benchmarks, and few times on NFS systems. On the contrary, studies that addressed the behavior of BFT on large back-ends, like Directories, are few. We believe that studying such systems is crucial for practice community due to their popularity. In this paper, we integrate BFT with OpenLDAP Directory. We introduce the design of the integrated system, that we call BFT-LDAP. Then, we study its behavior accompanied with some useful observations. In addition, we discuss the cost overhead of this integration. Our approach ensures that OpenLDAP legacy code remains completely intact, and that the integration with BFT is straightforward using APIs. Moreover, we convey that the additional performance cost of BFT-LDAP is negligible as compared to that of stand-alone OpenLDAP. We conducted our experiments on Emulab. The experiments indicate that the performance discrepancy of BFT-LDAP is negligible whenever different state-of-the-art BFT protocols are used. Other experiments demonstrate that a little sacrifice in throughput (less than 10%) is needed in order to leverage the resiliency of OpenLDAP against Byzantine faults (i.e., through applying BFT).

[1]  Rachid Guerraoui, Maysam Yabandeh, Ali Shoker and Jean-Paul Bahsoun. "Trustful Cumulus Clouds". EPFL Technical Report, LPD-REPORT-2010-10, December, 2010.


Abstract. Cloud computing offers an appealing business model and it is tempting for companies to delegate their IT services to the cloud. Yet, a company might find it risky to do so for sensible services and to depend entirely on a single provider, which can be vulnerable and constitute a clearly identified target for attackers. We explore in this paper a replication approach where copies of the same IT service are placed on several (cumulus) clouds that are not only independent but actually unaware of each other. Replica consistency is ensured using CBFT, a new BFT protocol designed for wide area networks. CBFT uses a primary to handle contention among multiple client requests but shares the load of multicasting and encrypting them among the clients. We evaluate CBFT on an Emulab cluster with a wide area topology and convey its scalability with respect to state of the art BFT protocols.