Modeling and Analysis of P2P Data Storage Systems

Julian Monteiro

Supervised by Stéphane Pérennes and Olivier Dalle

Abstract

Large scale peer-to-peer systems are foreseen as a way to provide highly reliable data storage at low cost. To achieve high durability, such P2P systems encode the user data in a set of redundant fragments and distribute them among the peers (see Figure 1). This redundancy has to be constantly monitored and maintained by a reconstruction process due to the occurrence of peer failures or departures. Indeed, the system performance depends on many parameters that need to be well tuned, such as the factor of redundancy, the code scheme used, the frequency of data repair and the size of a data block. These parameters impact the amount of resources (e.g., bandwidth, storage space, etc.) needed to obtain a certain reliability (e.g., probability to lose data). This thesis aims at providing tools to analyse and predict the performance of large scale data storage systems. Thus, different techniques are studied and applied. They range from formal analysis (using Markov chains and fluid models), simulations, and experimentation (using Grid5000 platform). By comparing to simulations, we show that Markov chain models give correct approximations of the system average behavior, but fails to capture its variations over time. Our main contribution is a new stochastic model based on fluid approximation, that capture the deviations around the mean behavior. We also use queuing models to estimate the repair time under limited bandwidth. We additionally study different ways to distribute fragments (placement strategies), and study a hybrid code that reduces the repair bandwidth consumption. Lastly, we conducted real experimentation on the Grid5000 platform to validate our results.

P2P storage scheme
Figure 1. Files or raw data are cut into data-blocks. Each data-block is divided into s initial fragments, from which r fragments of redundancy are added. Any s fragments among s+r are sufficient to recover the original data-block.

The contributions are organized as following: