Scholar - SciOpen

Most blockchain systems currently adopt resource-consuming protocols to achieve consensus between miners; for example, the Proof-of-Work (PoW) and Practical Byzantine Fault Tolerant (PBFT) schemes, which have a high consumption of computing/communication resources and usually require reliable communications with bounded delay. However, these protocols may be unsuitable for Internet of Things (IoT) networks because the IoT devices are usually lightweight, battery-operated, and deployed in an unreliable wireless environment. Therefore, this paper studies an efficient consensus protocol for blockchain in IoT networks via reinforcement learning. Specifically, the consensus protocol in this work is designed on the basis of the Proof-of-Communication (PoC) scheme directly in a single-hop wireless network with unreliable communications. A distributed MultiAgent Reinforcement Learning (MARL) algorithm is proposed to improve the efficiency and fairness of consensus for miners in the blockchain system. In this algorithm, each agent uses a matrix to depict the efficiency and fairness of the recent consensus and tunes its actions and rewards carefully in an actor-critic framework to seek effective performance. Empirical results from the simulation show that the fairness of consensus in the proposed algorithm is guaranteed, and the efficiency nearly reaches a centralized optimal solution.