Energized with the Coffee , let's get back to modeling distributed systems. After the two phase commit protocol, the next protocol that we will jump to is a simple broadcast-based failure detector!
By this point in the tutorial, we have gotten familiar with the P language and most of its features. So, working through this example should be super fast!
How to use this example
We assume that you have cloned the P repository locally.
git clone https://github.com/p-org/P.git
The recommended way to work through this example is to open the P/Tutorial folder in IntelliJ side-by-side a browser using which you can simultaneously read the description for each example and browse the P program in IntelliJ.
To know more about P language primitives used in the example, please look them up in the language manual.
System: We consider a simple failure detector that basically broadcasts ping messages to all the nodes in the system and uses a timer to wait for pong responses from all nodes. If a node does not respond with a pong message after multiple attempts (either because of network failure or node failure), the failure detector marks the node as down and notifies the clients about the nodes that are potentially down. We use this example to show how to model network message loss in P and discuss how to model other types of network behaviours.
Correctness Specification: We would like to check - using a liveness specification - that if the failure injector shuts down a particular node then the failure detector always eventually detects the node failure and notifies the client.
The P models (PSrc) for the FailureDetector example consist of four files:
- FailureDetector.p: Implements the
[Expand]: Let's walk through FailureDetector.p
- (L1 - L4) → Event
ePongare used to communicate between the
Nodestate machines (manual: event declaration).
- (L6) → Event
eNotifyNodesDownis used by the FailureDetector to inform the clients about the nodes that are potentially down.
- (L14 - L129) → Declares the
FailureDetectorstate machine (manual: P state machine). The key points to note in the
FailureDetectormachine are the usage of the Timer machine to model the usage of OS timer, the usage of ReliableBroadCast, and the usage of UnReliableBroadCast defined in NetworkFunctions.p.
- Node.p: Implements the
[Expand]: Let's walk through Node.p
- (L4 - L14) → Declares the
Nodestate machine. The
Nodemachine responds with a
ePongmessage on receiving a
ePingmessage from the
FailureDetector. On receiving a
eShutDownmessage from the
FailureInjector, the machine halts itself.
- Client.p: Declares the
[Expand]: Let's walk through Client.p
Client machine is a dummy machine that gets a set of alive nodes when the system starts and maintains the set of currently alive nodes by removing the nodes that are marked as down by the
- FailureDetectorModules.p: Declares the
[Expand]: Let's walk through FailureDetectorModules.p
FailureDetector module which is the union of the module consisting of the
Client machines and the
The P Specification (PSpec) for the FailureDetector is implemented in ReliableFailureDetector.p. We define a simple
ReliableFailureDetector liveness specification to assert that all nodes that have been shutdown
by the failure injector will eventually be detected by the failure detector as failed nodes.
[Expand]: Let's walk through ReliableFailureDetector.p
- (L6 - L57) → Declares the
ReliableFailureDetectorspec machine basically maintains two sets
nodesDownDetected(nodes that are detected as down by the detector) and
nodesShutdownAndNotDetected(nodes that are shutdown by the failure injector but not yet detected).
ReliableFailureDetectormonitor observes the
eShutDownevents to update these maps and move between the
hotstate (unstable state) and non-hot states. The system is in a hot state if there are nodes that are shutdown but not yet detected by the failure detector. The system violates a liveness specification if any of its execution paths terminates in a hot state.
- To understand the semantics of the P spec machines and the details about liveness monitors, please read the manual: p monitors.
The test scenarios folder in P has two parts: TestDrivers and TestScripts. TestDrivers are collections of state machines that implement the test harnesses (or environment state machines) for different test scenarios. TestScripts are collections of test cases that are automatically run by the P checker.
[Expand]: Let's walk through TestDriver.p
This file consists of a single test driver machine that sets up the system under test given the number of nodes and clients in the system. The
SetupSystemWithFailureInjector function creates the clients, nodes, failure injector and the failure detector machines.
[Expand]: Let's walk through TestScript.p
There is a single testcase (TestFailureDetector) defined for the FailureDetector system. The test case asserts the
ReliableFailureDetector specification on a system which is a composition of the
FailureInjector, and the test-driver
Run the following command to compile the FailureDetector project:
---------------------------------------- ==== Loading project file: FailureDetector.pproj ....... includes p file: P/Tutorial/4_FailureDetector/PSrc/FailureDetectorModules.p ....... includes p file: P/Tutorial/4_FailureDetector/PSrc/FailureDetector.p ....... includes p file: P/Tutorial/4_FailureDetector/PSrc/Node.p ....... includes p file: P/Tutorial/4_FailureDetector/PSpec/ReliableFailureDetector.p ....... includes p file: P/Tutorial/4_FailureDetector/PTst/TestDriver.p ....... includes p file: P/Tutorial/4_FailureDetector/PTst/TestScript.p ....... includes p file: P/Tutorial/4_FailureDetector/PTst/Client.p ==== Loading project file: P/Tutorial/Common/Timer/Timer.pproj ....... includes p file: P/Tutorial/Common/Timer/PSrc/Timer.p ....... includes p file: P/Tutorial/Common/Timer/PSrc/TimerModules.p ==== Loading project file: P/Tutorial/Common/FailureInjector/FailureInjector.pproj ....... includes p file: P/Tutorial/Common/FailureInjector/PSrc/NetworkFunctions.p ....... includes p file: P/Tutorial/Common/FailureInjector/PSrc/FailureInjector.p ---------------------------------------- ---------------------------------------- Parsing .. Type checking ... Code generation .... Generated FailureDetector.cs ---------------------------------------- Compiling FailureDetector.csproj .. Microsoft (R) Build Engine version 16.10.2+857e5a733 for .NET Copyright (C) Microsoft Corporation. All rights reserved. Determining projects to restore... Restored P/Tutorial/4_FailureDetector/FailureDetector.csproj (in 880 ms). FailureDetector -> P/Tutorial/4_FailureDetector/POutput/netcoreapp3.1/FailureDetector.dll Build succeeded. 0 Warning(s) 0 Error(s)
There is only a single test case in the FailureDetector program and we can directly run the test case for 10000 iterations:
pmc <Path>/FailureDetector.dll -i 10000
Discussion: Modeling Message Reordering
(to be added soon)
What did we learn through this example?
In this example, we saw how to use data nondeterminism to model message loss and unreliable sends. We also discussed how to model other types of network nondeterminism.