Actual final

....
2023-05-04 21:40:54 +01:00 · 2023-05-04 13:37:44 +01:00
2 changed files with 61 additions and 54 deletions
--- a/whitepaper/Dissertation.pdf
+++ b/whitepaper/Dissertation.pdf
--- a/whitepaper/Dissertation.tex
+++ b/whitepaper/Dissertation.tex
@ -50,6 +50,13 @@
 \pagenumbering{arabic}

 \maketitle
+\newpage
+
+\consultation{0}
+\newpage
+
+\declaration{Cryptographic protocol for playing Risk in an untrusted setting}{Jude Southworth}
+\newpage

 \abstract
 We present a modern implementation of the Paillier cryptosystem for the browser, using Jurik's form to optimise encryption. Furthermore, we present an application of this cryptosystem with zero-knowledge proofs to enable peers to maintain private state in a peer-to-peer implementation of the board game Risk. Use of novel zero-knowledge proofs enables peers to verify that the actions of other players adhere to the game's rules without learning additional information about their private state. Finally, we present benchmarks of the implementation.
@ -105,7 +112,7 @@ For playing games over an internet connection, multiple solutions already exist.
 In highly centralised networks, traffic is routed to a number of servers that are operated by the same organisation who maintains the game or service. This is the current standard for the majority of the internet: in fact, this is the methodology used by the official version of Risk, which is available as an app.

 Without patching the executables, there is no way for a user to run their own servers, or to connect to a third party's server. This has two main advantages: \begin{itemize}
-    \item \textbf{Moderation.} The developers can enforce their own rules through some form of EULA, and this would be properly enforceable, as if a user is banned from the official servers, there is no alternative.
+    \item \textbf{Moderation.} The developers can create rules for the platform through a EULA, and this would be enforceable, as if a user is banned from the official servers, there is no alternative.

    \item \textbf{Security.} The server acts as a trusted party, and validates all communications from players. Hence, players cannot subvert a (properly implemented) service's protocol.
 \end{itemize}
@ -132,7 +139,7 @@ P2P and federated networks address each of the disadvantages listed above. \begi
 	\item \textbf{Ownership.} Games such as Unreal Tournament 99 still have playable servers, as the servers are community-run, and so as long as people still wish to play the game, they will remain online (despite the original developers no longer officially supporting the title) \cite{eatsleeput.com_2022,epic}.
 \end{itemize}

-However, general privacy can often be worse in fully P2P networks than that of fully centralised networks. As there is no trusted server, there is no easy way to hide traffic of certain users or maintain, validate, and operate on private state. For example, most popular cryptocurrencies (such as Bitcoin and Ethereum) are public-ledger, meaning transactions can be publicly tracked. This is far less private than services such as banks or cash \cite{bitcoin08,bitcoindeanon}.
+However, general privacy can often be worse in fully P2P networks than that of fully centralised networks. As there is no trusted server, there is no easy way to obscure the traffic of each user or to maintain, validate, and operate on private data. For example, most popular cryptocurrencies (such as Bitcoin and Ethereum) are public-ledger, meaning transactions can be publicly tracked. This is far less private than services such as banks or cash \cite{bitcoin08,bitcoindeanon}.

 Some P2P services try to address issues with privacy. Monero and Zcash are privacy coins that use cryptographic protocols to obscure transaction amounts \cite{monero,zcash}. The P2P file-sharing protocol Soulseek keeps a user's download history private by using direct communication between the two peers \cite{slsk}. The downside of this approach is that if the first user goes offline, files will no longer be available. BitTorrent overcomes this by pooling peers with the file into a "seed pool". The disadvantage of this approach is that the users who download files is now public knowledge \cite{cohen_2017}.

@ -209,7 +216,7 @@ One solution to this is to transform a proof into a non-interactive zero-knowled

 \subsection{Games as graphs}

-The board used to play Risk can be viewed as an undirected graph. Each region is a node, with edges connecting it to the adjacent regions. For convenience, we also consider the player's hand to be a node, which has all units not in play placed upon it.
+Risk's board layout can be viewed as an undirected graph. Each region is a node, with edges connecting it to the adjacent regions. For convenience, we also consider the player's hand to be a node, which has all units not in play placed upon it.

 Furthermore, the actions taken when playing the game can be seen as constructing new edges on a directed weighted graph. This makes us interested in the ability to prove that the new edges conform to certain rules.

@ -302,7 +309,7 @@ An alternative general protocol is the $\Sigma$-protocol \cite{groth2004honest}.
 \end{itemize}
 This reduces the number of communications to a constant, even for varying numbers of challenges.

-The Fiat-Shamir heuristic \cite{fiatshamir}, as discussed above, is another way to reduce communication by using a random oracle. For ledgers, non-interactive zero-knowledge proofs are necessary, as the ledger must be resilient to a user going offline. This is not the same in our case, however non-interactive zero-knowledge proofs are still beneficial as the amount of communications can be reduced significantly, resulting in simpler network code.
+The Fiat-Shamir heuristic \cite{fiatshamir}, as discussed above, is another way to reduce communication by using a random oracle. For ledgers, non-interactive zero-knowledge proofs are necessary, as the ledger must be resilient to a user going offline. In our case, users do not go offline. However, non-interactive zero-knowledge proofs are still beneficial as the amount of communications can be reduced significantly, resulting in simpler network code.

 The downside of using the Fiat-Shamir heuristic in our implementation is that any third party can verify proofs. In some situations, we do not want this to be the case.

@ -470,7 +477,7 @@ It must be noted that \texttt{BigInt} is inappropriate for cryptography in pract

 \subsection{Modular exponentiation}\label{subsection:modexp}

-As \texttt{BigInt}'s V8 implementation does not optimise modular exponentiation itself, we employ the use of addition chaining \cite{schneier_1996}. Addition chaining breaks a modular exponentiation into repeated square-and-modulo operations, which are less expensive to perform.
+As the implementation of \texttt{BigInt}s in V8 does not optimise modular exponentiation itself, we employ the use of addition chaining \cite{schneier_1996}. Addition chaining breaks a modular exponentiation into repeated square-and-modulo operations, which are less expensive to perform.

 The number of operations is dependent primarily on the size of the exponent. For an exponent $b$, between $|b|$ and $2|b|$ multiply-and-modulo operations are performed.

@ -570,7 +577,7 @@ We also need to compute $\mu = \lambda^{-1} \mod n$ as part of decryption. Fortu

 Let $c$ be the ciphertext. The corresponding plaintext is computed as \begin{align*}
 	m = L(c^\lambda \mod n^2) \cdot \mu \mod n,
-\end{align*} where $L(x) = \frac{x - 1}{n}$. This operation can be optimised by applying Chinese remainder theorem. However, in the application presented, decryption is not used and is only useful as a debugging measure. So this optimisation is not applied.
+\end{align*} where $L(x) = \frac{x - 1}{n}$. This operation can be optimised by applying Chinese remainder theorem. However, decryption is not used in the application, and is only useful as a debugging measure. So this optimisation is not applied.

 \subsection{Implementation details}

@ -996,15 +1003,15 @@ We combine some ideas from the graph isomorphism proofs with set bijection proof

 \subsection{Optimising}

-It is preferred that these proofs can be performed with only a few communications: this issue is particularly prevalent for protocols requiring multiple rounds to complete. The independence of each round on the next means the proof can be performed in parallel, so the prover computes all of their private state, then the verifier computes all of their challenges. However, still is the issue of performing proofs of zero.
+It is preferred that these proofs can be performed with only a few communications: this issue is particularly prevalent for protocols requiring multiple rounds to complete. The independence of each round on the next means the proof can be performed in parallel, so the prover computes all of their proofs, then the verifier computes all of their challenges. However, the issue remains of performing the proofs of zero.

 We can apply the Fiat-Shamir heuristic to make proofs of zero non-interactive \cite{fiatshamir}. In place of a random oracle, we use a cryptographic hash function. We take the hash of some public parameters to prevent cheating by searching for some values that hash in a preferable manner. In this case, selecting $e = H(g, m, a)$ is a valid choice. To get a hash of desired length, an extendable output function such as SHAKE256 can be used \cite{FIPS202}. The library jsSHA \cite{jssha} provides an implementation of SHAKE256 that works within a browser.

-We can then apply the Fiat-Shamir heuristic to each higher-level protocol to make them non-interactive. The proving party first generates $t$ proofs (this can be done independently of verification) and serialises them to JSON. Then, use SHA-3 to compute a $t$-bit hash of the serialised proofs, and use each bit of this hash as the verifier's $a$. It is assumed that the prover cannot compute preimages of SHA-3, so they cannot control which challenges will be "requested". This makes it computationally infeasible for the prover to cheat.
+We can then apply the Fiat-Shamir heuristic to each higher-level protocol to make them non-interactive. The proving party first generates $t$ proofs (this can be done independently of verification) and serialises them to JSON. Then, use SHA-3 to compute a $t$-bit hash of the serialised proofs, and use each bit of this hash as the verifier's challenge. It is assumed that the prover cannot compute preimages of SHA-3, so they cannot control which challenges will be "requested". This makes it computationally infeasible for the prover to cheat.

 \subsection{Application to domain}

-We finally present a diagram showing how each protocol presented ties into the domain. We highlight the interactions between two particular players: Player 1 is the current player, and Player 2 controls a region neighbouring a region of Player 1.
+Finally, the following diagram shows how each protocol presented ties into the domain. We highlight the interactions between two particular players: Player 1 is the current player, and Player 2 controls a region neighbouring a region of Player 1.

 \begin{figure}[H]
 	\centering
@ -1059,10 +1066,10 @@ We finally present a diagram showing how each protocol presented ties into the d
 		\fill (6,-15) circle [radius=2pt] ;
 		\fill (12,-15) circle [radius=2pt] ;
 	\end{tikzpicture}
-	\caption{An example turn in Risk incorporates each of the protocols presented above.}
+	\caption{An example turn in our P2P implementation of Risk.}
 \end{figure}

-\chapter{Review}
+\chapter{Discussion}

 \section{Theoretic considerations}

@ -1082,56 +1089,56 @@ Paillier is broken if factoring large numbers is computationally feasible \cite[

 The proof of zero is honest-verifier \cite[Section~5.2]{damgard2003}. However, applying the Fiat-Shamir heuristic converts such a proof into a general zero-knowledge proof \cite[Section~5]{fiatshamir}. This means that, supposing the choice of transform used is appropriate, \hyperref[protocol1]{Protocol~\ref*{protocol1}} should also be general zero-knowledge. However, the interactive proofs performed as part of the game are still only honest-verifier. Consequently, a malicious verifier may be able to extract additional information from the prover (such as the blinding value used).

+\section{Security}
+
+\subsection{Soundness}
+
+Assuming $t = 24$, the chance of an undetected cheater in a single execution of a multi-round protocol is $2^{-24} \approx 6.0 \times 10^{-8}$.
+
+It is possible that even if a prover cheats a proof at one point, the cheat would be detected later on in the game. For example, suppose a player cheated the range proof during an attack, and as a result won the attack. This instance of cheating is likely to be detected in the final range proofs to prove that regions are non-negative in value at the end of a turn.
+
+We previously discussed the soundness issues relating to the BCDG proof \cite{bcdg1987}. These issues are overcome by \hyperref[protocol4]{Protocol~\ref*{protocol4}}, which instead aims to fix an upper bound on the bit length of a value, rather than prove a value falls within a specific range.
+
+From the additive homomorphic property, this proof can be easily manipulated to cover other ranges without affecting the soundness: for example, to prove a value $v$ is in the range $1 \leq v \leq 256$, each party first multiplies the ciphertext by $g^{-1} \mod n^2$, and then proceeds with the proof as normal.
+
+\subsection{Collusion}
+
+Assuming $n$ players, we discussed that \hyperref[protocol2]{Protocol~\ref*{protocol2}} is resistant to $n-1$ colluding parties. Similarly, as the Fiat-Shamir heuristic is used for most proofs, colluding parties cannot agree beforehand to use specific challenges, which would allow cheating of proofs.
+
+The only instance of a zero-knowledge proof that doesn't use Fiat-Shamir is the proof of neighbouring values. However, this proof is not important to the integrity of the game, as all state-changing actions are verified by the other non-interactive multi-round proofs.
+
+Colluding players could agree to not verify each other's proofs. However, this would just result in the colluding players' games diverging from the games of the players who are playing fairly, effectively ending the game session for the non-colluding players.
+
 \section{Efficiency}

 \subsection{Storage complexity}

-Let $n$ be the Paillier modulus.
+For this section, let $n$ be the Paillier modulus.

-Paillier ciphertexts are constant size, each $2|n|$ in size (as they are taken modulo $n^2$). This is small enough for the memory and network limitations of today.
+Paillier ciphertexts are constant size, each $2|n|$ in size (as they are taken modulo $n^2$). This is within today's memory and network limitations.

 The interactive proof of zero uses two Paillier ciphertexts (each size $2|n|$), a challenge of size $|n|$, and a proof statement of size $|n|$. In total, this is a constant size of $6|n|$.

-On the other hand, the non-interactive variant needs not communicate the challenge (as it is computed as a function of other variables). So the non-interactive proof size is $5|n|$.
+On the other hand, the non-interactive variant need not communicate the challenge (as it is computed as a function of other variables). So the non-interactive proof size is $5|n|$.

-The non-interactive \hyperref[protocol1]{Protocol~\ref*{protocol1}} requires multiple rounds. Assume that we use 48 rounds: this provides a good level of soundness, with a cheat probability of $\left(\frac{1}{2}\right)^{-48} \approx 3.6 \times 10^{-15}$. Additionally, assume that there are five regions to verify. Each prover round then requires five Paillier ciphertexts, and each verifier round five non-interactive proofs of zero plus some negligible amount of additional storage for the bijection.
-This results in a proof size of $(10|n| + 10|n|) \times 48 = 960|n|$. For key size $|n| = 2048$, this is 240kB. This is a fairly reasonable size for memory and network, but risks exceeding what can be placed within a processor's cache, leading to potential slowdown during verification.
+The non-interactive \hyperref[protocol1]{Protocol~\ref*{protocol1}} requires multiple rounds. Assume that we use 24 rounds, and additionally assume that there are five regions to verify. Each prover round then requires five Paillier ciphertexts, and each verifier round requires five non-interactive proofs of zero plus some negligible amount of additional storage for the bijection.
+This results in a proof size of $(10|n| + 10|n|) \times 24 = 480|n|$. For key size $|n| = 2048$, this is 120kB. This is a reasonable size for memory and network, but risks exceeding what can be placed within a processor's cache, leading to potential slowdown during verification.

-This could be overcome by reducing the number of rounds, which comes at the cost of increasing the probability of cheating. In a protocol designed to only facilitate a single game session, this may be acceptable to the parties involved. For example, reducing the number of rounds to 24 will increase the chance of cheating to $\left(\frac{1}{2}\right)^{-24} \approx 6.0 \times 10^{-8}$, but the size would reduce by approximately half.
+This could be overcome by reducing the number of rounds, which comes at the cost of decreasing the soundness. In a protocol designed to only facilitate a single game session, this may be acceptable to the parties involved. For example, reducing the number of rounds to 12 will increase the chance of cheating to $\frac{1}{4096}$, but the size would reduce to approximately half.

-Each of these calculations is in an ideal situation without compression or signatures: in the implementation presented, the serialisation of a ciphertext is larger than this, since it serialises to a string of the hexadecimal representation and includes a digital signature for authenticity. In JavaScript, encoding a byte string as hexadecimal should yield approximately a four times increase in size, as one byte uses two hexadecimal characters, which are encoded as UTF-16. Results for the actual sizes of each proof are given in \hyperref[table3]{Table~\ref*{table3}}. Some potential solutions are discussed here.
+Each of these calculations is in an ideal situation without compression or signatures. In the implementation presented, the serialisation of a ciphertext is larger than this for two main reasons. First, each value serialises to a string of its hexadecimal representation, and secondly each message includes a digital signature for authenticity. In JavaScript, encoding a byte string as hexadecimal should yield approximately a four times increase in size, as one byte uses two hexadecimal characters, which are encoded as UTF-16. Results for the actual sizes of each proof are given in \hyperref[table3]{Table~\ref*{table3}}. Some potential solutions are discussed here.

-\textbf{Compression.} One solution is to use string compression. String compression can reduce the size considerably, as despite the ciphertexts being random, the hex digits only account for a small amount of the UTF-8 character space. LZ-String, a popular JavaScript string compression library \cite{lzstring}, can reduce the size of a single hex-encoded ciphertext to about 35\% of its original size. This will result in some slowdown due to compression time. However, this is somewhat negligible in the face of the time taken to produce and verify proofs in the first place.
+\textbf{Compression.} One solution is to use string compression. String compression can reduce the size considerably, as despite the ciphertexts being random, the hex digits only account for a small amount of the UTF-8 character space. LZ-String, a popular JavaScript string compression library \cite{lzstring}, can reduce the size of a single hex-encoded ciphertext to about 35\% of its original size. This will result in some slowdown due to compression time. However, this is negligible in the face of the time taken to produce and verify proofs in the first place.

-\textbf{Message format.} Another solution is to use a more compact message format, for example msgpack \cite{msgpack} (which also has native support for binary literals).
+\textbf{Message format.} Another solution is to use a more compact message format, for example msgpack \cite{msgpack}, which also has native support for binary literals.

 \textbf{Smaller key size.} The size of ciphertexts depends directly on the size of the key. Using a shorter key will reduce the size of the ciphertexts linearly.

 \subsection{Time complexity}

-Theoretic timing results versus RSA are backed experimentally by the implementation. The following benchmarking code was executed.
+Theoretic timing results versus RSA are backed experimentally by the implementation. Performing 250 Paillier encrypts required 47,000ms. On the other hand, performing 250 RSA encrypts required just 40ms. Results are shown in \hyperref[table1]{Table~\ref*{table1}}.

-\begin{minted}{javascript}
-    console.log("Warming up")
-
-    for (let i = 0n; i < 100n; i++) {
-        keyPair.pubKey.encrypt(i);
-    }
-
-    console.log("Benching")
-
-    performance.mark("start")
-    for (let i = 0n; i < 250n; i++) {
-        keyPair.pubKey.encrypt(i);
-    }
-    performance.mark("end")
-
-    console.log(performance.measure("duration", "start", "end").duration)
-\end{minted}
-
-Performing 250 Paillier encrypts required 47,000ms. On the other hand, performing 250 RSA encrypts required just 40ms. Results are shown in \hyperref[table1]{Table~\ref*{table1}}.
-
-Some potential further optimisations to the implementation are considered below.
+Potential further optimisations to the implementation are considered below.

 \textbf{Caching.} As the main values being encrypted are 0 or 1, a peer could maintain a cache of encryptions of these values and transmit these instantly. Caching may be executed in a background "web worker". A concern is whether a peer may be able to execute a timing-related attack by first exhausting a peer's cache of a known plaintext value, and then requesting an unknown value and using the time taken to determine if the value was sent from the exhausted cache or not.

@ -1228,17 +1235,17 @@ Multi-round proofs combining set membership and graph isomorphism are among the

 \section{Domain}

-The protocols devised are effective in the target domain of online games. With multi-round proofs of 24 rounds, players can be reasonably confident that other players are not cheating. The chance of an undetected cheater in a single execution of
+The protocols devised are effective in the target domain of online games. With multi-round proofs using 24 rounds, players can be reasonably confident that other players are not cheating.

-For the most part, the protocols shown run in a time-frame that would not disrupt the experience, with the exception of the bit length proof. With additional work, this proof could be replaced with a Bulletproof \cite{bulletproofs}, which may use less bandwidth and perform faster.
+For the most part, the proposed protocols run in a time-frame that would not disrupt the experience, with the exception of the bit length proof. With additional work, this proof could be replaced with a Bulletproof \cite{bulletproofs}, which may use less bandwidth and perform faster.

 A large outstanding problem with the implementation is conflict resolution. Currently, if a player submits proofs that do not verify, other players simply ignore the message. However, a better solution would be to allow other players to remove a misbehaving player from the protocol.

 \section{Wider application}

-P2P software solutions have many benefits to end users: mainly being greater user freedom. I believe that the content presented here shows clear ways to extend P2P infrastructure, and reduce dependence on centralised services.
+P2P software solutions have many benefits to end users: mainly being greater user freedom. The content presented here shows clear ways to extend P2P infrastructure, and reduce dependence on centralised services.

-I propose some ideas which could build off the content here.
+We propose some further ideas which could build off the content here.

 \subsection{Larger scale games}

@ -1248,7 +1255,7 @@ The downsides of this are that the complexity of P2P networking is far greater t

 \subsection{Decentralised social media}

-The schemes presented here could be applies to the concept of a decentralised social media platform. Such a platform may use zero-knowledge proofs as a way to allow for "private" profiles: the content of a profile may stay encrypted, but zero-knowledge proofs could be used as a way to allow certain users to view private content in a manner that allows for repudiation, and disallows one user from sharing private content to unauthorised users.
+The schemes presented here could be applied to the concept of a decentralised social media platform. Such a platform may use zero-knowledge proofs as a way to allow for "private" profiles: the content of a profile may stay encrypted, but zero-knowledge proofs could be used as a way to allow certain users to view private content in a manner that allows for repudiation, and disallows one user from sharing private content to unauthorised users.

 To store data, IPFS could be used. IPFS is a P2P data storage protocol \cite{ipfs}. This poses an advantage that users can store their own data, but other users can mirror data to protect against outages or users going offline. The amount of effective storage would also grow as more users join the network.

@ -1262,26 +1269,26 @@ The ability to prove the contents of a dataset to a second party without guarant

 Another consideration in this domain is the use of homomorphic encryption schemes to allow a third party to process data without actually viewing the data. This protects the data from viewing by the third party, and the processing methods from viewing by the first party. For example, common statistical functions such as regression can be performed on data that is encrypted under fully homomorphic encryption schemes.

-\section{Limitations}
+\section{Limitations encountered}

-\subsection{Implementation}
+\subsection{JavaScript}

-JavaScript was the incorrect choice of language for this project. Whilst the event-based methodology was useful, I believe that JavaScript made development much more difficult.
+JavaScript was the incorrect choice of language for this project. Whilst the event-based methodology was useful, JavaScript overall made development much more difficult.

-JavaScript, in its most common implementations, is a slow language for number processing. Prime generation takes a considerable amount of time, and this extends to encryption and decryption being slower than in an implementation in an optimising compiled language.
+JavaScript, in its most common implementations, is a slow language for number processing. Prime generation takes a considerable amount of time, and this extends to encryption being slower than in an implementation in an optimising compiled language.

 \begin{table}[H]
 	\caption{Time to generate safe primes}
 	\begin{tabularx}{\hsize}{c *2{>{\Centering}X}}
 		\toprule
-		 & My implementation & \texttt{openssl dhparam 512} \\
+		 & Our implementation & \texttt{openssl dhparam 512} \\
 		\midrule
 		$|p| = 512$  & 8,660ms & 66ms \\
 		\bottomrule
 	\end{tabularx}
 \end{table}

-JavaScript's type system makes debugging difficult. It is somewhat obvious that this problem is far worse in systems with more interacting parts. TypeScript may have been a suitable alternative, but most likely the easiest solution was to avoid both and go with a language that was designed with stronger typing in mind from the outset.
+JavaScript's type system made debugging difficult. It is somewhat obvious that this problem is far worse in systems with more interacting parts. TypeScript could have been a suitable alternative, but most likely the easiest solution was to avoid both and go with a language that was designed with stronger typing in mind from the outset.

 JavaScript is an asynchronous, but single-threaded language: this means that the interpreter uses an event loop to handle new events \cite{mozdoc}. This introduces the possibility of race conditions despite no explicit threading being used. The asynchronous nature is beneficial to a degree, as it means that long-running code won't cause the WebSocket to close or block other communications from being processed. Using a language with explicit threading would allow for speed up in prime generation and proof construction, as these can be parallelised trivially.

@ -1291,7 +1298,7 @@ Using a language that can interact with the operating system would also have adv

 The P2P implementation requires more processing power and more bandwidth on each peer than a client-server implementation would. This is the main limitation of the P2P implementation. The program ran in a reasonable time, using a reasonable amount of resources on the computers I had access to, but these are not representative of the majority of computers in use today. Using greater processing power increases power consumption, which is undesirable. In a client-server implementation, the power consumption should be lower than the P2P implementation presented as no processing time is spent validating proofs or using the Paillier cryptosystem, which is less efficient than the hybrid cryptosystems used in standard online communication.

-\emph{Final word count: 9,190}
+\emph{Final word count: 9,355}

 \bibliography{Dissertation}
Author	SHA1	Message	Date
jude	a1fbbf5942	Actual final	2023-05-04 21:40:54 +01:00
jude	7e2c92d9c3	....	2023-05-04 13:37:44 +01:00