diff --git a/whitepaper/Dissertation.bib b/whitepaper/Dissertation.bib index 63f9245..4f96086 100644 --- a/whitepaper/Dissertation.bib +++ b/whitepaper/Dissertation.bib @@ -285,6 +285,15 @@ howpublished = {\url{https://zcash.readthedocs.io/en/latest/rtd_pages/basics.htm howpublished = {\url{https://github.com/ipfs/specs}}, } +@misc{unciv, + author = {Morgenstern, Yair}, + title = {Unciv - {Civ V} remake for {Android} \& {Desktop}}, + year = {2023}, + publisher = {GitHub}, + journal = {GitHub repository}, + howpublished = {\url{https://github.com/yairm210/Unciv}}, +} + @misc{msgpack, author = {msgpack}, title = {{MessagePack}: Spec}, @@ -532,4 +541,17 @@ title = {Frequently Asked Questions | Soulseek}, howpublished = {\url{http://www.soulseekqt.net/news/faq-page#t10n606}} } -@ \ No newline at end of file +@misc{mozdoc, +author = {Mozilla}, +title = {{JavaScript} language overview}, +howpublished = {\url{https://developer.mozilla.org/en-US/docs/Web/JavaScript/Language_Overview}}} + +@misc{matrix, + author = {{Matrix Spec Core Team}}, + title = {{Matrix Specification}}, + howpublished = {\url{https://spec.matrix.org/latest}}} + +@misc{xmpp, + author = {XMPP}, + title = {XMPP Specifications}, + howpublished = {\url{https://xmpp.org/extensions/}}} \ No newline at end of file diff --git a/whitepaper/Dissertation.pdf b/whitepaper/Dissertation.pdf index ffb254d..98a83fd 100644 Binary files a/whitepaper/Dissertation.pdf and b/whitepaper/Dissertation.pdf differ diff --git a/whitepaper/Dissertation.tex b/whitepaper/Dissertation.tex index 1f39dff..e4da6e6 100644 --- a/whitepaper/Dissertation.tex +++ b/whitepaper/Dissertation.tex @@ -2,13 +2,6 @@ % !TeX TXS-program:compile = txs:///pdflatex/[--shell-escape] \documentclass[12pt,a4paper]{report} -%todo writing style stream of concisense -% move stuff to lit review and explain/justify -% finish implementation -% requirements spec, clear evaluation of security with links to proofs etc. -% clear achievements of content in regards to goals -% reflect on achievemtns, difficulties, novelty etc. - \usepackage{algorithmicx} \usepackage{algpseudocode} \usepackage[toc,page]{appendix} @@ -58,12 +51,13 @@ \maketitle -\chapter*{} - -\section*{Abstract} - +\abstract We present a modern implementation of the Paillier cryptosystem for the browser, using Jurik's form to optimise encryption. Furthermore, we present an application of this cryptosystem with zero-knowledge proofs to enable peers to maintain private state in a peer-to-peer implementation of the board game Risk. Use of novel zero-knowledge proofs enables peers to verify that the actions of other players adhere to the game's rules without learning additional information about their private state. Finally, we present benchmarks of the implementation. +\tableofcontents + +\chapter*{} + \section*{Disambiguation} \begin{table}[htp] @@ -88,7 +82,7 @@ We present a modern implementation of the Paillier cryptosystem for the browser, \end{tabularx} \end{table} -\vspace*{0.05\paperheight} +\vspace*{0.1\paperheight} \begin{center} "Never create anything, it will be misinterpreted, it will chain you and follow you for the rest of your life." - Hunter S. Thompson @@ -265,6 +259,8 @@ In general, this approach uses a decomposition of the plaintext message $m$ into Some cryptosystems admit an additive homomorphic property: that is, given the public key and two encrypted values $\sigma_1 = E(m_1), \sigma_2 = E(m_2)$, the value $\sigma_1 + \sigma_2 = E(m_1 + m_2)$ is the ciphertext of the underlying operation. +\subsection{Paillier cryptosystem} + The Paillier cryptosystem, which is based on composite residuosity classes, express the additive homomorphic property \cite{paillier1999public}. This is due to the structure of ciphertexts in the Paillier cryptosystem. A public key is of structure $(n, g)$, where $n$ is the product of two large primes and $g$ is a generator of $\mathbb{Z}^*_n$. Under the public key, the encryption $c$ of a message $m$ is computed as \begin{align*} c = g^mr^n \mod n^2 \end{align*} @@ -880,6 +876,8 @@ Validating $E(m)$ is done with the proof of zero. Then it remains to prove that The downside of this proof over the BCDG proof \cite{bcdg1987} is that the time to perform and verify this proof grows linearly with $|m|$. However, in most cases $|m|$ should be "small": i.e, $|m| \leq 6$, as Risk unit counts rarely exceed 64 on a single region. In fact, to prevent revealing additional information from the bit length, this protocol in practice will pad out numbers to a minimum of 8 bits, which in our application makes this protocol constant time. +This proof is still sound, complete, and supposing sufficient padding is used, zero-knowledge. The soundness depends on the number of rounds $t$ performed. Cheating this protocol requires that the prover transmits some $c_j \notin \{ 0, 1 \}$ such that the product computed in step 1 is still zero. Then, there is a $\frac{1}{2}$ chance that, when testing $c_j$ in step 2, the challenge selected by the verifier will be failed. This gives a probability of cheating as $2^{-t}$ as before. + Range proof is used in points (3), (4), and (5). In (3), this is to convince other players that the number of units is sufficient for the action. In (4), this is to show that the region is not totally depleted. In (5), this is to ensure the number of units being fortified is less than the strength of the region. All of these are performed using \hyperref[protocol4]{Protocol~\ref*{protocol4}} and by using the additive homomorphic property to subtract the lower range from $m$ first. \subsection{Proving fortifications} @@ -976,7 +974,7 @@ Point (5) still remains, as the range proof alone only works to prevent negative \caption{Valid and invalid fortify messages.} \end{figure} -We combine some ideas from the graph isomorphism proofs with ideas from before to get the following protocol. +We combine some ideas from the graph isomorphism proofs with set bijection proofs from before to get the following protocol. \begin{protocol}\label{protocol3} The prover transmits the set \begin{align*} @@ -988,10 +986,10 @@ We combine some ideas from the graph isomorphism proofs with ideas from before t \begin{enumerate} \item Prover transmits $\{ (\psi(R_i), E(-n_i, r_i^*)) \mid 0 < i \leq N \}$ where $\psi$ is a random bijection on the regions, and $\{ H(R_i, R_j, s_{ij}) \mid R_i \text{ neighbours } R_j \}$ where $s_{ij}$ is a random salt. - \item Verifier chooses a random $c \in \{0, 1\}$. \begin{enumerate} - \item If $c = 0$, the verifier requests the definition of $\psi$ and each salt. They check that the resulting graph is isomorphic to the original graph. They then compute $E(n_i, r_i) \cdot E(-n_i, r_i^*)$ for each $i$ and request a proof that each is zero. Finally, they compute each edge hash and check that there are precisely the correct number of hashes. + \item Verifier chooses $a \in_R \{0, 1\}$. \begin{enumerate} + \item If $a = 0$, the verifier requests the definition of $\psi$ and each salt. They check that the resulting graph is isomorphic to the original graph. They then compute $E(n_i, r_i) \cdot E(-n_i, r_i^*)$ for each $i$ and request a proof that each is zero. Finally, they compute each edge hash and check that there are precisely the correct number of hashes. - \item If $c = 1$, the verifier requests proofs that $|S| - 2$ elements are zero and that the remaining pair add to zero. They then request the salt used to produce the hash along the edge joining the two non-zero elements, and test that this hash is correct. + \item If $a = 1$, the verifier requests proofs that $|S| - 2$ elements are zero and that the remaining pair add to zero. They then request the salt used to produce the hash along the edge joining the two non-zero elements, and test that this hash is correct. \end{enumerate} \end{enumerate} \end{protocol} @@ -1002,8 +1000,12 @@ It is preferred that these proofs can be performed with only a few communication We can apply the Fiat-Shamir heuristic to make proofs of zero non-interactive \cite{fiatshamir}. In place of a random oracle, we use a cryptographic hash function. We take the hash of some public parameters to prevent cheating by searching for some values that hash in a preferable manner. In this case, selecting $e = H(g, m, a)$ is a valid choice. To get a hash of desired length, an extendable output function such as SHAKE256 can be used \cite{FIPS202}. The library jsSHA \cite{jssha} provides an implementation of SHAKE256 that works within a browser. +We can then apply the Fiat-Shamir heuristic to each higher-level protocol to make them non-interactive. The proving party first generates $t$ proofs (this can be done independently of verification) and serialises them to JSON. Then, use SHA-3 to compute a $t$-bit hash of the serialised proofs, and use each bit of this hash as the verifier's $a$. It is assumed that the prover cannot compute preimages of SHA-3, so they cannot control which challenges will be "requested". This makes it computationally infeasible for the prover to cheat. + \subsection{Application to domain} +We finally present a diagram showing how each protocol presented ties into the domain. We highlight the interactions between two particular players: Player 1 is the current player, and Player 2 controls a region neighbouring a region of Player 1. + \begin{figure}[H] \centering \begin{tikzpicture}[every node/.append style={ @@ -1012,7 +1014,7 @@ We can apply the Fiat-Shamir heuristic to make proofs of zero non-interactive \c minimum height=20pt}] \node[draw,rectangle] (P1) at (0,-0.5) {Player 1}; - \node[draw,rectangle] (V) at (6,-0.5) {World}; + \node[draw,rectangle] (V) at (6,-0.5) {Network}; \node[draw,rectangle] (P2) at (12,-0.5) {Player 2}; \draw [very thick] (P1)-- (0,-15); @@ -1057,7 +1059,7 @@ We can apply the Fiat-Shamir heuristic to make proofs of zero non-interactive \c \fill (6,-15) circle [radius=2pt] ; \fill (12,-15) circle [radius=2pt] ; \end{tikzpicture} - \caption{An example turn during the game incorporates each of the protocols presented above, some multiple times.} + \caption{An example turn in Risk incorporates each of the protocols presented above.} \end{figure} \chapter{Review} @@ -1129,15 +1131,11 @@ Theoretic timing results versus RSA are backed experimentally by the implementat Performing 250 Paillier encrypts required 47,000ms. On the other hand, performing 250 RSA encrypts required just 40ms. Results are shown in \hyperref[table1]{Table~\ref*{table1}}. -The speed of decryption is considerably less important in this circumstance, as Paillier ciphertexts are not decrypted during the execution of the program. +Some potential further optimisations to the implementation are considered below. -Some potential further optimisations to the implementation are as follows. +\textbf{Caching.} As the main values being encrypted are 0 or 1, a peer could maintain a cache of encryptions of these values and transmit these instantly. Caching may be executed in a background "web worker". A concern is whether a peer may be able to execute a timing-related attack by first exhausting a peer's cache of a known plaintext value, and then requesting an unknown value and using the time taken to determine if the value was sent from the exhausted cache or not. -\textbf{Caching.} As the main values being encrypted are 0 or 1, a peer could maintain a cache of encryptions of these values and transmit these instantly. Caching may be executed in a background "web worker". A consideration is whether a peer may be able to execute a timing-related attack by first exhausting a peer's cache of a known plaintext value, and then requesting an unknown value and using the time taken to determine if the value was sent from the exhausted cache or not. - -\textbf{Smaller key size.} The complexity of Paillier encryption increases with key size. Using a smaller key could considerably reduce the time taken \cite{paillier1999public}. - -I tested this on top of the alternative Paillier scheme from above. This resulted in linear reductions in encryption time: encryption under a 1024-bit modulus took a sixth of the amount of time as under a 2048-bit modulus, and encryption under a 2048-bit modulus took a sixth of the amount of time as under a 4096-bit modulus. +\textbf{Smaller key size.} The complexity of Paillier encryption increases with key size. Using a smaller key considerably reduces the time complexity \cite{paillier1999public}. \textbf{Vectorised plaintexts.} The maximum size of a plaintext is $|n|$: in our case, this is 4096 bits. By considering this as a vector of 128 32-bit values, peers could use a single ciphertext to represent their entire state. This process is discussed as a way to allow embedded devices to use Paillier encryption \cite{10.1145/2809695.2809723}. @@ -1230,7 +1228,7 @@ Multi-round proofs combining set membership and graph isomorphism are among the \section{Domain} -The protocols devised are effective in the target domain of online games. With multi-round proofs of 24 rounds, players can be reasonably confident that other players are not cheating. +The protocols devised are effective in the target domain of online games. With multi-round proofs of 24 rounds, players can be reasonably confident that other players are not cheating. The chance of an undetected cheater in a single execution of For the most part, the protocols shown run in a time-frame that would not disrupt the experience, with the exception of the bit length proof. With additional work, this proof could be replaced with a Bulletproof \cite{bulletproofs}, which may use less bandwidth and perform faster. @@ -1244,7 +1242,7 @@ I propose some ideas which could build off the content here. \subsection{Larger scale games} -Many other games exist that the ideas presented could be applied to. Games of larger scale with a similar structure, such as Unciv, could benefit from P2P networking implemented in a similar manner. In particular, similar protocols to \hyperref[protocol4]{Protocol~\ref*{protocol4}} would form an intrinsic part of such games, as they have a similar graph structure which requires guarantees of adjacency for many actions. +Many other games exist that the ideas presented could be applied to. Games of larger scale with a similar structure, such as Unciv \cite{unciv}, could benefit from P2P networking implemented similarly. In particular, similar protocols to \hyperref[protocol4]{Protocol~\ref*{protocol4}} would form an intrinsic part of such games, as they have a similar graph structure which requires guarantees of adjacency for many actions. The downsides of this are that the complexity of P2P networking is far greater than in a centralised model. This would be a considerable burden on the developers, and could hurt the performance of such a game. Additionally, some modern routers no longer support NAT hole-punching or UPnP due to security concerns \cite{upnp}, which makes accessing P2P services more difficult for end users. @@ -1254,9 +1252,9 @@ The schemes presented here could be applies to the concept of a decentralised so To store data, IPFS could be used. IPFS is a P2P data storage protocol \cite{ipfs}. This poses an advantage that users can store their own data, but other users can mirror data to protect against outages or users going offline. The amount of effective storage would also grow as more users join the network. -Decentralised platforms promote user privacy, as users can control their own data. Additionally, decentralised platforms promote standardisation of common operations such as instant messaging. This can include end-to-end encryption, and so confidentiality is then a choice of the user rather than the platform, and the consequences of backdoors or legislation targetting platforms is reduced. +Decentralised platforms promote user privacy, as users can control their own data. Additionally, decentralised platforms promote standardisation of common operations such as instant messaging. This can include end-to-end encryption, and so confidentiality is then a choice of the user rather than the platform. Furthermore, the consequences of security issues in individual configurations or legislation targetting platforms is reduced. -Some P2P messaging standards already coexist that could be used here, for example Matrix and XMPP. +Some P2P messaging standards already coexist that could be used here, for example Matrix and XMPP \cite{matrix,xmpp}. \subsection{Handling of confidential data} @@ -1266,11 +1264,11 @@ Another consideration in this domain is the use of homomorphic encryption scheme \section{Limitations} -\subsection{JavaScript} +\subsection{Implementation} JavaScript was the incorrect choice of language for this project. Whilst the event-based methodology was useful, I believe that JavaScript made development much more difficult. -JavaScript is a slow language. Prime generation takes a considerable amount of time, and this extends to encryption and decryption being slower than in an implementation in an optimising compiled language. +JavaScript, in its most common implementations, is a slow language for number processing. Prime generation takes a considerable amount of time, and this extends to encryption and decryption being slower than in an implementation in an optimising compiled language. \begin{table}[H] \caption{Time to generate safe primes} @@ -1285,15 +1283,15 @@ JavaScript is a slow language. Prime generation takes a considerable amount of t JavaScript's type system makes debugging difficult. It is somewhat obvious that this problem is far worse in systems with more interacting parts. TypeScript may have been a suitable alternative, but most likely the easiest solution was to avoid both and go with a language that was designed with stronger typing in mind from the outset. -JavaScript is a re-entrant language: this means that the interpreter does not expose threads or parallelism to the developer, but it may still use threads under-the-hood and switch contexts to handle new events. This introduces the possibility of race conditions despite no explicit threading being used. The re-entrant nature is however beneficial to a degree, as it means that long-running code won't cause the WebSocket to close or block other communications from being processed. +JavaScript is an asynchronous, but single-threaded language: this means that the interpreter uses an event loop to handle new events \cite{mozdoc}. This introduces the possibility of race conditions despite no explicit threading being used. The asynchronous nature is beneficial to a degree, as it means that long-running code won't cause the WebSocket to close or block other communications from being processed. Using a language with explicit threading would allow for speed up in prime generation and proof construction, as these can be parallelised trivially. -Using a language that can interact with the operating system would have further advantages, as key generation can be performed by standard tools such as OpenSSL and stored in the system keychain, and features like SIMD could be utilised for parallelism. +Using a language that can interact with the operating system would also have advantages, as key generation can be performed by standard tools such as OpenSSL and stored in the system keychain, and platform features such as SIMD could be utilised for parallelism. \subsection{Resources} The P2P implementation requires more processing power and more bandwidth on each peer than a client-server implementation would. This is the main limitation of the P2P implementation. The program ran in a reasonable time, using a reasonable amount of resources on the computers I had access to, but these are not representative of the majority of computers in use today. Using greater processing power increases power consumption, which is undesirable. In a client-server implementation, the power consumption should be lower than the P2P implementation presented as no processing time is spent validating proofs or using the Paillier cryptosystem, which is less efficient than the hybrid cryptosystems used in standard online communication. -\emph{Final word count: 9,166} +\emph{Final word count: 9,190} \bibliography{Dissertation}