...

2023-04-06 20:42:24 +01:00
parent f645c234ce
commit eccf482192
6 changed files with 50 additions and 33 deletions
--- a/static/js/modules/crypto/paillier.js
+++ b/static/js/modules/crypto/paillier.js
@ -15,8 +15,14 @@ class Cyphertext {
        // Compute g^m by binomial theorem.
        let gm = (1n + key.n * plainText) % key.n ** 2n;

-        // Compute g^m r^n from crt
+        // Compute g^m r^n from crt.
        this.cyphertext = (gm * mod_exp(r, key.n, key.n ** 2n)) % key.n ** 2n;
+
+        // Force into range.
+        while (this.cyphertext < 0n) {
+            this.cyphertext += key.n ** 2n;
+        }
+
        this.r = r;
        this.pubKey = key;
        this.plainText = plainText;
--- a/static/js/modules/interface/main.js
+++ b/static/js/modules/interface/main.js
@ -22,6 +22,7 @@ window.addEventListener("beforeunload", () => {

 document.addEventListener("DOMContentLoaded", () => {
    socket = io();
+
    random = new Random();
    barrier = new Barrier();

@ -33,6 +34,8 @@ document.addEventListener("DOMContentLoaded", () => {
    });

    socket.on("message", async (packet) => {
+        window.console.log(`Received size: ${JSON.stringify(packet).length}`);
+
        let data = packet.payload;
        if (data.type !== "KEEPALIVE") window.console.log(data);

--- a/whitepaper/Bath-CS-Dissertation.sty
+++ b/whitepaper/Bath-CS-Dissertation.sty
@ -19,7 +19,7 @@
                        % Bibliography to appear in TOC
 \usepackage{microtype}  % Micro typography features
 \usepackage[margin=1in]{geometry}
-                        % Margins  
+                        % Margins

 %\usepackage{sfmath}                       % These two lines set the fonts to
 %\renewcommand{\familydefault}{\sfdefault} % sans-serif
@ -27,7 +27,7 @@
 \usepackage{natbib}                       % Bibliography and citations
 \newcommand*{\urlprefix}{Available from: }% in the Harvard-Bath style
 \newcommand*{\urldateprefix}{Accessed }   %
-\bibliographystyle{bathx}                 %
+\bibliographystyle{abbrv}                 %

 \pagestyle{headings}

@ -49,7 +49,7 @@
 \ifnum #1 = 0
 This dissertation may be made available for consultation within the University Library and may be photocopied or lent to other libraries for the purposes of consultation.
 \else
-This dissertation may not be consulted, photocopied or lent to other libraries without the permission of the author for #1 
+This dissertation may not be consulted, photocopied or lent to other libraries without the permission of the author for #1
 \ifnum #1 = 1
 year
 \else
--- a/whitepaper/Dissertation.bib
+++ b/whitepaper/Dissertation.bib
@ -254,6 +254,15 @@ doi={10.1109/SP.2014.36}}
    howpublished = {\url{https://github.com/tc39/proposal-bigint}},
 }

+@misc{msgpack,
+    author = {msgpack},
+    title = {MessagePack: Spec},
+    year = {2021},
+    publisher = {GitHub},
+    journal = {GitHub repository},
+    howpublished = {\url{https://github.com/msgpack/msgpack}},
+}
+
@article{RABIN1980128,
    title = {Probabilistic algorithm for testing primality},
    journal = {Journal of Number Theory},
@ -283,24 +292,24 @@ doi={10.1109/SP.2014.36}}

@article{Shor_1997,
 	doi = {10.1137/s0097539795293172},
-	
+
 	url = {https://doi.org/10.1137%2Fs0097539795293172},
-	
+
 	year = 1997,
 	month = {oct},
-	
+
 	publisher = {Society for Industrial {\&} Applied Mathematics ({SIAM})},
-	
+
 	volume = {26},
-	
+
 	number = {5},
-	
+
 	pages = {1484--1509},
-	
+
 	author = {Peter W. Shor},
-	
+
 	title = {Polynomial-Time Algorithms for Prime Factorization and Discrete Logarithms on a Quantum Computer},
-	
+
 	journal = {{SIAM} Journal on Computing}
 }

--- a/whitepaper/Dissertation.pdf
+++ b/whitepaper/Dissertation.pdf
--- a/whitepaper/Dissertation.tex
+++ b/whitepaper/Dissertation.tex
@ -59,11 +59,11 @@ Without patching the executables, there is no way for a user to run their own se

 In peer-to-peer (P2P) networks, traffic may be routed directly to other peers, or servers may be operated by third parties (sometimes called "federated networks"). This form of communication is still popular in certain games or services, for example BitTorrent is primarily a P2P service; and titles from the Counter-Strike series are federated, with a wide selection of third party hosts.

-The main advantage of peer-to-peer networks over centralised networks is longevity. Games such as Unreal Tournament 99 (which is federated) still have playable servers, as the servers are community-run, and so as long as people still wish to play the game, they will remain online (despite the original developers no longer making any profit from the title) \citep{eatsleeput.com_2022}.
+The main advantage of peer-to-peer networks over centralised networks is longevity. Games such as Unreal Tournament 99 (which is federated) still have playable servers, as the servers are community-run, and so as long as people still wish to play the game, they will remain online (despite the original developers no longer making any profit from the title) \cite{eatsleeput.com_2022}.

 However, security can often be worse in fully peer-to-peer networks than that of fully centralised networks. Peers may send malicious communications, or behave in ways that violate the general rules of the service. As there is no trusted server, there is no easy way to validate communications to prevent peers from cheating.

-Some peer-to-peer services try to address issues with security. In file-sharing protocols such as BitTorrent, a tracker supplies hashes of the file pieces to validate the file being downloaded \citep{cohen_2017}. However, the downside of this approach is that a trusted party (in this case the tracker) is still required. A malicious tracker could supply bad hashes, or an outdated tracker may expose the peer to security vulnerabilities.
+Some peer-to-peer services try to address issues with security. In file-sharing protocols such as BitTorrent, a tracker supplies hashes of the file pieces to validate the file being downloaded \cite{cohen_2017}. However, the downside of this approach is that a trusted party (in this case the tracker) is still required. A malicious tracker could supply bad hashes, or an outdated tracker may expose the peer to security vulnerabilities.

 \subsection{Untrusted setups}

@ -183,7 +183,7 @@ Adjacency proofs are necessary to ensure that players move units fairly.

 Zerocash is a ledger system that uses zero-knowledge proofs to ensure consistency and prevent cheating. Ledgers are the main existing use case of zero-knowledge proofs, and there are some limited similarities between ledgers and Risk in how they wish to obscure values of tokens within the system.

-\emph{Publicly-verifiable preprocessing zero-knowledge succinct non-interactive arguments of knowledge} (zk-SNARKs) are the building blocks of Zerocash \citep{6956581}, and its successor Zcash. A zk-SNARK consists of three algorithms: \texttt{KeyGen}, \texttt{Prove}, \texttt{Verify}.
+\emph{Publicly-verifiable preprocessing zero-knowledge succinct non-interactive arguments of knowledge} (zk-SNARKs) are the building blocks of Zerocash \cite{6956581}, and its successor Zcash. A zk-SNARK consists of three algorithms: \texttt{KeyGen}, \texttt{Prove}, \texttt{Verify}.

 These are utilised to construct and verify transactions called \texttt{POUR}s. A \texttt{POUR} takes, as input, a certain "coin", and splits this coin into multiple outputs whose values are non-negative and sum to the same value as the input. The output coins may also be associated with different wallet addresses.

@ -205,14 +205,14 @@ So, using some such scheme to obscure edge weights should enable verification of

 In the presented algorithms, interaction is performed fairly constantly, leading to a large number of communications. This will slow the system considerably, and make proofs longer to perform due to network latency.

-An alternative general protocol is the $\Sigma$-protocol \citep{groth2004honest}. In the $\Sigma$-protocol, three communications occur: \begin{itemize}
+An alternative general protocol is the $\Sigma$-protocol \cite{groth2004honest}. In the $\Sigma$-protocol, three communications occur: \begin{itemize}
    \item The prover sends the conjecture.
    \item The verifier sends a random string.
    \item The prover sends some proofs generated using the random string.
 \end{itemize}
 This reduces the number of communications to a constant, even for varying numbers of challenges.

-The Fiat-Shamir heuristic \citep{fiatshamir} provides another method to reduce communication by constructing non-interactive zero-knowledge proofs using a random oracle. For ledgers, non-interactive zero-knowledge proofs are necessary, as the ledger must be resilient to a user going offline. However, in our case, users should be expected to stay online for an entire session of Risk, and each session is self-contained. So this full transformation is not necessary.
+The Fiat-Shamir heuristic \cite{fiatshamir} provides another method to reduce communication by constructing non-interactive zero-knowledge proofs using a random oracle. For ledgers, non-interactive zero-knowledge proofs are necessary, as the ledger must be resilient to a user going offline. However, in our case, users should be expected to stay online for an entire session of Risk, and each session is self-contained. So this full transformation is not necessary.

 \subsubsection{Set membership proofs}

@ -230,7 +230,7 @@ Defined by \cite{10.1007/3-540-48285-7_24}, accumulators form a subset of one-wa

 \subsubsection{Merkle trees}

-Merkle trees \citep{merkle} provide an alternative way of proving set membership, that is more space efficient than accumulators, and doesn't require special hashing functions (any one-way function will work). A Merkle tree stores the hashes of some data in the leaf nodes, and each node above stores the hash of the two nodes below it. The commitment is then the hash of the topmost node.
+Merkle trees \cite{merkle} provide an alternative way of proving set membership, that is more space efficient than accumulators, and doesn't require special hashing functions (any one-way function will work). A Merkle tree stores the hashes of some data in the leaf nodes, and each node above stores the hash of the two nodes below it. The commitment is then the hash of the topmost node.

 With this scheme, the data stored in the leaf nodes is totally obscured. However, the constructor of the tree can demonstrate to another user the presence of some data in the tree by revealing the hashes of a subset of the other nodes in the tree. They can also reveal the tree's structure without revealing any contents by revealing all hashes constituting the tree.

@ -244,7 +244,7 @@ To overcome this issue we want to devise some zero-knowledge system for proving

 \cite{10.1007/978-3-540-89255-7_15} demonstrates how blind signatures can be used to construct zero-knowledge set membership proofs for some element $\sigma$ in a public set $\Phi$, using pairing-based cryptography.

-Blind signatures can also be performed with RSA \citep{bellare2003one}. In RSA-based blind signatures, the signing party computes primes $p_A, q_A$ and exponents $d, e$ such that $(m^d)^e \equiv m \mod p_Aq_A$. The 2-tuple $(p_Aq_A, e)$ is the public key, and is released publicly. The other party computes a random value $R$, and computes and publishes $B = m \cdot R^e \mod p_Aq_A$ for some message $m$. The signing party then replies with $B^d = (m \cdot R^e)^d \equiv m^d \cdot R \mod p_Aq_A$, so that the other party can then extract $m^d$ as $R$ is known only to them. Due to the discrete logarithm problem, determining the signing key $d$ from this is not computationally feasible. Similarly, it is not feasible for the signer to determine $m$, as $R$ is not known to them.
+Blind signatures can also be performed with RSA \cite{bellare2003one}. In RSA-based blind signatures, the signing party computes primes $p_A, q_A$ and exponents $d, e$ such that $(m^d)^e \equiv m \mod p_Aq_A$. The 2-tuple $(p_Aq_A, e)$ is the public key, and is released publicly. The other party computes a random value $R$, and computes and publishes $B = m \cdot R^e \mod p_Aq_A$ for some message $m$. The signing party then replies with $B^d = (m \cdot R^e)^d \equiv m^d \cdot R \mod p_Aq_A$, so that the other party can then extract $m^d$ as $R$ is known only to them. Due to the discrete logarithm problem, determining the signing key $d$ from this is not computationally feasible. Similarly, it is not feasible for the signer to determine $m$, as $R$ is not known to them.

 RSA blinding can incur a security risk, as by using the same keys to sign and encrypt, a player can be tricked into revealing their private key through a chosen-plaintext attack.

@ -275,9 +275,9 @@ RSA keys are accepted by peers on a first-seen basis.
 Paillier requires the calculation of two large primes for the generation of public and private key pairs. ECMAScript typically stores integers as floating point numbers, giving precision up to $2^{53}$. This is clearly inappropriate for the generation of sufficiently large primes.

 In 2020,
-ECMAScript introduced \texttt{BigInt} \citep{tc39}, which are, as described in the spec, "arbitrary precision integers". Whilst this does not hold true in common ECMAScript implementations (such as Chrome's V8), these "big integers" still provide sufficient precision for the Paillier cryptosystem, given some optimisations and specialisations are made with regards to the Paillier algorithm and in particular the modular exponentiation operation.
+ECMAScript introduced \texttt{BigInt} \cite{tc39}, which are, as described in the spec, "arbitrary precision integers". Whilst this does not hold true in common ECMAScript implementations (such as Chrome's V8), these "big integers" still provide sufficient precision for the Paillier cryptosystem, given some optimisations and specialisations are made with regards to the Paillier algorithm and in particular the modular exponentiation operation.

-It must be noted that \texttt{BigInt} is inappropriate for cryptography in practice, due to the possibility of timing attacks as operations are not necessarily constant time \citep{tc39}. In particular, modular exponentiation is non-constant time, and operates frequently on secret data. A savvy attacker may be able to use this to leak information about an adversary's private key.
+It must be noted that \texttt{BigInt} is inappropriate for cryptography in practice, due to the possibility of timing attacks as operations are not necessarily constant time \cite{tc39}. In particular, modular exponentiation is non-constant time, and operates frequently on secret data. A savvy attacker may be able to use this to leak information about an adversary's private key.

 \subsection{Modular exponentiation}

@ -289,7 +289,7 @@ The number of operations is dependent primarily on the size of the exponent. For

 I chose to use primes of length 2048 bits. This is a typical prime size for public-key cryptography, as this generates a modulus $n = pq$ of length 4096 bits.

-Generating these primes is a basic application of the Rabin-Miller primality test \citep{RABIN1980128}. This produces probabilistic primes, however upon completing sufficiently many rounds of verification, the likelihood of these numbers actually not being prime is dwarfed by the likelihood of hardware failure.
+Generating these primes is a basic application of the Rabin-Miller primality test \cite{RABIN1980128}. This produces probabilistic primes, however upon completing sufficiently many rounds of verification, the likelihood of these numbers actually not being prime is dwarfed by the likelihood of hardware failure.

 \subsection{Public key}

@ -430,7 +430,7 @@ Instead of proving a value is within a range, the prover will demonstrate that a
 	\end{enumerate}
 \end{protocol}

-This protocol has the following properties, given that the proof of zero from before also holds the same properties \citep{damgard2003}.
+This protocol has the following properties, given that the proof of zero from before also holds the same properties \cite{damgard2003}.
 \begin{itemize}
 	\item \textbf{Complete.} The verifier will clearly always accept $S$ given that $S$ is valid.
 	\item \textbf{Sound.} A cheating prover will trick a verifier with probability $2^{-t}$. So select a sufficiently high $t$.
@ -522,13 +522,11 @@ Random values are used in two places. \begin{itemize}
 	\item Rolling dice.
 \end{itemize}

-\subsection{Web as a platform}
-
 \section{Review}

 \subsection{Random oracles}

-Various parts of the implementation use the random oracle model: in particular, the zero-knowledge proof sections. The extent to which the random oracle model is used is in the construction of truly random values that will not reveal information about the prover's state. In practice, a cryptographically secure pseudo random number generator will suffice for this application, as CSPRNGs typically incorporate environmental data to ensure outputs are unpredictable \citep{random(4)}.
+Various parts of the implementation use the random oracle model: in particular, the zero-knowledge proof sections. The extent to which the random oracle model is used is in the construction of truly random values that will not reveal information about the prover's state. In practice, a cryptographically secure pseudo random number generator will suffice for this application, as CSPRNGs typically incorporate environmental data to ensure outputs are unpredictable \cite{random(4)}.

 \subsection{Efficiency}

@ -538,13 +536,13 @@ Paillier ciphertexts are constant size, each $\sim$1.0kB in size (as they are ta

 The proof of zero uses two Paillier ciphertexts, a challenge of size 2048 bits, and a proof statement of size 4096 bits. In total, this is $\sim$2.8kB. These are constant size, and since they run in a single round, take constant time.

-On the other hand, \hyperref[protocol1]{Protocol~\ref*{protocol1}} requires multiple rounds. Assume that we use 42 rounds: this provides an acceptable level of soundness, with a cheat probability of $\left(\frac{1}{2}\right)^{-42} \approx 2.3 \times 10^{-13}$. Additionally, assume that there are 10 regions to verify. Each round then requires ten Paillier ciphertexts alongside ten proofs of zero. This results in a proof size of $\sim$1.7MB. Whilst this is still within current memory limitations, the network cost is extreme.
+On the other hand, \hyperref[protocol1]{Protocol~\ref*{protocol1}} requires multiple rounds. Assume that we use 42 rounds: this provides an acceptable level of soundness, with a cheat probability of $\left(\frac{1}{2}\right)^{-42} \approx 2.3 \times 10^{-13}$. Additionally, assume that there are 10 regions to verify. Each round then requires ten Paillier ciphertexts alongside ten proofs of zero. This results in a proof size of $\sim$1.7MB. Whilst this is still within current memory limitations, the network cost is extreme; and this value may exceed what can be reasonably operated on within a processor's cache.

 This could be overcome by reducing the number of rounds, which comes at the cost of increasing the probability of cheating. In a protocol designed to only facilitate a single game session, this may be acceptable to the parties involved. For example, reducing the number of rounds to 19 will increase the chance of cheating to $\left(\frac{1}{2}\right)^{-19} \approx 1.9 \times 10^{-6}$, but the size would reduce considerably to $\sim$770kB.

-Another potential solution is to change the second challenge's structure to only verify a single random ciphertext each turn. This would approximately half the amount of data required at the expense of soundness. However, the advantage over lowering the number of rounds is that the change in soundness is dependent on the number of items in the set being verified. By a tactful selection, a high probability of soundness can still be maintained.
+This is all in an ideal situation without compression: in the implementation presented, the serialisation of a ciphertext is larger than this, since it serialises to a string of the hexadecimal representation. Compression shouldn't be expected to make a considerable difference, as the ciphertexts should appear approximately random.

-To see this, we consider the ways in which a prover may try to "cheat" a proof. The prover wishes to submit a set $S$ which contains negative values, so the sum of the values is still 1 but multiple non-zero values exist. %todo
+The size of the proof of zero communication is, in total, $3290 + 1744 + 2243$ characters, i.e $\sim$7.3kB. This is about 2-3 times larger than the ideal size. A solution to this is to use a more compact format, for example msgpack \cite{msgpack} (which also has native support for binary literals).

 \subsubsection{Time complexity}

@ -574,8 +572,7 @@ This is not without its downsides: I found that the complexity of P2P networking

 \subsection{Decentralised social media}

-\cite{damgard2003} uses Paillier to implement electronic voting. Whilst electronic voting for political candidates has fallen out of favour, %todo cite
-the schemes could still be useful in the concept of a decentralised social media platform. Such a platform may use ZKPs as a way to allow for "private" profiles: the content of a profile may stay encrypted, but ZKPs could be used as a way to allow certain users to view private content in a manner that allows for repudiation, and disallows one user from sharing private content to unauthorised users.
+The schemes presented here and in \cite{damgard2003} could be applies to the concept of a decentralised social media platform. Such a platform may use ZKPs as a way to allow for "private" profiles: the content of a profile may stay encrypted, but ZKPs could be used as a way to allow certain users to view private content in a manner that allows for repudiation, and disallows one user from sharing private content to unauthorised users.

 The obvious issue is P2P data storage. Users could host their own platforms, but this tends to lead to low adoption due to complexity for normal people. IPFS is a P2P data storage protocol that could be considered. This poses an advantage that users can store their own data, if they have a large amount, but other users can mirror data effectively to protect against outages. The amount of storage can grow effectively as more users join the network.

@ -589,7 +586,7 @@ Another consideration in this domain is the use of fully-homomorphic encryption

 Finally, I present the limitations that I encountered.

-\subsection{JavaScript related}
+\subsection{JavaScript}

 To summarise, JavaScript was the incorrect choice of language for this project. Whilst the event-based methodology was useful, I believe overall that JavaScript hampered development.

@ -601,6 +598,8 @@ JavaScript's type system makes debugging difficult. It is somewhat obvious that

 Peer-to-peer programming requires a lot more care than client-server programming. This makes development far slower and far more bug-prone. As a simple example, consider the action of taking a turn in Risk. In the peer-to-peer implementation presented, each separate peer must keep track of how far into a turn a player is, check if a certain action would end their turn (or if its invalid), contribute in verifying proofs, and contribute in generating randomness for dice rolls. In a client-server implementation, the server would be able to handle a turn by itself, and could then propagate the results to the other clients in a single predictable request.

+The use of big integers leads to peculiar issues relating to signedness. This is in some ways a JavaScript issue, but would also be true in other languages. Taking modulo $n$ of a negative number tends to return a negative number, rather than a number squashed into the range $[0, n]$. This leads to inconsistencies when calculating the GCD or finding Bezout coefficients. In particular, this became an issue when trying to validate proofs of zero, as the GCD returned $-1$ rather than $1$ in some cases. Resolving this simply required changing the update and encrypt functions to add the modulus until the representation of the ciphertext was signed correctly. Whilst the fix for this was simple, having to fix this in the first place is annoying, and using a non-numerical type (such as a byte stream) may resolve this in general.
+
 \subsection{Resources}

 The peer-to-peer implementation requires more processing power and more bandwidth on each peer than a client-server implementation would. This is the main limitation of the peer-to-peer implementation. The program ran in a reasonable time, using a reasonable amount of resources on the computers I had access to, but these are not representative of the majority of people. Using greater processing power increases power consumption, which is definitely undesirable. In a client-server implementation, even with an extra computer, I predict that the power consumption should be lower than the peer-to-peer implementation presented. %todo justify