Fast prime cluster search - or building a fast Riecoin miner (part 1)

Introduction

In the wake of Bitcoin's success, many "alternative" cryptocurrencies, or "alt-coins" have emerged that change various aspects of the coin in an attempt to improve upon Bitcoin.  (We'll ignore the many that have emerged with the goal of simply duplicating it to put money in their creators' pockets).

I've been particularly interested in variants that change something with the "proof-of-work" function.  This sits at the heart of Bitcoin's decentralization:  If a node in the Bitcoin network wishes to be the one permitted to sign a block of transactions, and receive payment for doing so, it must prove that it did a certain amount of computational effort.  This mechanism spreads the authority for the transaction history over all of the participants in the protocol in rough proportion to their computational horsepower.  The goal of this mechanism is to ensure that no one entity can control the currency without spending an enormous amount of money to do so.

Bitcoin itself uses a very simple proof-of-work:  iterations of the SHA256 hash function.  Given a target t, find a nonce n for which  SHA256(nonce, transaction-data) < t.  As far as we know, the only way to find such a nonce is by brute force.

While there are many variants that use different hash functions, the ones that are more interesting have different objectives for their proof of work.  Two that seem particularly popular are:

  • ASIC or GPU-resistance:  Develop a proof-of-work function that targets the strong points of general purpose CPUs to prevent GPUs or ASICs from having a huge advantage.
  • Extract some general social good from the proof-of-work.
The rationale for the first is a philosophical question I don't want to get in to - I'm personally skeptical that there's too much value in it.  From an energy perspective, it's more efficient to have a bunch of ASICs spinning than it is to have loads of general-purpose CPUs spinning, and non-experts underestimate the degree to which even modestly complex computations can be accelerated on GPU or ASIC if the price is right.  I've blogged before about making GPU-resistant coins mineable on GPUs, so I won't belabor the point here.

The second, however, is fascinating:  If (and that's a huge if that's again not my point) a Bitcoin-like payment mechanism takes off, and it still requires computational proof-of-work, can we engineer a PoW function that does something useful as a byproduct?

Nobody has a good answer yet, but at least two coins are trying a little bit:  Primecoin and Riecoin.  Both require finding "interesting" prime numbers as part of the proof-of-work, which I'll get into in a moment.  The social value of this is relatively small, but at least it creates a dataset with vaguely plausible interest when the numbers get big.  But they also have the benefit that, e.g., effort put into the mining software for them might eventually feed back into software such as gmp that is used more widely in computation and cryptography.

I've spent the last month playing with Riecoin and wanted to share some of the fun.  Let's jump in.

The Riecoin Proof-of-Work

Riecoin is named after Bernhard Riemann, but its proof of work is actually related to the Hardy-Littlewood k-tuple conjecture.  That's a mouthful, though, and Riemann deserves some public props too, so I think it's a nice name.

The proof-of-work is, given a target number t and a search limit l:
    Find t <= n <= t+l
    Such that six values are all prime numbers:  n, n+4, n+6, n+10, n+12, and n+16

Such a dense cluster of prime numbers all occurring "in a row" (except multiples of two and three) is called a prime sextuplet, or a prime 6-tuple.

The relationship to the Hardy-Littlewood conjecture is that it's unknown if there are an infinite number of such clusters or if we can accurately estimate their density.  In fact, we don't even know for certain that there are infinitely many pairs of the form n, n+2 both prime -- this is the famous twin prime conjecture, on which there has been exciting recent progress.  But there probably are. :)

So, building a fast Riecoin solver is pretty straightforward:  There's some very standard bitcoin-stuff involved in computing the target t, but once that's done, it's all primes all the time.

In the rest of this post, we'll start out by first enumerating the k-tuples from 0 in order to understand some of the building blocks.  Then we'll look at how we change it to start searching from a very large target number.  Break out your Python interpreter and follow along at home:  you'll be able to run all of these examples yourself, and by the end, have built a Riecoin miner fast enough to solve the first Riecoin block.

Finding Prime 6-Tuples

Here's a concrete example of a prime sextuplet:  7, 11, 13, 17, 19, 23.  Here, n is 7, and n, n+4, n+6, n+10, n+12, and n+16 are all prime.  (Here's a list of primes if you want to check.)

Grab the source from Github and follow along in the explain directory.

Method 1:  Brute force using a primality test

If you have a function is_prime_p(n) that will tell you whether or not a number is prime, you can easily create a brute-force 6-tuple finder.   This is slow, but it's a good start.  We'll use the Fermat test for simplicity, but you don't need to understand it - just pretend for now that it tells us whether a number is prime or not.  It can be wrong, but don't sweat it.  

First, let's define our test function to see if something is a valid Riecoin PoW (see fermat_test.py):

def fermat_is_valid_pow(n):
    for offset in [0, 4, 6, 10, 12, 16]:
        if not fermat_prime_p(n+offset):
            return False
    return True

And now we'll run through all odd candidates from 3 to wherever we want to stop, N:

count = 0
do_output = True

for candidate in range(17, N, 2):
    if fermat_test.fermat_is_valid_pow(candidate):
        count += 1
        if do_output:
            print candidate

print "Total of ", count, " prime sextuplets"

Acknowledging that this is a silly way of doing it, how well does it work?  Let's see:

time ./brute_force_ric.py 10000000
Total of  17  prime sextuplets

real 0m7.366s

7.3 seconds to search the first ten million numbers.  That doesn't seem too bad - until you think that finding a Riecoin proof-of-work at the most trivial difficulties requires searching through 4 billion or so numbers, plus or minus a few, and that the Fermat test is much slower the larger the numbers involved get.

Method #2:  Using a Basic Sieve of Eratosthenes

Anyone who's taken an introductory computer science course knows that there's a better way to find prime numbers:  using a sieve.  The most common, of course, is the sieve of Eratosthenes, which has been described more and better than I will do here.  The core idea is to eliminate all multiples of prime factors:  Create a vector of N "True" values, and then mark all of the even ones false.  Then mark all of the multiples of 3 false.  Then the multiples of 5.  And so on.

Here's a basic sieve implemented in python (basic_sieve_e.py), and we've defined a similar is_valid_pow function in it that just checks the array at each spot.  We'll use the sieve to generate all of the prime numbers up to N and then to test them for primality.  This is in basic_ric.py:

for candidate in basic_sieve_e.genprime(N):
    # Skip the first spots until our sieve will have eliminated
    # invalid candiates.  Only needed because we're starting small.
    if (candidate > 17 and candidate < N-17):
        if (basic_sieve_e.sieve_is_valid_pow(candidate)):
            print candidate

How does this do?

time ./basic_ric.py 10000000
Total of  17  prime sextuplets

real 0m3.278s

Hey, twice as fast.  And a casual examination of the sieve we wrote suggests that it's horribly slow and inefficient.  If we apply some very basic optimizations (faster_sieve_e.py) to speed it up (storing only the odd-numbered entries and not sieving out multiples of two or three), we get a bit of a speedup:

Faster Sieve:

time basic_ric2.py 
Total of  17  prime sextuplets

real 0m2.454s

This basic idea represents the state of Riecoin mining in the initial release.  We could speed it up by at least 10x through better engineering of the sieve, but it illustrates the point.  However, as demonstrated by the huge increase in difficulty when ypool's jh00 released his miner, one can do better.  A lot better:


How did jh00's miner accomplish this?  The answer boils down to another well-known technique, often called wheel factorization, but applied specifically for finding k-tuples.

Wheel Factorization for K-Tuples

Let's start with an observation.  Imagine you want to sieve out all even numbers.  You write down a sieve filled with all 1s (true):
1111111111111111

and then you strike out all of the even ones:
0101010101010101

It's a very simple observation, but note that the pattern of zeros and ones repeats with a period of 2.  Obviously - we're striking out every other one.  What if we sieve out both 2 and 3?  Let's start at 6:

     6  7  8  9  10 11 12 13 14 15 16 17 18 19 20 21 22 23 
  2: .     .      .     .     .     .     .     .     .
  3: .        .         .        .        .        .
sum: .     .  .   .     .     .  .  .     .     .  .  .

The pattern "010001" repeats.  It has a period of 2*3=6, which again, is intuitive:  the "2" pattern repeats every two digits, and the "3" pattern repeats every 3, so the first point at which they both repeat is 6.

You can extend this idea - the intersection of 2, 3, and 5 repeats every 2*3*5=30 digits.

This means you don't really need to sieve these small numbers over and over, because the pattern repeats itself.  You can just skip over them once you've established the basic pattern.

The term for the product of primes up to a number is called a Primorial.  2*3*5=30 is the third primorial, also called Pn5.  We can use this observation with any primorial we want.

Now let's extend this idea to k-tuples.  Because we know that the sieve pattern repeats every Pn digits, it must also be the case that the pattern of possible k-tuples relative to the primorial repeats every Pn digits.  By relative to, I simply mean "possible k-tuples that are eliminated by the prime factors of the primorial".  In other words, the repeating pattern of the 2,3,5,7, ..., Pn factors indicates that one or more of the 6 offsets in the k-tuple are composite.

This observation doesn't mean that a valid position will be a prime tuple:  One of the positions might have a factor > Pn.  But it gives us a way to rapidly eliminate a huge number of candidate locations.  Let's see this in Python using the 4th primorial, or Pn7.  We'll sieve using the first four primes, but this time, we'll eliminate from consideration any n where any of the 6 positions are composite (ric_siever.py):

primorial = 2*3*5*7 # This equals 210

isCandidate = [True] * primorial

# Kill any location that is divisible by the primes in our primorial
for i in [2, 3, 5, 7]:
    ni = i
    while ni < (primorial+16):
        for offset in [0, 4, 6, 10, 12, 16]:
            if (ni-offset > 0 and ni-offset < primorial):
                isCandidate[ni-offset] = False
        ni += i

for i in range(3, primorial):
    if isCandidate[i]:
        print i

If we run this, we see something interesting:  Only the number 97.  This means that the only possible solutions to the Riecoin PoW occur at 210*n + 97 for some numbers n.  In other words, instead of considering all odd numbers, or all prime numbers, we can consider only one in every 210 digits as a candidate location.  What if we apply this to our brute force searcher?  We'll just change it from evaluating every odd number to every 210*n+97 (brute_force_ric210.py)

max_n = N/210

for candidate in range(0, max_n):
    if fermat_test.fermat_is_valid_pow(candidate*210 + 97):
        count += 1
        if do_output:
            print candidate

time ./brute_force_ric210.py 10000000
Total of  17  prime sextuplets

real 0m0.107s

Now we're starting to get a little more serious - about a 10x or more speed boost relative to our naïve brute force version, and we haven't combined our tricks yet.

We can then extend this idea recursively to larger primorials:  We know that only 210*n+97 produces valid candidates, so we can then examine those locations to see if they're divisible by the next element of the primorial.  For 2*3*5*7*11  (2310), for example, many of the candidates in 210*n+97 are divisible by 11.  So going to this larger primorial gives us these candidates mod 2310:

97  937  1147  1357  2197

Instead of the 11 possibilities, there are now only 5.  Doing this recursively up to the limit of what we're trying to find yields a further speedup (ric_gen.py)

time ./ric_gen.py 10000000
Total of  17  prime sextuplets

real 0m0.045s

But a lot of that is just the time to start Python.  Using 100m instead:

time ./ric_gen.py 100000000
Total of  81  prime sextuplets

real 0m0.194s

Has accomplished a more than 100-fold speedup over our initial version.

This basic idea - that matching k-tuples can only occur at particular locations - forms the basis for the second and third generation RIC miners.  I'll explain the rest of the speed-up in the next blog post.

Comments

Popular posts from this blog

Reflecting on CS Graduate Admissions

Chili Crisp Showdown: Laoganma and Flybyjing

Two examples from the computer science review and publication process