dabs of grue banner




Numbers of k-almost
primes gawk program

More gawk programs

1. Background and significance

This article* stems from my naive interest in seeking connections between the structure of arithmetic and the structure of reality, as intimated in Remarks on the physical nature of mathematics.... Needless to say, I am not a mathematician and would not expect a mathematician to indulge in this sort of fantasy. But I hope, at least, that much of the following will be understood by ordinary folk, like me, with a less than remarkable quota of grey matter, but with a basic high-school education in maths.

*This article is out of date. A fair bit of work has recently been done on almost primes and formulae for estimating their distribution now appear to be available. I will try to assess the implications of these studies before long.

Prime number theory has for long been considered an important branch of mathematics. However, until very recently there had been little investigation of the numbers known as almost primes. A k-almost prime is a number with k (prime) factors. Thus the series of 2-almost primes (also called semiprimes) begins 4, 6, 9, 10, 14, 15, 21, 22 … , these numbers each comprising two factors as follows:

2 x 2 = 4
2 x 3 = 6
3 x 3 = 9
2 x 5 = 10
2 x 7 = 14
3 x 5 = 15
3 x 7 = 21
2 x 11 = 22

Similarly the series of 3-almost primes begins 8, 12, 18, 20, 27, 28, 30, 42, 44, 45, 50, 52, 63 ..... (2 x 2 x 2,
2 x 2 x 3 etc). The series of 4-almost primes begins 16, 24, 36, 40, 54, 56, 60, 81, 84, 88, 90, 100 .....
(2 x 2 x 2 x 2,  2 x 2 x 2 x 3 etc). Large semiprimes are very difficult to factorise and for this reason are sometimes used in encryption programs. Therefore most of the interest in this subject to date has been on developing algorithms to factorise large semiprimes.

It is inviting to think of almost primes as k-dimensional numbers. In the domain of integers >1, a k-dimensional number is one that has exactly k factors (the same or different). Prime numbers themselves can thus be regarded as 1-dimensional numbers (1-almost primes).

A loose definition. The way that the k-almost primes (for a given value of k) are scattered amongst the series of integers (1, 2, 3, 4, 5... ) is often called the distribution of that kind of number. In this article distribution will also have a more specific meaning: for a given kind of number, it means a formula, graph or table that describes the total number of those numbers that occur up to any given integer value. For example, the first few prime numbers are 2, 3, 5, 7, 11, 13, 17, 19, and there are 8 of them in the first 20 integers, but if you look at line 1 of Table 1 (below), you will find the number of primes up to 100, 1000, 10000 ...107. So this is a table describing the distribution of primes up to 107. (Subsequent lines show the distributions of 2, 3 and 4-almost primes.)

There are two areas of particular interest relating to k-dimensional numbers, where k is very small:

(i) The distribution of primes. It is well known that the exact distribution of primes cannot be captured by any formula. Although the prime number theorem (and various modifications of it) provides a good estimate of the number of primes up to any given large number, the actual distribution of primes is quite erratic. Because there are very many small-k numbers with a common small factor (e.g. 2, 3), the distribution of these numbers strongly influences the small-scale distribution of the primes. The "bumps" in the distribution of the primes are largely accounted for by the presence of intervening small-k, small-factor numbers. In other words, the distribution of primes may be influenced by the distribution of numbers that are not primes in a way that might be illuminated by an analysis of k-dimensional domains.

(ii) Relationship between universes of different dimensionality. For any given k, the domain of
k-dimensional numbers is unique. The numbers thus correspond to geometrical points in unique
k-dimensional universes, which I call “real worlds”, by analogy with the 3-dimensional local real world. (This supposition is, of course, totally at odds with modern set theory, which asserts that any n-dimensional Euclidean space complies with the set of real numbers and not with the set of natural numbers or any of its subsets.) In a real world of given dimensionality nothing exists having any other dimensionality. For example, in real space, if an object were to have width and breadth but no height, it wouldn't exist at all. Each of the k axes of a k-dimensional real universe is scaled only with the sequence of prime numbers (see remarks below). (Since every integer can be represented as a factorisation of primes, the class of k-dimensional domains is the class of integers >1. According to conventional maths, there is an infinite number of such classes).

This area is intriguing because of its possible, but highly improbable, relevance to physics and the geometry of the actual world. (Thus 3- and 4-dimensional numbers are of greatest immediate interest.) However, recent advances in information-theory and speculation about the nature of reality tend to put a cloud over these thoughts. For one thing, it seems that the contents of a 3-dimensional universe can be represented in only two dimensions (and in general an x-dimensional universe can be represented in x-1 dimensions); and for another, some of the gurus of cosmology - exponents of the so-called "holographic" theory of reality - are now saying that our universe is in fact a projection into three dimensions from a 2-dimensional boundary, which is where all the information in the universe really exists. For me, this is not a big deal, because dimensions are nothing but mathematical constructs anyway (see various other articles).

2. Algorithms for finding almost primes

It is quite easy to construct computer algorithms for finding k-almost primes when k is very small, or for counting the number of k-almost primes that are smaller than a few powers of 10 (depending on the size of k). After that, computer power and time may be limiting. Examples of both kinds of algorithm can be found here and here. I have programmed these as lucidly as possible in gawk, so they will run in unix, linux or a unix emulation program for Windows such as Gnu; or they can easily be transcribed into another language. Some results from the counting program are shown below:

Table 1: Number of k-almost primes for k ≤ 4 and n ≤ 107

k      n =  100   1000  10000   105        106            107

1              25    168     1229     9592      78498       664579
2              34    299     2625     23378    210035    1904324
3              22    247     2569     25556    250853    2444359
4              12    149     1712     18744    198062    2050696
>4             7     137     1865     22730    262552    2936042

The problem is that the distributions of almost primes tend to have large scale fluctuations at the lower end of the number scale. Strong patterns don't emerge until you look at a very large number of numbers! Also, the larger the k number, the longer it takes for the distribution to settle down to a definite trend. In fact with semiprimes we need to look at a range of integers much larger than the 0-107 in Table 1 - probably more like around 0-1020. And for 3 and 4-almost primes we need to go much higher again. So we immediately run into computing problems. A partial answer is to try to compute the densities of almost primes in small regions of integers, and to calculate estimated total numbers from those data. This is practical for semiprimes, of limited use for 3-almost primes and useless for 4-almost primes (see Table 2).

3. Existing (?) formulae for approximating almost prime distributions

What we really want are some formulae analogous to those for primes based on the prime number theorem. The internet literature has little to offer. Some approximations for 2 and 3-almost primes, found in Wikipedia and attributed to E. Landau, are clearly wrong, but with a rearrangement of the terms and other minor adjustments we come up with the following (along with a version of the prime number theorem for comparison):

N(k=1) ~ n / (log n -1)

N(k=2) ~ n(log log n) / (log n -1)

N(k=3) ~ n(log log n)2 / 2(log n -1)

where log n is the natural logarithm of n.

Table 2 shows some results from these formulae (see here for gawk programs) alongside my best estimates using algorithm sampling methods. For n>102k the match is remarkably good.

Table 2: Prime and almost prime distribution by formula and algorithm (italics)

n Primes Semiprimes 3-almost primes
104 1218
106 78030
2.049 x 105
2.100 x 105
2.690 x 105
2.509 x 105
108 5.74 x 106
5.76 x 106
1.672 x 107
1.69 x 107
2.436 x 107
2.37 x 107
1012 3.76 x 1010
3.76 x 1010
1.246 x 1011
1.28 x 10
2.068 x 1011
2.06 x 1011
1016 2.790 x 1014
2.790 x 1014
1.006 x 1015
1.06 x 1015
1.815 x 1015
1.82 x 1015
1020 2.220 x 1018
2.220 x 1018
8.50 x 1018
8.8 x 1018
1.628 x 1019
1.67 x 1019

The problem is that I cannot validate these formulae, or improve upon them except in a totally arbitrary manner. [For example, try this formula for semiprimes: N(k=2) = n*log(log(n))/(log(n)-log(log(n^0.25))) ] Therefore it seems necessary to study the nature of the distributions in more depth, starting with semiprimes. The ultimate aim is to find a general method of building approximate formulae for the distributions of almost primes, analogous to (one of) the approximations already known for prime numbers.

4. Estimation of distribution of semiprimes

The problem

As we have seen, the distribution of semiprimes cannot be satisfactorily determined by computing individual numbers, as it is necessary to compute a large number of numbers > 1016 (approx), which takes too long on most computers. This much can be learnt from preliminary computations and by looking at the sorts of formulae that might be involved, both approaches suggesting that k-dimensional numbers other than primes approach their asymptotes extremely slowly.

We need to try to arrive at an approximate formula that relates the distribution of semiprimes to the distribution of primes, in so far as the latter is known. We are concerned with the density of semiprimes in the integer range 4 to n, where n is extremely large.

First approach to a solution

The following approach seems to me to be the most obvious way of tackling the problem using a computer.

Let F1(n) be the number of primes ≤n and let F2(n) be the number of semiprimes ≤n, where n is very large.

The number of semiprimes up to n having the lowest prime, 2, as one of the factors is approximately equal to the number of primes up to n/2, F1(n/2). Similarly the number of semiprimes up to n with 3 as one of the factors is approximately equal to the number of primes up to n/3, F1(n/3). Excluding the case (2*3) already accounted for above, the number of semiprimes using primes from 3 up to n/3 is F1(n/3)-1. Similarly the number of semiprimes using primes from 5 up to n/5 is F1(n/5)-2. In general, the number of semiprimes using primes from p to n/p with p as one of the factors is F1(n/p)-F1(p)+1. Thus F2(n) is the sum of this series up to √n, ie the last term is approximately F1(n/√n)-F1(√n)+1 =1. (The “+1” in these terms is a doubtful figure that needs to be checked by computation; intuitively the average addendum is less than 1 and may be closer to 0). This can be expressed as follows:

F2(n) = Σ[p=2 to p=√n] F1(n/p) – (1+2+3+4+…+ F1(√n) + z
     where p ranges over primes, n is arbitrarily constant and z ≤F1(√n)

Taking these terms in reverse order:

The standard approximation for the number of primes ≤n, F1(n), is n/(log n), where log n is the natural logarithm of n. A better approximation, which we will use (although for very large n it makes little difference), is:

F1(n) = n/((log n) -1).

Therefore let us assume z = √n/(log √n -1)

Next, the middle (negative) term is equivalent to an expression of the form (a2+a)/2, ie:

1+2+3+4+…+F1(√n) = 1+2+3+4+…+√n/(log √n -1)
= { [√n/(log √n -1)]2 + √n/(log √n -1) } /2

Let s = √n/(log √n -1). Combining the last 2 terms we have:

– (s2 + s)/2 + s = (s - s2)/2

and if the first term is A, F2(n) = A + (s - s2)/2

Finally, the first term, A, presents greater difficulties. A is approximately equivalent to:

n.{ 1/[2(log n/2 -1)] + 1/[3(log n/3 -1)] + 1/[5(log n/5 -1)]+ …. + 1/[√n(log √n -1)] }


Σ[p=2 to p=√n] n/[p(log n/p -1)]     (p ranging over primes)

There appears to be no readily available formula for this. However, it can be computed for p<1012 (n<1024), but whether this is accurate enough remains to be seen.

Computation based on the above method for n up to approx 1014 apparently gives higher density values from n=104 onwards than computation of actual semiprimes using only algorithms. Therefore I assume something is wrong with my maths!

Another approach to a solution

A slightly different approach is to first calculate the number of semiprimes ≤n, x, whose factors are primes ≤√n. The number of such semiprimes equals every combination of pairs of primes (including pairs of identical primes) ≤√n. Let p = F1(√n). Then x = (p2 + p)/2. In the range 4-n, this leaves all semiprimes having one factor larger than √n and one smaller. No such factor is larger than n/2. Therefore the total number of semiprimes in the range 4-n can be calculated from the number of primes using the formula (p2 + p)/2 along with an algorithm to calculate the number of semiprimes having just one factor, f, such that n< f < n/2. Notice, however, that the influence of (p2 + p)/2 on the final figure is small and decreases as n increases. For very large n it is negligible, so we are basically still left with an algorithm from which we have to try to extract a formula. (I've thought about shortcuts around this without much luck.)

An alternative might be to consider how the expression (p2 + p)/2 increases with each squaring of n, as it's reasonable to suppose this might be related to the the semiprime distribution. However, at high values of n, this expression is almost equivalent to p2/2, so no new light is shed on the comparison of semiprimes with primes. The following table puts this into perspective, with some relevant figures based on values of p obtained by formula.

Table 3: Values of (p2 + p)/2  for n = 10 to n = 1032

n p (p2+p)/2 No. of semiprimes
(by formula)
10 7.7 33.3 6.4
102 27.7 399 42.4
104 1218 7.42 x 105  2704
108 5.74 x 106    1.648 x 1013  2.436 x 107 
1016 2.79 x 1014  3.892 x 1028    1.815 x 1015 
1032 1.38 x 1030  9.465 x 1059  5.92 x 1030 

4. Conjectures and remarks

From computer algorithm calculations of k-almost primes up to approx n=1020, I'm hazarding a guess that in this range there are more 3-dimensional numbers than numbers of any other k-dimensional domain.

It might be remarked that if a Cartesian model of dimensions is adopted, with all the prime numbers used in scaling every axis, then there will be many more points in any given finite k-dimensional universe than there are k-almost primes. In a domain where the largest k-almost prime is n, there will be Pk points, where P is the number of primes up to k√n. However, it's quite possible that the Cartesian model is "incorrect" (or rather, I'm suggesting that it might be more fruitful to investigate a model based on k-almost primes as such).

.......Dabs of Grue........09/07/04 - 26/12/07.........................HOME