- Updated: March 15, 2026
- 6 min read
Efficient C Techniques for Generating All 32‑Bit Unsigned Integer Primes
The fastest way to generate every 32‑bit unsigned integer prime in C is to use a bit‑packed Sieve of Eratosthenes; trial division and wheel factorization are correct but considerably slower.
Why 32‑bit Prime Generation Still Matters
Prime numbers are the backbone of cryptography, hash functions, and many algorithmic tricks. When a project needs all primes that fit into a uint32_t (0 – 4 294 967 295), the challenge is not just correctness—it’s raw speed and memory efficiency. In this guide we dissect three classic C approaches, compare their runtime and memory footprints, and hand you ready‑to‑copy snippets that you can drop into any Linux‑based build pipeline.
Whether you are a software developer polishing a cryptographic library or an engineer building a data‑science pipeline, the insights below will help you pick the right algorithm for your constraints.
Overview of Prime Generation Methods
1️⃣ Trial Division
Trial division checks a candidate n against every known prime ≤ √n. The algorithm is simple, but its worst‑case time grows as O(N·π(√N)), where π is the prime‑counting function. For the full 32‑bit range this means millions of division operations.
Even with a pre‑filled prime list, the method spends most of its CPU cycles on numbers that are quickly eliminated by small divisors.
2️⃣ Wheel Factorization
Wheel factorization builds on the observation that many numbers are trivially composite (even numbers, multiples of 3, 5, …). By constructing a “wheel” of size equal to the product of the first k primes, we only test numbers that are coprime to that product.
In practice a wheel based on the first five primes (2·3·5·7·11 = 2310) reduces the candidate set to roughly 20 % of the natural numbers. The asymptotic complexity stays the same as plain trial division, but the constant factor drops.
3️⃣ Sieve of Eratosthenes
The sieve flips the problem: instead of testing each number, it marks multiples of each discovered prime as composite. Using a bit‑packed array (one bit per integer) the memory requirement for the full 32‑bit range shrinks to ≈ 537 MiB.
Time complexity is O(N log log N), and on modern hardware the entire sieve finishes in under a minute—orders of magnitude faster than the two trial‑based methods.
Code Snippets & Performance Insights
🔧 Trial Division (C)
bool is_prime(uint32_t n, const uint32_t *primes, size_t count) {
for (size_t i = 0; i n) break; // No need to check beyond √n
if (n % p == 0) return false; // Divisible → composite
}
return true;
}
🔧 Wheel Factorization (C)
struct Wheel {
uint32_t size; // product of first k primes
uint32_t *spokes; // remainders coprime to size
size_t n_spokes;
};
struct Wheel wheel_init(const uint32_t *first_primes, size_t k) {
struct Wheel w = {1, NULL, 0};
for (size_t i = 0; i < k; ++i) w.size *= first_primes[i];
uint32_t *tmp = malloc(w.size * sizeof(uint32_t));
for (uint32_t r = 0; r < w.size; ++r) {
bool ok = true;
for (size_t i = 0; i < k; ++i)
if (r % first_primes[i] == 0) { ok = false; break; }
if (ok) tmp[w.n_spokes++] = r;
}
w.spokes = malloc(w.n_spokes * sizeof(uint32_t));
memcpy(w.spokes, tmp, w.n_spokes * sizeof(uint32_t));
free(tmp);
return w;
}
🔧 Sieve of Eratosthenes (C, bit‑packed)
#include <stdint.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
uint8_t *bits;
size_t size; // number of bits
} BitArray;
static inline void bit_set(BitArray *ba, size_t i) {
ba->bits[i>>3] |= (1u << (i & 7));
}
static inline bool bit_get(const BitArray *ba, size_t i) {
return ba->bits[i>>3] & (1u << (i & 7));
}
BitArray bitarray_init(size_t bits) {
BitArray ba = { calloc((bits+7)/8, 1), bits };
return ba;
}
/* Sieve implementation */
BitArray sieve_32bit(void) {
const uint32_t MAX = 0xFFFFFFFFu;
BitArray is_composite = bitarray_init((size_t)MAX + 1);
bit_set(&is_composite, 0);
bit_set(&is_composite, 1);
for (uint32_t p = 2; p * p <= MAX; ++p) {
if (bit_get(&is_composite, p)) continue;
for (uint64_t mult = (uint64_t)p * p; mult <= MAX; mult += p)
bit_set(&is_composite, (size_t)mult);
}
return is_composite;
}
On a 2024‑class laptop (6‑core AMD Ryzen 5 7640U, 16 GiB RAM) the three implementations produced the following runtimes:
| Algorithm | Time Complexity (worst‑case) | Observed Runtime |
|---|---|---|
| Trial Division | O(N·π(√N)) | ≈ 24 min 20 s |
| Wheel Factorization | O(N·π(√N)) (smaller constant) | ≈ 23 min 30 s |
| Sieve of Eratosthenes | O(N log log N) | ≈ 32 s |
As the table shows, the sieve outperforms the other two by a factor of > 40 while staying within a reasonable memory budget.
For a deeper dive into the original research, see the comprehensive 32‑bit prime generation article. The author’s source code is openly available on GitHub, and the final PRIMES file hashes to 272eb05aa040ba1cf37d94717998cbbae53cd669093c9fa4eb8a584295156e15.
Memory & Runtime Considerations
When you decide which algorithm to embed in a production pipeline, keep these trade‑offs in mind:
- Trial Division: Minimal RAM (just the growing prime list). Scales poorly; unsuitable for time‑critical services.
- Wheel Factorization: Slightly more RAM for the wheel table, but still dominated by the prime list. Gains are modest for 32‑bit ranges.
- Sieve: Requires a bit‑array of ~537 MiB plus the final prime buffer (~800 MiB). Total ≈ 1.3 GiB, which fits comfortably on modern servers.
For cloud‑native micro‑services you may want to keep the memory footprint under 2 GiB to avoid OOM kills. The sieve’s deterministic runtime also makes it a good candidate for Enterprise AI platform by UBOS workloads that demand predictable latency.
If you are building a lightweight CLI tool for embedded devices, trial division with a small pre‑computed wheel (e.g., 2·3·5 = 30) may be the only viable path, albeit at the cost of minutes of CPU time.
Conclusion & Next Steps
Generating every 32‑bit unsigned prime is a classic benchmark for low‑level algorithm engineering. The Sieve of Eratosthenes delivers the best performance‑to‑memory ratio, while trial division and wheel factorization remain useful for constrained environments.
Ready to integrate a high‑speed prime generator into your stack? Grab the ready‑made UBOS templates for quick start, or explore the Web app editor on UBOS to prototype a custom service in minutes.
Need help automating the workflow? The Workflow automation studio lets you chain the sieve with downstream analytics, such as feeding the prime list into an AI SEO Analyzer or an AI Article Copywriter for content generation pipelines.
For startups looking to showcase cryptographic capabilities, check out UBOS for startups. SMBs can benefit from the UBOS solutions for SMBs, while larger enterprises may want to evaluate the UBOS partner program for co‑development opportunities.
Finally, if you’re curious about how AI can augment your development workflow, explore the AI marketing agents or the Telegram integration on UBOS for real‑time notifications when a new prime batch is generated.
Stay tuned for Part II, where we benchmark cutting‑edge libraries like primesieve and discuss SIMD‑accelerated sieves.