- Testing requirements after changes:
  * Test all functions return either native or bigints.  Functions that
    return raw MPU::GMP results will return strings, which isn't right.
  * Valgrind, coverage
  * use:  -O2 -g -Wall -Wextra -Wdeclaration-after-statement -fsigned-char
  * Test on 32-bit Perl, Cygwin, Win32.
  * Test on gcc70 (NetBSD), gcc119 (AIX/Power8), gcc22 (MIPS64), gcc115 (aarch)
  * prove -b -I../Math-Prime-Util-GMP/blib/lib -I../Math-Prime-Util-GMP/blib/arch

- For new functions:
  XS, .h, .c, PP, PPFE, export, t exports, lib/ntheory.pm, Changes, doc, test


- Move .c / .h files into separate directory.
  version does it in a painful way.  Something simpler to be had?

- finish test suite for bignum.  Work on making it faster.

- An assembler version of mulmod for i386.

- It may be possible to have a more efficient ranged totient.  We're using
  the sieve up to n/2, which is better than most people seem to use, but I'm
  not completely convinced we can't do better.  The method at:
  http://codegolf.stackexchange.com/a/26747/30069 ends up very similar.  For
  the monolithic results the main bottleneck seems to be the array return.

- Big features:
   - QS factoring

- Figure out a way to make the internal FOR_EACH_PRIME macros use a segmented
  sieve.

- Rewrite 23-primality-proofs.t for new format (keep some of the old tests?).

- Factoring in PP code is really wasteful -- we're calling _isprime7 before
  we've done enough trial division, and later we're calling it on known
  composites.  Note how the XS code splits the factor code into the public
  API (small factors, isprime, then call main code) and main code (just the
  algorithm).  The PP code isn't doing that, which means we're doing lots of
  extra primality checks, which aren't cheap in PP.

- Consider using Test::Number::Delta for many tests

- More tweaking of LMO prime count.
    - OpenMP.  The step 7 inner loop is available.
    - Convert to 32-bit+GMP to support large inputs, add to MPU::GMP.
    - Try __int128.
    - Variable sieve size
    - look at sieve.c style prime walking
    - Fenwick trees for prefix sums

- Iterators speedup:
  1) config option for sieved next_prime.  Very general, would make
     next_prime run fast when called many times sequentially.  Nasty
     mixing with threads.
  2) iterator, PrimeIterator, or PrimeArray in XS using segment sieve.

- Perhaps have main segment know the filled in range.  That would allow
  a sieved next_prime, and might speed up some counts and the like.

- Benchmark simple SoEs, SoA.  Include Sisyphus SoE hidden in Math::GMPz.

- Try using malloc/free for win32 cache memory.  #define NO_XSLOCKS

- Investigate optree constant folding in PP compilation for performance.
  Use B::Deparse to check.

- Ensure a fast path for Math::GMP from MPU -> MPU:GMP -> GMP, and back.

- More Pari:  parforprime

- znlog:
    = GMP BSGS for znlog.
    = Clean up znlog (PH, BSGS, Rho).
    = Experiment with Wang/Zhang 2012 Rho cycle finding

- consider using Ramanujan Li for PP code.

- xt/pari-compare:  add chinese, factorial, vecmin, vecmax,
                        bernfrac, bernreal, LambertW.

- Proth test using LLR.  Change mersenne test file to test both.
  Note: what does this mean?  Both LLR and Proth are in GMP now.

- harmreal and harmfrac for general $k

- For PP, do something like the fibprime examples.  At load time, look for
  the best library (GMPz, GMP, Pari, BigInt) and set $BICLASS.  Then we
  should use that class for everything.  Go ahead and return that type.
  Make a config variable to allow get/set.

- Support FH for print_primes.  PerlIO_write is giving me fits.

- Test for print_primes.  Not as easy with filenos.

- divsum and divsummult as block functions.
  The latter does sum = vecprod(1 + f(p_i) + f(p_i^2) + ... f(p_i^e) for all p.

- Consider Lim-Lee random prime generation, optionally with proof.
  https://pdfs.semanticscholar.org/fd1d/864a95d7231eaf133b00a1757ee5d0bf0e07.pdf
  libgcrypt/cipher/primegen.c

- More formal random prime generation for pedantic FIPS etc. users, with
  guarantee of specific algorithm.

- surround_primes

- More Montgomery: znlog, catalan

- polymul, polyadd, polydiv, polyneg, polyeval, polyorder, polygcd, polylcm, polyroots, ...
  A lot of our ops do these mod n, we could make ..mod versions of each.

- poly_is_reducible

- use word-based for-sieve for non-segment.

- remove start/end partial word tests from inner loop in for-sieve.

- sieve.h and util.h should get along better.

- compare wheel_t with primes separated and possibly cached.

- urandomm with bigints could be faster.
   7.7s  my $f=factorial(144); urandomm($f) for 1..5e5;
   6.2s  my $f=factorial(144); urandomm("$f") for 1..5e5;
   4.8s  my $f="".factorial(144); urandomm($f) for 1..5e5;
   5.3s  use Math::GMP qw/:constant/; my $f=factorial(144); urandomm($f) for 1..5e5;
   1.7s  my $f=Math::Prime::Util::GMP::factorial(144); Math::Prime::Util::GMP::urandomm($f) for 1..5e5;
  In the first case, we're calling ""->bstr->_str once for validation in MPU
  and and once for use in MPU::GMP.
  The last case is all strings with no read/write bigint objects anywhere.

- Destroy csprng context on thread destruction.

- submit bug report for Perl error in 30b8ab3

- localized a/b in vecreduce, see:
  https://metacpan.org/diff/file?target=REHSACK/List-MoreUtils-XS-0.428/&source=HERMES%2FList-MoreUtils-XS-0.427_001#XS.xs
  perl #92264 (sort in 5.27.7)

- consider #define PERL_REENTRANT

- add back formultiperm optimization if we can get around lastfor issue.

- make a uint128_t version of montmath.  Needs to handle 64-bit.

- sieve_range does trial division

- srand with no args should be calling GMP's srand with the selected seed
  value.  This is all a hacky artifact of having the two codebases.

- Look at using Ramanujan series for PP Li.

- _reftyped as XS call

- update prime count lower/upper from https://arxiv.org/pdf/1703.08032.pdf

- urandomr

- circular primes ... just use repdigits after 1M?  https://oeis.org/A068652

- perhaps square-free flag for factor for early stop.  Use in moebius etc.

- make a NVCONST, define log, sqrt, etc. for quadmath vs. long double

- move most of our long double routines to NVCONST (see above).

- Change from Kalai to Bach's algorithm for random factored integers
  https://maths-people.anu.edu.au/~brent/pd/multiplication-HK.pdf

- Adjust crossover in random_factored_integer PP code for Kalai vs. naive

- Things from Pari/GP 2.12 beta:
   - rewritten (much faster) Bernoulli.
   - factorial
   - divisors?
   - DLP/PH

- semiprime_count PP just walk if small range.

- add b125527.txt to oeis 125527.  (semiprime counts 2^n)

- limit for twin prime and ramanujan prime

- add to A033843 (twin prime count < 2^n).  Oliviera e Silva has good data.

- consider adding multifactorial.  See MPU::GMP.

- multicall in forpart/forcomp.

- check memory use for non-multicall.  We need enter/leave which were removed.

- Add aliquot sum

- split 80-PP into multiple tests: random, float, numeric, factor, prime, etc.

- testing:  lehman_factor, print_primes, aks

- powint, divint, mulint, addint:
  - optimize XS for (1) both neg, (2) either neg

- factor, factor_exp should accept negative inputs

- NEGMAXINT testing in PP.

- NEGMAXINT input in XS.

- Looks like lots of dodgy cases with -(2**63+1) and higher.
  Build in the svn conversion as well?  stat = get(&n, svn, flags);
  We're now restricting negative inputs to -ivmax, which is better, but
  means we're skipping optimizations for where we want abs(n) or just want
  to know if it is negative.
  If we built this into the flags, it would be slightly messy but now in
  exactly one place instead of all over.

- euler_phi should do XS -> GMP directly.  Maybe make totient in PP for uniform name.

- Make prime_omega, prime_bigomega, and liouville take a range like moebius.

- think about making an iterator for range omega/bigomega.  We can precalc
  the primes and offsets, which should enable fast sieving of small windows.

- dickman_rho, debruijn_psi
  See: https://arxiv.org/pdf/1301.5293.pdf
       https://www.ams.org/journals/mcom/1969-23-106/S0025-5718-1969-0247789-3/S0025-5718-1969-0247789-3.pdf
       Hunter/Sorenson(1997)
       https://cosec.bit.uni-bonn.de/fileadmin/user_upload/teaching/08us/08us-cryptabit/AnatInt_Crypto-sld.pdf

- consider random_smooth_integer, random_rough_integer.  nbit or range or mod?
  See: https://cr.yp.to/papers/epsi.pdf
       https://arxiv.org/pdf/2006.07445.pdf

- ipowsafe could have the limits using hard-coded sqrt and cbrt to avoid div.

- almost primes check and enter new for
   3 http://oeis.org/A109251
   4 http://oeis.org/A114106
   5 http://oeis.org/A114453
   6 http://oeis.org/A120047
   7 http://oeis.org/A120048
   8 http://oeis.org/A120049
   9 http://oeis.org/A120050
  10 http://oeis.org/A120051
  11 http://oeis.org/A120052
  12 http://oeis.org/A120053
  4^n https://oeis.org/A116426
  6^n https://oeis.org/A116427
  8^n https://oeis.org/A116428
  9^n https://oeis.org/A116429
  nth https://oeis.org/A101695
  Try sequence A052130.
    a(n) = {my m = ceil(n*log(3)/log(1.5)); return apc(1<<m,m-n); }
    Or: apc(m=floor(n*(1+1/sqrt(2))); return apc(2^(n+m+2), m+2)
    mpu '$n=19; $m=int($n*log(3)/log(1.5)+1); say almost_prime_count($m-$n,1<<$m);'
    mpu '$n=19; $m=int($n*(1+1/sqrt(2))); say almost_prime_count($m+2,1<<($n+$m+2));'
    It looks like the Mathematica table has a misplaced paren.
  Sequence A078843:
    mpu 'use POSIX; $|=1; sub app3i { my($k,$n)=@_; almost_prime_count($k,$n) - (($k < 1) ? 0 : almost_prime_count($k-1, divint($n,3))); } my $a=1; for my $n (1..60) { $k=POSIX::ceil($n*log(5/3)/log(5/2)); $a +=app3i($n-$k,divint(powint(3,$n),powint(2,$k))); print "$a, "; }'


- Almost primes things:
  - Once we have a range bigomega, use it in PP almost_primes.
  - XS almost_primes should call this in segments, use count_approx for seg size
  - more work on verifying count_upper for k=5+
  - more work on verifying count_lower for k=5+
  - max_almost_prime_count fill in k=5-9.
    almost_prime_count(n,~0): n=17   23m   n=16   41m   n=15  71m   n=14  127m
                              n=13  225m   n=12  397m   n=11 699m   n=10 1217m
    We could estimate n=9 at 38 hours, n=8 at 66 hours, n=7 at 116 hours.
    Note this is *significantly* faster than when this was first written.
  - max_almost_prime_count improve k=2,3,4.  Somehow.
  - optimize PP is_almost_prime
  - better PP almost_prime_count bounds for k > 3
  - optimize PP nth_almost_prime and bounds (this is super slow)
  - optimize PP almost_primes() (uses forfactored)
  - Consider getting rid of the unused construction code.
  - foralmostprimes k reduction in PP
  - foralmostprimes XS tuning (size, when to do incremental, etc.)
  - rethink the structure for foralmostprimes: iterator?  smart segment size?
  - foralmostprimes ssize, maybe use approx counts?
  - almost_prime_count(k,beg,end)

- optimize nth_powerful, both in C and PP.

- Add OEIS sequence, a(n) = the k-th k-powerful number
  mpu 'for my $k (1..40) { say "$k ",nth_powerful($k,$k); }'

- Improve inverse_interpolation.  We're using linear interpolation.
  Consider 2nd order or Aitken's method.  Dubiously useful here.

- For almost primes in PP, maybe use Lagrange estimator to start, given
  that we have no bounds.

- omega primes, work on figuring out formula for omega_prime_count
  better construction and counting
  OEIS sequence for omega k=3 counts for 10^i

- There might be a better method for _sqrtmod_prime_power.

- PP submod native ints should be streamlined.

- extend qnr with optional root argument, cubic non-residue, etc.

- practical numbers: make OEIS sequence with count <= 10^n.
  mpu '$s=1; for my $e (1..9) { $s += is_practical($_) for 10**($e-1)+1..10**$e; say "10^$e  $s"; }'
  mpu 'for my $e (1..9) { $s=0; $s += is_practical($_) for 1..10**$e; say "10^$e  $s"; }'
  5, 30, 198, 1456, 11751, 97385, 829157, 7266286, 64782731, 582798892, 5283879886

- consider forprimepowers { ... } beg,end

- PP next_prime_power, prev_prime_power: skip evens.

- test coverage for PP.

- Revisit AKS in PP.  Essentially all the time is spent in two lines of
  poly_mod_mul.  (1) Move to Bern 4.1, (2) try using bigint multiply
  (Kronecker) like we do in GMP.

- 81-bignum.t takes too much time with PP

- PP random_safe_prime(180) is very slow.  GMP algorithm doesn't help.

- any way to make non-GMP random_strong_prime faster

- sum_primes128, try using Meissel.

- Speed up XS RiemannR with quadmath.

- binomialmod:
   - PP implementation for primes, squarefree, and general composites.

- Possible new:
  - checksums?  Rather than print | md5sum.  adler32, sha1/2/3, blake2 (b2sum)

- inverse sigma.  Better, determine how to generalize this somehow.

- stronger BPSW test:  https://arxiv.org/abs/2006.14425

- faster prime gaps:  https://arxiv.org/abs/2012.03771
                      https://github.com/sethtroisi/prime-gap

- Pari 2.13 has faster exp, Catalan, log2, gamma, factorial, lngamma.
  Completely new Bernoulli and bernvec.
  eulerreal, eulervec, ramanujantau
  MPQS
  Take a look.

- Pari 2.13 added an optional third argument to sqrtint, just like rootint.
  Not high priority since we can just call rootint with k=2.

- Faster Mertens. Helfgott/Thompson 2020.  https://arxiv.org/pdf/2101.08773.pdf

- gcdext, see Sorenson, Jabelean, and kernel/none/gcdll.c.

- Add optional base to is_delicate_prime(n,base=10).  This should be relatively
  easy in C, but we'll need to put the math back in PP.

- refactor lucas code.
  split out all the different codes, benchmark them all.

- Complete reviewing docs for positive vs. non-negative, and comparing
  to XS.xs.  Everything after modular functions was not reviewed.


- make new XS routine: _validate_simple_int(SV *svn, int negok)
  then make PP link _valint($n,negok) to either XS or PP.
  especially PP valuation base 2 is really slow with current validation
  .
  Similarly, a routine to return the best form (int or bigint) for post-calc

- PP needs a revamp of the bigint->int downgrade.  Using the babs(BMAX) is
  wrong.  We can use max/min.  Better come up with something more consistent.
  Possibly XS, or try "$x=$n->numify; return ($x eq $n) ? $x : $n;"


- GMP
  - rootmod, binomialmod
  - almost primes
  - omega primes
  - powerful numbers, counts
  - prime_power_count
  - smooth_count
  - rough_count
  - fdivrem, lucasuv, lucasumod, lucasvmod, lucasuvmod
  - prime_bigomega
  - prime_omega
  - qnr
  - fdivrem


- consider (in GMP 0.53):
  - setbit, clrbit, combit, tstbit
  - bitand, bitor, bitxor, bitcom

- other *int functions?  See overload and GMP for possible things we want.
  is_divisible, remove_factors, clzint, ctzint (scan0,scan1), is_even, is_odd

- an overload option or module, to call our *int functions.
  have: add, sub, mul, pow, div, rem, neg, ++, --, <<, >>, <=>
  need: not, bnot, and, or, xor, gt, lt, geq, leq, eq, neq,
        as_bool, as_string, as_num, clone

- 32-bit testing

- Lots of older PP code, especially factoring, is built on a bigint vs PP
  idea.  It might be useful to write normally using Mpowmod etc. and
  benchmark various inputs.

- incremental factoring.  Maybe a stateful iterator?

- new semiprime approximations should be used in PP.

- get rid of the pp shell for prime_powers, semiprimes, etc.  Just use XS.
