I am working with extremely large numbers and would like to verify my Karatsuba multiplication result ((2^136279841)-1)^2 which needs (532 344 * _m256i_epi64)^2 i.e. 4,258,752 uint64_t to store the result.
I stored all required data arrays in a preallocated memory:
size_t num_bits = 136279841;
size_t num_uint64 = (num_bits + 255) / 256 * 4;
size_t n = num_uint64;
// Correct calculation for First_256_offset
size_t First_256_offset = (GB * 0x40000000ULL) - ((2ULL + 1ULL) * num_uint64 * sizeof(uint64_t));
constexpr size_t GB = 3;
static const SIZE_T giga = 1024 * 1024 * 1024;
static const SIZE_T size = GB * giga;
uint64_t* ARRAY = static_cast<uint64_t*>(VirtualAlloc(NULL, size, MEM_COMMIT, PAGE_READWRITE));
uint64_t* number = ARRAY + First_256_offset / sizeof(uint64_t);
// Store the number (2^136279841)-1) using _mm256_maskstore_epi64 in a loop
__m256i ones = _mm256_set1_epi64x(-1);
size_t i = 0;
for (; i < (num_uint64 - 4); i += 4) {
_mm256_store_si256((__m256i*) & number[i], ones);
}
_mm256_maskstore_epi64((long long int*) & number[i], _mm256_setr_epi64x(-1, -1, -1, -1), _mm256_setr_epi64x(0x1111111111111111, 0x0000000000000001LL, 0x0, 0x0));
I need to calculate the MOD (A, B) where A, multiplication result which takes about 3 minutes on my laptop, is stored from ARRAY and B is the number in the code. The memory space above A and below First_256_offset is used as temporary space for Karatsuba multiplication. In the MOD (A, B) result, I may use the space below First_256_offset.
I need to avoid using any external libraries, vector, string or memalloc functions.
P.S. Note that I am using uint64_t operation in my C++ Karatsuba program because _m256i can handle int64_t only and I need to work with uint64_t data.