#pragma GCC optimize("O2") #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include //#define int ll #define INT128_MAX (__int128)(((unsigned __int128) 1 << ((sizeof(__int128) * __CHAR_BIT__) - 1)) - 1) #define INT128_MIN (-INT128_MAX - 1) #define clock chrono::steady_clock::now().time_since_epoch().count() #ifdef DEBUG #define dbg(x) cout << (#x) << " = " << x << '\n' #else #define dbg(x) #endif namespace R = std::ranges; namespace V = std::views; using namespace std; using ll = long long; using ull = unsigned long long; using ldb = long double; using pii = pair; using pll = pair; //#define double ldb template ostream& operator<<(ostream& os, const pair pr) { return os << pr.first << ' ' << pr.second; } template ostream& operator<<(ostream& os, const array &arr) { for(const T &X : arr) os << X << ' '; return os; } template ostream& operator<<(ostream& os, const vector &vec) { for(const T &X : vec) os << X << ' '; return os; } template ostream& operator<<(ostream& os, const set &s) { for(const T &x : s) os << x << ' '; return os; } /** * template name: MontgomeryModInt * author: Misuki * reference: https://github.com/NyaanNyaan/library/blob/master/modint/montgomery-modint.hpp#L10 * last update: 2023/11/30 * note: mod should be a prime less than 2^30. */ template struct MontgomeryModInt { using mint = MontgomeryModInt; using i32 = int32_t; using u32 = uint32_t; using u64 = uint64_t; static constexpr u32 get_r() { u32 res = 1, base = mod; for(i32 i = 0; i < 31; i++) res *= base, base *= base; return -res; } static constexpr u32 get_mod() { return mod; } static constexpr u32 n2 = -u64(mod) % mod; //2^64 % mod static constexpr u32 r = get_r(); //-P^{-1} % 2^32 u32 a; static u32 reduce(const u64 &b) { return (b + u64(u32(b) * r) * mod) >> 32; } static u32 transform(const u64 &b) { return reduce(u64(b) * n2); } MontgomeryModInt() : a(0) {} MontgomeryModInt(const int64_t &b) : a(transform(b % mod + mod)) {} mint pow(u64 k) const { mint res(1), base(*this); while(k) { if (k & 1) res *= base; base *= base, k >>= 1; } return res; } mint inverse() const { return (*this).pow(mod - 2); } u32 get() const { u32 res = reduce(a); return res >= mod ? res - mod : res; } mint& operator+=(const mint &b) { if (i32(a += b.a - 2 * mod) < 0) a += 2 * mod; return *this; } mint& operator-=(const mint &b) { if (i32(a -= b.a) < 0) a += 2 * mod; return *this; } mint& operator*=(const mint &b) { a = reduce(u64(a) * b.a); return *this; } mint& operator/=(const mint &b) { a = reduce(u64(a) * b.inverse().a); return *this; } mint operator-() { return mint() - mint(*this); } bool operator==(mint b) const { return (a >= mod ? a - mod : a) == (b.a >= mod ? b.a - mod : b.a); } bool operator!=(mint b) const { return (a >= mod ? a - mod : a) != (b.a >= mod ? b.a - mod : b.a); } friend mint operator+(mint a, mint b) { return a += b; } friend mint operator-(mint a, mint b) { return a -= b; } friend mint operator*(mint a, mint b) { return a *= b; } friend mint operator/(mint a, mint b) { return a /= b; } friend ostream& operator<<(ostream& os, const mint& b) { return os << b.get(); } friend istream& operator>>(istream& is, mint& b) { int64_t val; is >> val; b = mint(val); return is; } }; using mint = MontgomeryModInt<998244353>; /** * template name: NTTmint * reference: https://judge.yosupo.jp/submission/69896 * last update: 2024/01/07 * include: mint * remark: MOD = 2^K * C + 1, R is a primitive root modulo MOD * remark: a.size() <= 2^K must be satisfied * some common modulo: 998244353 = 2^23 * 119 + 1, R = 3 * 469762049 = 2^26 * 7 + 1, R = 3 * 1224736769 = 2^24 * 73 + 1, R = 3 * verify: Library Checker - Convolution */ template> struct NTT { using u32 = uint32_t; static constexpr u32 mod = (1 << k) * c + 1; static constexpr u32 get_mod() { return mod; } static void ntt(vector &a, bool inverse) { static array w, w_inv; if (w[0] == 0) { Mint root = 2; while(root.pow((mod - 1) / 2) == 1) root += 1; for(int i = 0; i < 30; i++) w[i] = -(root.pow((mod - 1) >> (i + 2))), w_inv[i] = 1 / w[i]; } int n = ssize(a); if (not inverse) { for(int m = n; m >>= 1; ) { Mint ww = 1; for(int s = 0, l = 0; s < n; s += 2 * m) { for(int i = s, j = s + m; i < s + m; i++, j++) { Mint x = a[i], y = a[j] * ww; a[i] = x + y, a[j] = x - y; } ww *= w[__builtin_ctz(++l)]; } } } else { for(int m = 1; m < n; m *= 2) { Mint ww = 1; for(int s = 0, l = 0; s < n; s += 2 * m) { for(int i = s, j = s + m; i < s + m; i++, j++) { Mint x = a[i], y = a[j]; a[i] = x + y, a[j] = (x - y) * ww; } ww *= w_inv[__builtin_ctz(++l)]; } } Mint inv = 1 / Mint(n); for(Mint &x : a) x *= inv; } } static vector conv(vector a, vector b) { int sz = ssize(a) + ssize(b) - 1; int n = bit_ceil((u32)sz); a.resize(n, 0); ntt(a, false); b.resize(n, 0); ntt(b, false); for(int i = 0; i < n; i++) a[i] *= b[i]; ntt(a, true); a.resize(sz); return a; } }; //source: KACTL(https://github.com/kth-competitive-programming/kactl) ull modmul(ull a, ull b, ull M) { ll ret = a * b - M * ull(1.L / M * a * b); return ret + M * (ret < 0) - M * (ret >= (ll)M); } ull modpow(ull b, ull e, ull mod) { ull ans = 1; for (; e; b = modmul(b, b, mod), e /= 2) if (e & 1) ans = modmul(ans, b, mod); return ans; } bool isPrime(ull n) { if (n < 2 || n % 6 % 4 != 1) return (n | 1) == 3; ull A[] = {2, 325, 9375, 28178, 450775, 9780504, 1795265022}, s = __builtin_ctzll(n-1), d = n >> s; for (ull a : A) { // ^ count trailing zeroes ull p = modpow(a%n, d, n), i = s; while (p != 1 && p != n - 1 && a % n && i--) p = modmul(p, p, n); if (p != n-1 && i != s) return 0; } return 1; } ull pollard(ull n) { static mt19937_64 rng(clock); uniform_int_distribution unif(0, n - 1); ull c = 1; auto f = [n, &c](ull x) { return modmul(x, x, n) + c % n; }; ull x = 0, y = 0, t = 30, prd = 2, i = 1, q; while (t++ % 40 || __gcd(prd, n) == 1) { if (x == y) c = unif(rng), x = ++i, y = f(x); if ((q = modmul(prd, max(x,y) - min(x,y), n))) prd = q; x = f(x), y = f(f(y)); } return __gcd(prd, n); } vector factor(ull n) { if (n == 1) return {}; if (isPrime(n)) return {n}; ull x = pollard(n); auto l = factor(x), r = factor(n / x); l.insert(l.end(), r.begin(), r.end()); return l; } //#include "fastFactorize.cpp" ull primitiveRoot(ull p) { auto fac = factor(p - 1); R::sort(fac); fac.resize(unique(fac.begin(), fac.end()) - fac.begin()); auto test = [p, fac](ull x) { for(ull d : fac) if (modpow(x, (p - 1) / d, p) == 1) return false; return true; }; static mt19937_64 rng(clock); uniform_int_distribution unif(1, p - 1); ull root; while(!test(root = unif(rng))); return root; } struct mulConvolution { const int P, root; vector powR, logR; mulConvolution(int _P) : P(_P), root(primitiveRoot(_P)), powR(P - 1), logR(P) { for(int i = 0, tmp = 1; i < P - 1; i++, tmp = (ll)tmp * root % P) powR[i] = tmp, logR[tmp] = i; } template vector transform(vector &f) { assert(ssize(f) == P); vector g(P - 1); for(int i = 1; i < P; i++) g[logR[i]] = f[i]; return g; } template vector invTransform(vector &f) { assert(ssize(f) == P - 1); vector g(P); for(int i = 0; i < P - 1; i++) g[powR[i]] = f[i]; return g; } }; int p; int fac[200000], facInv[200000]; int C(int a, int b) { if (b > a or b < 0) return 0; else return (ll)fac[a] * facInv[b] % p * facInv[a - b] % p; } NTT ntt; signed main() { ios::sync_with_stdio(false), cin.tie(NULL); ll n; cin >> n >> p; fac[0] = 1; for(int i = 1; i < p; i++) fac[i] = (ll)fac[i - 1] * i % p; facInv[p - 1] = modpow(fac[p - 1], p - 2, p); for(int i = p - 2; i >= 0; i--) facInv[i] = (ll)facInv[i + 1] * (i + 1) % p; mulConvolution mu(p); vector f(p - 1); f[0] = 1; while(n) { int nd = n % p; n /= p; vector g(p); for(int i = 0; i < p; i++) g[C(nd, i)] += 1; g = mu.transform(g); f = ntt.conv(f, g); for(int i = p - 1; i < 2 * p - 3; i++) f[i - (p - 1)] += f[i]; f.resize(p - 1); } f = mu.invTransform(f); mint ans = 0; for(int i = 1; i < p; i++) ans += f[i] * i; cout << ans << '\n'; return 0; }