#include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #define int ll #define INT128_MAX (__int128)(((unsigned __int128) 1 << ((sizeof(__int128) * __CHAR_BIT__) - 1)) - 1) #define INT128_MIN (-INT128_MAX - 1) #define pb push_back #define eb emplace_back #define clock chrono::steady_clock::now().time_since_epoch().count() using namespace std; template ostream& print_tuple(ostream& os, const tuple tu) { os << get(tu); if constexpr (I + 1 != sizeof...(args)) { os << ' '; print_tuple(os, tu); } return os; } template ostream& operator<<(ostream& os, const tuple tu) { return print_tuple(os, tu); } template ostream& operator<<(ostream& os, const pair pr) { return os << pr.first << ' ' << pr.second; } template ostream& operator<<(ostream& os, const array &arr) { for(size_t i = 0; T x : arr) { os << x; if (++i != N) os << ' '; } return os; } template ostream& operator<<(ostream& os, const vector &vec) { for(size_t i = 0; T x : vec) { os << x; if (++i != size(vec)) os << ' '; } return os; } template ostream& operator<<(ostream& os, const set &s) { for(size_t i = 0; T x : s) { os << x; if (++i != size(s)) os << ' '; } return os; } template ostream& operator<<(ostream& os, const multiset &s) { for(size_t i = 0; T x : s) { os << x; if (++i != size(s)) os << ' '; } return os; } template ostream& operator<<(ostream& os, const map &m) { for(size_t i = 0; pair x : m) { os << x.first << " : " << x.second; if (++i != size(m)) os << ", "; } return os; } #ifdef DEBUG #define dbg(...) cerr << '(', _do(#__VA_ARGS__), cerr << ") = ", _do2(__VA_ARGS__) template void _do(T &&x) { cerr << x; } template void _do(T &&x, S&&...y) { cerr << x << ", "; _do(y...); } template void _do2(T &&x) { cerr << x << endl; } template void _do2(T &&x, S&&...y) { cerr << x << ", "; _do2(y...); } #else #define dbg(...) #endif using ll = long long; using ull = unsigned long long; using ldb = long double; using pii = pair; using pll = pair; //#define double ldb template using vc = vector; template using vvc = vc>; template using vvvc = vc>; using vi = vc; using vll = vc; using vvi = vvc; using vvll = vvc; template using min_heap = priority_queue, greater>; template using max_heap = priority_queue; template concept R_invocable = requires(F&& f, Args&&... args) { { std::invoke(std::forward(f), std::forward(args)...) } -> std::same_as; }; template, typename F> requires R_invocable void pSum(rng &&v, F f) { if (!v.empty()) for(T p = *v.begin(); T &x : v | views::drop(1)) x = p = f(p, x); } template> void pSum(rng &&v) { if (!v.empty()) for(T p = *v.begin(); T &x : v | views::drop(1)) x = p = p + x; } template void Unique(rng &v) { ranges::sort(v); v.resize(unique(v.begin(), v.end()) - v.begin()); } template rng invPerm(rng p) { rng ret = p; for(int i = 0; i < ssize(p); i++) ret[p[i]] = i; return ret; } template vi argSort(rng p) { vi id(size(p)); iota(id.begin(), id.end(), 0); ranges::sort(id, {}, [&](int i) { return pair(p[i], i); }); return id; } template, typename F> requires invocable vi argSort(rng p, F proj) { vi id(size(p)); iota(id.begin(), id.end(), 0); ranges::sort(id, {}, [&](int i) { return pair(proj(p[i]), i); }); return id; } template vvi read_graph(int n, int m, int base) { vvi g(n); for(int i = 0; i < m; i++) { int u, v; cin >> u >> v; u -= base, v -= base; g[u].emplace_back(v); if constexpr (!directed) g[v].emplace_back(u); } return g; } template vvi adjacency_list(int n, vc e, int base) { vvi g(n); for(auto [u, v] : e) { u -= base, v -= base; g[u].emplace_back(v); if constexpr (!directed) g[v].emplace_back(u); } return g; } template vc equal_subarrays(vc &v) { vc lr; for(int i = 0, j = 0; i < ssize(v); i = j) { while(j < ssize(v) and v[i] == v[j]) j++; lr.eb(i, j); } return lr; } template requires invocable vc equal_subarrays(vc &v, F proj) { vc lr; for(int i = 0, j = 0; i < ssize(v); i = j) { while(j < ssize(v) and proj(v[i]) == proj(v[j])) j++; lr.eb(i, j); } return lr; } template void setBit(T &msk, int bit, bool x) { (msk &= ~(T(1) << bit)) |= T(x) << bit; } template void onBit(T &msk, int bit) { setBit(msk, bit, true); } template void offBit(T &msk, int bit) { setBit(msk, bit, false); } template void flipBit(T &msk, int bit) { msk ^= T(1) << bit; } template bool getBit(T msk, int bit) { return msk >> bit & T(1); } template T floorDiv(T a, T b) { if (b < 0) a *= -1, b *= -1; return a >= 0 ? a / b : (a - b + 1) / b; } template T ceilDiv(T a, T b) { if (b < 0) a *= -1, b *= -1; return a >= 0 ? (a + b - 1) / b : a / b; } template bool chmin(T &a, T b) { return a > b ? a = b, 1 : 0; } template bool chmax(T &a, T b) { return a < b ? a = b, 1 : 0; } //reference: https://github.com/NyaanNyaan/library/blob/master/modint/montgomery-modint.hpp#L10 //note: mod should be an odd prime less than 2^30. template struct MontgomeryModInt { using mint = MontgomeryModInt; using i32 = int32_t; using u32 = uint32_t; using u64 = uint64_t; static constexpr u32 get_r() { u32 res = 1, base = mod; for(i32 i = 0; i < 31; i++) res *= base, base *= base; return -res; } static constexpr u32 get_mod() { return mod; } static constexpr u32 n2 = -u64(mod) % mod; //2^64 % mod static constexpr u32 r = get_r(); //-P^{-1} % 2^32 u32 a; static u32 reduce(const u64 &b) { return (b + u64(u32(b) * r) * mod) >> 32; } static u32 transform(const u64 &b) { return reduce(u64(b) * n2); } MontgomeryModInt() : a(0) {} MontgomeryModInt(const int64_t &b) : a(transform(b % mod + mod)) {} mint pow(u64 k) const { mint res(1), base(*this); while(k) { if (k & 1) res *= base; base *= base, k >>= 1; } return res; } mint inverse() const { return (*this).pow(mod - 2); } u32 get() const { u32 res = reduce(a); return res >= mod ? res - mod : res; } mint& operator+=(const mint &b) { if (i32(a += b.a - 2 * mod) < 0) a += 2 * mod; return *this; } mint& operator-=(const mint &b) { if (i32(a -= b.a) < 0) a += 2 * mod; return *this; } mint& operator*=(const mint &b) { a = reduce(u64(a) * b.a); return *this; } mint& operator/=(const mint &b) { a = reduce(u64(a) * b.inverse().a); return *this; } mint operator-() { return mint() - mint(*this); } bool operator==(mint b) const { return (a >= mod ? a - mod : a) == (b.a >= mod ? b.a - mod : b.a); } bool operator!=(mint b) const { return (a >= mod ? a - mod : a) != (b.a >= mod ? b.a - mod : b.a); } friend mint operator+(mint c, mint d) { return c += d; } friend mint operator-(mint c, mint d) { return c -= d; } friend mint operator*(mint c, mint d) { return c *= d; } friend mint operator/(mint c, mint d) { return c /= d; } friend ostream& operator<<(ostream& os, const mint& b) { return os << b.get(); } friend istream& operator>>(istream& is, mint& b) { int64_t val; is >> val; b = mint(val); return is; } }; //using mint = MontgomeryModInt<1'000'000'007>; using mint = MontgomeryModInt<998'244'353>; //reference: https://judge.yosupo.jp/submission/69896 //remark: MOD = 2^K * C + 1, R is a primitive root modulo MOD //remark: a.size() <= 2^K must be satisfied //some common modulo: 998244353 = 2^23 * 119 + 1, R = 3 // 469762049 = 2^26 * 7 + 1, R = 3 // 1224736769 = 2^24 * 73 + 1, R = 3 template> struct NTT { using u32 = uint32_t; static constexpr u32 mod = (1 << k) * c + 1; static constexpr u32 get_mod() { return mod; } static void ntt(vector &a, bool inverse) { static array w, w_inv; if (w[0] == 0) { Mint root = 2; while(root.pow((mod - 1) / 2) == 1) root += 1; for(int i = 0; i < 30; i++) w[i] = -(root.pow((mod - 1) >> (i + 2))), w_inv[i] = 1 / w[i]; } int n = ssize(a); if (not inverse) { for(int m = n; m >>= 1; ) { Mint ww = 1; for(int s = 0, l = 0; s < n; s += 2 * m) { for(int i = s, j = s + m; i < s + m; i++, j++) { Mint x = a[i], y = a[j] * ww; a[i] = x + y, a[j] = x - y; } ww *= w[__builtin_ctz(++l)]; } } } else { for(int m = 1; m < n; m *= 2) { Mint ww = 1; for(int s = 0, l = 0; s < n; s += 2 * m) { for(int i = s, j = s + m; i < s + m; i++, j++) { Mint x = a[i], y = a[j]; a[i] = x + y, a[j] = (x - y) * ww; } ww *= w_inv[__builtin_ctz(++l)]; } } Mint inv = 1 / Mint(n); for(Mint &x : a) x *= inv; } } static vector conv(vector a, vector b) { int sz = ssize(a) + ssize(b) - 1; int n = bit_ceil((u32)sz); a.resize(n, 0); ntt(a, false); b.resize(n, 0); ntt(b, false); for(int i = 0; i < n; i++) a[i] *= b[i]; ntt(a, true); a.resize(sz); return a; } }; NTT ntt; //#include template struct binomial { vector _fac, _facInv; binomial(int size) : _fac(size), _facInv(size) { assert(size <= (int)Mint::get_mod()); _fac[0] = 1; for(int i = 1; i < size; i++) _fac[i] = _fac[i - 1] * i; if (size > 0) _facInv.back() = 1 / _fac.back(); for(int i = size - 2; i >= 0; i--) _facInv[i] = _facInv[i + 1] * (i + 1); } Mint fac(int i) { return i < 0 ? 0 : _fac[i]; } Mint faci(int i) { return i < 0 ? 0 : _facInv[i]; } Mint inv(int i) { return _facInv[i] * _fac[i - 1]; } Mint binom(int n, int r) { return r < 0 or n < r ? 0 : fac(n) * faci(r) * faci(n - r); } Mint catalan(int i) { return binom(2 * i, i) - binom(2 * i, i + 1); } Mint excatalan(int n, int m, int k) { //(+1) * n, (-1) * m, prefix sum > -k if (k > m) return binom(n + m, m); else if (k > m - n) return binom(n + m, m) - binom(n + m, m - k); else return Mint(0); } }; binomial bn(1 << 20); signed main() { ios::sync_with_stdio(false), cin.tie(NULL); int n, m; cin >> n >> m; vc in_a(n + 1, false); while(m--) { int x; cin >> x; in_a[x] = true; } vc G(n + 1); for(int i = 1; i <= n; i++) G[i] = (i + 1) * bn.fac(i); dbg(G); mint ans = 0; for(int c : {-2, 0}) { vc F(n + 1); auto dc = [&](int l, int r, auto &self) -> void { if (l + 1 == r) { if (!in_a[l]) F[l] = 0; else F[l] += bn.fac(l), F[l] *= c; return; } int mid = (l + r) / 2; self(l, mid, self); { vc F2(F.begin() + l, F.begin() + mid); vc G2(G.begin(), G.begin() + min(r - l, (int)G.size())); auto H = ntt.conv(F2, G2); for(int i = mid - l; i < ssize(H) and i + l < r; i++) F[i + l] += H[i]; } self(mid, r, self); }; dc(0, n + 1, dc); dbg(c, F); ans += bn.fac(n); for(int i = 1; i <= n; i++) ans += F[i] * (n - i + 1) * bn.fac(n - i); } cout << ans / 2 << '\n'; return 0; }