Theory§
KANs§
KANs are based on Kolmogorov-Arnold representation theorem, which asserts that any continuous multivariate function \(f(x_1, x_2, \dots, x_n)\) can be expressed as a composition of univariate functions: \(f(x_1, x_2, \dots, x_n) = \sum_{q=1}^{2n+1} \Phi_q \left( \sum_{p=1}^n \phi_{q,p}(x_p) \right)\), where \(\Phi_q\) and \(\phi_{q,p}\) are continuous functions. In practice, KANs implement this by replacing traditional linear layers in neural networks with layers that approximate complex functions using adaptive basis functions.
Basis Functions§
We employ three types of basis functions to approximate the univariate functions \(\phi_{q,p}(x)\):
Fourier Series: Represent functions as sums of sines and cosines, \(\phi(x) = a_0 + \sum_{k=1}^\infty (a_k \cos(kx) + b_k \sin(kx))\). These are ideal for periodic or oscillatory data, efficiently capturing repeating patterns in time series.
Splines: Use piecewise polynomials, \(\phi(x) = \sum_{i} c_i B_i(x)\), where \(B_i(x)\) are B-spline basis functions defined over a grid. Splines excel at fitting smooth, non-periodic data with local adaptability.
Chebyshev Polynomials: Approximate functions via \(\phi(x) = \sum_{k=0}^n c_k T_k(x)\), with \(T_k(x) = \cos(k \arccos(x))\) as Chebyshev polynomials. They offer robust approximation for data with varying scales, leveraging orthogonality properties.
Architecture§
Recurrent Neural Networks (RNNs) process sequential data by maintaining a hidden state \(h_t\) that evolves over time. Standard LSTM and GRU architectures mitigate vanishing gradient issues in vanilla RNNs. For LSTM, the state update involves a cell state \(c_t\) and gates: \(f_t = \sigma(W_f x_t + U_f h_{t-1} + b_f)\), \(i_t = \sigma(W_i x_t + U_i h_{t-1} + b_i)\), and \(o_t = \sigma(W_o x_t + U_o h_{t-1} + b_o)\), with \(c_t = f_t \odot c_{t-1} + i_t \odot \tanh(W_c x_t + U_c h_{t-1} + b_c)\) and \(h_t = o_t \odot \tanh(c_t)\). GRU simplifies this with update and reset gates: \(z_t = \sigma(W_z x_t + U_z h_{t-1} + b_z)\), \(r_t = \sigma(W_r x_t + U_r h_{t-1} + b_r)\), and \(\tilde{h}_t = \tanh(W_h x_t + U_h (r_t \odot h_{t-1}) + b_h)\), yielding \(h_t = (1 - z_t) \odot h_{t-1} + z_t \odot \tilde{h}_t\).
In TimeKAN, KAN layers enhance these recurrent cells by replacing key linear transformations with nonlinear approximations:
tKANLSTM: KAN layers substitute the output gate computation. The aggregated input \(\text{agg_input} = W_x x_t + W_h h_{t-1}\) is processed by KAN layers, yielding \(o_t = \sigma(\text{KAN}(\text{agg_input}))\)
tKANGRU: KAN layers compute the candidate hidden state, transforming \(\text{agg_input} = W_x x_t + W_h (r_t \odot h_{t-1})\) into \(\tilde{h}_t = \tanh(\text{KAN}(\text{agg_input}))\)