#] "$d_web"'Neural nets/Paper reviews/240621 journal paper review- math only.txt'
http://www.BillHowell.ca/Neural nets/Paper reviews/240621 journal paper review- math only.txt
www.BillHowell.ca  21Jun2024 initial   
# view in text editor, using constant-width font (eg Liberation Mono 10pt), tabWidth = 3

The step-by-step checks below are for : 
p6c1L27 3.3.4 Characterization of α

key [confirmation, comment]s using : 
	$ grep  'ccc\|qqq\|sss\|www'  "$d_web"'Neural nets/Paper reviews/240621 journal paper review- math only.txt'

**************************
C6. MATH [NOTES, CHECKS] : 

These are simply my [retyped notes, confirmations for checking against convention] : 
Overall, the authors' developments look very solid and well-reasoned.  I did NOT spot any [error, omission]s.

The full text file of the math checks can be found at : 
https://www.BillHowell.ca/Neural nets/Paper reviews/240621 paper review- math only.txt
/home/bill/web/Neural nets/Paper reviews/240621 journal paper review- math only.txt

The math notes are too [long, detailed] to be included in the review submission, plus it mainly consists of re-typing the authors' [text, formulae] to ensure that I go though their work in detail.  Key [confirmation, comment]s are included in the peer review submission.  These are indicated below by contiguous lines following [sss, www, qqq, ccc]. 

I do feel that the authors' approach is important enough to look closely.

+-----+
p3c2L13 2.2 LIF Module
classical LIF model to roughly model the equivalent circuit of a neuron:
(1)	τ*d[t: V(t)]  =  -(V(t) - Vreset) + X(t)
In Eq.(1), 
	V(t)		membrane potential of the neuron at time t
	X(t)		presynaptic input at time
	Vreset	represents the reset membrane potential
	τ			time constant
The charge, discharge, and reset equations can be simplified into three distinct processes:
(2)	H(t) = f(V(t-1), X(t))					represents the decay of neuronal potential (leakage?)
(3)	S(t) = Θ(H(t) - Vth)						describes the activation behavior of neurons
(4)	V(t) = Ht(1 - S(t)) + Vreset*S(t)	represents the charging process of neurons
where (including difference approximations due to [discrete, nondifferentiable] problems : 
	H(t)	membrane potential at time t
	S(t)	activation spike at time t, and has the value of 0 or 1
	f(·)	equation relationship of Eq.(1)
	Θ(·)	Heaviside function

sss >> The authors' equations 1-4 are similar standard definitions of the "Leaky Integrate and Fire" (LIF) model. for example : Wulfram Gerstner, Werner Kistler  2002 "Spiking neuon models : Single neurons, populations, plasticity"  Cambridge University Press, 4th printing 2008, 480pp  ISBN 978-0-521-89079-3 paperback  www.cambridge.org/9780521823555


+-----+
p5c1L45 membrane potential dynamics of BAPLIF : 
below.
(5)	H(t) = V(t-1) + 1/τ*(-(V(t-1) - F(t-1)) + X(t))		membrane potential before decay
where : 
	F(t-1)	membrane potential at the previous moment : degree of reset 
	V(t-1)	membrane potential at the previous moment
	X(t)		synaptic input		 at the current  moment
(6)	1/τ  = sigmoid(a)
(7)	S(t) = Θ(H(t) - Vth)						same as standard LIF model
(8)	V(t) = H(t, 1 - S(t)) + F(t)*S(t)	membrane potential composition equation, 
														activation spike at time t
where :
	[τ, Θ(·), Vth, S(t)]	same as standard LIF model (τ not constant here)

>> reviewer: The BAPLIF model (Eq. (5)) is similar to the classical LIF model (Eq (1)), assuming the rough correspondances : 
	LIF						->		BAPLIF
	τ (time constant)		->		τ = 1/sigmoid(a) (time step)
	Heaviside				->		sigmoid 
	Vreset					->		F(t-1) as per sigmoid path rather than Heaviside (trainable)
	τ*d[t: V(t)]			->		τ*H(t)		(here τ = delta(t) = 1 time step)
I find the nomenclature and presentation of this sequence of [term, equation]s to be somewhat [error-prone, ambiguous] (Equations (5) to (12)), but they are OK as-is.

Key adaptations of the standard LIF model to make it more [train, adapt]able for BAPLIF are : 
	Heaviside				->		sigmoid 
	parameters [α, β]
		α		makes BAPLIF more dependent on external data rather than the previous output
		β		dynamic resetting (versus original LIF static resetting)

As stated by the authors : 
p7c1L0  "...   This means that the introduction of dynamic resetting should result in a smaller degree of resetting than the original full dependence on static resetting, allowing the neuron to have a higher firing strength. At the same time, it is important to ensure that the membrane potential after the spike emission is lower than the membrane potential before the spike emission. Therefore, there is H(t-1) > V(t-1).  ..."

So how does this differ from the standard LIF model (or more importantly PLIF)? : 
	it is easier to adapt BAPLIF, and train parameters [α, β] (gradient descent etc) to balance temporal aspects of [[extern, intern]al data,  reset].

p4c1L20  "...   The BAPLIF (Balanced and Parametric LIF Module) was designed to incorporate temporal factors into neurons. This module takes into account the input of temporal factors from the previous moment, which is consistent with the biological behavior of neurons. The module is more aligned with real data and has good biological interpretability.   ..."


+-----+
p5c2L37 3.3.1 Membrane Potential Representations
(9)	V(t) = F(t)*S(t)		 = F(t)			spike is activated, S(t) = 1 
(10)	V(t) = H(t, 1 - S(t)) = H(t)		no spike is activated, S(t) = 0

+-----+
p5c2L37  3.3.2 Composition input at time t
(11)	Xt = α*I(t) + (1 - α)*S(t - 1)
where :
	I(t)		external input part 
	S(t)		cyclic input part
	α			balancing factor 

+-----+
p6c1L0  3.3.3 The level of reset at time t
(12)	F(t) = β*H(t) + (1 - β)*Vreset
where : 
	F(t)		reset level at time t
	H(t)`		dynamic reset part 
	Vreset	fixed reset part, given hyperparameter
	β			balancing factor, which is learnable

sss p6c1L12 "...   Through the above kinetic formula and the elaboration of the input composition and reset degree, it can be seen that the previous LIF neurons did not consider the time factor in the input composition and reset mechanism, which caused the activation degree of the neurons to become rigid, making the activation not very efficient.   ..." >> timely comment

qqq p6c1L0  3.3.3 The level of reset at time t  >> Can the [Vreset, β] lead to "local minima trapping" as is also common with gradient descent?  I guess I'm looking for at least some component of randomness to avoid trapping.


+-----+
p6c1L27 3.3.4 Characterization of α

(13)	∂S(t)/∂α  =  ∂S(t)/∂H(t) * ∂H(t)/∂X(t) * ∂X(t)/∂α		chain rule
(14)	given that Θ(·) is the undifferentiable Heaviside function,   
			a sigmoid function δ is used as an approximation : 
			sigmoid δ(x) =  (1 + exp(-λ*x))^-1  where λ is the neuron gain
			d[dx: δ(x)]  =  δ(x)*(1 - δ(x)) 

+--+
(14a)	∂S(t)/∂H(t)  =  δ′(H(t) - Vth)  =  δ(H(t) - Vth) * (1 - δ(H(t) - Vth))  >  0
check : 
	S(t) =  δ(H(t) - Vth) 
	H(t) = V(t-1) + 1/τ*(-(V(t-1) - F(t-1)) + X(t))  (from (5))
therefore :
	∂S(t)/∂H(t)  =  ∂S(t)/∂δ(H(t) - Vth) * ∂δ(H(t) - Vth)/∂(H(t) - Vth) * ∂(H(t) - Vth)/∂H(t)
where : 
	  ∂S(t)/∂δ(H(t) - Vth)  =  ∂[∂δ(H(t) - Vth): S(t)]   ("flat-line notation": no subscripts) 
	= ∂[∂δ(H(t) - Vth): δ(H(t) - Vth)]
	= 1
and : 
	  ∂δ(H(t) - Vth)/∂(H(t) - Vth)  
	= ∂[∂H(t): δ(H(t) - Vth)]   ("flat-line notation": no subscripts) 
	= δ(H(t) - Vth)*(1 - δ(H(t) - Vth))		(see (14) above)
and : 
	∂(H(t) - Vth)/∂H(t) = 1
combined : 
	   ∂S(t)/∂H(t)  
	=  ∂S(t)/∂δ(H(t)) * ∂δ(H(t))/∂(H(t))						* ∂(H(t) - Vth)/∂H(t)
	=  1					* δ(H(t) - Vth)*(1 - δ(H(t) - Vth))	* 1
	=  δ(H(t) - Vth)*(1 - δ(H(t) - Vth))

sss >> p6c1L41 OK correct: step-by-step check of Eq (14a)

+--+
(14b)	∂H(t)/∂X(t)  =  1/τ  >  0
check : 
	H(t) = V(t-1) + 1/τ*(-(V(t-1) - F(t-1)) + X(t))  (from (5))
therefore this is straightforward :
	∂H(t)/∂X(t)  =  1/τ
>> OK, correct, and as τ is always positive, so is the result

sss >> p6c1L45 OK correct: step-by-step check of Eq (14c)

+--+
(14c)	∂X(t)/∂α     =  I(t) - S(t-1)  = I(t) or I(t - 1)		current external inputs
					sign of ∂X/∂α is certain regardless of the neuron’s previous discharge
check : 
	X(t)		synaptic input		 at the current  moment
	Xt = α*I(t) + (1 - α)*S(t - 1)		(from (11))
where :
	I(t)		external input part 
	S(t) =   δ(H(t) - Vth)		(from (7))	cyclic input part
	α			balancing factor 
therefore this is straightforward :
	∂X(t)/∂α  
	=  I(t) - S(t - 1)
	=  I(t) - δ(H(t) - Vth)
	=  I(t) 		if no spike is activated
	=  I(t - 1)	if spike is activated previous timestep

sss >> p6c1L47 OK correct: step-by-step check of Eq (14c)


+--+
p6c2L5  α is updated solely based on ∂Loss ∂St which represents the discharge at t the current moment : 

(15)	α  =  α - η*∂Loss/∂S(t)*∂S(t)/∂α

>> Reviewer: I did NOT re-derive Eq (15) step-by-step, as I did with 14[a, b, c].  

sss p6c2L5  "...   At this timestep, the value of α is updated solely based on ∂Loss ∂St which represents the discharge at t the current moment. If 0 ≤ It ≤ 1, the sign of ∂X ∂α will be influenced by the neuron discharge at the previous moment. This means that the direction of the update of α at this time is not only related to the discharge at the current moment, but also to the discharge at the previous moment. From this perspective, BAPLIF can implicitly synthesize neighboring time node neuron discharges to update the parameter α, which can better handle the temporal characteristics of the data.   ..."  >> helpful comment


+-----+
p6c2L19 3.3.5 Characterization of β

(16)	∂St/∂β  =  ∂St/∂Ht * ∂Ht/∂V(t-1) * ∂V(t-1)/∂F(t-1) * ∂F(t-1)/∂β

+--+
(17)	∂Ht/∂V(t-1)  =  1 - 1/τ  = 1 - sigmoid(a) > 0
(18)	∂V(t-1)/∂F(t-1)  =  S(t-1)  ≥  0
(19)	∂F(t-1)/∂β  =  H(t-1) - Vreset
>> Reviewer: I did NOT re-derive these step-by-step, as I did with 14[a, b, c].  Equation [17, 18, 19] look very [simple, similar] in approach, and appear to be correct. 

+--+
p7c2L48  Focus t
on the case of S(t) = 1. It is evident that ∂S/∂β's positivity and negativity will be entirely determined by ∂F(t-1)/∂β.  As the discharge was carried out in the previous moment, one can derive Eq. (20) from the membrane potential kinetics Eq.(10) provided by BAPLIF.

(20)	V(t-1) = F(t-1) = β*H(t-1) + (1 - β)*Vreset

+--+
p7c1L0  Note that if the neuron discharges at time t - 1, then H(t-1) > Vth > Vreset must be fulfilled. This is because once Vth ≤ Vreset holds, it means that the reset degree exceeds the neuron’s spike firing threshold, and any neuron can discharge at any moment, which is not common sense. Therefore, we can obtain V(t-1) > β*Vreset + (1 - β)Vreset = Vreset. This means that the introduction of dynamic resetting should result in a smaller degree of resetting than the original full dependence on static resetting, allowing the neuron to have a higher firing strength. At the same time, it is important to ensure that the membrane potential after the spike emission is lower than the membrane potential before the spike emission. Therefore, there is H(t-1) > V(t-1).

From the information provided, it is possible to derive Eq.(21).

(21)	∂F(t-1)/∂β  =  H(t-1) - Vreset  >   V(t-1) - Vreset  >  0 

Thus, if the neuron fired a spike in the previous moment (∂S/∂β > 0), the parameter update Eq.(22) for β can be obtained.

(22)	β  =  β - η*∂Loss/∂S(t) * ∂S(t)/∂β

The update direction of β is solely related to ∂Loss/∂St , indicating that it only considers whether the neuron is currently activating the spike at time t or not.

+-----+
p7c2L0  Section 3.4 Graph Probability Sampling (GPS) 
The feature similarity between node i and node j can be calculated using the Gaussian radial basis kernel function, as shown in Eq.(23). This calculation requires the feature vectors hi and hj.

(23)	l(i,j)  =  exp( -||hi - hj||^2 / (2*σ^2) )

As a result, an N-order matrix L can be obtained, which represents the affinity matrix of features between all nodes in the graph, and we prefer that the central node chooses neighbors that are not similar to its features, introducing an evaluation matrix C.

(24)	C  =  1 - L

To make the results correctable for the sampling model, the introduction of the learnable param- eter, the weight mapping matrix W, is followed by a layer of sigmoid function so that all elements of the matrix fall between [0, 1], and the resulting matrix is denoted as T

(25)	T  =  Sigmoid(W⊙C)

Meanwhile, it is necessary to perform a mask operation on T to obtain the sampling probability matrix P to avoid sampling to non-neighboring nodes.

(26)	P  =  Adj⊙T

where L: 
	⊙		element-by-element multiplication
	Adj	adjacency matrix. 

Algorithm Table 1 gives the specific pro cedure for GPS to sample neighboring nodes by the elements of the probability matrix.

+--+
Algorithm 1 Graph Probability Sampling Algorithm

Input:	Temporal graph G = G1, G2, ..., GT ; Time step T ; 
			Probability matrix P ; Vertex set V ; Neighbor of set V is N ;
Output:	Sampling sequence S(v,t) , ∀v ∈ V, t = {1, ..., T};
1: for t = 1 to T do
2: 	for v ∈ V do
3: 		The sampling probability p(t,v) is 
4: 		given by the probability matrix
5: 		if p(t,v) ≤ random(0, 1) then
6:				S(v,t) ← S(v,t)-1 ∪ {n}, n ∈ Ni
7:			else
8:				S(v,t) ← S(v,t)-1
9:			end if
10:	end for
11: end for
12: return The sampling sequence of length T S(V,T)
+--+


# enddoc