GTD’s convergence is possible because the O.D.E is stablized by $A^TA$ instead of the problematic $A$ in TD for off-policy learning.
Was “NEU” new in what sense?