Hyper-Connections (HC) make the usual single shortcut in neural networks wider by creating several parallel streams and letting the model mix them, but this can become unstable when stacked deep.
Big AI models used to get better by getting wider or reading longer texts, but those tricks are slowing down.
The paper fixes a stability problem in Hyper-Connections (HC) by gently steering the network’s mixing matrix onto a safe shape (a manifold) where signals don’t blow up or vanish.