Implementation details of Backpropagation in Siamese networks. [D]

Reddit r/MachineLearning / 4/13/2026

💬 OpinionIdeas & Deep AnalysisModels & Research

Key Points

  • A Reddit thread asks how to correctly implement backpropagation for Siamese networks, noting that the original 1993 paper lacks detailed guidance for training procedures.
  • The question compares two training approaches: running inputs sequentially and updating weights after computing loss on the last pair versus processing two inputs in parallel like a bi-encoder and backpropagating jointly.
  • It highlights a core concern about whether and how gradients should be shared across the twin networks that must use the same parameters.
  • The discussion is framed as clarification on the correct training loop rather than proposing a new method or reporting a newly released system.

Hey Folks,
Could someone please share correct implementation of backprop in siamese networks? The explanation on the original paper is not super detailed.

I found this random implementation on github, ref. The inputs are passed one after the other, loss is computed for the last two inputs and the weight is updated after. Is this the correct implementation?

Another implementation I could think of is to have two copies of same network like Bi-encoder. Two inputs are passed simultaneously, loss is backprop'd and weights are updated for both the networks, and both network weights are replaced with aggregate(mean) of both networks before next forward pass.

Which one is correct?
Please clarify.

submitted by /u/red_dhinesh_it
[link] [comments]