CptPicard

December 6th, 2008, 09:08 PM

Has anyone here been playing with backprop? I have been puzzling over an adaptation of the procedure for a long time now and am feeling a little insecure in my understanding...

Now, in traditional backpropagation we work with an error function, where the output unit gets a local relative error computed which is then pushed back over the network and divided over the links as we go. So far so good; we get a gradient for minimizing our error function.

Now, I also come across these claims that backprop lets us also compute a gradient for the actual value of the function represented by the NN. Now, all the implementations or accurate descriptions of the algorithm always work with the error, and it seems like this is left "as an exercise for the reader". :)

In particular, see here: http://www.cs.ualberta.ca/~sutton/book/11/node2.html

http://www.cs.ualberta.ca/~sutton/book/11/img27.gif

The gradient in this equation can be computed efficiently by the backpropagation procedure.

Umm... ok. So how do we do that? :)

So far I've been working under the assumption that what I just do is that I try to replace the error with the value, so essentially the "local error" becomes 1 at the output node, and then this "effect" is pushed back analogously to the usual backprop algorithm... am I totally mistaken or is this how it's supposed to be done?

Now, in traditional backpropagation we work with an error function, where the output unit gets a local relative error computed which is then pushed back over the network and divided over the links as we go. So far so good; we get a gradient for minimizing our error function.

Now, I also come across these claims that backprop lets us also compute a gradient for the actual value of the function represented by the NN. Now, all the implementations or accurate descriptions of the algorithm always work with the error, and it seems like this is left "as an exercise for the reader". :)

In particular, see here: http://www.cs.ualberta.ca/~sutton/book/11/node2.html

http://www.cs.ualberta.ca/~sutton/book/11/img27.gif

The gradient in this equation can be computed efficiently by the backpropagation procedure.

Umm... ok. So how do we do that? :)

So far I've been working under the assumption that what I just do is that I try to replace the error with the value, so essentially the "local error" becomes 1 at the output node, and then this "effect" is pushed back analogously to the usual backprop algorithm... am I totally mistaken or is this how it's supposed to be done?