Deep learning flow is like that.
Expressed as a formula, It is D(S(Wx+b),L).
So It is Multinational logistic classification.
The smaller value of the cross entropy function D means close to the correct classification.
So the smaller the value of the sum of D function for all the input and labeling values can be said correct w, b is got good.
Then how to get optimization values of w, b?
That is optimization problem.
Usually using the gradient descent method via a derivative of these problems.
Sorry, my english is not good. ^^