Aug 05, 2020
Stanford CS330: Multi-Task and Meta-Learning, 2019 | Lecture 2 - Multi-Task & Meta-Learning Basics
Notes:
- while training a Classification NN, we will have learnt theta min for L(theta, f, D) is minimum
- we have final layer where P(y/x) is predicted
- If we compute P(a/y) at previous layers and reduce n/w computations
- How much back we can come in layers?
- Will it actually reduce computation and optimises state of the art NN?
When to apply?
- May be you have conflicting objectives(tasks) and wanna optimise for all/one
- Eg: Youtube recommendation
Generative VS Descriminative:
Generative | Descrimptive |
---|---|
argmax theta - P(theta/Data) | argmin theta - L(theta, Data) |
Paragraph | Text |