September 1, 2020

Fine tuned model for a specific task would be less good at the previous generic task on which it was pre-trained.
This is called catastrophic forgetting.
To retain the earlier learning you would want to revision by feeding few samples from previous task along with samples from our new specific task.

In hypothesis testing (I do get confused everytime) based on value of p we may either reject the null hypothesis (Ho) else we fail to reject Ho. Ho is never truly accepted.
Inference itself is hard, statistical inference is much harder. All you can do is remove all other possibilities and finally conclude this is right inference.
Here is a statement from American Statistical Association. Statement

Comparison of frequentist vs Bayesian method for probability. Good Read.
Three ways of looking at probability:
- Frequency of event happening in long run (Frequentist)
- Degree of belief of event happening (Bayesian)
- Extension of logical probability (Bayesian)
Same data can be viewed with both persepectives. Frequentist approach cannot assign probability to events which are non repeatable while bayesian approach can. (For example probability trump wins which is not repeatable for many times)
Frequentist approach gives maximum likely hood estimate with confidence interval.
But applying it just based on p-value and significance is wrong approach.
Best test when we are doing frequentist NHST is to taking Ho to be contrary to earlier Ho and see if it still holds.
In most of 20th century frequentist and NHST (Null Hypothesis Significance Test) was used everywhere, but wrongly applied.
In recent 15-20yrs people have realised advantage of Bayesian approach and renoved practitioner of this approach blame on quality of research which just relies on NHST and p-value for significance.
Bayesian approach takes into consideration other factors (their prior probability) + data to comment on posterior probablity of parameter. $$P(^\theta/_{Data}) = \frac {P(^{Data}/ _\theta) \cdot P(\theta)} {P(Data)} $$
- $$P(^\theta/_{Data}) = Posterior Probabiliy $$
- $$P(^{Data}/ _\theta) = Likely hood $$
- $$P(\theta) = Prior probability $$
- $$P(Data) = Evidence $$

I view frequentist vs bayesian to be similar to classical vs quantum physics.
Classical physics applies and scales well at large scale objects.
While quantum physics is more powerfull and generic. Classical physics is just a special and is derivable from quantum physics.
As quantum physics gives probability of finding electron at different places, similarly Bayesian approach gives posterior probability of parameter given its prior and evidence.