Fine tuned model for a specific task would be less good at the previous
generic task on which it was pre-trained.
This is called catastrophic forgetting.
To retain the earlier learning you would want to revision by feeding few
samples from previous task along with samples from our new specific task.
Are p-values really that valuables?
In hypothesis testing (I do get confused everytime) based on value of p
we may either reject the null hypothesis (Ho) else we fail to reject Ho.
Ho is never truly accepted.
Inference itself is hard, statistical inference is much harder. All you
can do is remove all other possibilities and finally conclude this is right
inference.
Here is a statement from American Statistical Association. Statement
Frequentist vs Bayesian Statistical Approaches:
Comparison of frequentist vs Bayesian method for probability. Good Read.
Three ways of looking at probability:
Frequency of event happening in long run (Frequentist)
Degree of belief of event happening (Bayesian)
Extension of logical probability (Bayesian)
Same data can be viewed with both persepectives. Frequentist approach cannot
assign probability to events which are non repeatable while bayesian approach can.
(For example probability trump wins which is not repeatable for many times)
Frequentist approach gives maximum likely hood estimate with confidence interval.
But applying it just based on p-value and significance is wrong approach.
Best test when we are doing frequentist NHST is to taking Ho to be contrary
to earlier Ho and see if it still holds.
In most of 20th century frequentist and NHST (Null Hypothesis Significance Test)
was used everywhere, but wrongly applied.
In recent 15-20yrs people have realised advantage of Bayesian approach and renoved
practitioner of this approach blame on quality of research which just relies on NHST
and p-value for significance.
Bayesian approach takes into consideration other factors (their prior probability) + data
to comment on posterior probablity of parameter.
$$P(^\theta/_{Data}) = \frac {P(^{Data}/ _\theta) \cdot P(\theta)} {P(Data)} $$
$$P(^\theta/_{Data}) = Posterior Probabiliy $$
$$P(^{Data}/ _\theta) = Likely hood $$
$$P(\theta) = Prior probability $$
$$P(Data) = Evidence $$
Classical vs Quantum physics:
I view frequentist vs bayesian to be similar to classical vs quantum physics.
Classical physics applies and scales well at large scale objects.
While quantum physics is more powerfull and generic. Classical physics is just
a special and is derivable from quantum physics.
As quantum physics gives probability of finding electron at different places, similarly
Bayesian approach gives posterior probability of parameter given its prior and evidence.