Beyond heuristics: take token weighting as optimization, not guesswork. Improving both in-domain accuracy and out-of-domain generalization. Serves as a more effective initialization for subsequent RL.