This kind of work has much more of a problem with finite training data than language models do - there are about 150,000 known protein structures, though a lot of them are very close homologues, often ...