Discussion about this post

User's avatar
Tom Dietterich's avatar

I like this analysis. The story in ImpossibleDistillation is a bit more complex than simple generate-and-filter. ImpossibleDistillation is combining information from four models:

1. The initial generation model L_LM

2. The keyword extraction model (KeyBERT)

3. The Natural Language Inference model (RoBERTa-Large fine-tuned on WANLI)

4. The model L_0 that is fine-tuned (T5-large)

Items 2 and 3 form part of the filter chain which includes additional encoded knowledge defining the task (summarization vs. paraphrase, compression, etc.). Viewed abstractly, this shows that we can use initial LMs in both the generation and filtering phases combined with additional human input to create a target system that is better constrained to the specific task.

We are paying for the "free lunch" in many ways (training corpora for the various models, human-provided task definition, etc.). If we iterate the process, the gains will diminish once we have fully incorporated the constraints from the various input models into the target model.

Expand full comment
Byndnglsh's avatar

Although your weekly summaries are essential reading, this analysis is among the best I've read in ages. Keep up the fantastic work!

Expand full comment
5 more comments...

No posts