Language Models are Few-Shot LearnersRecent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fiarxiv.orgAbstractLanguage model의 사이즈를 키우는 것(scaling up)이 과제에 유연한 few-shot 성능을 크게 상승시킴gradient update나 fin..