machine learning

Can Transformers Solve Everything? | Harys Dalvi

October 2, 2024

Looking into the math and the data reveals that transformers are both overused and underused.

Transformers are best known for their applications in natural language processing. They were originally designed for translating between languages,[1] and are now most famous for their use in large language models like ChatGPT (generative pretrained transformer).

But since their introduction, transformers have been applied to ever more tasks, with great results. These include image recognition,[2] reinforcement learning,[3] and even weather prediction.[4]

Even the seemingly specific task of language generation with transformers has a number of surprises, as we’ve already seen. Large language models have emergent properties that feel more intelligent than just predicting the next word. For example, they may know various facts about the world, or replicate nuances of a person’s style of speech.

The success of transformers has made some people ask the question of whether transformers can do everything. If transformers generalize to so many tasks, is there any reason not to use a transformer?

Clearly, there is still a case for other machine learning models and, as is often forgotten these days, non-machine learning models and human intellect. But transformers do have a number of unique properties, and have shown incredible results so far. There is also a considerable mathematical and empirical basis…