Attention (is not) all you need. An alternative approach to the… | by Josh Taylor

An alternative approach to the transformer model for text generation

Can fractal patterns help us to create a more efficient text generation model? Photo by Giulia May on Unsplash

Since the release of ChatGPT at the end of November 2022, LLMs (Large Language Models) have, almost, become a household name.

Worldwide search interest for ‘LLM’. Source: Google Trends

There is good reason for this; their success lies in their architecture, particularly the attention mechanism. It allows the model to compare every word they process to every other word.

This gives LLMs the extraordinary capabilities in understanding and generating human-like text that we are all familiar with.

However, these models are not without flaws. They demand immense computational resources to train. For example, Meta’s Llama 3 model took 7.7 million GPU hours of training[1]. Moreover, their reliance on enormous datasets — spanning trillions of tokens — raises questions about scalability, accessibility, and environmental impact.

Despite these challenges, ever since the paper ‘Attention is all you need’ in mid 2017, much of the recent progress in AI has focused on scaling attention mechanisms further, rather than exploring fundamentally new architectures.

Attention (is not) all you need. An alternative approach to the… | by Josh Taylor | Nov, 2024

An alternative approach to the transformer model for text generation