📢 Exclusive on Gate Square — #PROVE Creative Contest# is Now Live!
CandyDrop × Succinct (PROVE) — Trade to share 200,000 PROVE 👉 https://www.gate.com/announcements/article/46469
Futures Lucky Draw Challenge: Guaranteed 1 PROVE Airdrop per User 👉 https://www.gate.com/announcements/article/46491
🎁 Endless creativity · Rewards keep coming — Post to share 300 PROVE!
📅 Event PeriodAugust 12, 2025, 04:00 – August 17, 2025, 16:00 UTC
📌 How to Participate
1.Publish original content on Gate Square related to PROVE or the above activities (minimum 100 words; any format: analysis, tutorial, creativ
The sixth anniversary of Transformer: Even the NeurIPS Oral was not obtained, and 8 authors have founded several AI unicorns
Today marks the sixth anniversary of the submission of the famous transformer paper.
Six years ago, a paper with an exaggerated name was uploaded to the preprinted paper platform arXiv. The phrase "xx is All You Need" was repeated by developers in the AI field, and it even became a trend in the title of the paper. , and Transformer is no longer the meaning of Transformers, it now represents the most advanced technology in the field of AI.
Six years later, looking back at this paper that year, we can find many interesting or little-known places, as summarized by Jim Fan, an AI scientist at Nvidia.
The Transformer model abandons the traditional CNN and RNN units, and the entire network structure is composed entirely of attention mechanisms.
Although the name of the Transformer paper is "Attention is All You Need", we continue to promote the attention mechanism because of it, but please note an interesting fact: it is not the researchers of Transformer who invented attention, but they put this The mechanism is pushed to the extreme.
The Attention Mechanism was proposed by a team led by deep learning pioneer Yoshua Bengio in 2014:
In this ICLR 2015 paper, Bengio et al. proposed a combination of RNN + "context vector" (ie attention). Although it is one of the greatest milestones in the NLP field, it is much less well-known than Transformer, Bengio's team's paper has been cited 29,000 times to date, and Transformer has 77,000.
If you don't let go of any local information, you will inevitably do a lot of useless work, which is not conducive to survival. Likewise, introducing similar mechanisms in deep learning networks can simplify models and speed up computation. In essence, Attention is to filter out a small amount of important information from a large amount of information, and focus on these important information, ignoring most of the unimportant information.
In recent years, the attention mechanism has been widely used in various fields of deep learning, such as in computer vision for capturing receptive fields on images, or in NLP for locating key tokens or features. A large number of experiments have proved that the model with the attention mechanism has achieved significant performance improvements in tasks such as image classification, segmentation, tracking, enhancement, and natural language recognition, understanding, question answering, and translation.
The Transformer model that introduces the attention mechanism can be regarded as a general-purpose sequence computer. The attention mechanism allows the model to assign different attention weights according to the correlation of different positions in the sequence when processing the input sequence. It enables Transformer to capture long-distance dependencies and context information, thereby improving the effect of sequence processing.
But in that year, neither the Transformer nor the original attention paper talked about a general-purpose sequential computer. Instead, the authors see it as a mechanism for solving a narrow and specific problem - machine translation. So when we trace the origin of AGI in the future, we may be able to trace it back to the "inconspicuous" Google Translate.
Although it was accepted by NeurIPS 2017, it didn't even get an Oral
Although Transformer's paper is very influential now, it didn't even get an Oral, let alone an award, at the world's top AI conference NeurIPS 2017. The conference received a total of 3,240 paper submissions that year, of which 678 were selected as conference papers. The Transformer paper was one of the accepted papers. Among these papers, 40 were Oral papers, 112 were Spotlight papers, and 3 were the best Papers, a Test of time award, Transformer missed the award.
Although it missed the NeurIPS 2017 paper award, the influence of Transformer is obvious to all.
Jim Fan commented: It is not the fault of the judges that it is difficult for people to realize the importance of an influential study before it becomes influential. However, there are also papers that are lucky enough to be discovered in the first place. For example, ResNet proposed by He Kaiming and others won the best paper of CVPR 2016 that year. This research is well-deserved and has been correctly recognized by the AI Summit. But at the moment in 2017, very smart researchers may not be able to predict the changes brought about by LLM now. Just like in the 1980s, few people could foresee the tsunami brought about by deep learning since 2012.
At that time, there were 8 authors of this paper, they were from Google and the University of Toronto. Five years later, most of the authors of the paper have left the original institution.
On April 26, 2022, a company called "Adept" was officially established, with 9 co-founders, including Ashish Vaswani and Niki Parmar, two of the authors of the Transformer paper.
Niki Parmar graduated from the University of Southern California with a master's degree and joined Google in 2016. During her work, she developed some successful question answering and text similarity models for Google search and advertising. She led early work on extending the Transformer model to image generation, computer vision, and more. In 2021, she will also leave Google.
After leaving, the two co-founded Adept and served as Chief Scientist (Ashish Vaswani) and Chief Technology Officer (Niki Parmar) respectively. Adept's vision is to create an AI called an "artificial intelligence teammate" that is trained to use a variety of different software tools and APIs.
In March 2023, Adept announced the completion of a US$350 million Series B round of financing. The company's valuation exceeded US$1 billion and it was promoted to a unicorn. However, by the time Adept raised its public funding, Niki Parmar and Ashish Vaswani had left Adept to start their own new AI startup. However, the new company is currently under wraps and we do not have access to details of the company.
Another paper author, Noam Shazeer, was one of Google's most important early employees. He joined Google at the end of 2000, until he finally left in 2021, and then became the CEO of a start-up company called "Character.AI".
In addition to Noam Shazeer, the founder of Character.AI is Daniel De Freitas, both of whom are from Google's LaMDA team. Previously, they built LaMDA, a language model that supports conversational programs, at Google.
In March of this year, Character.AI announced the completion of financing of 150 million US dollars, with a valuation of 1 billion US dollars. It is one of the few start-up companies that have the potential to compete with OpenAI, the agency of ChatGPT, and it is also rare to grow in only 16 months. For unicorn companies. Its app, Character.AI, is a neural language model chatbot that can generate human-like text responses and engage in contextual conversations.
Character.AI launched on the Apple App Store and Google Play Store on May 23, 2023, with over 1.7 million downloads in its first week. In May 2023, the service added a $9.99-per-month paid subscription called c.ai+, which allows users priority chat access, faster response times and early access to new features, among other perks.
Cohere is a generative AI startup founded in 2019. Its core business includes providing NLP models and helping companies improve human-computer interaction. The three founders are Ivan Zhang, Nick Frosst and Aidan Gomez, where Gomez and Frosst are former members of the Google Brain team. In November 2021, Google Cloud announced that they will cooperate with Cohere, Google Cloud will use its powerful infrastructure to power the Cohere platform, and Cohere will use Cloud's TPU to develop and deploy its products.
Notably, Cohere just raised $270 million in Series C funding, making it a $2.2 billion unicorn.
While at Google, Jakob Uszkoreit helped build the language understanding team for Google Assistant and worked on Google Translate early on.