The sixth anniversary of Transformer: Even the NeurIPS Oral was not obtained, and 8 authors have founded several AI unicorns

2023-06-13 06:34:05

Some people joined OpenAI, some founded startups, and some stuck to Google AI. It was they who jointly started today's era of AI development.

From ChatGPT to AI drawing technology, the recent wave of breakthroughs in the field of artificial intelligence may be thanks to Transformer.

Today marks the sixth anniversary of the submission of the famous transformer paper.

Paper link:

Six years ago, a paper with an exaggerated name was uploaded to the preprinted paper platform arXiv. The phrase "xx is All You Need" was repeated by developers in the AI field, and it even became a trend in the title of the paper. , and Transformer is no longer the meaning of Transformers, it now represents the most advanced technology in the field of AI.

Six years later, looking back at this paper that year, we can find many interesting or little-known places, as summarized by Jim Fan, an AI scientist at Nvidia.

## ** "Attention Mechanism" is not proposed by the author of Transformer**

The Transformer model abandons the traditional CNN and RNN units, and the entire network structure is composed entirely of attention mechanisms.

Although the name of the Transformer paper is "Attention is All You Need", we continue to promote the attention mechanism because of it, but please note an interesting fact: it is not the researchers of Transformer who invented attention, but they put this The mechanism is pushed to the extreme.

The Attention Mechanism was proposed by a team led by deep learning pioneer Yoshua Bengio in 2014:

* "Neural Machine Translation by Jointly Learning to Align and Translate", the title is relatively simple. *

In this ICLR 2015 paper, Bengio et al. proposed a combination of RNN + "context vector" (ie attention). Although it is one of the greatest milestones in the NLP field, it is much less well-known than Transformer, Bengio's team's paper has been cited 29,000 times to date, and Transformer has 77,000.

AI's attention mechanism is naturally modeled on human visual attention. There is an innate ability in the human brain: when we look at a picture, we first quickly scan the picture, and then lock the target area that needs to be focused on.

If you don't let go of any local information, you will inevitably do a lot of useless work, which is not conducive to survival. Likewise, introducing similar mechanisms in deep learning networks can simplify models and speed up computation. In essence, Attention is to filter out a small amount of important information from a large amount of information, and focus on these important information, ignoring most of the unimportant information.

In recent years, the attention mechanism has been widely used in various fields of deep learning, such as in computer vision for capturing receptive fields on images, or in NLP for locating key tokens or features. A large number of experiments have proved that the model with the attention mechanism has achieved significant performance improvements in tasks such as image classification, segmentation, tracking, enhancement, and natural language recognition, understanding, question answering, and translation.

The Transformer model that introduces the attention mechanism can be regarded as a general-purpose sequence computer. The attention mechanism allows the model to assign different attention weights according to the correlation of different positions in the sequence when processing the input sequence. It enables Transformer to capture long-distance dependencies and context information, thereby improving the effect of sequence processing.

But in that year, neither the Transformer nor the original attention paper talked about a general-purpose sequential computer. Instead, the authors see it as a mechanism for solving a narrow and specific problem - machine translation. So when we trace the origin of AGI in the future, we may be able to trace it back to the "inconspicuous" Google Translate.

Although it was accepted by NeurIPS 2017, it didn't even get an Oral

Although Transformer's paper is very influential now, it didn't even get an Oral, let alone an award, at the world's top AI conference NeurIPS 2017. The conference received a total of 3,240 paper submissions that year, of which 678 were selected as conference papers. The Transformer paper was one of the accepted papers. Among these papers, 40 were Oral papers, 112 were Spotlight papers, and 3 were the best Papers, a Test of time award, Transformer missed the award.

Although it missed the NeurIPS 2017 paper award, the influence of Transformer is obvious to all.

Jim Fan commented: It is not the fault of the judges that it is difficult for people to realize the importance of an influential study before it becomes influential. However, there are also papers that are lucky enough to be discovered in the first place. For example, ResNet proposed by He Kaiming and others won the best paper of CVPR 2016 that year. This research is well-deserved and has been correctly recognized by the AI Summit. But at the moment in 2017, very smart researchers may not be able to predict the changes brought about by LLM now. Just like in the 1980s, few people could foresee the tsunami brought about by deep learning since 2012.

## Eight authors, their lives are wonderful

At that time, there were 8 authors of this paper, they were from Google and the University of Toronto. Five years later, most of the authors of the paper have left the original institution.

On April 26, 2022, a company called "Adept" was officially established, with 9 co-founders, including Ashish Vaswani and Niki Parmar, two of the authors of the Transformer paper.

Ashish Vaswani received a Ph.D. from the University of Southern California, under the tutelage of Chinese scholars David Chiang and Liang Huang, and mainly researched the early application of modern deep learning in language modeling. In 2016, he joined Google Brain and led the Transformer research, leaving Google in 2021.

Niki Parmar graduated from the University of Southern California with a master's degree and joined Google in 2016. During her work, she developed some successful question answering and text similarity models for Google search and advertising. She led early work on extending the Transformer model to image generation, computer vision, and more. In 2021, she will also leave Google.

After leaving, the two co-founded Adept and served as Chief Scientist (Ashish Vaswani) and Chief Technology Officer (Niki Parmar) respectively. Adept's vision is to create an AI called an "artificial intelligence teammate" that is trained to use a variety of different software tools and APIs.

In March 2023, Adept announced the completion of a US$350 million Series B round of financing. The company's valuation exceeded US$1 billion and it was promoted to a unicorn. However, by the time Adept raised its public funding, Niki Parmar and Ashish Vaswani had left Adept to start their own new AI startup. However, the new company is currently under wraps and we do not have access to details of the company.

Another paper author, Noam Shazeer, was one of Google's most important early employees. He joined Google at the end of 2000, until he finally left in 2021, and then became the CEO of a start-up company called "Character.AI".

In addition to Noam Shazeer, the founder of Character.AI is Daniel De Freitas, both of whom are from Google's LaMDA team. Previously, they built LaMDA, a language model that supports conversational programs, at Google.

In March of this year, Character.AI announced the completion of financing of 150 million US dollars, with a valuation of 1 billion US dollars. It is one of the few start-up companies that have the potential to compete with OpenAI, the agency of ChatGPT, and it is also rare to grow in only 16 months. For unicorn companies. Its app, Character.AI, is a neural language model chatbot that can generate human-like text responses and engage in contextual conversations.

Character.AI launched on the Apple App Store and Google Play Store on May 23, 2023, with over 1.7 million downloads in its first week. In May 2023, the service added a $9.99-per-month paid subscription called c.ai+, which allows users priority chat access, faster response times and early access to new features, among other perks.

Aidan N. Gomez left Google as early as 2019, then worked as a researcher at FOR.ai, and is now the co-founder and CEO of Cohere.

Cohere is a generative AI startup founded in 2019. Its core business includes providing NLP models and helping companies improve human-computer interaction. The three founders are Ivan Zhang, Nick Frosst and Aidan Gomez, where Gomez and Frosst are former members of the Google Brain team. In November 2021, Google Cloud announced that they will cooperate with Cohere, Google Cloud will use its powerful infrastructure to power the Cohere platform, and Cohere will use Cloud's TPU to develop and deploy its products.

Notably, Cohere just raised $270 million in Series C funding, making it a $2.2 billion unicorn.

Łukasz Kaiser left Google in 2021, worked at Google for 7 years and 9 months, and is now a researcher at OpenAI. During his time as a research scientist at Google, he participated in the design of SOTA neural models for machine translation, parsing, and other algorithmic and generative tasks, and was a co-author of the TensorFlow system, Tensor2Tensor library.

Jakob Uszkoreit left Google in 2021 and worked at Google for 13 years before joining Inceptive as a co-founder. Inceptive is an AI pharmaceutical company dedicated to using deep learning to design RNA drugs.

While at Google, Jakob Uszkoreit helped build the language understanding team for Google Assistant and worked on Google Translate early on.

Illia Polosukhin left Google in 2017 and is now the co-founder and CTO of NEAR.AI (a blockchain underlying technology company).

The only one still at Google is Llion Jones, this year is his 9th year working at Google.

Now, 6 years have passed since the publication of the "Attention Is All You Need" paper. Some of the original authors have chosen to leave, and some have chosen to stay at Google. In any case, Transformer's influence continues.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

Reward
like
Comment
Repost
Share

Comment

0/400

No comments

Topic
#TOKEN OF LOVE IS BACK
16k Popularity
#BTC Market Cap Tops Amazon
7k Popularity
#Show My Alpha Points
94k Popularity
#Crypto Market Cap Hits ATH
2k Popularity
#Predict BTC's Bull or Bear Trend
8k Popularity

sitemap