🍀 Gate.io Honor Credits Spring Lucky Draw Round 9️⃣ is Officially Live!
💎 Enter the Draw Now and Seize Your Springtime Luck! 👉 https://www.gate.io/activities/creditprize?now_period=9
To Join:
1️⃣ Open the App 'Home'-'Post', and Tap the Credits Icon Next to Your Profile to Enter the 'Credits Center'.
2️⃣ Complete Tasks like Post, Comment, and Like to Earn Honor Credits.
🎁 Every 300 Credits to Draw 1 Chance, Win a MacBook Air, INTER Water Bottle, Futures Voucher, Points, and More Amazing Prizes!
⏰ Ends on April 5th, 16:00 PM (UTC)
👉 Details: https://www.gate.io/announcements/article/44114
#
Complete victory over GPT-4, killing the closed-source model in seconds! Code Llama mysterious version exposed
Original source: Xinzhiyuan
Only 2 days after its release, Code Llama once again ignited the revolution of AI coding.
Remember the mysterious version Unnatural Code Llama that Meta appeared in the Code Llama paper that can fully equalize GPT-4?
Big guy Sebastian explained in his blog:
It is a fine-tuned version of Code Llama-Python 34B using 15,000 non-natural language instructions.
So just now, WizardCoder 34B, which was fine-tuned based on Code Llama, directly defeated GPT-4 on the Human benchmark.
In addition, the performance of WizardCoder 34B exceeds the latest version GPT-3.5, and Claude 2.
According to Jim Fan, a top scientist at Nvidia, this is basically an open version of "Unnatural Code Llama".
While the benchmark data looks good, Human only tests a narrow distribution and may overfit. Data testing in natural scenarios is really important. Coding benchmarks need a major upgrade.
On Friday, Meta officially open-sourced three versions of Code Llama.
In the Human and MBPP benchmarks, many people found a version not mentioned in the official Meta - Unnatural Code Llama.
According to the introduction, WizardCoder 34B is a fine-tuned version of the Code Llama model using the synthetic dataset Evol-Instruct.
The following is a visualization of the performance comparison with all open source and closed source models.
The results provided by OpenAI's official GPT4 report (2023/03/15) are: 67.0% and 48.1%, respectively. The results of the researchers using the latest API (2023/08/26) test are 82.0% and 72.5%.
A netizen did a comparative test of GPT-3.5 and Code Llama Instruct-34B. It was tested with access to Code Llama 34B provided by Perplexity.AI.
The result is that GPT-3.5 wins by 8:5.
The following are the specific test results.
first question
Use Python to accomplish this task, given two strings word1 and word2. Merge strings by adding letters in alternating order, starting with word1. If one string is longer than the other, append additional letters to the end of the merged string.
Finally output the merged string.
For example:
Input: word1 = "abc", word2 = "pqr" Output: "apbqcr"
Second question
Use Python to accomplish this task, given a string s, just reverse all vowels in the string and return it.
The vowels are "a", "e", "i", "o", and "u", which can appear multiple times in both lowercase and uppercase.
For example: input: s = "hello" output: "ello"
The third question
Use Python to accomplish this task, given an integer array nums, move all 0s to the end of it while maintaining the relative order of the non-zero elements.
Note that you have to do this in-place, without making a copy of the array.
For example: Input: nums = [0,1,0,3,12] Output: [1,3,12,0,0]
Question 4
Using Python for this task, you have a long flowerbed, some plots are planted with flowers, and some are not.
However, adjacent plots cannot be planted with flowers. Given an integer array of 0 and 1 for a flowerbed, where 0 is empty and 1 is not empty, and an integer n, output true if n new flowers can be planted in the flowerbed without violating the no-adjacent flower rule, Otherwise, false is output.
Example 1: Input: Flowerbed = [1,0,0,0,1], n = 1 Output: true Example 2: Input: Flowerbed = [1,0,0,0,1], n = 2 Output: false
Question 5
Using Python, given an input string s, reverse the order of the words. A word is defined as a sequence of non-whitespace characters. Words in s will be separated by at least one space.
Output a string of words joined by single spaces in reverse order. Note that s may contain leading or trailing spaces or multiple spaces between two words.
The returned string should have only one space to separate words. Do not include any extra spaces.
Example: Input: s = "the sky is blue" Output: "blue is sky the"
Question 6
Use Python to accomplish this task. Given a string s and an integer k, return the maximum number of vowels in any substring of length k in s.
The vowels in English are "a", "e", "i", "o" and "u". Example: Input: s = "leetcode", k = 3 Output: 2
Explanation: "lee", "eet" and "ode" contain 2 vowels.
Question 7
Use Python to accomplish this task, given a string s that contains asterisks *. With one operation, you can: Select an asterisk in s.
Removes the nearest non-asterisk character to its left, and removes the asterisk itself. Output the string after removing all asterisks. Example: Input: s = "leet**cod*e" Output: "lecoe"
Question 8
Use Python to accomplish this task, given an integer temperature array representing the daily temperature, return an array answer, where answer [i] is the number of days after day i you have to wait for warmer temperatures.
If there is no day in the future to do this, keep the answer [i] == 0. Example: Input: Temperature = [73,74,75,71,69,72,76,73] Output: [1,1,4,2,1,1,0,0]
Regarding the performance of the two models, this netizen believes that this is not a rigorous study, but a simple test. Every time the model is regenerated to generate code, it can basically get a better answer, but there is no test.
So the conclusion of the test is not the performance of the final two models.
Comparable to GPT-4, Llama 3 should be open source
Since the release of Llama and Llama 2, the machine learning community ChatGPT has exploded, and various fine-tuning models have sprung up.
OpenAI researcher Jason Wei said that he learned from Meta GenAI social activities that Llama 3 and Llama 4 will also be open source in the future.
I want to be clear about what this means: no kill switch.
If something goes wrong—an agent goes out of control, or a bad actor weapons it—there's no easy way to shut it down. It can run on any small cluster. There is no security at all.
Security research becomes meaningless.
All the work people have done to make AI systems honest, consistent, ethical, etc. becomes meaningless. The world's AI systems will evolve toward whichever system yields the greatest economic benefit, regardless of their values or motivations. There are no guardrails. Anyone can change the AI's values or capabilities at will, for better or worse.
If Meta continues to be open-sourced while we get smarter AI, then it's clear to me that things will get messy. The arrival of these extraterrestrial intelligences is already messing up the world, but it will be even worse if we give up what little control humans have.
As far as I know, Meta's hope for open source is mainly derived from the "open source community dogma", that is, "open source is good". And as far as I know, they weren't that pro-open source until the accidental leak of their first model, the Llama, and they've been pretending to be open source ever since.
Llama 2 is a very strong model in all aspects.
However, it has a very obvious weakness - the ability to code.
According to the data in the paper published by Meta on Llama 2, Llama 2's performance in Hum (a benchmark test for evaluating LLM and coding) is even worse than GPT-3.5, not to mention worse than GPT-4 how much.
But code ability will definitely be an important direction for the open source community to use Llama 2 in the future. Naturally, Meta cannot be poor in this direction, so there is Code Llama, which is greatly optimized for code ability.
Two days ago, Meta officially released the Code Llama family: Code Llama (7B, 13B and 34B), and 3 variants: the general code model Code Llama, the instruction follow model Code Llama-instruct and the Python code-specific version Code Llama- Python.
These models are free academic and commercial, as are the Llama 2 licenses.
The code ability of Code Llama 34B model is almost twice that of Llama 2, greatly narrowing the gap with GPT-4.
Remember the Unnatural Code Llama that Meta appeared in the Code Llama paper, which can fully equalize the GPT-4 version?
Big guy Sebastian explained in his blog:
It is a fine-tuned version of Code Llama-Python 34B using 15,000 non-natural language instructions.
Why is there no 70B Code Llama model?
Interestingly, Code Llama only has 7B, 13B and 34B parameter versions, which is 70B less than Llama 2.
Although Meta did not explain why this is the case in the paper, technology guru Sebastian offered two possible reasons:
Since the training data of Code Llama is only 1/4 compared with that of Llama 2, it may be because there is not enough training data, coupled with the limitation of LLM's Scaling Laws, the performance of CodeLlama70B is not good.
In contrast, Llama 2 only supports input lengths up to 4k. If the 70B model is to support an input length of 100k tokens, it may make the model's computational requirements too exaggerated.
References: