Researchers From China Propose a Generate-and-Edit Approach that Utilizes Execution Results of the Generated Code from LLMs to Improve the Code Quality in the Competitive Programming Task

Mai 18, às 11:04


5 min de leitura


5 leituras

Researchers From China Propose a Generate-and-Edit Approach that Utilizes Execution Results of the Generated Code from LLMs to Improve the Code Quality on the Competitive Programming Task
Researchers From China Propose a Generate-and-Edit Approach that Utilizes Execution Results of the Generated Code from LLMs to Improve the Code Quality in the Competitive Programming Task

Researchers draw inspiration from the process of human programming to help LLMs do better in competitive programming jobs. The competitive programming job has recently been applied to large language models. This work necessitates accurately implementing solutions that can span hundreds of lines and comprehending a sophisticated natural language description of a problem with example test cases. Executing solutions on concealed test cases allows for solution evaluation. However, current LLMs’ accuracy and pass rates could be higher for this purpose. For instance, on the widely used APPS test, a competitive programming benchmark, the virtually most powerful model GPT3 only scores 7% accuracy. 

Programmers often develop an initial program, run a few sample test cases, and then make changes to the code in response to the test findings while resolving competitive programming difficulties. During this step, the programmer may use important information from the test results to troubleshoot the software. They implement this concept by using a comparable workflow with a neural-based editor. The code produced by a pre-trained LLM was examined, and it was discovered that several of the generated codes might be enhanced with small adjustments. 

They see that the error message identifies the coding fault, allowing them to correct the problem rapidly. It encourages us to look into editing methods and enhance the code quality produced by LLMs with the aid of execution outcomes. In this study, researchers from Peking University suggest a unique generate-and-edit approach to improve LLMs at competitive programming tasks. Their method uses the capability of LLMs in three phases to emulate the behavior of the human programmers mentioned above: 

  1. Generation utilizing LLMs. They create the program based on the problem description using huge language models like black box generators.
  2. Execution. They run the created code on the sample test case using LLMs to obtain the execution results. They also offer templates for the execution results as additional comments to include more useful data for modification.
  3. Edit. They create a fault-aware neural code editor that improves the code using the produced code and additional comments as input. Their code editor strives to raise the caliber and precision of LLM-based code production. 

They conduct in-depth research on the APPS and HumanEval public competitive programming benchmarks. To demonstrate the universality, they apply their methodology to 9 well-known LLMs with parameter values ranging from 110M to 175B. Their strategy dramatically raises LLM’s performance. In particular, their method raises the average of [email protected] on APPS-dev and APPS-test by 89% and 31%, respectively. Their tiny editor model can increase [email protected] from 26.6% to 32.4% on the APPS-dev test, even for the biggest language model used, GPT3-175B. They prove the transferability of their method on the out-of-distribution benchmark by improving the average of [email protected] by 48% on a new kind of dataset called HumanEval. Various methods for post-processing programs created by LLMs have recently been presented. 

These methods do extensive LLM sampling, rerank the sampled programs, and produce the final program. Their strategy, in contrast, provides two benefits: Their method keeps the sample budget constant and drastically lowers the computational burden on LLMs. Their editor alters the programs directly and outperforms these reranking-based techniques, particularly with a constrained sample budget like [email protected]. They are the first, as far as they are aware, to use an editing-based post-processing technique for programming competitions. 

The following is a list of the contributions: 

• To produce high-quality code for challenging programming jobs, they suggest a generate-and-edit method for huge language models. 

• They create a fault-aware neural code editor that uses error messages and produces code as input to improve the code’s precision and quality. 

• They do trials using two well-known datasets and nine LLMs to show the potency and applicability of their strategy.

Check out the Paper. Don’t forget to join our 21k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

🚀 Check Out 100’s AI Tools in AI Tools Club

Aneesh Tickoo

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

Continue lendo

AI | Techcrunch

Disney is reportedly preparing a standalone ESPN streaming service
Disney is actively preparing to launch a standalone ESPN streaming service, according to a new report from the Wall Street Journal. The report indicates that ESPN is planning to sell its channel directly to...

Hoje, às 15:51

AI | Techcrunch

The billionaires are trying to live longer… again
Hello, and welcome back to Equity, a podcast about the business of startups, where we unpack the numbers and nuance behind the headlines. This week Mary Ann, Becca, and Alex gathered to chew through the biggest news of the week. Here’s what the gang got into today: Vice goes bankrupt: Now is not a great time […]

Hoje, às 15:17

AI | Techcrunch

NASA picks Blue Origin-led team to build second human landing system on the moon, joining SpaceX
NASA has chosen a Blue Origin-led team to develop a second lunar landing system for the Artemis program, as the agency looks to provide competition with SpaceX and support long-term exploration of the...

Hoje, às 14:41

AI | TechCrunch

Apple reportedly limits internal use of AI-powered tools like ChatGPT and GitHub Copilot
As big tech companies are in a fierce race with each other to build generative AI tools, they are being cautious about giving their secrets away. In a move to prevent any of its data from ending up with...

Hoje, às 13:55

AI | Techcrunch

Apple is on the hunt for generative AI talent
Apple, like a number of companies right now, may be grappling with what role the newest advances in AI are playing, and should play, in its business. But one thing Apple is confident about is the fact that it...

Hoje, às 13:16

Victoria Lo

Enhancing Public Speaking Skills: A Guide by an Introvert
Public speaking can be a daunting task for many people, especially for introverts who may feel uncomfortable in large groups or social situations. However, with a bit of preparation and practice, introverts...

Hoje, às 13:16


How React Preserve and Reset State
State is isolated between components. React keeps track of which state belongs to which component based on their place in the UI tree. You can control when to preserve state and when to reset it between...

Hoje, às 12:55

AI | Techcrunch

Restaurant365 gobbles up $135M to supersize its software for the food service industry
The price of food continues to go up and up, but surprisingly that hasn’t (yet?) played out as pressure on the wider restaurant industry. Now, a startup that’s building technology to serve that sector announced a supersized round of funding to nourish its growth. Restaurant365, which develops all-in-one restaurant management software, announced $135 million in […]

Hoje, às 11:57

AI | Techcrunch

To secure early-stage funding, entrepreneurs should build ESG into their business models
The fiduciary duty of investment managers would suggest a long-term imperative to ensure that the funds they manage are not placed into assets that will become stranded or obsolete.

Hoje, às 11:30

Hacker News

WSJ News Exclusive | Apple Restricts Employee Use of ChatGPT, Joining Other Companies Wary of Leaks
By Aaron Tilley and Miles KruppaUpdated May 18, 2023 7:35 pm ETSam Altman, CEO of ChatGPT creator OpenAI, touted the benefits of AI and acknowledged potential downsides of the technology during a Senate...

Hoje, às 10:55