Startec

Startec

Perp-Neg: Unveiling Image Potential with Negative Prompts and Stable Diffusion

Mai 25, às 02:51

·

4 min de leitura

·

0 leituras

Despite the remarkable capabilities demonstrated by advancements in generating images from text using diffusion models, the accuracy of the generated images in conveying the intended meaning of the original text prompt is not always guaranteed, as found by recent research. Generating images that effectively align with the semantic content of the text query is a […]
Perp-Neg: Unveiling Image Potential with Negative Prompts and Stable Diffusion

Despite the remarkable capabilities demonstrated by advancements in generating images from text using diffusion models, the accuracy of the generated images in conveying the intended meaning of the original text prompt is not always guaranteed, as found by recent research. Generating images that effectively align with the semantic content of the text query is a challenging task that necessitates a deep understanding of textual concepts and their meaning in visual representations.

Due to the challenges of acquiring detailed annotations, current text-to-image models struggle to fully comprehend the intricate relationship between text and images. Consequently, these models tend to generate images that resemble frequently occurring text-image pairs in the training datasets. As a result, the generated images often lack requested attributes or contain undesired ones. While recent research efforts have focused on addressing this issue by reintroducing missing objects or attributes to modify images based on well-crafted text prompts, there is a limited exploration of techniques for removing redundant attributes or explicitly instructing the model to exclude unwanted objects using negative prompts. 

Based on this research gap, a new approach has been proposed to address the current limitations of the existing algorithm for negative prompts. According to the authors of this work, the current implementation of negative prompts can lead to unsatisfactory results, particularly when there is an overlap between the main prompt and the negative prompts. 

To address this issue, they propose a novel algorithm called Perp-Neg, which does not require any training and can be applied to a pre-trained diffusion model. The architecture is reported below. 

The name “Perp-Neg” is derived from the concept of utilizing the perpendicular score estimated by the denoiser for the negative prompt. This choice of name reflects the key principle behind the Perp-Neg algorithm. Specifically, Perp-Neg employs a denoising process that is restricted to be perpendicular to the direction of the main prompt. This geometric constraint plays a crucial role in achieving the desired outcome.

Perp-Neg effectively addresses the issue of undesired perspectives in the negative prompts by limiting the denoising process to be perpendicular to the main prompt. It ensures that the model focuses on eliminating aspects that are orthogonal or unrelated to the main semantics of the prompt. In other words, Perp-Neg enables the model to remove undesirable attributes or objects not aligned with the text’s intended meaning while preserving the main prompt’s core essence.

This approach helps in enhancing the overall quality and coherence of the generated images, ensuring a stronger alignment with the original text input.

Some results obtained via Perp-Neg are presented in the figure below.

Beyond image synthesis, Perp-Neg is also extended to DreamFusion, an advanced text-to-3D model. Furthermore, in this context, the authors demonstrate its effectiveness in mitigating the Janus problem. The Janus (or multi-faced) problem refers to situations where a 3D-generated object is primarily rendered according to its canonical view rather than other perspectives. This problem mainly happens because the training dataset is unbalanced. For instance, animals or people are usually depicted from their front view and only sporadically from the side or back views.

This was the summary of Perp-Neg, a novel AI algorithm that leverages the geometrical properties of the score space to address the shortcomings of the current negative prompts algorithm. If you are interested, you can learn more about this technique in the links below.


Check out the Paper, Project, and Github. Don’t forget to join our 21k+ ML SubRedditDiscord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more. If you have any questions regarding the above article or if we missed anything, feel free to email us at [email protected]

🚀 Check Out 100’s AI Tools in AI Tools Club

Daniele Lorenzi

Daniele Lorenzi received his M.Sc. in ICT for Internet and Multimedia Engineering in 2021 from the University of Padua, Italy. He is a Ph.D. candidate at the Institute of Information Technology (ITEC) at the Alpen-Adria-Universität (AAU) Klagenfurt. He is currently working in the Christian Doppler Laboratory ATHENA and his research interests include adaptive video streaming, immersive media, machine learning, and QoS/QoE evaluation.


Continue lendo

DEV

Chetan Jain's Portfolio
Hi, I'm Chetan, an IITian who has successfully completed multiple projects. Here some of the projects I am Proud of: Projects Omkar Cloud I scraped over 300 million LinkedIn profiles to extract their...

Hoje, às 07:10

Hacker News

Asia Sentinel Blocked in Singapore
Access to Asia Sentinel’s website has apparently been blocked in Singapore by the country’s Ministry of Communications and Information, according to local media, after we refused to comply with an order to...

Hoje, às 01:49

DEV

Mastering API Fetch: How to Build a Movie Website with Real-Time Data Updates
Learn how to use the Fetch API to asynchronously request data from an external API and dynamically build a movie website. This beginner-friendly tutorial is great for learning about APIs, async JS, and...

Jun 2, às 21:58

Hacker News

Christopher Strachey and the Dawn of Interactive Text
The 50 Years of Text Games book begins its game coverage in 1971, but that’s not where the book starts. A lengthy introduction covers the prehistory of digital games you could type to, which stretches back...

Jun 2, às 21:22

Showmetech

O que é um número de telefone virtual?
Índice Número de telefone virtualComo ter um número de telefone virtualInternet no mundo todoO que faz do número de telefone virtual ideal para sua empresa?Como comprar um número de telefone...

Jun 2, às 21:18

Showmetech

Promoção Days of Play 2023 traz 70% de desconto em jogos do PlayStation!
Índice God of War RagnarökGrand Theft Auto V: Edição PremiumDead Island 2The Last of Us Parte IMarvel’s Spider-Man: Miles Morales PS4 & PS5WWE 2K23 Edição Digital Cross-GenGran Turismo 7Ghost of...

Jun 2, às 21:01

AI | Techcrunch

This AI used GPT-4 to become an expert Minecraft player
AI researchers have built a Minecraft bot that can explore and expand its capabilities in the game’s open world — but unlike other bots, this one basically wrote its own code through trial and error and lots...

Jun 2, às 20:22

DEV

📋 Javascript Quiz : 1️⃣
Once suspended, merudra754 will not be able to comment or publish posts until their suspension is removed. Once unsuspended, merudra754 will be able to comment and publish posts again. Once...

Jun 2, às 19:31

Hacker News

Systems explained by Humberto Maturana
In May 2021, Humberto Maturana, a Chilean Biologist, died aged 92. In these videos he explains the differences between information transfer within biological and non-biological systems.

Jun 2, às 19:16

Hacker News

A New Attack Can Unmask Anonymous Users on Any Major Browser
Everyone from advertisers and marketers to government-backed hackers and spyware makers wants to identify and track users across the web. And while a staggering amount of infrastructure is already in place to...

Jun 2, às 19:16