🖌️Generation Menus

A detailed look at the different settings in the Retro Diffusion image generation menus.

Text to Image

"Text to Image" generation uses a text prompt as the input to create an image. This is found in "Help" -> "Retro Diffusion Scripts" -> "Text to Image," or in "Sprite" -> "Text to Image."

This requires an active sprite canvas and the RGB color mode.

Prompt

To generate an image with this tool, type a description of your desired image in the "Prompt" field. The tool will then generate an image based off of that description. Some general tips for prompt writing are found here or in the help menus.

Negative

The "Negative" section informs the tool about what concepts to avoid in the generation. Many negatives are included automatically, so you can normally leave this field blank. If you do identify common elements in your generations that are undesirable, use this field to target and remove them.

Style

The "Style" field is used to save and load different styles (styles are different combinations of prompts and negatives). It also contains the helpful "Reset" and "Clear" buttons, which are used to help manage the "Prompt" and "Negative" fields.

Width & Height

"Width" and "Height" control the size of the generated image. By default, these sliders allow any input between 8 and 512. This can be changed in the "Retro Diffusion Settings" menu, but the recommended size range for generated images is <256. Images larger than 256 are likely to crash on most hardware and are prone to having poor quality due to the massive size.

If you make a selection on the canvas before opening the menu, these settings will automatically be set to that selection.

Sliders can be modified by clicking and dragging, or by using the mouse scroll wheel.

They can also be replaced with direct number entry fields. Do this by unchecking the "Input size with sliders" option in the generation menu:

The size preset buttons (found under the "Width" and "Height" sliders) are used to quickly set the width and height. These presets include several common sizes for pixel art.

The "Swap" button switches the width and height values with each other.

Quality

"Quality" affects many factors of image generation on the back-end. Its impact is evident because it produces similar images across quality values and reduces generation time by nearly 50% per image. Factors that contribute to the speed increase include generation size, composition enhancement, and other settings.

When testing prompts, it is best to use lower quality settings (because the image generation is much faster with lower settings than higher ones). Higher settings are recommended for final images.

The speed difference is especially noticeable when generating large batches.

Composition

The "Edit composition" button opens the "Composition Preview" menu.

More about this menu is found here.

The "Reset" composition button resets the composition menu to its default values.

Generations

"Generations" is the number of images that will be created in one run.

Model

By default, the "Model" setting is set to "Pixel art." This option dynamically changes between the four available base models so that the image always gets the best model for the given size.

If you want to choose the specific model directly, you must change the model to one of the "forced" size models. This is an advanced feature and requires experience and knowledge to use effectively.

The models will perform best at or around their specified sizes.

You can also load custom models in this section by placing the files inside of the model folder.

Modifier

Modifiers are additional models that can be loaded on top of the pixel art model to create different styles, characters, or effects. By default, Retro Diffusion has over a dozen different modifiers that offer a range of unique effects and styles.

You can limit how strongly the modifiers affect the image generation. To do this, adjust the strength controls shown below the drop-down menu for each modifier. The recommended range for the included modifiers is between 30% and 80%. Higher or lower values may substantially degrade image quality.

Modifiers can be mixed to create different styles. For example, this unique watercolor style was made by mixing 30% top-down, 50% game boy advance, and 30% front-facing.

If you do not want to use a modifier, set the modifier to "None" or the strength to 0.

Additionally, you can use your own modifiers in the form of LoRA models. These can be placed in the lora folder and then selected from the modifier drop-down menus. Note that these must be in the "safetensors" format, and must be LoRA models (not LyCORIS, LoCon, or Textual Inversions).

The number of modifiers that you can enable at once, as well as the strength range, can be modified in the settings menu.

Processing section

This contains settings related to image generation and rendering.

Show preview

This setting displays a preview window of the image as it is being generated. This slightly slows generation speed.

You can select between different images in the same batch as they are returned. A slider is provided for quick and easy navigation.

Use LLM enhanced prompt (If LLM is enabled)

If this option doesn't show up, AND your system has more than 8GB of VRAM, you can install it by going through the setup process again and enabling the LLM.

Using LLM enhanced prompts will modify the inputted prompts with a Large Language Model. This can help align prompts with the model to create higher quality, more detailed images. It also enables several other prompt based features.

Enhance composition

This setting enables automatic composition corrections by first generating a smaller image, then using that image to guide the larger generation.

"Enhance composition" also generates large images faster. It does this by performing fewer steps at large resolutions (which can be very slow).

Use fast pixel decoder

The "fast pixel decoder" is an alternate way of converting the "latents" generated by the image model into RGB images. Enabling the "fast pixel decoder" slightly reduces quality, but can greatly lower VRAM usage and improve generation speed for large images.

Automatically reduce output colors

Once an image is generated, it can still contain lots of unnecessary colors. This setting attempts to correct that by reducing those colors without losing image quality.

Disable this setting for more precise manual control over image colors.

UI settings section

This contains settings for how to display images once they're generated as well as for advanced UI options.

Save as Frames/Layers/Grid

Once generated, images are returned to the Aseprite canvas in one of three ways: 1) As a set of sequential frames on a new layer, with the layer name being the seed of the first image 2) As a sequence of new layers, with each layer being named after the seed that generated the image, or 3) As a single frame and layer which contains a grid of all the generated images laid out next to each other, with the layer name being the seed of the first image.

Show advanced options

Enabling this setting will display more advanced options in the generation menu. These options and what they do are covered further down on this page.

Show modifiers

This setting will show/hide the modifier section, which can simplify/enhance the generation menu.

Input size with sliders

This option toggles the width and height input method between sliders and number inputs.

Process in background

This will stop the terminal window from being pushed to the front when generating an image.

Advanced settings

Using the "Show advanced options" checkbox will add several other options to the generation menu.

Scale

This is a technical setting corresponding to "CFG scale" in traditional diffusion models. Lowering the scale setting causes the prompt to "guide" the image generation less, while increasing the scale setting makes the prompt guide it more. Values that are too high also tend to "deep fry" the image, meaning they get highly saturated colors, patches of noise, and incoherent details.

Seed

Using the same seed and other settings will reproduce the same image.

Tiling

Enabling tiling forces the model to "wrap" the sides of the image around itself, making the edges seamless. This is a heavy-handed method that forces the model to tile and can result in distorted, noisy images.

Image to Image

"Image to Image" generation uses both a starting image and a text prompt as the input to create a new image. This is found in "Help" -> "Retro Diffusion Scripts" -> "Image to Image," or in "Sprite" -> "Image to Image."

This requires an active sprite canvas, a selection containing some pixels, and the RGB color mode.

"Image to Image" has all the same settings as "Text to Image" except "Enhance composition," which it lacks. It also has the added features of "Strength," "Snap to selection," and "Inpaint."

Strength

This setting determines how much the input image influences the generated image.

0 strength outputs the original image with no modification. 100 strength outputs a completely new image.

The recommended range is between 50 and 80.

Snap to selection

Unlike "Text to Image," width and height will not automatically be set to the input selection. You can use this button to snap the width and height settings to the selection width and height.

Inpaint

The "Inpaint" button opens the masking menu:

Here, you can use the left and right mouse buttons to paint sections of the image to be replaced by the new image generation. You can also use the scroll wheel and middle mouse button to navigate the canvas.

In this example the house has been painted out, and its spot will be replaced by a new image generation.

Last updated