The Midjourney Power User Playbook
📋 Contents
Most Midjourney users type a description and hit enter. Power users understand that Midjourney responds to a specific language — a combination of subject, style references, technical parameters, and compositional guidance. Here's how to speak it fluently.
The Prompt Formula That Actually Works
The structure that consistently produces professional-quality images:
[SUBJECT + ACTION] [ENVIRONMENT/SETTING] [STYLE/AESTHETIC]
[LIGHTING] [CAMERA/PERSPECTIVE] [MOOD] [TECHNICAL PARAMS]
Working Examples
Commercial photography style:
glass perfume bottle with minimalist gold cap, floating on black marble surface,
editorial product photography, rim lighting with soft fill, macro lens 85mm,
luxury brand aesthetic, photorealistic --ar 4:5 --stylize 200 --v 6.1
Brand illustration style:
confident woman using laptop in modern coworking space, flat design illustration,
teal and coral color palette, bold geometric shapes, positive professional mood,
Dribbble aesthetic --ar 16:9 --stylize 500 --v 6.1
Cinematic scene:
abandoned lighthouse on rocky coast at golden hour, storm approaching from behind,
dramatic atmospheric perspective, shot on 35mm film, Kodak Portra 400 grain,
cinematic widescreen, moody isolation --ar 21:9 --stylize 300 --v 6.1
What Each Part Does
- Subject first, adjectives close to what they modify. "a sleek black electric car" works better than "a car that is electric and black and sleek."
- Style references before technical params. Midjourney weights earlier prompt elements more heavily — put style guidance before camera specs.
- Lighting matters enormously. "rim lighting," "golden hour," "overcast diffused light," "dramatic studio lighting" — these single phrases transform the output more than almost anything else.
- Camera/lens references for photography. "shot on 35mm," "50mm f/1.4," "aerial drone perspective" — these ground photorealistic images in recognizable visual languages.
What NOT to Include
- Negation in the main prompt ("not dark," "without people") — use
--noparameter instead - Overly abstract concepts without visual anchors ("freedom," "happiness" alone)
- Too many competing styles — pick one primary aesthetic
- Exact color hex codes — Midjourney doesn't read these; use color names or references
Style & Character Reference Techniques
/describe: Reverse-Engineer Any Image
Upload any image with /describe [image] and Midjourney gives you 4 prompt variations that would produce similar results. This is invaluable for:
- Replicating a style you found somewhere
- Understanding why your image looks the way it does
- Building a prompt vocabulary for styles you like
--sref: Style References (The Consistency Tool)
Use --sref [image URL] to pull style (colors, textures, mood, artistic technique) from a reference image without copying subject matter. This is how professionals maintain visual consistency.
product packaging design for artisan coffee, kraft paper texture, hand-lettered typography
--sref https://[your-style-reference-image.jpg] --sw 500 --ar 2:3
--sw (style weight, 0-1000) controls how strongly the style reference influences output. Start at 500 and adjust. High values can override your text prompt significantly.
--cref: Character References (Consistent Faces)
The problem Midjourney struggled with longest: consistent characters across images. --cref [image URL] locks in a character's appearance:
same character from --cref [URL], now in a coffee shop, casual weekend mood,
candid portrait --cref [character-reference.jpg] --cw 75 --ar 4:5
--cw (character weight, 0-100): 100 = strict face match but may override clothing/style; 75 = good balance for most use cases; 0-50 = loose inspiration only.
Combining --sref + --cref
This is the professional brand photography workflow:
[scene description] --sref [brand-style-image.jpg] --cref [brand-character.jpg]
--sw 300 --cw 75 --ar 16:9 --v 6.1
You get consistent character + consistent brand aesthetic across an entire campaign. This used to require a professional photo shoot; now it's a Discord command.
Workflow: Creating Brand Identity with Midjourney
This is how design agencies are actually using Midjourney — not as a final deliverable but as a rapid ideation and client presentation tool.
Phase 1: Mood Board Generation (30 min)
- Start with
--chaos 70and 5-6 concept directions in the same prompt session to see wildly different interpretations - Use high
--stylize(700-900) in this phase — you want Midjourney's aesthetic intelligence, not literal interpretation - Collect the promising seeds (copy the seed number from successful images)
Phase 2: Direction Refinement
Once you have 2-3 promising directions:
minimalist tech brand visual identity, primary color electric blue #1 of 4 variations,
geometric sans-serif logo concept, clean white space, professional B2B aesthetic,
Silicon Valley startup energy --stylize 300 --chaos 20 --ar 1:1
Lower --chaos (20-30) now — you want variations on a theme, not wild divergence.
Phase 3: Asset Generation
Once the direction is locked, generate the full asset set:
- Hero image:
--ar 16:9with your established style reference--sref - Social media:
--ar 1:1and--ar 4:5 - Email header:
--ar 3:1 - Favicon concept:
--ar 1:1 --stylize 50(lower stylize for simpler, icon-appropriate output)
Phase 4: Variation and Zoom
- Vary (Region): Inpainting — select specific areas of an image to regenerate while keeping the rest. Perfect for fixing hands, faces, or background elements.
- Zoom Out: Extends the canvas outward, adding more scene. Great for creating wider crops of a tight shot.
- Pan: Extends in one direction — useful for creating wider banner formats from square images.
Workflow: Product Mockups for E-Commerce
Product photography is one of the clearest ROI use cases for Midjourney. A professional product photo shoot costs $500-5,000. Midjourney delivers usable results for $0.01-0.05 per image at the Standard plan rate.
The Product Photography Prompt Stack
[product description], product photography, [surface], [background],
[lighting setup], photorealistic, high detail, commercial quality
--ar 1:1 --stylize 150 --v 6.1 --no hands, people, text
Real example:
minimalist white ceramic mug with subtle blue geometric pattern,
product photography, white marble surface, white background,
soft studio lighting with slight shadow, photorealistic, Shopify hero image quality,
beverage branding --ar 1:1 --stylize 150 --v 6.1 --no hands, people
Lifestyle Shots Without Models
artisan coffee mug on wooden table near window, cozy home office setting,
morning light, bokeh background showing plants and books, lifestyle product photography,
Canon 5D Mark IV look, warm color grade --ar 4:5 --stylize 200 --v 6.1
Upload Your Product for Consistent Results
Upload your actual product image and use it as a reference:
/imagine [your-product-image.jpg] placed on marble kitchen counter,
luxury lifestyle setting, natural window light, editorial product placement,
photorealistic --iw 0.8 --stylize 100
--iw (image weight, 0-3): Controls how strongly your uploaded image influences the result. 0.5-1.0 for product consistency; lower for more creative interpretation.
Advanced Parameters Most People Ignore
| Parameter | Range | What It Does | When to Use It |
|---|---|---|---|
--stylize / --s | 0–1000 | How much Midjourney's aesthetic training influences output. Higher = more artistic/opinionated. | Low (0-100) for literal accuracy; high (700+) for artistic interpretation |
--chaos / --c | 0–100 | Variety between the 4 initial grid images. High = very different options. | High for ideation; low for refinement |
--weird / --w | 0–3000 | Experimental, unusual aesthetic. Makes outputs surreal and unexpected. | Creative/editorial work where unusual is a feature |
--quality / --q | 0.25, 0.5, 1 | Rendering time/quality. 0.25 = fast draft; 1 = full quality. | Use 0.25 for ideation iterations; 1 for final outputs |
--seed | Any number | Fix the random seed for reproducible results. Same seed + same prompt = same image. | Reproducing a result, creating subtle variations |
--tile | Flag | Creates seamlessly tileable images. | Backgrounds, textures, pattern design |
--no | Word list | Negative prompting — exclude specific elements. | Any time you want to prevent something common: --no text, watermark, hands |
--repeat / --r | 2–40 | Runs the same prompt multiple times. Useful for batch generation. | When you need many variations at once |
Midjourney vs DALL-E vs Stable Diffusion: Honest Comparison
| Criteria | Midjourney | DALL-E 3 | Stable Diffusion |
|---|---|---|---|
| Image quality ceiling | 🥇 Highest artistic quality | Good, improving | Variable (model-dependent) |
| Prompt understanding | Excellent (artistic language) | Excellent (conversational) | Good (technical) |
| Photorealism | Excellent | Good | Excellent (with right models) |
| Text in images | Poor (v6.1 improved but still unreliable) | Good | Variable |
| Cost | $10-120/mo subscription | Included in ChatGPT Plus | Free (local) or per-API-call |
| Setup friction | Low (Discord/web) | None (built into ChatGPT) | High (local setup) / Low (API) |
| Commercial rights | ✅ Paid plans own outputs | ✅ Outputs owned by user | ✅ Full ownership |
| Custom model training | ❌ No | ❌ No | ✅ Full LoRA/finetune support |
| Privacy (no data retention) | Paid stealth mode only | OpenAI privacy terms apply | ✅ Fully local option |
Use Midjourney when: You want professional-quality artistic or photographic images with minimal technical friction. The output-per-prompt quality is simply the highest of any commercial tool.
Use DALL-E when: You're already in ChatGPT and need a quick image. Or when you need to generate images from very detailed textual descriptions — DALL-E handles instruction following better for complex scenes with specific text elements.
Use Stable Diffusion when: You need custom fine-tuning (your own characters, products, styles), you process high volumes that make per-image pricing prohibitive, or you require full data privacy with no cloud processing.
Cost Per Image Breakdown
| Plan | Price/mo | Fast GPU Hours | Est. Images | Cost/Image |
|---|---|---|---|---|
| Basic | $10 | 3.3 hrs | ~200 (fast) | ~$0.05 |
| Standard | $30 | 15 hrs | ~900 fast + unlimited relaxed | ~$0.03 fast; ~$0.01 relaxed |
| Pro | $60 | 30 hrs | ~1,800 fast + unlimited relaxed + stealth | ~$0.03 fast; ~$0.005 relaxed |
| Mega | $120 | 60 hrs | ~3,600 fast + unlimited relaxed | ~$0.03 fast; <$0.005 relaxed |
Standard at $30/mo is the right tier for 90% of users. Unlimited relaxed generations means cost-per-image effectively approaches zero for patient users. Pro adds Stealth mode (images don't appear in public gallery) — required if you're generating commercially sensitive work.