A Modular Framework for AI Image Prompting

Most designers generating images with AI are writing prompts from scratch every time. You open a tool, describe what you want in a paragraph or two, and hope the output lands close enough to be usable. Sometimes it does. More often you end up in a cycle of rewrites, adjusting language, swapping descriptors, trying to steer the model toward something that actually fits the visual identity you’re working within. And yet the alternative, stock photography, is worse: generic, overused, and disconnected from anything specific to your brand or project.

The problem is that there’s no system behind the prompts. Every request starts from zero, which means every output is a coin flip in terms of consistency. If you’re producing imagery for a brand across a website, a pitch deck, social content, and editorial features, that inconsistency becomes a real problem. You need your images to feel like they belong together, and ad hoc prompting doesn’t get you there reliably.

This post introduces a framework we’ve been developing at Origin to solve that. It’s a modular prompt structure designed for photography-style image generation that locks in the constants of a visual language while giving you controlled flexibility across different image types. The result is a reusable system prompt that produces cohesive, brand-aligned imagery every time, without starting over. Think of it as a direct replacement for stock photography: imagery that’s actually tailored to your brand, generated on demand, and consistent across every touchpoint.

Why Modular Prompting Matters

The framework handles this by defining a fixed set of visual rules that persist across every prompt, then layering in category-specific adjustments that modify only the variables: scene and subject description, how the image is shot, and lighting design. You select a category, provide your project context, and the system assembles a complete prompt that’s structurally consistent but contextually specific.

This means you can generate a detail shot of print materials, an environmental portrait, and a candid collaboration scene, and all three will share the same visual DNA without you having to manually reconcile the language each time.

The Four Categories

The framework is built around four image types that cover the majority of what design-oriented brands need, the same categories you’d typically fill with stock photography but with results that actually belong to your project. Each one adjusts the three variable sections while leaving the core visual rules untouched.

Details covers close-ups and still-life moments: hands interacting with tools, tactile surfaces, paper textures, swatches, sketches. Tight framing, shallow depth of field, close directional light.
Sense of Place shifts the focus to environment: studios, home offices, cafes, community spaces. Human presence is implied rather than explicit. Ambient daylight filling the space.
Portraiture puts a person at the center. Natural expressions, workspace context, strong directional window light shaping the face. Editorial, not corporate.
Authentic Moments captures real interaction and movement. People reviewing prints, collaborating, arranging products. Slight motion blur is acceptable. Candid and editorial.

How the Output Works

Every prompt the framework generates follows the same five-section structure, regardless of which category you select. This consistency is what makes the imagery feel cohesive across an entire project.

The sections are: Scene & Subject Description, How It Is Shot, Lighting Design, Visual Style & Atmosphere, and Stylistic Reference & Emotional Intent. The first three shift based on category. The last two remain stable, anchored to your project’s established visual language.

When you feed the framework a category and some context, say “Details” and “show tactile print materials for brand review,” it assembles a prompt that adjusts the scene, framing, and light for a close-up detail shot while maintaining the overarching style and philosophy you’ve defined. Swap the category to “Portraiture” and provide “designer reviewing proofs at their desk,” and the structural logic shifts accordingly. The feel stays the same.

The System Prompt

This is the full framework, designed to be used as a system prompt for any AI model that generates images or image descriptions. Paste it into your model’s system instructions, define your project’s visual language within it, and use it as-is. When you need an image, just tell it which category and what context. It handles the rest.

You are a specialized image-prompt generator. Your job is to take the user's requested category and their project-specific context, then output a fully formed, production-ready photography prompt.

CORE VISUAL RULES (Always Maintain):

Lighting: Always natural, always directional, always window-based. Prefer morning or late-day warmth. Real shadows, no studio lights.

Palette: Define per project. Reference the project's established palette. Low contrast, soft depth.

Texture: Linen, paper grain, wood, matte finishes, stone, soft shadows.

Human Presence: Always present or implied through hands, gestures, tools, papers, mugs, sketches, devices.

Philosophy: The image communicates that good design is a tool, quiet, supportive, and meaningful, not the center of the story.

CATEGORY LOGIC:

1. DETAILS (Objects & Tools)
- Scene: Close-ups, still-life moments, hands interacting with design tools, tactile surfaces, paper textures, print pieces, sketches. Emphasize residue of work.
- Lighting: Close, soft, directional morning or afternoon light. Highlight surface textures with shallow depth of field.
- Shot: Nikon Z8 with 50mm or macro lens. Shallow depth of field. Tight framing.

2. SENSE OF PLACE (Environment)
- Scene: Environment first, studios, home offices, cafes, community spaces. Lived-in details: plants, shelves, notes, moodboards, signage, windows, brick, wood. Human presence implied.
- Lighting: Ambient daylight filling space. Natural shadows across walls, desks, corners. Wider depth, soft edges.
- Shot: Nikon Z8, eye-level or slight off-center framing. Wider composition. Imperfect framing allowed, editorial realism.

3. PORTRAITURE (People / Presence)
- Scene: Human is primary focus. Natural, thoughtful expressions. Person seated or standing within workspace context. Subtle props, not overcrowded.
- Lighting: Strong directional natural window light shaping the face. Soft shadows, no harsh drama.
- Shot: Nikon Z8, 50mm at f/2.2 or f/2.8. Eye-level portrait framing, intentional but not stiff. Optional slight matte film grain.

4. AUTHENTIC MOMENTS (Action / Interaction)
- Scene: Humans engaged in real movement or interaction. Reviewing prints, gesturing, collaborating, laughing, arranging products. Tools in use.
- Lighting: Mixed daylight. Subtle motion or imperfect focus allowed. Natural reflections on surfaces.
- Shot: Nikon Z8, 35mm or 50mm. Slight motion blur acceptable. Candid editorial feel.

OUTPUT FORMAT (Always use these sections):

→ Scene & subject description
→ How it is shot
→ Lighting design
→ Visual style & atmosphere
→ Stylistic reference & emotional intent

INSTRUCTIONS:
1. Confirm category and user context in one sentence.
2. Output one cohesive prompt using the five sections above.
3. Adjust scene, shot, and lighting based on the selected category.
4. Keep visual style and stylistic intent stable across all outputs.
5. If the user provides additional context about purpose, let it influence props, framing, people, and implied story.
6. Never use studio lighting, hard shadows, glamour styling, corporate headshot tone, sterile setups, flash photography language, or artificial lighting effects.

Adapting It to Your Projects

The framework is intentionally brand-agnostic. The version above ships without a specific color palette or mood language. Those are variables you define per project. When you set it up for a client, you add their palette references, their tonal descriptors, and any specific textural or environmental preferences into the core visual rules section. The category logic and output structure stay the same.

This means you build the system once, then customize the skin for each engagement. A studio working across multiple clients can maintain a single structural framework and swap in project-specific visual language as needed. No more licensing stock photos that five other companies are also using on their homepage. The consistency comes from the architecture, not from copying and pasting the same adjectives into every prompt.

What’s Next

This is one piece of a larger set of workflows we’re building and sharing through Origin Supply. Practical frameworks for designers integrating AI into their process without sacrificing intentionality or craft. The goal is to automate the scaffolding around taste, so the decisions that actually matter get more of your attention.

If you’re working on something similar or have an approach worth sharing, we’d like to hear about it. Origin Supply is an open resource. The link below is where you can contribute.