WorkflowIntermediateApril 25, 2026

Demo GPT-5.5 with an inbound lead to proposal pack

A visible GPT-5.4 vs GPT-5.5 workflow demo for founders and operators: give both models the same messy inbound lead packet, then compare how much of the proposal, CRM update, follow-up email, and risk review each lane completes.

Loading demo...

Time: 45 to 75 minutes
Cost: $5 to $20
Stack: GPT-5.4GPT-5.5CRM notespricing rulesproposal templaterisk checklist

You’re stuck with

You need to show a model upgrade in a way a business user can feel. Benchmarks are too abstract, and side-by-side prose outputs do not show workflow completion.

You end up with

A demoable two-lane replay: GPT-5.4 handles the narrow drafting work but pauses for human stitching; GPT-5.5 carries the larger packet and returns a proposal pack ready for review.

This workflow produced

GPT-5.5 Workflow Completion Map

A practical scorecard for GPT-5.5: agentic coding, computer work, research loops, review workflows, and the routing rules I would actually use after the release.

View build →

The recipe

The easiest way to make a model upgrade visible is to stop comparing answers. Compare finished workflow state instead.

For a ShipWithTez audience, the best workflow is not a puzzle, a benchmark, or an abstract coding task. It is a messy operator task with money nearby:

Messy inbound lead -> client-ready proposal pack

Give GPT-5.4 and GPT-5.5 the same packet. The visible difference is not prose quality. It is how many surfaces each model can complete before the human has to take over.

1. Freeze the inbound packet

Use one packet with five pieces:

# Inbound lead packet

## Email thread
- founder asks for help automating a weekly reporting workflow
- budget range is implied, not explicit
- timeline is "before next board meeting"

## Call transcript
- current process takes 6 hours every Friday
- data comes from Stripe, HubSpot, and a spreadsheet
- founder wants a weekly PDF plus Slack summary

## Pricing rules
- under 10 hours of setup: productized audit
- 10 to 30 hours: fixed proposal
- over 30 hours: phased discovery first

## CRM fields
- company:
- pain:
- budget signal:
- urgency:
- next action:

## Risk checklist
- unclear buyer
- unclear data access
- unclear success metric
- compliance-sensitive data

Both lanes get this exact packet.

2. Ask for four artifacts, not one answer

The output should be easy to inspect on a screen:

proposal summary
CRM update
follow-up email
risk checklist

That matters because a founder can see the workflow difference without reading a long transcript. GPT-5.4 might write a good proposal section but leave CRM and risk work incomplete. GPT-5.5 should earn the premium by returning a fuller pack.

3. Score visible completion

Use a small rubric:

Artifact	GPT-5.4 question	GPT-5.5 question
Proposal	Did it draft the offer?	Did it draft the offer and choose the right scope?
CRM	Did it identify useful fields?	Did it fill fields with evidence from the packet?
Follow-up	Did it write a nice email?	Did it ask for the exact missing inputs?
Risk	Did it list generic risks?	Did it flag the concrete blockers before sending?
Human work left	What still needs stitching?	What only needs review?

The winner is not the model with prettier language. The winner is the one that leaves less operational residue.

4. Make the demo honest

Label the first version as an illustrative replay unless you have saved real model outputs. Then rerun it with actual codex exec or API traces and replace the demo copy with measured artifacts.

mkdir -p comparisons/inbound-proposal-pack

codex exec -m gpt-5.4 "$(cat inbound-lead-packet.md)" \
  > comparisons/inbound-proposal-pack/gpt-5.4.md

codex exec -m gpt-5.5 "$(cat inbound-lead-packet.md)" \
  > comparisons/inbound-proposal-pack/gpt-5.5.md

If GPT-5.5 does not visibly reduce the human stitching work, do not route the workflow to GPT-5.5. Keep it checkpointed.

5. The routing rule

Use GPT-5.4 when the job is a narrow draft or a single artifact.

Use GPT-5.5 when the job needs:

messy source synthesis
business judgment under ambiguity
multiple output artifacts
tool or CRM state updates
risk review before handoff

That is the operator-facing story: GPT-5.5 is not just a better answer model. It is a better workflow-completion lane when the packet crosses sales, ops, docs, and review.

Steal this starter

# GPT-5.4 vs GPT-5.5 inbound proposal demo

## Frozen packet
- email thread:
- call notes:
- pricing rules:
- CRM fields:
- risk checklist:

## Required artifacts
- proposal summary
- CRM update
- follow-up email
- risk checklist

## Visible score
| Artifact | GPT-5.4 | GPT-5.5 | Human work left |
|---|---|---|---|
| Proposal | | | |
| CRM | | | |
| Follow-up | | | |
| Risk | | | |

## Routing decision

The demo works when the viewer can tell, at a glance, which lane created a business artifact they would actually use.

Get new workflows and breakdowns in your inbox.

WorkflowIntermediateApril 25, 2026

Demo GPT-5.5 with an inbound lead to proposal pack

Same messy lead packet

Frozen inbound packet

Series A founder needs weekly investor reporting automated before next board meeting.

Email threadCall notesPricing rulesCRM fieldsRisk checklist

Simultaneous proposal-pack replay

Same packet, same requested artifacts. The visible delta is the amount of business work left for the human.

GPT-5.44 human turns

partial pack

run logstep 1/5

Summarize pain

Friday reporting takes 6 hours

Draft proposal intro

Good narrative, no scoped offer

Pick pricing lane

Needs human budget call

human

Fill CRM

Missing urgency and next action

human

Risk review

Generic risks only

open

artifact tray

pending

Proposal

pending

CRM

pending

Risk

GPT-5.51 review turn

review-ready pack

run logstep 1/5

Map pain to offer

Reporting workflow plus Slack summary

Apply pricing rules

Fixed proposal, not discovery

Fill CRM fields

Pain, urgency, budget signal, next action

Draft proposal + email

Scope, timeline, assumptions, CTA

Check risks

Data access and success metric flagged

review

artifact tray

pending

Proposal

pending

CRM

pending

Risk

Loading demo...

Time: 45 to 75 minutes
Cost: $5 to $20
Stack: GPT-5.4GPT-5.5CRM notespricing rulesproposal templaterisk checklist

You’re stuck with

You need to show a model upgrade in a way a business user can feel. Benchmarks are too abstract, and side-by-side prose outputs do not show workflow completion.

You end up with

A demoable two-lane replay: GPT-5.4 handles the narrow drafting work but pauses for human stitching; GPT-5.5 carries the larger packet and returns a proposal pack ready for review.

This workflow produced

GPT-5.5 Workflow Completion Map

A practical scorecard for GPT-5.5: agentic coding, computer work, research loops, review workflows, and the routing rules I would actually use after the release.

View build →

The recipe

The easiest way to make a model upgrade visible is to stop comparing answers. Compare finished workflow state instead.

For a ShipWithTez audience, the best workflow is not a puzzle, a benchmark, or an abstract coding task. It is a messy operator task with money nearby:

Messy inbound lead -> client-ready proposal pack

Give GPT-5.4 and GPT-5.5 the same packet. The visible difference is not prose quality. It is how many surfaces each model can complete before the human has to take over.

1. Freeze the inbound packet

Use one packet with five pieces:

# Inbound lead packet

## Email thread
- founder asks for help automating a weekly reporting workflow
- budget range is implied, not explicit
- timeline is "before next board meeting"

## Call transcript
- current process takes 6 hours every Friday
- data comes from Stripe, HubSpot, and a spreadsheet
- founder wants a weekly PDF plus Slack summary

## Pricing rules
- under 10 hours of setup: productized audit
- 10 to 30 hours: fixed proposal
- over 30 hours: phased discovery first

## CRM fields
- company:
- pain:
- budget signal:
- urgency:
- next action:

## Risk checklist
- unclear buyer
- unclear data access
- unclear success metric
- compliance-sensitive data

Both lanes get this exact packet.

2. Ask for four artifacts, not one answer

The output should be easy to inspect on a screen:

proposal summary
CRM update
follow-up email
risk checklist

3. Score visible completion

Use a small rubric:

Artifact	GPT-5.4 question	GPT-5.5 question
Proposal	Did it draft the offer?	Did it draft the offer and choose the right scope?
CRM	Did it identify useful fields?	Did it fill fields with evidence from the packet?
Follow-up	Did it write a nice email?	Did it ask for the exact missing inputs?
Risk	Did it list generic risks?	Did it flag the concrete blockers before sending?
Human work left	What still needs stitching?	What only needs review?

The winner is not the model with prettier language. The winner is the one that leaves less operational residue.

4. Make the demo honest

Label the first version as an illustrative replay unless you have saved real model outputs. Then rerun it with actual codex exec or API traces and replace the demo copy with measured artifacts.

mkdir -p comparisons/inbound-proposal-pack

codex exec -m gpt-5.4 "$(cat inbound-lead-packet.md)" \
  > comparisons/inbound-proposal-pack/gpt-5.4.md

codex exec -m gpt-5.5 "$(cat inbound-lead-packet.md)" \
  > comparisons/inbound-proposal-pack/gpt-5.5.md

If GPT-5.5 does not visibly reduce the human stitching work, do not route the workflow to GPT-5.5. Keep it checkpointed.

5. The routing rule

Use GPT-5.4 when the job is a narrow draft or a single artifact.

Use GPT-5.5 when the job needs:

messy source synthesis
business judgment under ambiguity
multiple output artifacts
tool or CRM state updates
risk review before handoff

That is the operator-facing story: GPT-5.5 is not just a better answer model. It is a better workflow-completion lane when the packet crosses sales, ops, docs, and review.

Steal this starter

# GPT-5.4 vs GPT-5.5 inbound proposal demo

## Frozen packet
- email thread:
- call notes:
- pricing rules:
- CRM fields:
- risk checklist:

## Required artifacts
- proposal summary
- CRM update
- follow-up email
- risk checklist

## Visible score
| Artifact | GPT-5.4 | GPT-5.5 | Human work left |
|---|---|---|---|
| Proposal | | | |
| CRM | | | |
| Follow-up | | | |
| Risk | | | |

## Routing decision

The demo works when the viewer can tell, at a glance, which lane created a business artifact they would actually use.

Get new workflows and breakdowns in your inbox.