Demo GPT-5.5 with an inbound lead to proposal pack
A visible GPT-5.4 vs GPT-5.5 workflow demo for founders and operators: give both models the same messy inbound lead packet, then compare how much of the proposal, CRM update, follow-up email, and risk review each lane completes.
- Time
- 45 to 75 minutes
- Cost
- $5 to $20
- Stack
- GPT-5.4GPT-5.5CRM notespricing rulesproposal templaterisk checklist
You’re stuck with
You need to show a model upgrade in a way a business user can feel. Benchmarks are too abstract, and side-by-side prose outputs do not show workflow completion.
You end up with
A demoable two-lane replay: GPT-5.4 handles the narrow drafting work but pauses for human stitching; GPT-5.5 carries the larger packet and returns a proposal pack ready for review.
This workflow produced
GPT-5.5 Workflow Completion Map
A practical scorecard for GPT-5.5: agentic coding, computer work, research loops, review workflows, and the routing rules I would actually use after the release.
The recipe
The easiest way to make a model upgrade visible is to stop comparing answers. Compare finished workflow state instead.
For a ShipWithTez audience, the best workflow is not a puzzle, a benchmark, or an abstract coding task. It is a messy operator task with money nearby:
Messy inbound lead -> client-ready proposal pack
Give GPT-5.4 and GPT-5.5 the same packet. The visible difference is not prose quality. It is how many surfaces each model can complete before the human has to take over.
1. Freeze the inbound packet
Use one packet with five pieces:
# Inbound lead packet
## Email thread
- founder asks for help automating a weekly reporting workflow
- budget range is implied, not explicit
- timeline is "before next board meeting"
## Call transcript
- current process takes 6 hours every Friday
- data comes from Stripe, HubSpot, and a spreadsheet
- founder wants a weekly PDF plus Slack summary
## Pricing rules
- under 10 hours of setup: productized audit
- 10 to 30 hours: fixed proposal
- over 30 hours: phased discovery first
## CRM fields
- company:
- pain:
- budget signal:
- urgency:
- next action:
## Risk checklist
- unclear buyer
- unclear data access
- unclear success metric
- compliance-sensitive data
Both lanes get this exact packet.
2. Ask for four artifacts, not one answer
The output should be easy to inspect on a screen:
- proposal summary
- CRM update
- follow-up email
- risk checklist
That matters because a founder can see the workflow difference without reading a long transcript. GPT-5.4 might write a good proposal section but leave CRM and risk work incomplete. GPT-5.5 should earn the premium by returning a fuller pack.
3. Score visible completion
Use a small rubric:
| Artifact | GPT-5.4 question | GPT-5.5 question |
|---|---|---|
| Proposal | Did it draft the offer? | Did it draft the offer and choose the right scope? |
| CRM | Did it identify useful fields? | Did it fill fields with evidence from the packet? |
| Follow-up | Did it write a nice email? | Did it ask for the exact missing inputs? |
| Risk | Did it list generic risks? | Did it flag the concrete blockers before sending? |
| Human work left | What still needs stitching? | What only needs review? |
The winner is not the model with prettier language. The winner is the one that leaves less operational residue.
4. Make the demo honest
Label the first version as an illustrative replay unless you have saved real
model outputs. Then rerun it with actual codex exec or API traces and replace
the demo copy with measured artifacts.
mkdir -p comparisons/inbound-proposal-pack
codex exec -m gpt-5.4 "$(cat inbound-lead-packet.md)" \
> comparisons/inbound-proposal-pack/gpt-5.4.md
codex exec -m gpt-5.5 "$(cat inbound-lead-packet.md)" \
> comparisons/inbound-proposal-pack/gpt-5.5.md
If GPT-5.5 does not visibly reduce the human stitching work, do not route the workflow to GPT-5.5. Keep it checkpointed.
5. The routing rule
Use GPT-5.4 when the job is a narrow draft or a single artifact.
Use GPT-5.5 when the job needs:
- messy source synthesis
- business judgment under ambiguity
- multiple output artifacts
- tool or CRM state updates
- risk review before handoff
That is the operator-facing story: GPT-5.5 is not just a better answer model. It is a better workflow-completion lane when the packet crosses sales, ops, docs, and review.
Steal this starter
# GPT-5.4 vs GPT-5.5 inbound proposal demo
## Frozen packet
- email thread:
- call notes:
- pricing rules:
- CRM fields:
- risk checklist:
## Required artifacts
- proposal summary
- CRM update
- follow-up email
- risk checklist
## Visible score
| Artifact | GPT-5.4 | GPT-5.5 | Human work left |
|---|---|---|---|
| Proposal | | | |
| CRM | | | |
| Follow-up | | | |
| Risk | | | |
## Routing decision
The demo works when the viewer can tell, at a glance, which lane created a business artifact they would actually use.
Get new workflows and breakdowns in your inbox.