Session 3 of 3
M0-S3: Evaluate Any AI Tool in 15 Minutes
~6 min read • 1,579 words Quiz
M0-S3: Evaluate Any AI Tool in 15 Minutes
Type: APPLY | Duration: 45 min | Prerequisites: M0-S2
Tools: Your Claude account + 1-2 other AI tools you've been meaning to try
In M0-S2, you learned the 5 mental models that govern AI-first PMs. Two of them are about to collide in this session.
Model 3 — "The PM's Judgment Is the Moat" — says that AI can compress admin but cannot make trade-off decisions. The same logic applies to AI tools themselves. You can't evaluate tools by watching demos or scanning ProductHunt reviews. You need a framework that forces YOU — the PM — to apply judgment systematically. That's the scorecard.
Model 4 — "Quality Gates Are for AI Too" — says AI output deserves the same review rigor as human work. Tools are no different. Before you adopt any tool into your workflow, it needs to pass a gate. The 5-factor scorecard IS that gate.
And from your M0-S2 task map: recall which of your tasks you classified as AI-feasible. That classification is your north star. A tool is only worth evaluating if it maps to a task you actually want automated. If a tool doesn't address your feedback synthesis pain point, its demo quality doesn't matter.
You built a system in M0-S1 (your weekly review Project). You classified your work in M0-S2. Now you're building the filter that protects your time from the firehose of new AI tools. This session is Model 4 applied to the market itself.
Every week there's a new AI tool for PMs. One promises auto-generated PRDs. Another promises "customer insight at scale." A third claims to automate your entire product workflow. Your LinkedIn feed is a firehose of demos. Every founder says their tool is "the one."
You cannot evaluate them all. You need a framework that tells you, in 15 minutes, whether a tool deserves more of your time or should be ignored until it matures. This session gives you that framework — the same one you'll use for every AI tool you encounter for the rest of your career.
For every AI tool you evaluate, answer these 5 questions. Not "is it cool?" Not "do other PMs use it?" These five:
1. Problem Fit — Does it solve YOUR actual problem?
What to ask: What does this tool claim to solve? Is that a real pain point in MY work — or a solution looking for a problem?
Red flags:
- The tool solves a problem you don't have ("AI-powered roadmap visualization" — you use a Google Sheet and it works fine)
- The tool claims to replace PM judgment ("AI decides what to build next")
- The tool description is vague about what it actually does ("revolutionize your workflow")
Green flags:
- You read the description and think "this is exactly the thing that takes me 3 hours every Friday"
- The tool has a specific, narrow use case (not "AI for PMs" but "AI that generates stakeholder updates from Jira data")
2. Workflow — How many steps from open to useful output?
What to ask: If I signed up right now, how many minutes until I have a useful output I'd actually use? Not a demo. Not a tutorial. MY work output.
Red flags:
- Requires setup beyond account creation (API keys, configuration files, integrations)
- No free tier — you can't test without paying
- "Book a demo" instead of "Try it now"
- Requires CLI, terminal, Python, or coding
- The onboarding tutorial is longer than 10 minutes
Green flags:
- Sign up with Google → upload a file → get output in under 10 minutes
- Free tier that lets you test with real (not sample) data
- Web UI — no installation, no configuration, no terminal
3. Setup Cost — What's the time investment to get to first useful output?
What to ask: Accounting for account creation, learning the interface, and generating my first useful output — how much time? Is the time investment proportional to the time it'll save me?
Red flags:
- Setup > 30 minutes before you see any output
- Requires importing data from 3 different tools before it works
- Needs ongoing configuration (weekly maintenance, context re-upload)
- Enterprise sales process — "we'll get back to you with pricing"
Green flags:
- First useful output in under 10 minutes
- Works with files you already have (export from Jira, upload a PDF, paste text)
- Remembers your context (you don't re-upload every time)
4. Maintenance Burden — What's the ongoing cost?
What to ask: After I set this up, what do I have to do every week/month to keep it working?
Red flags:
- Re-upload context every session
- Manually trigger every run (no scheduling)
- Output quality degrades over time unless you tweak settings
- Expensive — > $50/month for individual use without clear ROI
Green flags:
- "Set it once, it runs on schedule" (n8n, Claude Projects)
- Context persists across sessions
- Pricing is transparent and proportional to value
- Free tier covers the core workflow; paid is for power users
5. Integration — Does it work with my actual toolchain?
What to ask: Does this tool talk to Jira, Slack, Notion, Google Workspace, and/or Salesforce? Or is it a walled garden that requires me to work inside it?
Red flags:
- "Import your data" — but only supports CSV upload, not live integrations
- Creates a separate silo you have to maintain alongside your existing tools
- No export — output lives inside the tool, can't be sent to Slack/email/Notion
Green flags:
- Reads from your existing tools (Jira API, Google Drive, Slack)
- Output can be emailed, posted to Slack, or saved to Notion
- Works alongside your toolchain, doesn't try to replace it
After answering all 5 factors, assign one of three verdicts.
✅ Adopt
- Scores well on all 5 factors
- Solves a real pain point you have TODAY
- Takes < 15 minutes to first useful output
- Free or affordable
- Integrates with your toolchain
Action: Start using it this week. Incorporate into your workflow. Come back in 2 weeks and re-evaluate: is it saving the time you expected?
👀 Watch
- Promising but has 1-2 red flags
- Might be right for you in 3-6 months (missing integration, too expensive, too new)
- Solves a real problem but the execution isn't there yet
Action: Bookmark it. Set a calendar reminder for 3 months. Check if the red flags have been resolved. Don't spend more time on it now.
❌ Pass
- Multiple red flags
- Requires CLI, coding, or technical setup beyond your comfort level
- Claims to do things AI can't actually do (strategy, decision-making, judgment replacement)
- Solves a problem you don't have
Action: Move on. Don't feel FOMO. There will be 10 more tools next month. Your time is better spent mastering the tools you've already adopted than chasing every new launch.
Tool 1: Claude (You Already Use It)
Apply the 5-factor scorecard to Claude based on your M0-S1 experience. Be honest — critique where warranted.
| Factor | Your Assessment | Score (1-5) |
|---|---|---|
| Problem fit | Does it solve YOUR documentation/admin problem? | |
| Workflow | How fast from open to useful weekly review? | |
| Setup cost | How long to set up your Project? | |
| Maintenance | How much work to run it next Monday? | |
| Integration | Does it work with your existing tools? | |
| Verdict | Adopt / Watch / Pass |
Tool 2: Pick a Tool You've Been Curious About
Choose one AI tool you've seen mentioned, bookmarked, or been "meaning to try." Options if you don't have one:
- NotebookLM (notebooklm.google.com) — Google's free RAG tool, upcoming in Module 2
- Perplexity (perplexity.ai) — AI search with citations, good for competitive research
- Granola (granola.ai) — AI meeting notes, if you haven't tried one yet
| Factor | Your Assessment | Score (1-5) |
|---|---|---|
| Problem fit | ||
| Workflow | ||
| Setup cost | ||
| Maintenance | ||
| Integration | ||
| Verdict | Adopt / Watch / Pass |
You now have a decision framework for every AI tool you'll encounter. The landscape changes monthly. The framework doesn't.
When a colleague sends you a link with "you have to try this AI tool," you have two choices:
- Drop everything, sign up, spend 30 minutes exploring, realize it doesn't solve your problem
- Apply the 5-factor scorecard in 5 minutes, assign a verdict, move on with your day
Option 2 is how AI-first PMs operate. They don't chase every tool. They evaluate systematically, adopt what works, watch what's promising, and pass on the rest.
Save your scorecard template. Use it every time a new AI tool crosses your radar. In 6 months, you'll have a personal library of evaluated tools — adopted, watched, and passed — and you'll have wasted zero hours on demos that go nowhere.