Skip to content
AI First builder

Session 3 of 3

M0-S3: Evaluate Any AI Tool in 15 Minutes

~6 min read • 1,579 words Quiz

M0-S3: Evaluate Any AI Tool in 15 Minutes

Type: APPLY | Duration: 45 min | Prerequisites: M0-S2
Tools: Your Claude account + 1-2 other AI tools you've been meaning to try


In M0-S2, you learned the 5 mental models that govern AI-first PMs. Two of them are about to collide in this session.

Model 3 — "The PM's Judgment Is the Moat" — says that AI can compress admin but cannot make trade-off decisions. The same logic applies to AI tools themselves. You can't evaluate tools by watching demos or scanning ProductHunt reviews. You need a framework that forces YOU — the PM — to apply judgment systematically. That's the scorecard.

Model 4 — "Quality Gates Are for AI Too" — says AI output deserves the same review rigor as human work. Tools are no different. Before you adopt any tool into your workflow, it needs to pass a gate. The 5-factor scorecard IS that gate.

And from your M0-S2 task map: recall which of your tasks you classified as AI-feasible. That classification is your north star. A tool is only worth evaluating if it maps to a task you actually want automated. If a tool doesn't address your feedback synthesis pain point, its demo quality doesn't matter.

You built a system in M0-S1 (your weekly review Project). You classified your work in M0-S2. Now you're building the filter that protects your time from the firehose of new AI tools. This session is Model 4 applied to the market itself.


Every week there's a new AI tool for PMs. One promises auto-generated PRDs. Another promises "customer insight at scale." A third claims to automate your entire product workflow. Your LinkedIn feed is a firehose of demos. Every founder says their tool is "the one."

You cannot evaluate them all. You need a framework that tells you, in 15 minutes, whether a tool deserves more of your time or should be ignored until it matures. This session gives you that framework — the same one you'll use for every AI tool you encounter for the rest of your career.


For every AI tool you evaluate, answer these 5 questions. Not "is it cool?" Not "do other PMs use it?" These five:

1. Problem Fit — Does it solve YOUR actual problem?

What to ask: What does this tool claim to solve? Is that a real pain point in MY work — or a solution looking for a problem?

Red flags:

  • The tool solves a problem you don't have ("AI-powered roadmap visualization" — you use a Google Sheet and it works fine)
  • The tool claims to replace PM judgment ("AI decides what to build next")
  • The tool description is vague about what it actually does ("revolutionize your workflow")

Green flags:

  • You read the description and think "this is exactly the thing that takes me 3 hours every Friday"
  • The tool has a specific, narrow use case (not "AI for PMs" but "AI that generates stakeholder updates from Jira data")

2. Workflow — How many steps from open to useful output?

What to ask: If I signed up right now, how many minutes until I have a useful output I'd actually use? Not a demo. Not a tutorial. MY work output.

Red flags:

  • Requires setup beyond account creation (API keys, configuration files, integrations)
  • No free tier — you can't test without paying
  • "Book a demo" instead of "Try it now"
  • Requires CLI, terminal, Python, or coding
  • The onboarding tutorial is longer than 10 minutes

Green flags:

  • Sign up with Google → upload a file → get output in under 10 minutes
  • Free tier that lets you test with real (not sample) data
  • Web UI — no installation, no configuration, no terminal

3. Setup Cost — What's the time investment to get to first useful output?

What to ask: Accounting for account creation, learning the interface, and generating my first useful output — how much time? Is the time investment proportional to the time it'll save me?

Red flags:

  • Setup > 30 minutes before you see any output
  • Requires importing data from 3 different tools before it works
  • Needs ongoing configuration (weekly maintenance, context re-upload)
  • Enterprise sales process — "we'll get back to you with pricing"

Green flags:

  • First useful output in under 10 minutes
  • Works with files you already have (export from Jira, upload a PDF, paste text)
  • Remembers your context (you don't re-upload every time)

4. Maintenance Burden — What's the ongoing cost?

What to ask: After I set this up, what do I have to do every week/month to keep it working?

Red flags:

  • Re-upload context every session
  • Manually trigger every run (no scheduling)
  • Output quality degrades over time unless you tweak settings
  • Expensive — > $50/month for individual use without clear ROI

Green flags:

  • "Set it once, it runs on schedule" (n8n, Claude Projects)
  • Context persists across sessions
  • Pricing is transparent and proportional to value
  • Free tier covers the core workflow; paid is for power users

5. Integration — Does it work with my actual toolchain?

What to ask: Does this tool talk to Jira, Slack, Notion, Google Workspace, and/or Salesforce? Or is it a walled garden that requires me to work inside it?

Red flags:

  • "Import your data" — but only supports CSV upload, not live integrations
  • Creates a separate silo you have to maintain alongside your existing tools
  • No export — output lives inside the tool, can't be sent to Slack/email/Notion

Green flags:

  • Reads from your existing tools (Jira API, Google Drive, Slack)
  • Output can be emailed, posted to Slack, or saved to Notion
  • Works alongside your toolchain, doesn't try to replace it

After answering all 5 factors, assign one of three verdicts.

✅ Adopt

  • Scores well on all 5 factors
  • Solves a real pain point you have TODAY
  • Takes < 15 minutes to first useful output
  • Free or affordable
  • Integrates with your toolchain

Action: Start using it this week. Incorporate into your workflow. Come back in 2 weeks and re-evaluate: is it saving the time you expected?

👀 Watch

  • Promising but has 1-2 red flags
  • Might be right for you in 3-6 months (missing integration, too expensive, too new)
  • Solves a real problem but the execution isn't there yet

Action: Bookmark it. Set a calendar reminder for 3 months. Check if the red flags have been resolved. Don't spend more time on it now.

❌ Pass

  • Multiple red flags
  • Requires CLI, coding, or technical setup beyond your comfort level
  • Claims to do things AI can't actually do (strategy, decision-making, judgment replacement)
  • Solves a problem you don't have

Action: Move on. Don't feel FOMO. There will be 10 more tools next month. Your time is better spent mastering the tools you've already adopted than chasing every new launch.


Tool 1: Claude (You Already Use It)

Apply the 5-factor scorecard to Claude based on your M0-S1 experience. Be honest — critique where warranted.

FactorYour AssessmentScore (1-5)
Problem fitDoes it solve YOUR documentation/admin problem?
WorkflowHow fast from open to useful weekly review?
Setup costHow long to set up your Project?
MaintenanceHow much work to run it next Monday?
IntegrationDoes it work with your existing tools?
VerdictAdopt / Watch / Pass

Tool 2: Pick a Tool You've Been Curious About

Choose one AI tool you've seen mentioned, bookmarked, or been "meaning to try." Options if you don't have one:

  • NotebookLM (notebooklm.google.com) — Google's free RAG tool, upcoming in Module 2
  • Perplexity (perplexity.ai) — AI search with citations, good for competitive research
  • Granola (granola.ai) — AI meeting notes, if you haven't tried one yet
FactorYour AssessmentScore (1-5)
Problem fit
Workflow
Setup cost
Maintenance
Integration
VerdictAdopt / Watch / Pass

You now have a decision framework for every AI tool you'll encounter. The landscape changes monthly. The framework doesn't.

When a colleague sends you a link with "you have to try this AI tool," you have two choices:

  1. Drop everything, sign up, spend 30 minutes exploring, realize it doesn't solve your problem
  2. Apply the 5-factor scorecard in 5 minutes, assign a verdict, move on with your day

Option 2 is how AI-first PMs operate. They don't chase every tool. They evaluate systematically, adopt what works, watch what's promising, and pass on the rest.


Save your scorecard template. Use it every time a new AI tool crosses your radar. In 6 months, you'll have a personal library of evaluated tools — adopted, watched, and passed — and you'll have wasted zero hours on demos that go nowhere.