← marketplace-ops-toolkit
Tool #12 · marketplace-ops-toolkit

Vendor Performance Scorecard

Rank your vendors against a weighted multi-dimensional scorecard. Surface underperformers, build defensible vendor conversation data, classify the portfolio into A / B / C tiers with sensitivity analysis on weights. Four operator presets for kit sourcing, 3PL last-mile, BPO offshore, and balanced multi-vendor ops.

Operator preset

Pick the scenario closest to your stack. Each preset rebalances the dimension weights for the operating context and loads a realistic vendor sample.

Dimension weights

Each dimension contributes to the weighted score 0 to 100. Total weights should sum to 100. Adjust to match what matters most for your operation.
How dimension scoring works

Each vendor is scored 0 to 100 on each dimension based on inputs you provide. The weighted total is the sum of (weight × dimension score). Scoring rules:

  • On-Time Delivery %: raw % becomes the score directly. 95% on-time = 95 points.
  • Defect Rate %: inverted. 2% defect = 98 points. Capped at 0 if defect rate above 20%.
  • Cost Variance %: 0% variance = 100 points. Each 1% over target subtracts 5 points. Each 1% under target adds 1 point (capped at 110, normalized to 100).
  • Lead Time Days: normalized against the best vendor in the set. Best = 100, others scale down (1 extra lead-time day = 4 point penalty).
  • Response Time Hours: normalized against the best vendor. Best = 100, others scale down (1 extra hour = 3 point penalty).

Vendors

Enter your vendors' actual performance numbers. Cost variance is % vs your target unit cost. Negative = under budget, positive = over.
Vendor Volumeunits / mo On-Time% Defect% Cost Var% vs target Lead Timedays Responsehours

Portfolio snapshot

Ranked scorecard

Sorted by weighted score, descending. Tier A = top 25%, C = bottom 25%.
# Vendor On-Time Defect Cost Var Lead Time Response Score Tier

Sensitivity analysis (top vendor score)

If you shift the weight on each dimension by ± 10 points, how much does your top vendor's score change? Bars show absolute change in points. Orange = change above 3 points (sensitive). Use this to validate weights before taking the scorecard to vendors.

Action recommendations

Generated from tier classification + cost variance + volume. Use as a starting point for vendor conversations, not the final decision.

Use cases I've seen this work

Patterns from operations I've run before. Adjust the framework to your specific business, but the operating logic transfers.
Case 1: Multifamily kit sourcing with overseas manufacturers
You're sourcing renovation kits direct from China and Vietnam manufacturers. Eight to fifteen suppliers across cabinets, flooring, lighting, plumbing fixtures, hardware. Lead times unpredictable, defect rates variable, communication latency from time zones makes everything slower.

How this maps: use the "Kit sourcing (overseas mfg)" preset. Lead time and defect rate weighted heaviest because a single container delay can blow a unit-turnover schedule. Cost variance also high because your first PO is the negotiation moment.

The conversation it enables: walk into each renegotiation with the vendor's own data. Best performers earn first-call status on the next PO. Worst performers either pivot to remediation plan or get cut in cycle two.
Case 2: 3PL last-mile carrier portfolio
You're running last-mile delivery with 8 to 15 carrier vendors across regions. Each carrier has different cost-per-handover, SLA performance, defect rate. Vendor performance drifts month over month and nobody has time to chase it down weekly.

How this maps: use the "3PL last-mile" preset. On-time and defect rate heaviest. Cost variance is the lever you use in quarterly business reviews.

What I built: a scorecard cadence with carrier data on the table every month. On-time delivery moved from 78% to 91% in two quarters across the carrier base, without losing a single carrier. The conversation gets tighter because the data does the talking.
Case 3: BPO / offshore support vendor evaluation
You've outsourced part or all of your support operations to an offshore BPO. Quality drift, response time creep, and escalation gaps show up as customer feedback months later. Hard to manage from inside the home office without a scorecard rhythm.

How this maps: use the "BPO offshore" preset. Defect rate and response time weighted heaviest because those are your two biggest leaks. Volume matters less here because the BPO is a single contract; you're scoring sub-team performance within it.

The reframe: instead of one weekly call with the BPO lead, you bring this scorecard to a daily 15-minute standup. Drift gets caught in days, not months.
Case 4: Renegotiation cycle prep ahead of contract renewal
You have a vendor master agreement renewing in 60 to 90 days. Walking into the renegotiation without numbers is the most expensive thing a procurement leader does.

How this maps: any preset works. Run the scorecard with the prior 12 months of data, sort by score, then look at which vendors are above the cost-variance line (paying more than they should given their performance).

The play: bring the scorecard to each renewal conversation. Anchor on the vendor's own performance, never on the peer comparison (private). Vendors self-correct because nobody wants their own data to tell the bad story.