Telemetry Is Math. Start With The Constraint.
Telemetry is math.
If it does not change a decision.
It is trivia.
I learned this the expensive way.
By asking engineering for “a few more events” every week.
Then wondering why nobody trusted the dashboard.
Then realizing the dashboard was not the problem.
My telemetry design was.
So I went back to first principles.
Pick an outcome.
Pick the smallest set of signals that explain it.
Turn those signals into triggers that drive actions.
And yes.
I started with regression.
It failed.
That was useful.
The Real Issue Was Shape
Most PLG motions are not smooth.
They look smooth in your head.
Because you imagine a buyer slowly warming up over time.
Then you look at behavior.
And it is a cliff.
Nothing changes.
Then the user hits a structural constraint.
Free tier cap.
Rate limit.
Quota.
Seat limit.
Usage ceiling.
Whatever your product calls the wall.
Now you do not have “more usage equals slightly higher probability.”
You have “fine, fine, fine, decision.”
That shape matters.
Because your modeling tool is only as good as your assumptions.
Why Regression Struggled
Regression works great when the world is continuous.
It wants a relationship that behaves like a dimmer switch.
A little more X.
A little more probability.
A capped PLG motion behaves like a turnstile.
You can push on it all day.
Nothing happens.
Then you hit the stop.
Then you decide whether to pay, churn, or hack around it.
In that world, regression can still work.
But it often needs help.
You end up engineering the step into the feature set.
At that point, you are spending time translating the product’s constraint into math.
Instead of letting the math surface the constraint.
So I changed tools.
Why CART Worked
A decision tree is built for thresholds.
It is not trying to explain the world with a single smooth equation.
It is trying to split the world into buckets that behave differently.
That maps to how PLG actually runs.
Because PLG operations are routing.
If usage crosses X, do Y
If cap proximity is within Z, prompt upgrade
If errors spike, intervene
If activity drops after onboarding, rescue
That is what a tree gives you.
A split you can point at.
A threshold you can build around.
In practice, the tree surfaced a tipping point around recent usage minutes.
That became the heavy tester window.
And it gave me something that matters more than a pretty model.
A defensible trigger.
Where Regression Comes Back In
Regression is still valuable.
When you use it as a filter and a validator.
Once the tree showed me where the signal lived, I stopped chasing clever behavioral features.
I focused on volume primitives.
Usage minutes.
Cap proximity.
Error volume.
Maybe one or two operational health signals.
Then multivariable logistic held up fine.
The takeaway was blunt.
Minutes carried most of the conversion signal.
That is the whole game for telemetry.
Find the boring variables that explain outcomes.
Boring is cheap to maintain.
Boring stays stable when the UI changes.
Boring survives redesigns, experiments, and onboarding tweaks.
Primitives Beat Event Spam
Most teams instrument nouns.
Clicked pricing
Viewed docs
Hovered tooltip
Opened modal
Those can be useful for product discovery.
They are also fragile.
They break when UX changes.
They create endless debate.
They grow into 200 events that nobody owns.
Primitives travel.
Time
Volume
Rate
Proximity to a constraint
Failures and friction counts
If you are building a system, primitives win.
Because they support thresholds.
Thresholds support plays.
Plays change outcomes.
Two Classes Of Telemetry
I bucket telemetry into two classes.
Action telemetry
Signals that trigger an intervention
Diagnosis telemetry
Signals that explain what happened after the fact
Both have value.
They should not be funded the same way.
Action telemetry deserves engineering time.
Because it drives routing.
Diagnosis telemetry should start lightweight.
Sampled.
Aggregated.
Added only when there is a specific learning goal.
If you treat every curiosity as a first-class event, you bury the team.
Then you get what you deserve.
A noisy dashboard and a cynical engineering org.
The Build Sequence
Here is the sequence that avoids wasted instrumentation.
Pick one outcome
Conversion, activation, retention, expansion
One at a time
Start with primitives
Minutes
Cap proximity
Errors
Latency
Retry volume
Anything that reflects friction or value
Use regression as a filter
Does this primitive explain the outcome at all
If it does not, kill it
Use CART to find thresholds
Where does behavior change
What window separates converters from non converters
Turn thresholds into plays
Product prompts
Lifecycle messaging
Sales assist
Support interventions
Whatever your motion uses
This keeps telemetry tied to decisions.
It keeps the backlog sane.
Cap Proximity Management
If your PLG motion has a cap, cap proximity management is core infrastructure.
You are not tracking minutes for fun.
You are tracking minutes because the wall creates a predictable moment.
The system you want is simple.
Measure how close someone is to the wall
Measure how fast they are approaching it
Measure friction near the wall
Trigger the right intervention before they collide with payment friction
That is where conversion happens.
Or does not.
Most teams wait until the user hits the wall.
Then they call it a “pricing problem.”
It is rarely a pricing problem.
It is a sequencing problem.
Engineering Checklist
This is what I ask engineering for now.
It is boring on purpose.
Core primitives
Rolling usage minutes by window (24h, 7d)
Remaining free capacity and days-to-cap at current pace
Error count and error rate near cap proximity
Latency and timeout rates if relevant
Core derived signals
Cap proximity band (far, near, imminent)
Heavy tester band based on the tree threshold
Friction band based on recent failures
If a proposed event cannot be mapped to one of those signals or to a specific decision, it does not ship.
The Point
Telemetry is not a scrapbook.
It is a control system.
Regression helps you identify which primitives matter.
CART helps you turn those primitives into thresholds.
Thresholds turn into plays.
Plays change outcomes.
Everything else can wait.

