EducationPillar V · Six-Step EvaluationAssess the Evidence
Capstone · Six-Step System · Step 4

Assess the evidence.

Inherited risk and data quality. Step 4 pivots from structural analysis to evidence assessment — track record depth, sample size, Sharpe ratio realism, and why live performance alone does not guarantee reliability.

In this article
  • The pivot from structural analysis to evidence assessment and the concept of inherited risk.
  • Equity curve texture and Sharpe ratio realism as diagnostic tools for data quality.
  • The dual-sufficiency requirement: track record length and sample size must both be adequate.
  • The live-versus-backtest reframing — why live status is necessary but not sufficient.

Steps 2 and 3 examine a system's structure. Step 4 marks a pivot: from structural analysis to evidence assessment. The question is no longer whether the system is built soundly, but whether the evidence supporting its performance claims deserves the weight being placed on it.

This step deploys the Performance Validation pillar toolkit and introduces the concept of inherited risk — the risk an investor absorbs by trusting a track record whose evidentiary foundation may not warrant that trust. A system can pass both structural tests and still present evidence that is too thin, too short, or too favorable to support reliable conclusions.

§ 01

The pivot from structure to evidence.

A structurally sound system operating on live markets with genuine risk management can still present inherited risk through its evidence base. The track record may be too short to encompass different market regimes. The sample size may be too small for statistical conclusions. The performance data may come from a backtest rather than live trading.

Each of these evidence quality dimensions introduces a distinct form of inherited risk. The investor who deploys capital based on a track record that does not meet evidentiary standards inherits the risk that the track record does not represent what the system will actually deliver.

D
Definition
Inherited risk
The risk an investor absorbs by trusting a track record whose evidentiary foundation may not warrant that trust. Structure and evidence are independent dimensions: a system with sound architecture and insufficient evidence has not demonstrated reliability.
§ 02

Diagnostic checks for evidence quality.

A system operating in real market conditions produces an equity curve with texture: meaningful drawdowns, flat periods, varying rates of recovery, and visible differences between strong and weak performance periods. An equity curve that is too consistently upward raises a specific evidentiary question: does this curve reflect real conditions, or parameters optimized against historical data?

Sharpe ratio realism threshold
Sharpe > 3.0
Sustained Sharpe ratios above 3.0 are virtually nonexistent in audited performance data. A system claiming sustained ratios above this level presents an evidence quality signal, not a performance quality signal. The evaluator examines whether the ratio reflects genuine risk-adjusted returns or an artifact of a short sample period, favorable conditions, or optimized parameters.
§ 03

The dual-sufficiency requirement.

Both track record length and sample size must be independently sufficient. These are separate conditions, and meeting one does not compensate for failing the other.

Requirement A
Track record length
A sufficient track record spans multiple market conditions: trending, ranging, high-volatility events, low-volatility consolidation. A system operating only during favorable conditions has not demonstrated behavior under the conditions that will test its architecture.
Requirement B
Sample size
Statistical conclusions from small samples are unreliable regardless of how favorable the results appear. A system with 40 trades over two years has length but not sample depth. The confidence intervals are too wide for the precision marketing implies.

The evidence base is only as strong as the weaker of these two dimensions. A five-year track record with 30 total trades has temporal length but insufficient statistical foundation. A system with 500 trades over three months has statistical volume but insufficient temporal diversity.

§ 04

Reframing the live vs. backtest question.

The common assumption is binary: live performance is reliable, and backtested performance is not. The Institute's framework challenges this simplification. Live performance is preferable, all else being equal. But the live-versus-backtest distinction is not the question that determines evidence quality.

!
Key finding
Live performance without sufficient track record length and sample size is just as unreliable as a backtest. A system trading live for three months with 47 trades has not generated enough evidence to establish reliability, despite every trade being executed in real market conditions. The diagnostic question is not "is it live?" but "is the evidence base sufficient to support the claims being made?"
!
Key takeaway
Step 4 ensures that structural soundness is not confused with evidentiary sufficiency. Structure measures what the system does. Evidence measures how much confidence the data supports. They are weighted independently because they measure different things.
§ 05

Frequently asked questions.

QHow do you assess evidence quality in algo trading systems?

The Algo Institute's Step 4 examines four dimensions: equity curve texture, Sharpe ratio realism (sustained ratios above 3.0 are virtually nonexistent in audited data), dual sufficiency of track record length and sample size, and whether performance comes from live trading or backtesting. Live performance alone does not guarantee evidence reliability.

QIs live algo trading performance always more reliable than backtested?

Not necessarily. Live performance without sufficient track record length and sample size is just as unreliable as a backtest. A system trading live for three months with minimal trades has not generated enough evidence to establish reliability. The question is whether the evidence base is sufficient, not whether it is live.

QWhat Sharpe ratio is realistic for an algorithmic trading system?

A sustained Sharpe ratio above 3.0 is virtually nonexistent in audited performance data. A system claiming sustained ratios above this threshold warrants scrutiny of its evidence base — the ratio may reflect a favorable sample period, overfitted parameters, or backtest optimization rather than genuine risk-adjusted performance.

Cite
The Algo Institute, "Step 4 — Assess the Evidence: Inherited Risk and Data Quality," Six-Step Evaluation System, filed 24 May 2026, Methodology v3.1. thealgoinstitute.com/six-step-system/assess-evidence/