Diversity or Precision? A Deep Dive into Next Token Prediction¶
ArXiv: 2512.22955
🎯 Pitch¶
Reframes next-token cross-entropy as a single-step policy-gradient and introduces a reward‑shaping pre‑training objective that explicitly trades off precision (peaked probability on the ground truth) versus diversity (higher local/global entropy). By sculpting the token-output distribution before RL, the paper shows that precision-oriented priors (e.g., rewarding positives or suppressing tail negatives) create a more effective exploration space for on‑policy RL, yielding consistently better end‑to‑end reasoning performance across dense and MoE models up to 10B.
1. Executive Summary (2-3 sentences)¶
This work studies how the pre-trained next-token probability distribution controls the “exploration space” available to later reinforcement learning (RL) for reasoning in large language models (LLMs). It reframes standard next-token cross-entropy as a single-step policy-gradient objective and then introduces a reward-shaping pre-training loss that can explicitly trade off precision (peaked probability on the correct token) vs diversity (higher-entropy distributions). Across dense and MoE models up to 10B-A0.5B, the experiments show that a precision-oriented pre-training prior (e.g., β < 0 or penalizing tail negative tokens) yields better downstream RL reasoning performance than a global high-entropy prior.
2. Context and Motivation¶
- Problem / gap.
- RL training for reasoning (especially on-policy RL with verifiable rewards) is sensitive to what the model can plausibly generate early in training.
-
That early behavior is determined by the pre-trained model’s token distribution
πθ(· | s), so pre-training implicitly sets the “reachable” exploration space for later RL. -
Why it matters.
- If the initial token distribution makes the model explore poorly (e.g., collapses too early, or wastes mass on implausible tail tokens), RL may converge slowly or to worse solutions—even if RL has strong rewards.
-
A controllable way to shape the token distribution during pre-training could improve end-to-end reasoning, not just perplexity.
-
Prior approaches and their shortcomings (as framed here).
- Standard pre-training uses
cross-entropy(teacher forcing) and does not explicitly assign meaningful structure to “negative” tokens beyond the softmax constraint. -
Common modifications like
label smoothingandfocal losschange the effective weighting of examples/tokens, but they are not presented as a unified on-policy reward design for exploration shaping. -
Positioning.
- The core conceptual move is to treat next-token prediction as a stochastic decision process and interpret cross-entropy as a special case of
policy gradientin a single-step episode. - Then, the work proposes a generalized reward function that can systematically manipulate entropy and negative-token treatment to study downstream RL effects.
3. Technical Approach¶
3.1 Reader orientation (approachable technical breakdown)¶
- The system is a pre-training objective for autoregressive LLMs that replaces plain cross-entropy with a policy-gradient-style, reward-shaped next-token training rule.
- It solves the problem of controlling the pre-trained token distribution (entropy, head vs tail mass) so that subsequent on-policy RL has a better exploration starting point.
3.2 Big-picture architecture (diagram in words)¶
- Input: A context/prefix
s_t = X_{<t}and a ground-truth next tokenx_tfrom the dataset. - Model policy: The LLM defines a token distribution
πθ(· | s_t)over vocabularyV. - Action sampling (on-policy): Sample a token
a_t ~ πθ(· | s_t). - Reward shaping module: Compute a token-level reward
\bar r(s_t, a_t)using: - a positive reward term (only if
a_t = x_t) scaled byβ, and - a rank-aware negative reward term (if
a_t ≠ x_t) using Top-ksets and (\tilde λ,\hat λ). - Policy-gradient update: Apply
E[ \bar r(s_t, a_t) ∇θ log πθ(a_t | s_t) ](withstop-gradientthrough reward construction).
3.3 Roadmap for the deep dive¶
- Explain next-token prediction as an RL problem (states/actions/rewards).
- Derive cross-entropy as a single-step policy-gradient with an intrinsic reward.
- Introduce the generalized reward with:
- global entropy control via
β(positive token shaping), - local entropy/head–tail control via Top-
kand (\tilde λ,\hat λ) (negative token shaping). - Connect reward shaping to distributional effects (entropy, head/tail mass) and why that matters for downstream RL.
- Detail the training pipeline (pre-training → mid-training → RLVR) and the exact hyperparameters used in experiments.
3.4 Detailed, sentence-based technical breakdown¶
-
Framing. This is primarily an algorithmic/objective-function paper with an empirical study: the core idea is that next-token pre-training can be written as single-step policy optimization, which enables explicit reward shaping to sculpt the initial policy for later RL.
-
Step 1: Next-token prediction as a decision process (Eq. (1)–(5)).
- A token sequence is
X = {x1, …, xn}. - At time
t, the state is the prefixs_t = X_{<t}and the action is the next tokena_t ∈ V, sampled from the model policyπθ(· | s_t). - A general RL objective maximizes expected cumulative reward:
J(θ) = E_{τ~πθ} [ Σ_{t=1..n} r(s_t, a_t) ](Eq. (1)),- with policy gradient
∇θ J(θ) = E[ Σ_t (G_t - b(s_t)) ∇θ log πθ(a_t|s_t) ](Eq. (3)).
- To connect to next-token prediction, the work treats each token emission as a complete episode, i.e., for a fixed state
s_t:J_t(θ | s_t) = E_{a_t~πθ(·|s_t)} [ r(s_t, a_t) ](Eq. (4)),∇θ J_t(θ | s_t) = E[ r(s_t, a_t) ∇θ log πθ(a_t | s_t) ](Eq. (5)).
-
A key constraint highlighted is that for this to be consistent with the single-step formulation, the reward
r(s_t, a_t)must depend only on the immediate(s_t, a_t)pair. -
Step 2: Cross-entropy as a special case (Eq. (6)–(10)).
- Standard supervised pre-training maximizes the log-likelihood of the ground-truth next token:
J_CE(θ) = log πθ(x_t | s_t)(Eq. (6)),∇θ J_CE(θ) = ∇θ log πθ(x_t | s_t)(Eq. (7)).
- The work rewrites this gradient as an expectation over the model’s own distribution
πθ(·|s_t)by inserting an indicator1(a_t = x_t)and converting sums over the vocabulary into expectations (Eq. (8)–(9)). - This yields an “intrinsic reward” view of cross-entropy:
r_CE(s_t, a_t) = sg( 1(a_t = x_t) / πθ(a_t | s_t) )(Eq. (10)),- where
sg(·)is a stop-gradient operator (the reward is treated as a constant w.r.t. θ when taking gradients).
-
Mechanistically:
- If the sampled token matches the ground truth, reward is
1 / πθ(x_t|s_t), so low-probability correct tokens get larger reward, producing larger updates. - If the token is incorrect, reward is exactly
0. - Negative tokens are suppressed implicitly via the softmax constraint
Σ_{a∈V} πθ(a|s_t) = 1: increasing the correct token’s probability forces others down.
- If the sampled token matches the ground truth, reward is
-
Micro-example (to make Eq. (10) concrete).
- Suppose the vocabulary is
{A, B, C}and the ground-truth token isB. - If the model currently assigns
πθ(B|s) = 0.05, then whena=Bis sampled,r_CE = 1/0.05 = 20, making the gradient step large and pushing probability mass strongly towardB. - If instead
πθ(B|s) = 0.8, thenr_CE = 1/0.8 = 1.25, yielding a smaller update (the model is already confident). -
If the sample is
AorC, reward is0, and the only “pressure” on those tokens comes indirectly from raisingπθ(B|s). -
Step 3: Generalized reward shaping to control diversity vs precision (Eq. (11)–(13)).
-
The work introduces a generalized single-step reward
\bar r(s_t, a_t)that separately designs: 1) positive-token reward strength (global entropy control), and
2) negative-token treatment (local entropy/head–tail control). -
(A) Positive reward shaping with
β(Eq. (11)).- For the correct token case (
a_t = x_t), define: \bar r_pos(s_t, a_t) = sg( (1 / πθ(a_t|s_t)) * (1 - πθ(a_t|s_t))^β )(Eq. (11)).- Interpretation:
- The factor
(1 - π)^βmodulates reward depending on confidence. - If
β < 0, then whenπis not near 1,(1-π)^βis larger, amplifying the reward and pushing the model to concentrate mass more aggressively on the ground truth—lower global entropy / higher precision. - If
β > 0, the reward is attenuated, and the model is less forced to concentrate on the ground truth—higher global entropy / more diversity. - The baseline cross-entropy behavior is recovered at
β = 0(since(1-π)^0 = 1).
- For the correct token case (
-
(B) Rank-aware negative shaping with Top-
kand (\tilde λ,\hat λ) (Eq. (12)).- Define
K_t = TopK(πθ(· | s_t), k), the set of the top-kmost probable tokens under the current policy. - For incorrect tokens (
a_t ≠ x_t), assign: \bar r_neg(s_t, a_t) = \tilde λ · 1(a_t ∈ K_t ∧ a_t ≠ x_t) + \hat λ · 1(a_t ∉ K_t ∧ a_t ≠ x_t)(Eq. (12)).- Interpretation:
\tilde λcontrols how the model treats high-ranking incorrect alternatives (the “head” competitors).- If
\tilde λis positive, it rewards sampling plausible-but-wrong head tokens, which can preserve multiple plausible continuations.
- If
\hat λcontrols how the model treats low-probability tail tokens.- If
\hat λis negative, it penalizes the tail, concentrating mass into the head and away from implausible tokens.
- If
- Define
-
(C) Combined reward (Eq. (13)).
- The final reward is:
\bar r(s_t, a_t) = \bar r_pos(s_t, a_t) · 1(a_t = x_t) + \bar r_neg(s_t, a_t) · 1(a_t ≠ x_t)(Eq. (13)).- Setting
β = 0, \tilde λ = 0, \hat λ = 0recovers standard cross-entropy as a special case.
-
Step 4: Training pipeline and configurations (Section 3.1 + Appendices A/B).
- The end-to-end pipeline has three stages:
1) Pre-training:
500Btokens (general knowledge focused). 2) Mid-training:100Btokens, includes ~5%synthetic data and more reasoning-oriented content; synthetic long-reasoning data is deliberately excluded to observe “activation trends” of long-CoT capability. 3) RLVR: on mathematical reasoning tasks. - Architectures (Table 1). Models include:
- Dense:
1B(28 layers,d_model=1536,d_ffn=4608,n_head=16,n_kvhead=4), and4B(36 layers,d_model=2560,d_ffn=9728,n_head=32,n_kvhead=8). - MoE:
5B-A0.3B(12 layers,d_model=1024,d_expert=320,n_head=32,n_kvhead=4, total expertsE=384, active expertsE_a=12) and10B-A0.5B(16 layers,d_model=1536,d_expert=320,n_head=32,n_kvhead=4,E=384,E_a=12). - MoE training uses an “auxiliary loss free approach” (Appendix A.2 references Liu et al., 2024), but the internal details are not expanded in the provided excerpt.
- Dense:
- Pre-training + mid-training optimizer/schedule (Appendix A.1).
- Optimizer:
AdamW, weight decay0.1. - Gradient clipping:
1.0. - Learning-rate schedule: “warmup-stable-decay.”
- Pre-training: warmup
2000steps, then stable LR3 × 10^-4over500Btokens. - Mid-training: LR decays from
3 × 10^-4to3 × 10^-5over100Btokens. - Global batch size:
16M(as written). - Sequence length: max
4096in pre-training; extended to16384in mid-training. - Long-context adjustment: RoPE base frequency increased from
1e4to1e6in mid-training.
- Optimizer:
- Reward-shaping hyperparameters explored (Section 3.1).
- Positive shaping:
β = -0.25(precision/low entropy) andβ = 0.5(higher entropy) compared to baselineβ=0. - Negative shaping:
- Tail penalty:
\hat λ = -0.1, \tilde λ = 0, k=100 - Head reward:
\hat λ = 0, \tilde λ = 0.1, k=100
- Positive shaping:
- RLVR setup (Appendix B.1).
- Algorithm: on-policy
GRPO(named; internal derivation not given here). - No KL regularization.
- Stabilization: “clip-higher” and “dynamic sampling” strategies (named, not fully specified).
- Two-stage RL sequence length: first
700steps at8K, then continue at16K. - RL batch size:
128. - RL learning rate: constant
1 × 10^-6. - During RL training: sample
16outputs per prompt, temperature1.0.
- Algorithm: on-policy
4. Key Insights and Innovations¶
- (1) Cross-entropy as single-step policy gradient with an explicit intrinsic reward (Eq. (10)).
- Novelty here is not “policy gradient exists,” but the specific mapping:
- cross-entropy gradient equals
E_{a~πθ}[ r_CE(s,a) ∇ log πθ(a|s) ]withr_CE(s,a)=sg(1(a=x)/πθ(a|s)).
- cross-entropy gradient equals
-
Significance: this makes next-token pre-training directly compatible with RL-style reward design, enabling controlled experiments on exploration shaping.
-
(2) A unified reward-shaped pre-training objective that subsumes cross-entropy (Eq. (11)–(13)).
- The method provides two “knobs”:
βfor global entropy / precision via positive reward scaling,- (
\tilde λ,\hat λ,k) for local entropy / head–tail redistribution via rank-aware negative rewards.
-
Significance: it separates effects that cross-entropy conflates (rewarding correct token vs structuring negatives).
-
(3) Rank-aware asymmetry between high-probability and tail negatives (Eq. (12)).
- Many losses treat all negatives similarly (implicitly or explicitly).
- Here, negatives are split into:
- plausible competitors (
Top-k), and - tail tokens (outside
Top-k), with different reward signs/magnitudes.
- plausible competitors (
-
Significance: this targets where diversity is preserved (head) vs suppressed (tail), which the experiments link to RL stability and performance.
-
(4) Empirical claim: precision-oriented priors can improve RL exploration more than high entropy.
- The central empirical takeaway contradicts the simple heuristic “higher entropy ⇒ better exploration.”
- Significance: it suggests exploration quality depends on structured probability mass (credible head vs noisy tail), not just entropy magnitude.
5. Experimental Analysis¶
Evaluation methodology (Section 3.2)¶
- Base-model evaluation (pre-trained and mid-trained checkpoints).
- Capabilities grouped into:
- Knowledge-based: general knowledge + commonsense reasoning.
- Reasoning-based: logical reasoning + mathematics + coding.
- Benchmarks (19 total) include:
- General knowledge:
MMLU (4-shot, CoT),MMLU-Pro (5-shot, CoT),TriviaQA (5-shot),NaturalQuestions (5-shot). - Commonsense:
Hellaswag,SIQA,PIQA,WinoGrande,OpenBookQA,CommonsenseQA. - Logic:
ARC-Easy,ARC-Challenge,BBH (3-shot, CoT). - Math:
GSM8K,MATH-500,Minerva,OlympiadBench. - Code:
HumanEval+,MBPP+.
- General knowledge:
- For tasks needing multiple samples (math/code), they use
Pass@kwith the unbiased estimator (Eq. (14)), sampling:m = 128responses, temperature0.7, top-p0.95, reportingPass@64.
-
Max output length:
4Kfor pre-trained models,16Kfor mid-trained models.
-
RL model evaluation.
- Datasets:
AMC23,AIME(labeled asAIME24andAIME25in RL tables),MATH-500,Minerva,OlympiadBench. - Sampling:
128responses per problem. - Metrics:
Avg@128: average accuracy over 128 samples.Cons@128: majority-vote accuracy over 128 samples.Pass@64: pass rate with 64 (from the same sampling pool notion).
Main quantitative results¶
(A) Pre-training: perplexity converges similarly, entropy shifts (Figures 1–2)¶
- Figures 1–2 show:
PPLcurves converge to similar low values across configurations for both dense and MoE models.- Entropy is strongly affected by
β:β < 0reduces entropy (more peaked distribution),β > 0maintains higher entropy (flatter distribution).
- This supports the claim that reward shaping can change distributional properties without obviously harming next-token predictive convergence (as measured by PPL).
(B) Pre-training: performance at 500B tokens favors precision as scale grows (Tables 2–5)¶
- The pattern is most visible in larger models (4B, 5B MoE, 10B MoE):
- 4B dense, overall average at 500B tokens (Table 3):
β=-0.25:43.11β=0:42.62β=0.50:42.44
- 10B-A0.5B MoE, overall average at 500B tokens (Table 5):
β=-0.25:44.89β=0:44.11β=0.50:44.52
- Interpretation consistent with the paper’s narrative: precision-oriented shaping (
β < 0) tends to help scaling/performance at larger capacity, even if not uniformly best at small scale (e.g., 1B has mixed deltas in Table 2).
(C) Mid-training: β=-0.25 is consistently strong; negative shaping effects are smaller (Tables 10–13)¶
- 4B dense at 100B mid-training tokens (Table 10):
- Average:
β=-0.25→52.76,β=0→52.75,β=0.50→52.23. - The main separation is between
β=0.50and the other two. - 10B-A0.5B MoE at 100B mid-training tokens (Table 11):
- Average:
β=-0.25→51.13,β=0→50.85,β=0.50→50.60. - Negative shaping in mid-training:
- For 4B (Table 12),
\hat λ=-0.1yields the best overall average at100B(53.00). - For 10B (Table 13),
\tilde λ=0.1is slightly higher (51.06) than\hat λ=-0.1(50.92) and baseline (50.85) at100B. - This suggests mid-training results do not cleanly pick a single negative-shaping winner, but they do not contradict the main downstream RL findings.
(D) RLVR: precision-oriented pre-training improves downstream reasoning (Tables 14–23)¶
The RL stage is where the clearest separations appear, especially for the 10B-A0.5B MoE model.
- Global entropy control via
β(baseline vsβ=-0.25vsβ=0.50). - 10B-A0.5B MoE, RL Average
Pass@64at step 1000:- Baseline
β=0(Table 15):47.52 - Precision-oriented
β=-0.25(Table 18):50.75 - Higher-entropy
β=0.50(Table 19):49.59
- Baseline
- This directly supports the main claim:
β=-0.25provides a better RL initialization than both baseline and high-entropy in this setting. -
For 4B dense, differences are smaller, but
β=-0.25beatsβ=0.50at step 1000 on averagePass@64:β=-0.25(Table 16):50.43β=0.50(Table 17):50.09- Baseline (Table 14):
50.99(here baseline is slightly higher thanβ=-0.25on the final average, so the “always better than CE” claim does not strictly hold for 4B final-average—while the paper emphasizes trajectory and robustness across metrics).
-
Local head–tail shaping via (
\hat λ,\tilde λ). - Tail penalty tends to be strongest in RL, especially for 10B.
- 10B-A0.5B MoE, RL Average
Pass@64at step 1000:- Baseline (Table 15):
47.52 - Tail penalty
\hat λ=-0.1(Table 22):49.87 - Head reward
\tilde λ=0.1(Table 23):49.42
- Baseline (Table 15):
-
4B dense, RL Average
Pass@64at step 1000:- Baseline (Table 14):
50.99 - Tail penalty
\hat λ=-0.1(Table 20):51.18 - Head reward
\tilde λ=0.1(Table 21):50.26
- Baseline (Table 14):
-
Concrete dataset-level improvements (illustrative examples).
- For 10B,
MATH-500 Pass@64at RL step 1000 increases from85.75(baseline, Table 15) to86.89(β=-0.25, Table 18), andMinerva Pass@64is44.90(baseline) vs44.46(β=-0.25)—so gains are not uniform per dataset, but the overall average improves substantially due to broader gains (notably onAMC23andOlympiadBenchin the tables). - For 4B, the average improvements from negative shaping are modest but positive for tail penalty (
51.18vs50.99).
Do the experiments support the claims?¶
- Supported well:
- Reward shaping changes entropy without derailing perplexity convergence (Figures 1–2).
- Precision-oriented configurations improve RL outcomes for the 10B MoE model with clear margins (Tables 15 vs 18; 15 vs 22).
-
Tail-token suppression (
\hat λ < 0) is particularly beneficial in RL across both 4B and 10B (Tables 20, 22). -
More conditional / nuanced:
- For 4B dense, the baseline CE configuration is competitive and sometimes slightly better in the final averaged
Pass@64thanβ=-0.25(Tables 14 vs 16), thoughβ=-0.25still beats the high-entropyβ=0.50and the paper emphasizes trajectory/robustness across metrics. -
Negative shaping in mid-training is not uniformly dominated by one setting (Table 13 slightly favors
\tilde λ=0.1at 100B on the 10B model). -
Ablations / robustness checks present in the provided content:
- Systematic sweeps over
βand over negative-shaping settings (\hat λ=-0.1vs\tilde λ=0.1) across multiple model sizes and architectures. - RL metrics include
Avg@128,Cons@128, andPass@64, which helps distinguish “average sample quality” from “majority vote” stability and from “best-of-many” capability.
6. Limitations and Trade-offs¶
- Single-step episode assumption is specific.
- The derivation treats each token emission as a full episode, requiring the reward to be a function only of
(s_t, a_t). -
This is appropriate for next-token prediction, but it does not directly model multi-step credit assignment within a reasoning trace at the pre-training stage.
-
Top-
knegative shaping introduces discrete, rank-based behavior. - The negative reward depends on membership in
TopK(πθ(·|s_t), k), which changes discontinuously as ranks change. -
The paper uses
k=100in experiments; sensitivity tokbeyond this value is not shown in the provided content. -
Hyperparameter coverage is limited.
- Only a few settings are explored (
β ∈ { -0.25, 0, 0.5 },\hat λ ∈ { -0.1, 0 },\tilde λ ∈ { 0, 0.1 }). -
It remains unclear from the excerpt how robust the conclusions are to intermediate values (e.g.,
β=-0.1vsβ=-0.5) or combined nonzero\hat λand\tilde λsimultaneously. -
RL scope is math-focused and uses a specific RL setup.
- RLVR is run “prioritizing mathematical reasoning tasks,” and uses on-policy
GRPOwithout KL regularization (Appendix B.1). -
Conclusions about exploration may change under other RL algorithms, reward models, or KL-constrained objectives; this is not tested here (based on the provided text).
-
Entropy is not the only confound; response length dynamics appear important.
- Figure 7 (as described) links entropy collapse with drastic response-length decrease in the
β=0.5setting, suggesting that the failure mode may be about prematurely shortening reasoning rather than “diversity” per se. -
However, the exact causal mechanism (why higher-entropy pre-training leads to faster entropy collapse during RL) is observed empirically but not fully derived.
-
Compute/throughput details are incomplete in the excerpt.
- The excerpt provides token counts, batch sizes, learning rates, and sequence lengths, but not hardware, wall-clock time, or compute budget (e.g., PF-days), so reproducibility at the systems level is only partially specified.
7. Implications and Future Directions¶
- How this changes the landscape (within the paper’s scope).
- It reframes pre-training not just as “minimize perplexity,” but as “initialize a policy” whose distributional shape can materially affect later on-policy RL.
-
It argues that “more entropy” is not a reliable proxy for “better exploration”; precision-oriented priors (especially suppressing tail tokens) can yield better RL learning dynamics and final reasoning performance.
-
Follow-up research directions suggested by the presented framework.
- Richer reward design for negatives: Explore combined settings where both
\tilde λand\hat λare nonzero to jointly preserve plausible alternatives while suppressing tail noise. - Sensitivity studies: Sweep
kinTopK(·,k)and expand the grid forβ,\tilde λ,\hat λto map phase transitions (e.g., when entropy collapse occurs). - Task generalization: Apply the same pre-training shaping to RL on domains beyond math (e.g., code, tool use), to test whether the precision-oriented prior generalizes.
-
Mechanistic diagnostics: Since Figure 7 links entropy and response length, future work could explicitly incorporate length-aware constraints/rewards or analyze which token types (“forking tokens”) are most impacted by the shaping.
-
Practical applications / downstream use cases.
-
If you plan to do on-policy RL for verifiable reasoning (unit tests, math proof checking, etc.), this work suggests that shaping the pre-training distribution toward:
- stronger correct-token concentration (
β < 0), and/or - tail-token suppression (
\hat λ < 0), can improve RL outcomes even when base perplexity looks similar.
- stronger correct-token concentration (
-
Repro/Integration Guidance (based on provided details).
- A minimal integration path is to replace cross-entropy with the single-step policy-gradient objective using
\bar r(s_t,a_t):- Use the same optimizer as the baseline (
AdamW, weight decay0.1, grad clip1.0) and keep the same LR schedule, so the only change is reward shaping (as done in the experiments).
- Use the same optimizer as the baseline (
- If your main goal is downstream RL performance, the strongest empirical configuration in the provided RL tables is the tail-penalty negative shaping (
β=0, \hat λ=-0.1, \tilde λ=0, k=100) on the 10B MoE model:-
10B-A0.5B MoE, RL Average
Pass@64at step 1000: baseline47.52(Table 15) vs49.87with\hat λ=-0.1(Table 22).
-
- If you prefer a single scalar knob affecting global entropy,
β=-0.25shows a strong RL improvement for 10B:-
10B-A0.5B MoE, RL Average
Pass@64at step 1000:47.52(β=0, Table 15) vs50.75(β=-0.25, Table 18).
-