<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Madhav&#39;s Blog</title>
<link>https://madhavpr191221.github.io/blog/</link>
<atom:link href="https://madhavpr191221.github.io/blog/index.xml" rel="self" type="application/rss+xml"/>
<description>A blog built with Quarto</description>
<generator>quarto-1.9.36</generator>
<lastBuildDate>Sat, 18 Apr 2026 18:30:00 GMT</lastBuildDate>
<item>
  <title>Part 3: Fitting Survival Distributions to Data</title>
  <dc:creator>Madhav Prashanth Ramachandran</dc:creator>
  <link>https://madhavpr191221.github.io/blog/posts/part-3-fitting-survival-distributions/</link>
  <description><![CDATA[ 





<section id="recap-of-part-2-and-whats-coming-in-part-3" class="level2">
<h2 class="anchored" data-anchor-id="recap-of-part-2-and-whats-coming-in-part-3">Recap of Part 2 and What’s Coming in Part 3</h2>
<p>In <a href="https://madhavpr191221.github.io/blog/posts/part2-distributions-in-survival-analysis/"><strong>Part 2</strong></a>, we built a vocabulary of survival distributions — the Exponential for constant hazard, the Rayleigh for linearly increasing hazard, and the Weibull as the unifying family that contains both as special cases. Along the way we took a detour through the connection between the Rayleigh, Normal, and Exponential distributions, and ended with the bathtub curve — the most common failure pattern in real engineered systems.</p>
<p>But a distribution is not a model until it is fit to data. In <strong>Part 3</strong>, we ask: given a dataset of failure times — some exact, some censored — how do we estimate the parameters of these distributions? We will derive the likelihood function under censoring from first principles, apply maximum likelihood estimation, and assess the quality of our fits using diagnostic tools. Python code throughout using <code>lifelines</code>.</p>
<section id="maximum-likelihood-estimation" class="level3">
<h3 class="anchored" data-anchor-id="maximum-likelihood-estimation">Maximum Likelihood Estimation</h3>
<p>We will review the maximum likelihood estimation (MLE) framework with a simple example. We will then extend this framework to handle censored data, which is what makes survival analysis different from classical statistical inference. We will derive the likelihood function for censored data, and then apply MLE to estimate the parameters of the distributions we have discussed. We will simulate some synthetic data to demonstrate the MLE process in practice. If you are new to MLE and feel lost, you can consult any standard statistics textbook or an online resource for a more detailed introduction. The key idea is that MLE provides a systematic way to estimate the parameters of a statistical model by finding the parameter values that maximize the likelihood of observing the given data.</p>
<section id="the-likelihood-function" class="level4">
<h4 class="anchored" data-anchor-id="the-likelihood-function">The Likelihood Function</h4>
<p>This is the heart of the matter. Likelihoods often are confused with probabilities – they look similar, their forms are similar but they are conceptually different. In probability, we ask: given a model with known parameters, what is the chance of observing the data? In likelihood, we ask: given the observed data, what is the chance that the data came from a model with certain parameters? To give a concrete example, suppose we observe a failure time of 5 hours. Let’s suppose we have a model that says the failure time follows an exponential distribution with a mean of 10 hours. Probability asks: What is the probability of observing a failure between 4 and 6 hours given this model? Likelihood asks: Across all possible values of the mean (or rate) parameter, for which value does the observed time of 5 hours seems most plausible?</p>
<p>Here is the setup. We have a sequence of observed failure times <img src="https://latex.codecogs.com/png.latex?t_1,%20t_2,%20%5Cldots,%20t_n"> that we assume are independent and identically distributed (i.i.d.) samples from some distribution with a parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. The likelihood function is defined as the joint probability of observing this data given the parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta">: In the simplest setup, we will assume that all observations are exact failure times (no censoring). In that case, the likelihood function is:</p>
<p><img src="https://latex.codecogs.com/png.latex?L(%5Ctheta)%20=%20P(t_1,%20t_2,%20%5Cldots,%20t_n%20;%20%5Ctheta)"></p>
<p>Note that this is a function of <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> - we are treating the data as fixed and asking how likely different values of <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> are to have generated this data. Since most of our work will involve continuous distributions, we will often work with the probability density function (pdf) instead of the probability mass function (pmf). In that case, the likelihood becomes:</p>
<p><img src="https://latex.codecogs.com/png.latex?L(%5Ctheta)%20=%20f(t_1%20;%20%5Ctheta)%20%5Ccdot%20f(t_2%20;%20%5Ctheta)%20%5Ccdot%20%5Cldots%20%5Ccdot%20f(t_n%20;%20%5Ctheta)%20=%20%5Cprod_%7Bi=1%7D%5En%20f(t_i%20;%20%5Ctheta)"></p>
<p>Note that some authors use the conditional probability notation <img src="https://latex.codecogs.com/png.latex?P(t_i%20%7C%20%5Ctheta)"> instead of the joint probability notation <img src="https://latex.codecogs.com/png.latex?P(t_1,%20t_2,%20%5Cldots,%20t_n%20;%20%5Ctheta)">, but the meaning is the same. The likelihood function captures how well the model with parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> explains the observed data. The MLE process will involve finding the value of <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> that maximizes this likelihood function. We will use the semicolons in the notation to emphasize that we are treating the data as fixed and the parameter as an unknown constant. If the parameter is itself a random variable with a prior distribution, then we are in the realm of Bayesian inference, which is a different framework. In MLE, we do not assign a prior distribution to the parameter; we simply find the value that maximizes the likelihood of the observed data.</p>
</section>
<section id="the-maximum-likelihood-estimation-process" class="level4">
<h4 class="anchored" data-anchor-id="the-maximum-likelihood-estimation-process">The Maximum Likelihood Estimation Process</h4>
<p>The MLE process involves finding the value of <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> that maximizes the likelihood function <img src="https://latex.codecogs.com/png.latex?L(%5Ctheta)">. In practice, it is often easier to work with the log-likelihood function, which is the natural logarithm of the likelihood:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cell(%5Ctheta)%20=%20%5Clog%20L(%5Ctheta)%20=%20%5Csum_%7Bi=1%7D%5En%20%5Clog%20f(t_i%20;%20%5Ctheta)"></p>
<p>The MLE estimate <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D_%7B%5Ctext%7BMLE%7D%7D"> is the value of <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> that maximizes <img src="https://latex.codecogs.com/png.latex?%5Cell(%5Ctheta)">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D_%7B%5Ctext%7BMLE%7D%7D%20=%20%5Carg%5Cmax_%7B%5Ctheta%7D%20%5Cell(%5Ctheta)"></p>
<p>If the likelihood function is well-behaved (e.g., it is differentiable), we can find the MLE estimate by taking the derivative of the log-likelihood with respect to <img src="https://latex.codecogs.com/png.latex?%5Ctheta">, setting it equal to zero, and solving for <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. This gives us the critical points of the log-likelihood function, which we can then evaluate to find the maximum. To show that the critical point we find is indeed a maximum, we can check the second derivative or use other methods if necessary. Let’s see how this works in practice with a simple example.</p>
<p>Suppose we have a dataset of failure times that we believe follows an exponential distribution. The exponential distribution has a single parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda"> (the rate), and its pdf is given by:</p>
<p><img src="https://latex.codecogs.com/png.latex?f(t%20;%20%5Clambda)%20=%20%5Clambda%20e%5E%7B-%5Clambda%20t%7D"></p>
<p>Given a dataset of failure times <img src="https://latex.codecogs.com/png.latex?t_1,%20t_2,%20%5Cldots,%20t_n">, the likelihood function for the exponential distribution is:</p>
<p><img src="https://latex.codecogs.com/png.latex?L(%5Clambda)%20=%20%5Cprod_%7Bi=1%7D%5En%20%5Clambda%20e%5E%7B-%5Clambda%20t_i%7D%20=%20%5Clambda%5En%20e%5E%7B-%5Clambda%20%5Csum_%7Bi=1%7D%5En%20t_i%7D"></p>
<p>The log-likelihood function is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cell(%5Clambda)%20=%20n%20%5Clog%20%5Clambda%20-%20%5Clambda%20%5Csum_%7Bi=1%7D%5En%20t_i"></p>
<p>To find the MLE estimate <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Clambda%7D">, we take the derivative of <img src="https://latex.codecogs.com/png.latex?%5Cell(%5Clambda)"> with respect to <img src="https://latex.codecogs.com/png.latex?%5Clambda">, set it equal to zero, and solve for <img src="https://latex.codecogs.com/png.latex?%5Clambda">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cfrac%7Bd%5Cell%7D%7Bd%5Clambda%7D%20=%20%5Cfrac%7Bn%7D%7B%5Clambda%7D%20-%20%5Csum_%7Bi=1%7D%5En%20t_i%20=%200"></p>
<p>Solving for <img src="https://latex.codecogs.com/png.latex?%5Clambda"> gives us: <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Clambda%7D_%7B%5Ctext%7BMLE%7D%7D%20=%20%5Cfrac%7Bn%7D%7B%5Csum_%7Bi=1%7D%5En%20t_i%7D"></p>
<p>This is the MLE estimate for the rate parameter of the exponential distribution based on our observed failure times. Notice that this estimate is simply the reciprocal of the sample mean of the failure times, which makes intuitive sense given the properties of the exponential distribution. We can verify that this critical point is indeed a maximum by checking the second derivative of the log-likelihood function evaluated at <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Clambda%7D_%7B%5Ctext%7BMLE%7D%7D">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cfrac%7Bd%5E2%5Cell%7D%7Bd%5Clambda%5E2%7D%20=%20-%5Cfrac%7Bn%7D%7B%5Clambda%5E2%7D"></p>
<p>which is negative for all <img src="https://latex.codecogs.com/png.latex?%5Clambda%20%3E%200">. Therefore, we have a maximum at <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Clambda%7D_%7B%5Ctext%7BMLE%7D%7D">. In fact, the (only) critical point we find is a global maximum.</p>
<p>For a vector of parameters <img src="https://latex.codecogs.com/png.latex?%5Cboldsymbol%7B%5Ctheta%7D"> with components <img src="https://latex.codecogs.com/png.latex?%5Ctheta_1,%20%5Ctheta_2,%20%5Cldots,%20%5Ctheta_p">, the MLE process is similar but we take the gradient of the log-likelihood with respect to the vector of parameters and set it equal to the zero vector to find the critical points. We can then use the Hessian matrix to check whether we have a maximum, minimum, or saddle point. For instance, the Weibull distribution has two parameters, the shape <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> and the scale <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. The MLE process would involve taking the gradient of the log-likelihood with respect to both <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> and <img src="https://latex.codecogs.com/png.latex?%5Ctheta">, setting it equal to zero, and solving for both parameters simultaneously. Let’s derive the MLE equations for the Weibull distribution. Suppose you have a dataset of failure times <img src="https://latex.codecogs.com/png.latex?t_1,%20t_2,%20%5Cldots,%20t_n"> that you believe follows a Weibull distribution with shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> and scale parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. The pdf of the Weibull distribution is given by:</p>
<p><img src="https://latex.codecogs.com/png.latex?f(t%20;%20%5Cgamma,%20%5Ctheta)%20=%20%5Cfrac%7B%5Cgamma%7D%7B%5Ctheta%7D%20%5Cleft(%20%5Cfrac%7Bt%7D%7B%5Ctheta%7D%20%5Cright)%5E%7B%5Cgamma%20-%201%7D%20e%5E%7B-(t/%5Ctheta)%5E%5Cgamma%7D"></p>
<p>The likelihood function for the Weibull distribution is:</p>
<p><img src="https://latex.codecogs.com/png.latex?L(%5Cgamma,%20%5Ctheta)%20=%20%5Cprod_%7Bi=1%7D%5En%20%5Cfrac%7B%5Cgamma%7D%7B%5Ctheta%7D%20%5Cleft(%20%5Cfrac%7Bt_i%7D%7B%5Ctheta%7D%20%5Cright)%5E%7B%5Cgamma%20-%201%7D%20e%5E%7B-(t_i/%5Ctheta)%5E%5Cgamma%7D"></p>
<p>The log-likelihood function is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cell(%5Cgamma,%20%5Ctheta)%20=%20n%5Clog%5Cgamma%20-%20n%5Cgamma%5Clog%5Ctheta%20+%20(%5Cgamma%20-%201)%5Csum_%7Bi=1%7D%5En%20%5Clog%20t_i%20-%20%5Csum_%7Bi=1%7D%5En%20%5Cleft(%5Cfrac%7Bt_i%7D%7B%5Ctheta%7D%5Cright)%5E%5Cgamma"></p>
<p>To find the MLE estimates <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cgamma%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D">, we take the gradient of <img src="https://latex.codecogs.com/png.latex?%5Cell(%5Cgamma,%20%5Ctheta)"> with respect to both parameters, set it equal to zero, and solve for <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> and <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. This will give us a system of equations that we can solve numerically to find the MLE estimates. The resulting equations are nonlinear and do not have a closed-form solution, which is why we need numerical optimization methods to find the MLE estimates for the Weibull distribution. We will see how to do this in practice using <code>lifelines</code> in the simulation below.</p>
<div id="cell-fig-weibull-mle" class="cell" data-execution_count="1">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb1-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lifelines <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> WeibullFitter</span>
<span id="cb1-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> matplotlib.pyplot <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> plt</span>
<span id="cb1-4"></span>
<span id="cb1-5"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Generate synthetic Weibull data</span></span>
<span id="cb1-6">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">42</span>)</span>
<span id="cb1-7">true_gamma <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>   <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># shape</span></span>
<span id="cb1-8">true_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">100.0</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># scale</span></span>
<span id="cb1-9">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">200</span>  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># number of samples</span></span>
<span id="cb1-10"></span>
<span id="cb1-11"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Weibull random samples using inverse CDF</span></span>
<span id="cb1-12">u <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb1-13">t <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> true_theta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>np.log(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> u))<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span>true_gamma)</span>
<span id="cb1-14"></span>
<span id="cb1-15"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># All observed (no censoring yet)</span></span>
<span id="cb1-16">event_observed <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.ones(n)</span>
<span id="cb1-17"></span>
<span id="cb1-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Fit</span></span>
<span id="cb1-19">wf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> WeibullFitter()</span>
<span id="cb1-20">wf.fit(t, event_observed<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>event_observed)</span>
<span id="cb1-21">wf.print_summary()</span></code></pre></div></div>
<div id="fig-weibull-mle" class="cell-output cell-output-display quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-weibull-mle-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<div>


<table class="dataframe caption-top table table-sm table-striped small" data-border="1">
<tbody>
<tr class="odd">
<th data-quarto-table-cell-role="th">model</th>
<td>lifelines.WeibullFitter</td>
</tr>
<tr class="even">
<th data-quarto-table-cell-role="th">number of observations</th>
<td>200</td>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th">number of events observed</th>
<td>200</td>
</tr>
<tr class="even">
<th data-quarto-table-cell-role="th">log-likelihood</th>
<td>-1035.10</td>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th">hypothesis</th>
<td>lambda_ != 1, rho_ != 1</td>
</tr>
</tbody>
</table>

</div>
<table class="dataframe caption-top table table-sm table-striped small" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th" style="min-width: 12px"></th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">coef</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">se(coef)</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">coef lower 95%</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">coef upper 95%</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">cmp to</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">z</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">p</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">-log2(p)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<th data-quarto-table-cell-role="th">lambda_</th>
<td>97.18</td>
<td>3.63</td>
<td>90.06</td>
<td>104.29</td>
<td>1.00</td>
<td>26.50</td>
<td>&lt;0.005</td>
<td>511.45</td>
</tr>
<tr class="even">
<th data-quarto-table-cell-role="th">rho_</th>
<td>1.99</td>
<td>0.11</td>
<td>1.78</td>
<td>2.21</td>
<td>1.00</td>
<td>8.93</td>
<td>&lt;0.005</td>
<td>60.96</td>
</tr>
</tbody>
</table>
<br><div>


<table class="dataframe caption-top table table-sm table-striped small" data-border="1">
<tbody>
<tr class="odd">
<th data-quarto-table-cell-role="th">AIC</th>
<td>2074.19</td>
</tr>
</tbody>
</table>

</div>
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig quarto-uncaptioned" id="fig-weibull-mle-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1
</figcaption>
</figure>
</div>
</div>
<p><code>lifelines</code> uses <code>lambda_</code> for the scale parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and <code>rho_</code> for the shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> — different notation, same parameters. With <img src="https://latex.codecogs.com/png.latex?n%20=%20200"> observations, the estimated parameters are already close to the true values:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cgamma%7D%20=%201.99%20%5Capprox%202.0%20=%20%5Cgamma_%7B%5Ctext%7Btrue%7D%7D,%20%5Cqquad%20%5Chat%7B%5Ctheta%7D%20=%2097.18%20%5Capprox%20100.0%20=%20%5Ctheta_%7B%5Ctext%7Btrue%7D%7D"></p>
<p>MLE recovers the true parameters with high accuracy even at modest sample sizes.</p>
<div id="cell-fig-weibull-fit" class="cell" data-execution_count="2">
<div class="cell-output cell-output-display">
<div id="fig-weibull-fit" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-weibull-fit-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://madhavpr191221.github.io/blog/posts/part-3-fitting-survival-distributions/index_files/figure-html/fig-weibull-fit-output-1.png" width="853" height="470" class="figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-weibull-fit-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Fitted Weibull survival curve vs empirical survival curve
</figcaption>
</figure>
</div>
</div>
</div>
<p>The fitted Weibull survival curve (gold) closely tracks the empirical survival function (blue) across the entire time range. The empirical survival function is simply the fraction of systems still surviving at each time point — no distributional assumptions, just the raw data speaking for itself. The fact that the fitted curve hugs it so tightly tells us two things: MLE found good parameter estimates, and the Weibull is a reasonable model for this data. We will make this “goodness of fit” assessment more rigorous later using diagnostic tools.</p>
<p>But there is a catch — this was the easy case. All 200 observations were exact failure times. In practice, not all machines will have failed by the end of the study — some will still be running when the study ends. How do we modify the likelihood function to account for censoring? This is the key question that makes survival analysis different from classical statistical inference. We derive the censored likelihood function from first principles next.</p>
</section>
</section>
<section id="mle-with-censored-data" class="level3">
<h3 class="anchored" data-anchor-id="mle-with-censored-data">MLE with Censored Data</h3>
<p>Remember that in survival analysis, we often have censored data — we know that a system survived up to a certain time, but we don’t know the exact failure time. This is called right-censoring. We need to modify our likelihood function to account for this type of data. Recall from Part 1 of the series that we observe pairs of the form <img src="https://latex.codecogs.com/png.latex?(Y_i,%20%5Cdelta_i)"> where <img src="https://latex.codecogs.com/png.latex?Y_i%20=%20%5Cmin(T_i,%20C_i)"> is the observed time (either failure time or censoring time) and <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i"> is an indicator variable that is 1 if the event (failure) was observed and 0 if it was censored. If <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i%20=%201">, we know that the failure time <img src="https://latex.codecogs.com/png.latex?T_i"> was observed and is equal to <img src="https://latex.codecogs.com/png.latex?Y_i">. If <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i%20=%200">, we know that the system survived up to time <img src="https://latex.codecogs.com/png.latex?Y_i">, but we don’t know the exact failure time.</p>
<p>A brief notational remark before we proceed further. <img src="https://latex.codecogs.com/png.latex?T_i"> and <img src="https://latex.codecogs.com/png.latex?C_i"> are random variables representing the (unknown) failure time and censoring time of the <img src="https://latex.codecogs.com/png.latex?i">-th unit. Their realizations — the actual observed values — are denoted <img src="https://latex.codecogs.com/png.latex?t_i"> and <img src="https://latex.codecogs.com/png.latex?c_i"> in lowercase. <img src="https://latex.codecogs.com/png.latex?Y_i%20=%20%5Cmin(T_i,%20C_i)"> is also a random variable, and its realization is <img src="https://latex.codecogs.com/png.latex?y_i%20=%20%5Cmin(t_i,%20c_i)">. We use uppercase for random variables and lowercase for their observed values throughout.</p>
<p>How does this affect the likelihood function? For an observed failure time (when <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i%20=%201">), the contribution to the likelihood is simply the pdf evaluated at <img src="https://latex.codecogs.com/png.latex?Y_i">: <img src="https://latex.codecogs.com/png.latex?f(y_i%20;%20%5Ctheta)">. This is equal to the pdf evaluated at the failure time <img src="https://latex.codecogs.com/png.latex?t_i">.</p>
<p>For a censored observation (when <img src="https://latex.codecogs.com/png.latex?%5Cdelta_j%20=%200">), the observation tells us that the failure time <img src="https://latex.codecogs.com/png.latex?T_j"> is greater than <img src="https://latex.codecogs.com/png.latex?Y_j">. The contribution to the likelihood is the probability that the system survived past <img src="https://latex.codecogs.com/png.latex?Y_j">, which is given by the survival function: <img src="https://latex.codecogs.com/png.latex?S(Y_j%20;%20%5Ctheta)%20=%20P(T_j%20%3E%20Y_j)">.</p>
<p>The full likelihood function for a dataset with both exact failure times and censored observations is the product of the contributions from all observations:</p>
<p><img src="https://latex.codecogs.com/png.latex?L(%5Ctheta)%20=%20%5Cprod_%7Bi=1%7D%5En%20%5Bf(y_i%20;%20%5Ctheta)%5D%5E%7B%5Cdelta_i%7D%20%5BS(y_i%20;%20%5Ctheta)%5D%5E%7B1%20-%20%5Cdelta_i%7D"></p>
<p>You might be wondering why we have the exponents <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i"> and <img src="https://latex.codecogs.com/png.latex?1%20-%20%5Cdelta_i"> in the likelihood function. This is a common way to write the likelihood function in survival analysis to compactly represent both types of observations. When <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i%20=%201">, the term <img src="https://latex.codecogs.com/png.latex?%5Bf(y_i%20;%20%5Ctheta)%5D%5E%7B%5Cdelta_i%7D"> becomes <img src="https://latex.codecogs.com/png.latex?f(y_i%20;%20%5Ctheta)"> and the term <img src="https://latex.codecogs.com/png.latex?%5BS(y_i%20;%20%5Ctheta)%5D%5E%7B1%20-%20%5Cdelta_i%7D"> becomes 1, so the contribution to the likelihood is just <img src="https://latex.codecogs.com/png.latex?f(y_i%20;%20%5Ctheta)">. When <img src="https://latex.codecogs.com/png.latex?%5Cdelta_j%20=%200">, the term <img src="https://latex.codecogs.com/png.latex?%5Bf(y_j%20;%20%5Ctheta)%5D%5E%7B%5Cdelta_j%7D"> becomes 1 and the term <img src="https://latex.codecogs.com/png.latex?%5BS(y_j%20;%20%5Ctheta)%5D%5E%7B1%20-%20%5Cdelta_j%7D"> becomes <img src="https://latex.codecogs.com/png.latex?S(y_j%20;%20%5Ctheta)">, so the contribution to the likelihood is just <img src="https://latex.codecogs.com/png.latex?S(y_j%20;%20%5Ctheta)">. This way of writing the likelihood function allows us to handle both types of observations in a unified framework. This is very similar to writing the likelihood function (or loss function) in logistic regression, where we have a binary outcome and we use the observed labels to determine which term contributes to the likelihood for each observation.</p>
<p>Finally, a point worth observing is that the likelihood function for censored data is not a simple product of pdfs, but rather a product of a mixture of pdfs and survival functions. How can you combine the PDF (which is a density) and the survival function (which is a probability) in the same likelihood function? The answer lies in the fact that we are not comparing incompatible objects. For a censored observation, <img src="https://latex.codecogs.com/png.latex?S(y_j;%5Ctheta)%20=%20P(T_j%20%3E%20y_j)"> is a genuine probability. For an exact failure, <img src="https://latex.codecogs.com/png.latex?f(y_i;%5Ctheta)"> is a density — but both contribute to the same likelihood because we are asking the same question for each observation: what parameter value <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> makes this observation most plausible? The likelihood function combines these two types of contributions in a unified framework that allows MLE to work even in the presence of censoring.</p>
<p>If you want to be very abstract and not make a big deal about the fact that we are mixing densities and probabilities, you can just think of the likelihood function as a product of two functions and you are trying to find the parameter values that maximize this product. The abstract view strips away the details and allows you to focus on the optimization problem. However, it has a disadvantage that it takes away the intuition about why we are using the pdf for exact failures and the survival function for censored observations. The more concrete view emphasizes the different contributions to the likelihood from different types of observations, which can help build intuition about how MLE works in the presence of censoring. Both views are valid and can be useful in different contexts.</p>
<p>Let’s spend some more time on understanding true failure <img src="https://latex.codecogs.com/png.latex?%5Cdelta%20=%201"> vs censoring <img src="https://latex.codecogs.com/png.latex?%5Cdelta%20=%200"> and how they contribute to the likelihood function a little more intuitively.</p>
<p>Case 1: <strong><img src="https://latex.codecogs.com/png.latex?%5Cdelta_i%20=%201"> (exact failure observed)</strong>.</p>
<p>In this case, we know that the <strong>observed</strong> failure time <img src="https://latex.codecogs.com/png.latex?t_i"> (a realization of <img src="https://latex.codecogs.com/png.latex?T_i">) is less than or equal to the censoring time <img src="https://latex.codecogs.com/png.latex?C_i">. Consider the event that the failure time <img src="https://latex.codecogs.com/png.latex?T_i%20%5Cleq%20C_i"> and the failure time is in a small interval around <img src="https://latex.codecogs.com/png.latex?t_i">. If the censoring time <img src="https://latex.codecogs.com/png.latex?C_i"> and the failure time <img src="https://latex.codecogs.com/png.latex?T_i"> are independent, then the joint probability of observing a failure at time <img src="https://latex.codecogs.com/png.latex?t_i"> and it being uncensored can be expressed as:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(T_i%20%5Cin%20%5Bt_i,%20t_i%20+%20%5CDelta%20t%5D,%20T_i%20%5Cleq%20C_i)%20=%20P(T_i%20%5Cin%20%5Bt_i,%20t_i%20+%20%5CDelta%20t%5D)%20%5Ccdot%20P(C_i%20%5Cgeq%20t_i)"> The first term <img src="https://latex.codecogs.com/png.latex?P(T_i%20%5Cin%20%5Bt_i,%20t_i%20+%20%5CDelta%20t%5D)"> is approximately equal to the pdf evaluated at <img src="https://latex.codecogs.com/png.latex?t_i">: <img src="https://latex.codecogs.com/png.latex?f(t_i%20;%20%5Ctheta)%20%5CDelta%20t">. The second term <img src="https://latex.codecogs.com/png.latex?P(C_i%20%5Cgeq%20t_i)"> is the probability that the censoring time is greater than or equal to <img src="https://latex.codecogs.com/png.latex?t_i">, which is given by the survival function of the censoring distribution evaluated at <img src="https://latex.codecogs.com/png.latex?t_i">: <img src="https://latex.codecogs.com/png.latex?S_C(t_i)">. Therefore, the contribution to the likelihood from an exact failure observation can be expressed as:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(T_i%20%5Cin%20%5Bt_i,%20t_i%20+%20%5CDelta%20t%5D,%20T_i%20%5Cleq%20C_i)%20=%20f(t_i%20;%20%5Ctheta)%20%5C,%20%5CDelta%20t%20%5Ccdot%20S_C(t_i)"></p>
<p>Since we are maximizing the likelihood with respect to <img src="https://latex.codecogs.com/png.latex?%5Ctheta">, the term <img src="https://latex.codecogs.com/png.latex?S_C(t_i)"> does not depend on <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and can be treated as a constant. Therefore, the contribution to the likelihood from an exact failure observation is effectively proportional to the pdf evaluated at <img src="https://latex.codecogs.com/png.latex?t_i">: <img src="https://latex.codecogs.com/png.latex?f(t_i%20;%20%5Ctheta)">. This is why we use the pdf for exact failure observations in the likelihood function. Note that the term <img src="https://latex.codecogs.com/png.latex?S_C(t_i)"> is the survival function of the censoring distribution, which represents the probability that the censoring time is greater than or equal to <img src="https://latex.codecogs.com/png.latex?t_i"> and therefore has nothing to do with the failure parameters <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> that we are trying to estimate.</p>
<p>Case 2: <strong><img src="https://latex.codecogs.com/png.latex?%5Cdelta_j%20=%200"> (censored observation)</strong>.</p>
<p>In this case, we know that the system survived up to time <img src="https://latex.codecogs.com/png.latex?y_j">, but we don’t know the exact failure time. The contribution to the likelihood from a censored observation is the probability that the failure time <img src="https://latex.codecogs.com/png.latex?T_j"> is greater than <img src="https://latex.codecogs.com/png.latex?y_j">: <img src="https://latex.codecogs.com/png.latex?P(T_j%20%3E%20y_j)%20=%20S(y_j%20;%20%5Ctheta)">. We know with certainty that the system survived up to time <img src="https://latex.codecogs.com/png.latex?y_j">, so the likelihood contribution is the probability of surviving past that time, which is given by the survival function. Consider the event that the failure time <img src="https://latex.codecogs.com/png.latex?T_j"> is greater than <img src="https://latex.codecogs.com/png.latex?y_j"> and the censoring time <img src="https://latex.codecogs.com/png.latex?C_j"> is in a small interval around <img src="https://latex.codecogs.com/png.latex?y_j">. If the censoring time <img src="https://latex.codecogs.com/png.latex?C_j"> and the failure time <img src="https://latex.codecogs.com/png.latex?T_j"> are independent, then the joint probability of observing a censored observation at time <img src="https://latex.codecogs.com/png.latex?y_j"> can be expressed as:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(T_j%20%3E%20y_j,%20C_j%20%5Cin%20%5By_j,%20y_j%20+%20%5CDelta%20t%5D)%20=%20P(T_j%20%3E%20y_j)%20%5Ccdot%20P(C_j%20%5Cin%20%5By_j,%20y_j%20+%20%5CDelta%20t%5D)"></p>
<p>The first term <img src="https://latex.codecogs.com/png.latex?P(T_j%20%3E%20y_j)"> is the probability that the failure time is greater than <img src="https://latex.codecogs.com/png.latex?y_j">, which is given by the survival function evaluated at <img src="https://latex.codecogs.com/png.latex?y_j">: <img src="https://latex.codecogs.com/png.latex?S(y_j%20;%20%5Ctheta)">. The second term <img src="https://latex.codecogs.com/png.latex?P(C_j%20%5Cin%20%5By_j,%20y_j%20+%20%5CDelta%20t%5D)"> is approximately equal to the pdf of the censoring distribution evaluated at <img src="https://latex.codecogs.com/png.latex?Y_j">: <img src="https://latex.codecogs.com/png.latex?f_C(y_j)%20%5CDelta%20t">. Therefore, the contribution to the likelihood from a censored observation can be expressed as:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(T_j%20%3E%20y_j,%20C_j%20%5Cin%20%5By_j,%20y_j%20+%20%5CDelta%20t%5D)%20=%20S(y_j%20;%20%5Ctheta)%20%5Ccdot%20f_C(y_j)%20%5CDelta%20t"></p>
<p>Since we are maximizing the likelihood with respect to <img src="https://latex.codecogs.com/png.latex?%5Ctheta">, the term <img src="https://latex.codecogs.com/png.latex?f_C(Y_j)%20%5CDelta%20t"> does not depend on <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and can be treated as a constant. Therefore, the contribution to the likelihood from a censored observation is effectively proportional to the survival function evaluated at <img src="https://latex.codecogs.com/png.latex?Y_j">: <img src="https://latex.codecogs.com/png.latex?S(y_j%20;%20%5Ctheta)">. This is why we use the survival function for censored observations in the likelihood function. Note that the term <img src="https://latex.codecogs.com/png.latex?f_C(y_j)%20%5CDelta%20t"> is the pdf of the censoring distribution, which represents the probability of observing a censoring event at time <img src="https://latex.codecogs.com/png.latex?Y_j"> and therefore has nothing to do with the failure parameters <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> that we are trying to estimate.</p>
<p>Combining both cases and all the observations, the full censored likelihood is:</p>
<p><img src="https://latex.codecogs.com/png.latex?L(%5Ctheta)%20=%20%5Cprod_%7Bi=1%7D%5En%20%5Bf(y_i%20;%20%5Ctheta)%5D%5E%7B%5Cdelta_i%7D%20%5BS(y_i%20;%20%5Ctheta)%5D%5E%7B1%20-%20%5Cdelta_i%7D"></p>
<p>where the <img src="https://latex.codecogs.com/png.latex?%5CDelta%20t"> and censoring distribution terms have been absorbed into a proportionality constant that does not affect the location of the maximum.</p>
<p>The assumption of independence between the failure time and censoring time is crucial for this derivation. This is known as the <strong>non-informative censoring assumption</strong>, which tells us that knowing when a subject was censored does not give us any information about their failure time. This assumption is violated in some real world scenarios, such as when a machine operator decides to stop a machine that is showing signs of imminent failure, or when a patient drops out of a clinical trial due to worsening health. In such cases, the censoring is informative and the standard MLE approach may yield biased estimates. There are methods to handle informative censoring, such as joint modeling of the failure and censoring processes, but these are beyond the scope of this post.</p>
<section id="the-log-likelihood-function-with-censoring" class="level4">
<h4 class="anchored" data-anchor-id="the-log-likelihood-function-with-censoring">The Log-Likelihood Function with Censoring</h4>
<p>The full censored likelihood function is:</p>
<p><img src="https://latex.codecogs.com/png.latex?L(%5Ctheta)%20=%20%5Cprod_%7Bi=1%7D%5En%20%5Bf(y_i%20;%20%5Ctheta)%5D%5E%7B%5Cdelta_i%7D%20%5BS(y_i%20;%20%5Ctheta)%5D%5E%7B1-%5Cdelta_i%7D"></p>
<p>Taking the natural logarithm of the likelihood function gives us the log-likelihood function:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cell(%5Ctheta)%20=%20%5Csum_%7Bi=1%7D%5En%20%5Cleft%5B%20%5Cdelta_i%20%5Clog%20f(y_i%20;%20%5Ctheta)%20+%20(1%20-%20%5Cdelta_i)%20%5Clog%20S(y_i%20;%20%5Ctheta)%20%5Cright%5D"></p>
<p>Based on our previous study of the relationship between the pdf and the survival function, we can express the log-likelihood function in terms of the hazard function <img src="https://latex.codecogs.com/png.latex?%5Clambda(t%20;%20%5Ctheta)">. Recall that the pdf can be expressed as <img src="https://latex.codecogs.com/png.latex?f(t%20;%20%5Ctheta)%20=%20%5Clambda(t%20;%20%5Ctheta)%20S(t%20;%20%5Ctheta)">, so we can rewrite the log-likelihood function as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cell(%5Ctheta)%20=%20%5Csum_%7Bi=1%7D%5En%20%5Cleft%5B%20%5Cdelta_i%20%5Clog%20%5Clambda(y_i%20;%20%5Ctheta)%20+%20%5Clog%20S(y_i%20;%20%5Ctheta)%20%5Cright%5D"></p>
<p>Notice that <img src="https://latex.codecogs.com/png.latex?%5Clog%20S(y_i;%5Ctheta)"> appears for <strong>every</strong> observation regardless of whether it failed or was censored — survival information contributes to the likelihood for all units. The hazard term <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i%20%5Clog%20%5Clambda(y_i;%5Ctheta)"> only contributes for observed failures. This is the elegance of the censored likelihood. Finally, we can express the log-likelihood function in terms of the cumulative hazard function <img src="https://latex.codecogs.com/png.latex?%5CLambda(t%20;%20%5Ctheta)"> using the relationship <img src="https://latex.codecogs.com/png.latex?S(t%20;%20%5Ctheta)%20=%20e%5E%7B-%5CLambda(t%20;%20%5Ctheta)%7D">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cell(%5Ctheta)%20=%20%5Csum_%7Bi=1%7D%5En%20%5Cleft%5B%20%5Cdelta_i%20%5Clog%20%5Clambda(y_i%20;%20%5Ctheta)%20-%20%5CLambda(y_i%20;%20%5Ctheta)%20%5Cright%5D"></p>
<p>Now we use our calculus machinery to find the MLE estimates. Taking the derivative of the log-likelihood function with respect to <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and setting it equal to zero gives us the MLE equations that we can solve to find the parameter estimates. The specific forms of these equations will depend on the distribution we are fitting. And sometimes, these equations will be so nonlinear and complex that we won’t be able to solve them analytically. We will see how to handle this in practice using numerical optimization methods in <code>lifelines</code> when we fit the Weibull distribution with censored data in the next section.</p>
<p>But first, let’s calculate the analytical MLE estimates for the exponential distribution with censored data to see how the presence of censoring modifies the MLE equations. The exponential distribution has a single parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda"> (the rate), and its pdf and survival function are given by:</p>
<p><img src="https://latex.codecogs.com/png.latex?f(t%20;%20%5Clambda)%20=%20%5Clambda%20e%5E%7B-%5Clambda%20t%7D"> <img src="https://latex.codecogs.com/png.latex?S(t%20;%20%5Clambda)%20=%20e%5E%7B-%5Clambda%20t%7D"></p>
<p>The log-likelihood function for the exponential distribution with censored data is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cell(%5Clambda)%20=%20%5Csum_%7Bi=1%7D%5En%20%5Cleft%5B%20%5Cdelta_i%20%5Clog%20(%5Clambda%20e%5E%7B-%5Clambda%20y_i%7D)%20+%20(1%20-%20%5Cdelta_i)%20%5Clog%20(e%5E%7B-%5Clambda%20y_i%7D)%20%5Cright%5D"></p>
<p>This simplifies to:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cell(%5Clambda)%20=%20%5Csum_%7Bi=1%7D%5En%20%5Cleft%5B%20%5Cdelta_i%20%5Clog%20%5Clambda%20-%20%5Clambda%20y_i%20%5Cright%5D"></p>
<p>Taking the derivative of the log-likelihood function with respect to <img src="https://latex.codecogs.com/png.latex?%5Clambda"> and setting it equal to zero gives us:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cfrac%7Bd%5Cell%7D%7Bd%5Clambda%7D%20=%20%5Csum_%7Bi=1%7D%5En%20%5Cleft%5B%20%5Cfrac%7B%5Cdelta_i%7D%7B%5Clambda%7D%20-%20y_i%20%5Cright%5D%20=%200"></p>
<p>Solving for <img src="https://latex.codecogs.com/png.latex?%5Clambda"> gives us the MLE estimate for the exponential distribution with censored data:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Clambda%7D_%7B%5Ctext%7BMLE%7D%7D%20=%20%5Cfrac%7B%5Csum_%7Bi=1%7D%5En%20%5Cdelta_i%7D%7B%5Csum_%7Bi=1%7D%5En%20y_i%7D"></p>
<p>The quantity <img src="https://latex.codecogs.com/png.latex?%5Csum_%7Bi=1%7D%5En%20y_i"> is known as the <strong>total time at risk</strong>, which is the sum of the observed times for all units, regardless of whether they failed or were censored. Each unit contributes to the total time at risk based on its observed duration. The quantity <img src="https://latex.codecogs.com/png.latex?%5Csum_%7Bi=1%7D%5En%20%5Cdelta_i"> is the total number of observed failures.</p>
<p>Isn’t this marvelous? The MLE estimate for the rate parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda"> in the presence of censoring is simply the total number of observed failures (the sum of <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i">) divided by the total time at risk (the sum of <img src="https://latex.codecogs.com/png.latex?y_i">). If the data were fully observed with no censoring, then <img src="https://latex.codecogs.com/png.latex?%5Csum_%7Bi=1%7D%5En%20%5Cdelta_i%20=%20n"> and we would recover the MLE estimate for the exponential distribution without censoring: <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Clambda%7D_%7B%5Ctext%7BMLE%7D%7D%20=%20%5Cfrac%7Bn%7D%7B%5Csum_%7Bi=1%7D%5En%20y_i%7D"> (I have used <img src="https://latex.codecogs.com/png.latex?y_i"> instead of <img src="https://latex.codecogs.com/png.latex?t_i"> for consistency with the current topic). The presence of censoring effectively reduces the number of observed failures and increases the total time at risk, which leads to a different MLE estimate that accounts for the incomplete information in the data. This is the power of the censored likelihood function — it allows us to make valid inferences about the parameters of the failure distribution even when we have incomplete data due to censoring. It is left an exercise for the reader to verify that the MLE estimate for the exponential distribution with censored data is indeed a maximum by checking the second derivative of the log-likelihood function evaluated at <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Clambda%7D_%7B%5Ctext%7BMLE%7D%7D">.</p>
</section>
<section id="mle-with-censoring-a-synthetic-data-example" class="level4">
<h4 class="anchored" data-anchor-id="mle-with-censoring-a-synthetic-data-example">MLE with Censoring: A synthetic data example</h4>
<p>Rather than working with abstract numbers and symbols, let’s build a dataset that will accompany us for the rest of the series- a synthetic fleet of 1000 machines, each with its own age, operating conditions, and failure history.</p>
<div id="dataset-generation" class="cell" data-execution_count="3">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb2-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb2-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> os</span>
<span id="cb2-4"></span>
<span id="cb2-5">np.random.seed(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">42</span>)</span>
<span id="cb2-6">n <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1000</span></span>
<span id="cb2-7"></span>
<span id="cb2-8"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Continuous covariates ---</span></span>
<span id="cb2-9">machine_age        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">15</span>, n)</span>
<span id="cb2-10">usage_intensity    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span>, n)</span>
<span id="cb2-11">operating_temp     <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">60</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">120</span>, n)</span>
<span id="cb2-12">load_factor        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.3</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>, n)</span>
<span id="cb2-13">rpm                <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">500</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3000</span>, n)</span>
<span id="cb2-14">vibration_level    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.5</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>, n)</span>
<span id="cb2-15">oil_quality        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb2-16">maintenance_count  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.randint(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">21</span>, n).astype(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">float</span>)</span>
<span id="cb2-17"></span>
<span id="cb2-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Categorical covariates ---</span></span>
<span id="cb2-19">environment  <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.choice([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'indoor'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'outdoor'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'harsh'</span>], n, p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.2</span>])</span>
<span id="cb2-20">manufacturer <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.choice([<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'A'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'B'</span>, <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'C'</span>], n, p<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>[<span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.4</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.35</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.25</span>])</span>
<span id="cb2-21"></span>
<span id="cb2-22"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Encode categoricals for survival time generation ---</span></span>
<span id="cb2-23">env_effect <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.where(environment <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'indoor'</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>,</span>
<span id="cb2-24">             np.where(environment <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'outdoor'</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.85</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.65</span>))</span>
<span id="cb2-25"></span>
<span id="cb2-26">mfr_effect <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.where(manufacturer <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'A'</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">1.0</span>,</span>
<span id="cb2-27">             np.where(manufacturer <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'B'</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.9</span>, <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.75</span>))</span>
<span id="cb2-28"></span>
<span id="cb2-29"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- True Weibull parameters ---</span></span>
<span id="cb2-30"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Scale theta depends on covariates</span></span>
<span id="cb2-31"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Higher stress covariates -&gt; smaller theta -&gt; shorter survival</span></span>
<span id="cb2-32">gamma_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">2.0</span></span>
<span id="cb2-33"></span>
<span id="cb2-34">theta_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (</span>
<span id="cb2-35">    <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">300</span></span>
<span id="cb2-36">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> env_effect</span>
<span id="cb2-37">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> mfr_effect</span>
<span id="cb2-38">    <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> (</span>
<span id="cb2-39">        <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span></span>
<span id="cb2-40">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.03</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> machine_age</span>
<span id="cb2-41">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.15</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> usage_intensity</span>
<span id="cb2-42">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.008</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> operating_temp</span>
<span id="cb2-43">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.10</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> load_factor</span>
<span id="cb2-44">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.0001</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> rpm</span>
<span id="cb2-45">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> vibration_level</span>
<span id="cb2-46">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.10</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> oil_quality</span>
<span id="cb2-47">        <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.02</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> maintenance_count  <span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># more maintenance -&gt; longer survival</span></span>
<span id="cb2-48">    )</span>
<span id="cb2-49">)</span>
<span id="cb2-50"></span>
<span id="cb2-51"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Generate Weibull failure times via inverse CDF ---</span></span>
<span id="cb2-52">u <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>, n)</span>
<span id="cb2-53">T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> theta_true <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span> (<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>np.log(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span> u))<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">**</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span> <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">/</span> gamma_true)</span>
<span id="cb2-54"></span>
<span id="cb2-55"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Random right censoring ---</span></span>
<span id="cb2-56">C <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.random.uniform(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">50</span>, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">400</span>, n)</span>
<span id="cb2-57"></span>
<span id="cb2-58"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Observed time and event indicator ---</span></span>
<span id="cb2-59">Y     <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> np.minimum(T, C)</span>
<span id="cb2-60">delta <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> (T <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;=</span> C).astype(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>)</span>
<span id="cb2-61"></span>
<span id="cb2-62"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Build dataframe ---</span></span>
<span id="cb2-63">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.DataFrame({</span>
<span id="cb2-64">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'machine_id'</span>:         [<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f'M</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>i<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">+</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:04d}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">'</span> <span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">for</span> i <span class="kw" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">in</span> <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">range</span>(n)],</span>
<span id="cb2-65">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'machine_age'</span>:        machine_age.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>),</span>
<span id="cb2-66">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'usage_intensity'</span>:    usage_intensity.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>),</span>
<span id="cb2-67">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'operating_temp'</span>:     operating_temp.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>),</span>
<span id="cb2-68">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'load_factor'</span>:        load_factor.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>),</span>
<span id="cb2-69">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'rpm'</span>:                rpm.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>).astype(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>),</span>
<span id="cb2-70">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'vibration_level'</span>:    vibration_level.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>),</span>
<span id="cb2-71">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'oil_quality'</span>:        oil_quality.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">3</span>),</span>
<span id="cb2-72">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'maintenance_count'</span>:  maintenance_count.astype(<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">int</span>),</span>
<span id="cb2-73">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'environment'</span>:        environment,</span>
<span id="cb2-74">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'manufacturer'</span>:       manufacturer,</span>
<span id="cb2-75">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'observed_time'</span>:      Y.<span class="bu" style="color: null;
background-color: null;
font-style: inherit;">round</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">2</span>),</span>
<span id="cb2-76">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'event_observed'</span>:     delta</span>
<span id="cb2-77">})</span>
<span id="cb2-78"></span>
<span id="cb2-79"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Summary ---</span></span>
<span id="cb2-80"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Total machines    : </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>n<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb2-81"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Observed failures : </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>delta<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>delta<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">%)"</span>)</span>
<span id="cb2-82"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Censored          : </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>delta)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">sum</span>()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> (</span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">100</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">*</span>(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span><span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">-</span>delta)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">%)"</span>)</span>
<span id="cb2-83"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Mean observed time: </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>Y<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">.</span>mean()<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.1f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> hours"</span>)</span>
<span id="cb2-84"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">First 10 rows:"</span>)</span>
<span id="cb2-85"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(df.head(<span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">10</span>).to_string(index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>))</span>
<span id="cb2-86"></span>
<span id="cb2-87"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># --- Save ---</span></span>
<span id="cb2-88">os.makedirs(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../../data'</span>, exist_ok<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">True</span>)</span>
<span id="cb2-89">df.to_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../../data/machine_fleet.csv'</span>, index<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span><span class="va" style="color: #111111;
background-color: null;
font-style: inherit;">False</span>)</span>
<span id="cb2-90"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span><span class="ch" style="color: #20794D;
background-color: null;
font-style: inherit;">\n</span><span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">Dataset saved to ../../data/machine_fleet.csv"</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>Total machines    : 1000
Observed failures : 886 (88.6%)
Censored          : 114 (11.4%)
Mean observed time: 81.4 hours

First 10 rows:
machine_id  machine_age  usage_intensity  operating_temp  load_factor  rpm  vibration_level  oil_quality  maintenance_count environment manufacturer  observed_time  event_observed
     M0001         5.62             0.78           75.70         0.77 1930            4.240        0.648                 10     outdoor            A         111.39               1
     M0002        14.26             1.31           74.82         0.86 2514            4.998        0.172                 16     outdoor            A          68.56               1
     M0003        10.98             1.81          114.38         0.48 2400            8.618        0.872                  4      indoor            C          49.03               1
     M0004         8.98             1.60           74.97         0.74  885            3.730        0.613                  6       harsh            A          88.65               1
     M0005         2.34             1.71           76.32         0.70  873            8.762        0.157                 18      indoor            B         205.67               1
     M0006         2.34             1.49          105.56         0.88 1170            1.337        0.962                 14       harsh            B          69.69               1
     M0007         0.87             1.54           86.98         0.93 1403            7.880        0.518                  2     outdoor            B          79.77               0
     M0008        12.99             1.77          106.60         0.31 1521            8.552        0.073                 14      indoor            B          28.99               1
     M0009         9.02             0.87           63.92         0.77 2199            2.227        0.627                  4       harsh            B          37.09               1
     M0010        10.62             1.23           89.25         0.34  642            4.588        0.253                 11      indoor            A         106.60               1

Dataset saved to ../../data/machine_fleet.csv</code></pre>
</div>
</div>
<p>We generate a synthetic fleet of 1000 machines, each with 8 continuous and 2 categorical covariates representing realistic machine characteristics — age, operating temperature, vibration level, manufacturer, and so on. The true failure times are drawn from a Weibull distribution with shape <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20=%202.0"> and a scale parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> that depends on the covariates — higher stress (temperature, vibration, load) shrinks <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and shortens expected lifetime, while more maintenance history extends it.</p>
<p>For now, we work only with the observed times — whether the machine failed or was censored. But in reality, a machine’s lifetime depends on its physical characteristics — how old it is, how hard it runs, the temperature it operates at, how well it has been maintained. Modeling the relationship between these characteristics and the failure time is exactly what survival regression models like the Cox Proportional Hazards model are built to do — and that is where we are headed.</p>
<p>Now, we will fit a Weibull to the observed times with <code>lifelines</code>. Here is the piece of code.</p>
<div id="cell-weibull-fit-censored" class="cell" data-execution_count="4">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb4-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lifelines <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> WeibullFitter</span>
<span id="cb4-3"></span>
<span id="cb4-4">wf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> WeibullFitter()</span>
<span id="cb4-5">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../../data/machine_fleet.csv'</span>)</span>
<span id="cb4-6"></span>
<span id="cb4-7">wf.fit(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'observed_time'</span>], event_observed<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'event_observed'</span>])</span>
<span id="cb4-8">wf.print_summary()</span></code></pre></div></div>
<div id="weibull-fit-censored" class="cell-output cell-output-display">
<div>


<table class="dataframe caption-top table table-sm table-striped small" data-border="1">
<tbody>
<tr class="odd">
<th data-quarto-table-cell-role="th">model</th>
<td>lifelines.WeibullFitter</td>
</tr>
<tr class="even">
<th data-quarto-table-cell-role="th">number of observations</th>
<td>1000</td>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th">number of events observed</th>
<td>886</td>
</tr>
<tr class="even">
<th data-quarto-table-cell-role="th">log-likelihood</th>
<td>-4684.21</td>
</tr>
<tr class="odd">
<th data-quarto-table-cell-role="th">hypothesis</th>
<td>lambda_ != 1, rho_ != 1</td>
</tr>
</tbody>
</table>

</div>
<table class="dataframe caption-top table table-sm table-striped small" data-border="1">
<thead>
<tr class="header">
<th data-quarto-table-cell-role="th" style="min-width: 12px"></th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">coef</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">se(coef)</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">coef lower 95%</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">coef upper 95%</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">cmp to</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">z</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">p</th>
<th data-quarto-table-cell-role="th" style="min-width: 12px">-log2(p)</th>
</tr>
</thead>
<tbody>
<tr class="odd">
<th data-quarto-table-cell-role="th">lambda_</th>
<td>97.26</td>
<td>1.80</td>
<td>93.72</td>
<td>100.79</td>
<td>1.00</td>
<td>53.41</td>
<td>&lt;0.005</td>
<td>inf</td>
</tr>
<tr class="even">
<th data-quarto-table-cell-role="th">rho_</th>
<td>1.86</td>
<td>0.05</td>
<td>1.76</td>
<td>1.95</td>
<td>1.00</td>
<td>17.69</td>
<td>&lt;0.005</td>
<td>230.18</td>
</tr>
</tbody>
</table>
<br><div>


<table class="dataframe caption-top table table-sm table-striped small" data-border="1">
<tbody>
<tr class="odd">
<th data-quarto-table-cell-role="th">AIC</th>
<td>9372.42</td>
</tr>
</tbody>
</table>

</div>
</div>
</div>
<p>886 out of 1000 machines failed during the study- an 88.6% event rate with 11.4% censored. The fitted shape parameter <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cgamma%7D%20=%201.86"> is close to but not exactly the true value of <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20=%202.0">. This is expected — we are fitting a single Weibull to all 1000 machines while ignoring the fact that each machine has a different <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> driven by its covariates. By fitting a single Weibull, we are forcing one distribution to describe a heterogeneous fleet of machines. This is precisely why survival regression methods exist- to model how machine characteristics influence lifetime. We will get there eventually. But first, let’s assess how well this naive fit describes the data using diagnostic tools.</p>
</section>
</section>
<section id="diagnostic-tool-q-q-plot" class="level3">
<h3 class="anchored" data-anchor-id="diagnostic-tool-q-q-plot">Diagnostic Tool: Q-Q Plot</h3>
<p>The Q-Q plot is a graphical method to display whether a dataset follows a particular distribution or whether two datasets come from the same population. For our purposes, we are testing if the dataset comes from a particular Weibull or not. Note that the game is rigged- we simulated data to follow a particular Weibull, so we already know the answer. But the Q-Q plot is a diagnostic tool we will use repeatedly on real data where we do not know the answer.</p>
<p>The basic idea of the Q-Q plot is to compare theoretical quantiles vs empirical quantiles. If the data fits a particular Weibull, these quantiles should match and the points should fall on a straight line. Here is how we’ll build the Q-Q plot for the Weibull step by step.</p>
<section id="step-1-theoretical-quantiles" class="level4">
<h4 class="anchored" data-anchor-id="step-1-theoretical-quantiles">Step 1: Theoretical Quantiles</h4>
<p>Recall that the <img src="https://latex.codecogs.com/png.latex?p">-th quantile of a probability distribution is the value <img src="https://latex.codecogs.com/png.latex?t_p"> such that <img src="https://latex.codecogs.com/png.latex?P(T%20%5Cleq%20t_p)%20=%20p"> where <img src="https://latex.codecogs.com/png.latex?p%20%5Cin%20%5B0,%201%5D">. For instance, the 0.5 quantile (also called the 50th percentile) is the median. Let’s derive the theoretical quantiles of the Weibull distribution.</p>
<p>We want to find <img src="https://latex.codecogs.com/png.latex?t_p"> such that <img src="https://latex.codecogs.com/png.latex?F(t_p)%20=%20p">. If <img src="https://latex.codecogs.com/png.latex?T%20%5Csim%20%5Ctext%7BWeibull%7D(%5Ctheta,%20%5Cgamma)">, then:</p>
<p><img src="https://latex.codecogs.com/png.latex?1%20-%20%5Cexp%5Cleft%5C%7B-%5Cleft(%5Cfrac%7Bt_p%7D%7B%5Ctheta%7D%5Cright)%5E%5Cgamma%5Cright%5C%7D%20=%20p"></p>
<p>Rearranging:</p>
<p><img src="https://latex.codecogs.com/png.latex?1%20-%20p%20=%20%5Cexp%5Cleft%5C%7B-%5Cleft(%5Cfrac%7Bt_p%7D%7B%5Ctheta%7D%5Cright)%5E%5Cgamma%5Cright%5C%7D"></p>
<p>Taking logarithms of both sides:</p>
<p><img src="https://latex.codecogs.com/png.latex?-%5Cln(1-p)%20=%20%5Cleft(%5Cfrac%7Bt_p%7D%7B%5Ctheta%7D%5Cright)%5E%5Cgamma"></p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cln%5Cleft(%5Cfrac%7B1%7D%7B1-p%7D%5Cright)%20=%20%5Cleft(%5Cfrac%7Bt_p%7D%7B%5Ctheta%7D%5Cright)%5E%5Cgamma"></p>
<p>Taking the <img src="https://latex.codecogs.com/png.latex?%5Cgamma">-th root:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cboxed%7Bt_p%20=%20%5Ctheta%20%5Cleft%5B%5Cln%5Cleft(%5Cfrac%7B1%7D%7B1-p%7D%5Cright)%5Cright%5D%5E%7B1/%5Cgamma%7D%7D"></p>
<p>This is the quantile function (inverse CDF) of the Weibull distribution. Given a probability <img src="https://latex.codecogs.com/png.latex?p">, it tells us the time by which a fraction <img src="https://latex.codecogs.com/png.latex?p"> of systems are expected to have failed.</p>
<p>Here is a special value: <img src="https://latex.codecogs.com/png.latex?p%20=%201%20-%20%5Cfrac%7B1%7D%7Be%7D">. Substituting into the quantile function:</p>
<p><img src="https://latex.codecogs.com/png.latex?t_p%20=%20%5Ctheta%20%5Cleft%5B%5Cln%5Cleft(%5Cfrac%7B1%7D%7B1-(1-%5Cfrac%7B1%7D%7Be%7D)%7D%5Cright)%5Cright%5D%5E%7B1/%5Cgamma%7D%20=%20%5Ctheta%20%5Cleft%5B%5Cln%5Cleft(%5Cfrac%7B1%7D%7B%5Cfrac%7B1%7D%7Be%7D%7D%5Cright)%5Cright%5D%5E%7B1/%5Cgamma%7D%20=%20%5Ctheta%20%5Cleft%5B%5Cln(e)%5Cright%5D%5E%7B1/%5Cgamma%7D%20=%20%5Ctheta%20%5Ccdot%201%5E%7B1/%5Cgamma%7D%20=%20%5Ctheta"></p>
<p>So <img src="https://latex.codecogs.com/png.latex?t_%7B1-1/e%7D%20=%20%5Ctheta"> — the scale parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> is exactly the <img src="https://latex.codecogs.com/png.latex?(1%20-%201/e)%20%5Capprox%2063.2">th percentile of the Weibull distribution, regardless of the shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma">. This confirms what we noted earlier: by time <img src="https://latex.codecogs.com/png.latex?%5Ctheta">, approximately 63.2% of systems will have failed.</p>
</section>
<section id="step-2-empirical-quantiles" class="level4">
<h4 class="anchored" data-anchor-id="step-2-empirical-quantiles">Step 2: Empirical Quantiles</h4>
<p>The empirical quantiles are the ones you get directly from your dataset.</p>
<p><strong>Step 2a: Sort your data.</strong> Arrange the observed failure times in ascending order:</p>
<p><img src="https://latex.codecogs.com/png.latex?t_%7B(1)%7D%20%5Cleq%20t_%7B(2)%7D%20%5Cleq%20%5Ccdots%20%5Cleq%20t_%7B(n)%7D"></p>
<p>The notation <img src="https://latex.codecogs.com/png.latex?t_%7B(i)%7D"> denotes the <img src="https://latex.codecogs.com/png.latex?i">-th order statistic — <img src="https://latex.codecogs.com/png.latex?t_%7B(1)%7D"> is the smallest observed failure time, <img src="https://latex.codecogs.com/png.latex?t_%7B(2)%7D"> is the next smallest, and <img src="https://latex.codecogs.com/png.latex?t_%7B(n)%7D"> is the largest.</p>
<p><strong>Step 2b: Assign plotting positions.</strong> The <img src="https://latex.codecogs.com/png.latex?i">-th order statistic <img src="https://latex.codecogs.com/png.latex?t_%7B(i)%7D"> corresponds to an approximate quantile level:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7Bp%7D_i%20=%20%5Cfrac%7Bi%20-%200.3%7D%7Bn%20+%200.4%7D"></p>
<p>This is known as the <strong>median rank formula</strong> (or Blom plotting position). It estimates the probability level associated with the <img src="https://latex.codecogs.com/png.latex?i">-th smallest observation. The corrections <img src="https://latex.codecogs.com/png.latex?-0.3"> and <img src="https://latex.codecogs.com/png.latex?+0.4"> reduce bias compared to the naive estimate <img src="https://latex.codecogs.com/png.latex?%5Chat%7Bp%7D_i%20=%20i/n">, particularly in the tails of the distribution.</p>
</section>
<section id="step-3-constructing-the-q-q-plot" class="level4">
<h4 class="anchored" data-anchor-id="step-3-constructing-the-q-q-plot">Step 3: Constructing the Q-Q Plot</h4>
<p><strong>Step 3a:</strong> Fit the Weibull distribution to get the estimated parameters <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cgamma%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D">.</p>
<p><strong>Step 3b:</strong> For each data point <img src="https://latex.codecogs.com/png.latex?i%20=%201,%202,%20%5Cldots,%20n">:</p>
<ul>
<li>The empirical quantile is <img src="https://latex.codecogs.com/png.latex?t_%7B(i)%7D"> — the <img src="https://latex.codecogs.com/png.latex?i">-th ordered failure time from the data.</li>
<li>The approximate quantile level is <img src="https://latex.codecogs.com/png.latex?%5Chat%7Bp%7D_i%20=%20%5Cfrac%7Bi%20-%200.3%7D%7Bn%20+%200.4%7D"> from the median rank formula.</li>
<li>The theoretical quantile is:</li>
</ul>
<p><img src="https://latex.codecogs.com/png.latex?q_%7B%5Chat%7Bp%7D_i%7D%20=%20%5Chat%7B%5Ctheta%7D%5Cleft%5B%5Cln%5Cleft(%5Cfrac%7B1%7D%7B1%20-%20%5Chat%7Bp%7D_i%7D%5Cright)%5Cright%5D%5E%7B1/%5Chat%7B%5Cgamma%7D%7D"></p>
<p><strong>Step 3c:</strong> Plot the pairs <img src="https://latex.codecogs.com/png.latex?(q_%7B%5Chat%7Bp%7D_i%7D,%5C%20t_%7B(i)%7D)"> for each <img src="https://latex.codecogs.com/png.latex?i%20=%201,%202,%20%5Cldots,%20n">.</p>
<p><strong>What to look for:</strong></p>
<ul>
<li><strong>Straight line through the origin with slope 1</strong> — perfect fit. The theoretical and empirical quantiles agree at every probability level.</li>
<li><strong>Points above the diagonal</strong> — the data has heavier tails than the Weibull predicts. Some machines lasted much longer than expected.</li>
<li><strong>Points below the diagonal</strong> — the data has lighter tails than the Weibull predicts. Failures are happening earlier than the model expects.</li>
<li><strong>S-shaped curve</strong> — the data comes from a mixed distribution or the shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> is wrong.</li>
</ul>
<p>We will make these observations more precise and illuminating with a worked example shortly. Do not panic. For now we construct the Q-Q plot using only the observed failures, ignoring censored observations. This is a simplification — a more rigorous approach uses the Kaplan-Meier estimator for the empirical quantiles, which we will introduce in Part 4.</p>
<p>Let’s see how well our fitted Weibull describes the machine fleet data. Here is the Q-Q plot using the 886 observed failures.</p>
<div id="cell-fig-qqplot-fleet" class="cell" data-execution_count="5">
<div class="cell-output cell-output-display">
<div id="fig-qqplot-fleet" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-qqplot-fleet-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://madhavpr191221.github.io/blog/posts/part-3-fitting-survival-distributions/index_files/figure-html/fig-qqplot-fleet-output-1.png" width="663" height="662" class="figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-qqplot-fleet-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Q-Q plot: fitted Weibull vs observed failure times (machine fleet)
</figcaption>
</figure>
</div>
</div>
</div>
<p>The Q-Q plot tells a clear story. Most points fall below the <img src="https://latex.codecogs.com/png.latex?y%20=%20x"> diagonal — the empirical quantiles are smaller than the theoretical quantiles at the same probability level i.e <img src="https://latex.codecogs.com/png.latex?t_%7B(i)%7D%20%3C%20q_%7B%5Chat%7Bp%7D_i%7D">. What does that mean mathematically? Since the CDF <img src="https://latex.codecogs.com/png.latex?F"> is monotone increasing, this implies:</p>
<p><img src="https://latex.codecogs.com/png.latex?F(t_%7B(i)%7D)%20%3C%20F(q_%7B%5Chat%7Bp%7D_i%7D)%20=%20%5Chat%7Bp%7D_i%20%5Cquad%20%5CLongleftrightarrow%20%5Cquad%20S(t_%7B(i)%7D)%20%3E%201%20-%20%5Chat%7Bp%7D_i"></p>
<p>The fitted Weibull assigns too high a survival probability at the actual observed failure times- it thinks more machines should still be running than actually are. In simple words: the Weibull estimator is too optimistic. It predicts machines will last longer than they actually do.</p>
<p>This is the signature of a heterogeneous fleet being forced into a single distribution. Each machine has its own true <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> driven by its covariates like usage, age, temperature etc. A single Weibull cannot capture all of that simultaneously, and the Q-Q plot is telling us exactly that. This is precisely why survival regression exists and we will explore that in the later modules.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-1-contents" aria-controls="callout-1" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>A note on other Q-Q plot patterns
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-1" class="callout-1-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<ul>
<li><p><strong>Points above the diagonal</strong> — <img src="https://latex.codecogs.com/png.latex?t_%7B(i)%7D%20%3E%20q_%7B%5Chat%7Bp%7D_i%7D">, which implies <img src="https://latex.codecogs.com/png.latex?F(t_%7B(i)%7D)%20%3E%20%5Chat%7Bp%7D_i"> and <img src="https://latex.codecogs.com/png.latex?S(t_%7B(i)%7D)%20%3C%201%20-%20%5Chat%7Bp%7D_i">. The Weibull underestimates survival — it predicts more failures by time <img src="https://latex.codecogs.com/png.latex?t_%7B(i)%7D"> than actually occurred. The data has heavier tails than the model expects.</p></li>
<li><p><strong>S-shaped curve</strong> — points below the diagonal in the lower tail and above in the upper tail (or vice versa). The shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> is likely misspecified, or the data comes from a mixture of two distinct populations with different failure regimes.</p></li>
</ul>
</div>
</div>
</div>
</section>
</section>
<section id="empirical-cdf-and-the-binomial-connection" class="level3">
<h3 class="anchored" data-anchor-id="empirical-cdf-and-the-binomial-connection">Empirical CDF and the Binomial Connection</h3>
<p>I cannot help but talk about a beautiful connection between the empirical CDF and the Binomial distribution. Here is how it goes.</p>
<p>Given data <img src="https://latex.codecogs.com/png.latex?t_1,%20t_2,%20%5Cldots,%20t_n">, the empirical CDF estimates the true CDF from the data alone — no distributional assumptions, no parameters to fit. For any time <img src="https://latex.codecogs.com/png.latex?t">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n(t)%20=%20%5Cfrac%7B%5Ctext%7Bnumber%20of%20observations%7D%20%5Cleq%20t%7D%7Bn%7D"></p>
<p>Using indicator notation, this can be written compactly as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n(t)%20=%20%5Cfrac%7B1%7D%7Bn%7D%20%5Csum_%7Bi=1%7D%5E%7Bn%7D%20%5Cmathbf%7B1%7D%5C%7Bt_i%20%5Cleq%20t%5C%7D"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7B1%7D%5C%7Bt_i%20%5Cleq%20t%5C%7D"> is 1 if the <img src="https://latex.codecogs.com/png.latex?i">-th observation is less than or equal to <img src="https://latex.codecogs.com/png.latex?t">, and 0 otherwise.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-2-contents" aria-controls="callout-2" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Indicator Notation
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-2" class="callout-2-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>The indicator function <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7B1%7D%5C%7BA%5C%7D"> for an event <img src="https://latex.codecogs.com/png.latex?A"> is defined as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7B1%7D%5C%7BA%5C%7D%20=%20%5Cbegin%7Bcases%7D%201%20&amp;%20%5Ctext%7Bif%20%7D%20A%20%5Ctext%7B%20occurs%7D%20%5C%5C%200%20&amp;%20%5Ctext%7Botherwise%7D%20%5Cend%7Bcases%7D"></p>
<p>An equivalent notation uses a set <img src="https://latex.codecogs.com/png.latex?A"> and a point <img src="https://latex.codecogs.com/png.latex?x">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7B1%7D_A(x)%20=%20%5Cbegin%7Bcases%7D%201%20&amp;%20%5Ctext%7Bif%20%7D%20x%20%5Cin%20A%20%5C%5C%200%20&amp;%20%5Ctext%7Botherwise%7D%20%5Cend%7Bcases%7D"></p>
<p>In our case, <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7B1%7D%5C%7Bt_i%20%5Cleq%20t%5C%7D"> is shorthand for <img src="https://latex.codecogs.com/png.latex?%5Cmathbf%7B1%7D_%7B(-%5Cinfty,%20t%5D%7D(t_i)"> — it equals 1 if the observation <img src="https://latex.codecogs.com/png.latex?t_i"> falls in the set <img src="https://latex.codecogs.com/png.latex?(-%5Cinfty,%20t%5D">, and 0 otherwise.</p>
</div>
</div>
</div>
<p>Here is a simple example. Consider five observed failure times: <img src="https://latex.codecogs.com/png.latex?t_1%20=%20200,%5C%20t_2%20=%20350,%5C%20t_3%20=%20400,%5C%20t_4%20=%20550,%5C%20t_5%20=%20600"> hours.</p>
<ul>
<li>For <img src="https://latex.codecogs.com/png.latex?t%20%3C%20200">: no failures yet, so <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_5(t)%20=%200"></li>
<li>For <img src="https://latex.codecogs.com/png.latex?200%20%5Cleq%20t%20%3C%20350">: one failure observed, so <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_5(t)%20=%20%5Cfrac%7B1%7D%7B5%7D%20=%200.20"></li>
<li>For <img src="https://latex.codecogs.com/png.latex?350%20%5Cleq%20t%20%3C%20400">: two failures observed, so <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_5(t)%20=%20%5Cfrac%7B2%7D%7B5%7D%20=%200.40"></li>
<li>For <img src="https://latex.codecogs.com/png.latex?400%20%5Cleq%20t%20%3C%20550">: three failures observed, so <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_5(t)%20=%20%5Cfrac%7B3%7D%7B5%7D%20=%200.60"></li>
<li>For <img src="https://latex.codecogs.com/png.latex?550%20%5Cleq%20t%20%3C%20600">: four failures observed, so <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_5(t)%20=%20%5Cfrac%7B4%7D%7B5%7D%20=%200.80"></li>
<li>For <img src="https://latex.codecogs.com/png.latex?t%20%5Cgeq%20600">: all five failures observed, so <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_5(t)%20=%20%5Cfrac%7B5%7D%7B5%7D%20=%201.0"></li>
</ul>
<p>The empirical CDF is a step function — it jumps by <img src="https://latex.codecogs.com/png.latex?%5Cfrac%7B1%7D%7Bn%7D"> at each observed failure time and stays flat in between. Here is a plot showing the empirical CDF with these examples.</p>
<div id="cell-fig-ecdf-example" class="cell" data-execution_count="6">
<div class="cell-output cell-output-display">
<div id="fig-ecdf-example" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-ecdf-example-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://madhavpr191221.github.io/blog/posts/part-3-fitting-survival-distributions/index_files/figure-html/fig-ecdf-example-output-1.png" width="756" height="470" class="figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-ecdf-example-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: Empirical CDF for a simple example with 5 failure times
</figcaption>
</figure>
</div>
</div>
</div>
<p>Each jump represents one observed failure. The empirical CDF makes no assumptions about the underlying distribution — it simply counts. This is what makes it so powerful as a diagnostic tool, and as you will see in Part 4, it is also the foundation of the Kaplan-Meier estimator.</p>
<section id="the-binomial-connection" class="level4">
<h4 class="anchored" data-anchor-id="the-binomial-connection">The Binomial Connection</h4>
<p>Fix a time value <img src="https://latex.codecogs.com/png.latex?t">. For the <img src="https://latex.codecogs.com/png.latex?i">-th observation, with corresponding random variable <img src="https://latex.codecogs.com/png.latex?T_i">, define the indicator random variable:</p>
<p><img src="https://latex.codecogs.com/png.latex?X_i%20=%20%5Cbegin%7Bcases%7D%201%20&amp;%20%5Ctext%7Bif%20%7D%20T_i%20%5Cleq%20t%20%5C%5C%200%20&amp;%20%5Ctext%7Botherwise%7D%20%5Cend%7Bcases%7D"></p>
<p>Since <img src="https://latex.codecogs.com/png.latex?X_i"> is a random variable, it has an associated probability. What is <img src="https://latex.codecogs.com/png.latex?P(X_i%20=%201)">? It is simply the probability that <img src="https://latex.codecogs.com/png.latex?T_i%20%5Cleq%20t">, which is exactly the CDF of <img src="https://latex.codecogs.com/png.latex?T_i"> evaluated at <img src="https://latex.codecogs.com/png.latex?t">:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(X_i%20=%201)%20=%20P(T_i%20%5Cleq%20t)%20=%20F(t)"></p>
<p>Doesn’t this remind you of a coin toss? If we think of “failure by time <img src="https://latex.codecogs.com/png.latex?t">” as analogous to obtaining heads, then <img src="https://latex.codecogs.com/png.latex?F(t)"> plays the role of the probability of heads. The indicator <img src="https://latex.codecogs.com/png.latex?X_i"> is therefore a Bernoulli random variable with parameter <img src="https://latex.codecogs.com/png.latex?F(t)">:</p>
<p><img src="https://latex.codecogs.com/png.latex?X_i%20%5Csim%20%5Ctext%7BBernoulli%7D(F(t))"></p>
<p>Now let <img src="https://latex.codecogs.com/png.latex?X%20=%20%5Csum_%7Bi=1%7D%5En%20X_i">. This counts the total number of observations that have failed by time <img src="https://latex.codecogs.com/png.latex?t">. Since the failure times <img src="https://latex.codecogs.com/png.latex?T_1,%20T_2,%20%5Cldots,%20T_n"> are independent and identically distributed, the indicators <img src="https://latex.codecogs.com/png.latex?X_1,%20X_2,%20%5Cldots,%20X_n"> are independent Bernoulli<img src="https://latex.codecogs.com/png.latex?(F(t))"> random variables. A sum of <img src="https://latex.codecogs.com/png.latex?n"> independent Bernoulli<img src="https://latex.codecogs.com/png.latex?(p)"> random variables follows a Binomial<img src="https://latex.codecogs.com/png.latex?(n,%20p)"> distribution, so:</p>
<p><img src="https://latex.codecogs.com/png.latex?X%20%5Csim%20%5Ctext%7BBinomial%7D(n,%5C%20F(t))"></p>
<p>The expectation and variance of <img src="https://latex.codecogs.com/png.latex?X"> are:</p>
<p><img src="https://latex.codecogs.com/png.latex?E%5BX%5D%20=%20n%20%5Ccdot%20F(t),%20%5Cqquad%20%5Ctext%7BVar%7D%5BX%5D%20=%20n%20%5Ccdot%20F(t)%20%5Ccdot%20(1%20-%20F(t))%20=%20n%20%5Ccdot%20F(t)%20%5Ccdot%20S(t)"></p>
</section>
<section id="connection-with-the-empirical-cdf" class="level4">
<h4 class="anchored" data-anchor-id="connection-with-the-empirical-cdf">Connection with the Empirical CDF</h4>
<p>The empirical CDF is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n(t)%20=%20%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi=1%7D%5En%20%5Cmathbf%7B1%7D%5C%7Bt_i%20%5Cleq%20t%5C%7D%20=%20%5Cfrac%7B1%7D%7Bn%7D%5Csum_%7Bi=1%7D%5En%20X_i%20=%20%5Cfrac%7BX%7D%7Bn%7D"></p>
<p>Since <img src="https://latex.codecogs.com/png.latex?X%20%5Csim%20%5Ctext%7BBinomial%7D(n,%20F(t))">, we have:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(X%20=%20k)%20=%20%5Cbinom%7Bn%7D%7Bk%7D%20%5BF(t)%5D%5Ek%20%5B1%20-%20F(t)%5D%5E%7Bn-k%7D"></p>
<p><img src="https://latex.codecogs.com/png.latex?X"> takes integer values <img src="https://latex.codecogs.com/png.latex?%5C%7B0,%201,%20%5Cldots,%20n%5C%7D">, so <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n(t)%20=%20X/n"> takes values <img src="https://latex.codecogs.com/png.latex?%5Cleft%5C%7B0,%20%5Cfrac%7B1%7D%7Bn%7D,%20%5Cfrac%7B2%7D%7Bn%7D,%20%5Cldots,%201%5Cright%5C%7D"> — and the difference between consecutive values is exactly the jump size <img src="https://latex.codecogs.com/png.latex?%5Cfrac%7B1%7D%7Bn%7D"> we observed in the step function plot. The probability that the empirical CDF takes the value <img src="https://latex.codecogs.com/png.latex?k/n"> is:</p>
<p><img src="https://latex.codecogs.com/png.latex?P%5Cleft(%5Chat%7BF%7D_n(t)%20=%20%5Cfrac%7Bk%7D%7Bn%7D%5Cright)%20=%20P(X%20=%20k)%20=%20%5Cbinom%7Bn%7D%7Bk%7D%20%5BF(t)%5D%5Ek%20%5B1%20-%20F(t)%5D%5E%7Bn-k%7D"></p>
<p>The moments of <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n(t)"> follow directly from those of <img src="https://latex.codecogs.com/png.latex?X">:</p>
<p><img src="https://latex.codecogs.com/png.latex?E%5B%5Chat%7BF%7D_n(t)%5D%20=%20%5Cfrac%7BE%5BX%5D%7D%7Bn%7D%20=%20%5Cfrac%7Bn%20%5Ccdot%20F(t)%7D%7Bn%7D%20=%20F(t)"></p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D%5B%5Chat%7BF%7D_n(t)%5D%20=%20%5Cfrac%7B%5Ctext%7BVar%7D%5BX%5D%7D%7Bn%5E2%7D%20=%20%5Cfrac%7Bn%20%5Ccdot%20F(t)%20%5Ccdot%20S(t)%7D%7Bn%5E2%7D%20=%20%5Cfrac%7BF(t)%20%5Ccdot%20S(t)%7D%7Bn%7D%20=%20%5Cfrac%7BF(t)(1%20-%20F(t))%7D%7Bn%7D"></p>
<p>Two beautiful results — the empirical CDF is an <strong>unbiased estimator</strong> of the true CDF <img src="https://latex.codecogs.com/png.latex?F(t)">, and its variance shrinks as <img src="https://latex.codecogs.com/png.latex?n"> grows. The larger the dataset, the more precisely <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n(t)"> estimates <img src="https://latex.codecogs.com/png.latex?F(t)"> at every time point <img src="https://latex.codecogs.com/png.latex?t">.</p>
</section>
</section>
<section id="the-kolmogorov-smirnov-k-s-test" class="level3">
<h3 class="anchored" data-anchor-id="the-kolmogorov-smirnov-k-s-test">The Kolmogorov-Smirnov (K-S) Test</h3>
<p>The core idea of the K-S test is simple. You have two CDFs:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n(t)"> — the empirical CDF, computed from the data</li>
<li><img src="https://latex.codecogs.com/png.latex?F(t;%20%5Chat%7B%5Ctheta%7D,%20%5Chat%7B%5Cgamma%7D)"> — the fitted Weibull CDF</li>
</ul>
<p>At every point <img src="https://latex.codecogs.com/png.latex?t">, these two curves have some vertical distance between them. The K-S test statistic is simply the largest such distance:</p>
<p><img src="https://latex.codecogs.com/png.latex?D_n%20=%20%5Csup_t%20%5Cleft%7C%5Chat%7BF%7D_n(t)%20-%20F(t;%20%5Chat%7B%5Ctheta%7D,%20%5Chat%7B%5Cgamma%7D)%5Cright%7C"></p>
<p><img src="https://latex.codecogs.com/png.latex?D_n%20=%20%5C%7C%5Chat%7BF%7D_n%20-%20F(%5Ccdot%5C,%20;%5Chat%7B%5Ctheta%7D,%20%5Chat%7B%5Cgamma%7D)%5C%7C_%5Cinfty"></p>
<p>where <img src="https://latex.codecogs.com/png.latex?%5Csup"> stands for supremum — the least upper bound of a set of real numbers that’s (at least) bounded above. (i.e., there exists some number <img src="https://latex.codecogs.com/png.latex?M"> such that every element in the set is <img src="https://latex.codecogs.com/png.latex?%5Cleq%20M">. This number M is called an upper bound). The axiom of completeness states that any set of real numbers that is bounded above has a supremum. If you are not familiar with supremum, replace it with maximum and you will be fine for all practical purposes.</p>
<p>Intuitively, if the fit is perfect, <img src="https://latex.codecogs.com/png.latex?D_n%20=%200">. If the fit is terrible, <img src="https://latex.codecogs.com/png.latex?D_n"> is large.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-3-contents" aria-controls="callout-3" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>The Infinity Norm
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-3" class="callout-3-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>A function <img src="https://latex.codecogs.com/png.latex?f"> on a domain <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BD%7D"> is said to be <strong>bounded</strong> if there exists a real number <img src="https://latex.codecogs.com/png.latex?M%20%3E%200"> such that <img src="https://latex.codecogs.com/png.latex?%7Cf(x)%7C%20%5Cleq%20M"> for all <img src="https://latex.codecogs.com/png.latex?x%20%5Cin%20%5Cmathcal%7BD%7D">.</p>
<p>The infinity norm (or sup norm) of a bounded function <img src="https://latex.codecogs.com/png.latex?f"> on a domain <img src="https://latex.codecogs.com/png.latex?%5Cmathcal%7BD%7D"> is defined as:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5C%7Cf%5C%7C_%5Cinfty%20=%20%5Csup_%7Bx%20%5Cin%20%5Cmathcal%7BD%7D%7D%20%7Cf(x)%7C"></p>
<p>The infinity norm naturally induces a metric between two functions <img src="https://latex.codecogs.com/png.latex?f"> and <img src="https://latex.codecogs.com/png.latex?g">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5C%7Cf%20-%20g%5C%7C_%5Cinfty%20=%20%5Csup_%7Bx%20%5Cin%20%5Cmathcal%7BD%7D%7D%20%7Cf(x)%20-%20g(x)%7C"></p>
<p>It measures the <strong>largest</strong> absolute value the function attains over its entire domain — the worst-case deviation. In our case, <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n%20-%20F(%5Ccdot%5C,;%5Chat%7B%5Ctheta%7D,%5Chat%7B%5Cgamma%7D)"> is a function of time <img src="https://latex.codecogs.com/png.latex?t">, and <img src="https://latex.codecogs.com/png.latex?D_n%20=%20%5C%7C%5Chat%7BF%7D_n%20-%20F%5C%7C_%5Cinfty"> is its largest absolute value over all <img src="https://latex.codecogs.com/png.latex?t%20%5Cgeq%200">. The infinity norm is the natural norm on the space of bounded functions <img src="https://latex.codecogs.com/png.latex?B(%5Cmathbb%7BR%7D)"> and plays a central role in functional analysis and approximation theory.</p>
<p>The infinity norm is indeed a norm — it satisfies non-negativity, homogeneity, and the triangle inequality. Verifying these properties is a standard exercise in real analysis. The triangle inequality in particular:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5C%7Cf%20+%20g%5C%7C_%5Cinfty%20%5Cleq%20%5C%7Cf%5C%7C_%5Cinfty%20+%20%5C%7Cg%5C%7C_%5Cinfty"></p>
<p>follows directly from <img src="https://latex.codecogs.com/png.latex?%7Cf(x)%20+%20g(x)%7C%20%5Cleq%20%7Cf(x)%7C%20+%20%7Cg(x)%7C"> and taking the supremum on both sides. In fact, the space of bounded functions on <img src="https://latex.codecogs.com/png.latex?%5Cmathbb%7BR%7D"> equipped with the infinity norm, denoted <img src="https://latex.codecogs.com/png.latex?B(%5Cmathbb%7BR%7D)">, is not just a normed vector space but a <strong>Banach space</strong> — a complete normed vector space where every Cauchy sequence converges. This completeness property is what makes the infinity norm so powerful in analysis.</p>
</div>
</div>
</div>
<section id="the-hypothesis-test" class="level4">
<h4 class="anchored" data-anchor-id="the-hypothesis-test">The Hypothesis Test</h4>
<p><img src="https://latex.codecogs.com/png.latex?H_0:%20%5Ctext%7Bthe%20data%20comes%20from%20%7D%20F(t;%20%5Chat%7B%5Ctheta%7D,%20%5Chat%7B%5Cgamma%7D)"> <img src="https://latex.codecogs.com/png.latex?H_1:%20%5Ctext%7Bit%20doesn't%7D"></p>
<p>If <img src="https://latex.codecogs.com/png.latex?D_n"> is large enough — larger than what you would expect by chance if <img src="https://latex.codecogs.com/png.latex?H_0"> is true — you reject the null.</p>
</section>
<section id="distribution-of-d_n" class="level4">
<h4 class="anchored" data-anchor-id="distribution-of-d_n">Distribution of <img src="https://latex.codecogs.com/png.latex?D_n"></h4>
<p>Without going into the depths of hell, we simply state that under <img src="https://latex.codecogs.com/png.latex?H_0"> with <strong>known parameters</strong>, <img src="https://latex.codecogs.com/png.latex?%5Csqrt%7Bn%7D%20%5Ccdot%20D_n"> converges in distribution to the Kolmogorov distribution <img src="https://latex.codecogs.com/png.latex?K">. The CDF of <img src="https://latex.codecogs.com/png.latex?K"> has the closed form:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(K%20%5Cleq%20x)%20=%201%20-%202%5Csum_%7Bk=1%7D%5E%7B%5Cinfty%7D%20(-1)%5E%7Bk-1%7D%20e%5E%7B-2k%5E2x%5E2%7D"></p>
</section>
<section id="a-critical-caveat" class="level4">
<h4 class="anchored" data-anchor-id="a-critical-caveat">A Critical Caveat</h4>
<p>The standard K-S test assumes the parameters <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> are <strong>known in advance</strong> — not estimated from the same data. In our case, we estimated <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Ctheta%7D"> and <img src="https://latex.codecogs.com/png.latex?%5Chat%7B%5Cgamma%7D"> from the machine fleet data using MLE, and then used those estimates to construct <img src="https://latex.codecogs.com/png.latex?F(t;%20%5Chat%7B%5Ctheta%7D,%20%5Chat%7B%5Cgamma%7D)">. This makes <img src="https://latex.codecogs.com/png.latex?D_n"> artificially small — the fitted distribution has already been pulled toward the data, so the two curves are closer than they would be with truly known parameters. As a result, the standard K-S p-values are too optimistic and should not be taken at face value.</p>
<p>For our purposes, we use the K-S test as a descriptive tool — a way to quantify how close the fit is — rather than as a strict hypothesis test. The Q-Q plot already told us the story visually. The K-S statistic puts a number on it. Let’s see via code what that number looks like.</p>
<div id="ks-test-fleet" class="cell" data-execution_count="7">
<div class="code-copy-outer-scaffold"><div class="sourceCode cell-code" id="cb5" style="background: #f1f3f5;"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> numpy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> np</span>
<span id="cb5-2"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> pandas <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">as</span> pd</span>
<span id="cb5-3"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> lifelines <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> WeibullFitter</span>
<span id="cb5-4"><span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">from</span> scipy <span class="im" style="color: #00769E;
background-color: null;
font-style: inherit;">import</span> stats</span>
<span id="cb5-5"></span>
<span id="cb5-6">df <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> pd.read_csv(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'../../data/machine_fleet.csv'</span>)</span>
<span id="cb5-7"></span>
<span id="cb5-8">wf <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> WeibullFitter()</span>
<span id="cb5-9">wf.fit(df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'observed_time'</span>], event_observed<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'event_observed'</span>])</span>
<span id="cb5-10"></span>
<span id="cb5-11">gamma_hat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> wf.rho_</span>
<span id="cb5-12">theta_hat <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> wf.lambda_</span>
<span id="cb5-13"></span>
<span id="cb5-14"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Use only observed failures</span></span>
<span id="cb5-15">t_obs <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> df[df[<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'event_observed'</span>] <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">==</span> <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">1</span>][<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'observed_time'</span>].values</span>
<span id="cb5-16"></span>
<span id="cb5-17"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># K-S test against fitted Weibull</span></span>
<span id="cb5-18"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># scipy uses the standard Weibull parameterization</span></span>
<span id="cb5-19"><span class="co" style="color: #5E5E5E;
background-color: null;
font-style: inherit;"># Weibull(c, scale) where c = gamma, scale = theta</span></span>
<span id="cb5-20">ks_stat, p_value <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span> stats.kstest(</span>
<span id="cb5-21">    t_obs,</span>
<span id="cb5-22">    <span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">'weibull_min'</span>,</span>
<span id="cb5-23">    args<span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">=</span>(gamma_hat, <span class="dv" style="color: #AD0000;
background-color: null;
font-style: inherit;">0</span>, theta_hat)</span>
<span id="cb5-24">)</span>
<span id="cb5-25"></span>
<span id="cb5-26"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"K-S statistic  : </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ks_stat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.4f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb5-27"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"p-value        : </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>p_value<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.4f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb5-28"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"Sample size    : </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">len</span>(t_obs)<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">"</span>)</span>
<span id="cb5-29"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>()</span>
<span id="cb5-30"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Interpretation:"</span>)</span>
<span id="cb5-31"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">if</span> p_value <span class="op" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">&lt;</span> <span class="fl" style="color: #AD0000;
background-color: null;
font-style: inherit;">0.05</span>:</span>
<span id="cb5-32">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  D_n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ks_stat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.4f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> — reject H0 at 5% significance."</span>)</span>
<span id="cb5-33">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"  The fitted Weibull does not describe the data well."</span>)</span>
<span id="cb5-34"><span class="cf" style="color: #003B4F;
background-color: null;
font-weight: bold;
font-style: inherit;">else</span>:</span>
<span id="cb5-35">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;">f"  D_n = </span><span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">{</span>ks_stat<span class="sc" style="color: #5E5E5E;
background-color: null;
font-style: inherit;">:.4f}</span><span class="ss" style="color: #20794D;
background-color: null;
font-style: inherit;"> — fail to reject H0 at 5% significance."</span>)</span>
<span id="cb5-36">    <span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"  The fitted Weibull is a reasonable description of the data."</span>)</span>
<span id="cb5-37"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>()</span>
<span id="cb5-38"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"Note: p-value is anti-conservative since parameters were"</span>)</span>
<span id="cb5-39"><span class="bu" style="color: null;
background-color: null;
font-style: inherit;">print</span>(<span class="st" style="color: #20794D;
background-color: null;
font-style: inherit;">"estimated from the same data. Use as descriptive tool only."</span>)</span></code></pre></div></div>
<div class="cell-output cell-output-stdout">
<pre><code>K-S statistic  : 0.0669
p-value        : 0.0007
Sample size    : 886

Interpretation:
  D_n = 0.0669 — reject H0 at 5% significance.
  The fitted Weibull does not describe the data well.

Note: p-value is anti-conservative since parameters were
estimated from the same data. Use as descriptive tool only.</code></pre>
</div>
</div>
<p><img src="https://latex.codecogs.com/png.latex?D_n%20=%200.067"> and <img src="https://latex.codecogs.com/png.latex?p%20=%200.0007"> are both computed from the same test statistic, but they answer different questions. <img src="https://latex.codecogs.com/png.latex?D_n"> measures the size of the deviation — 6.7 percentage points at most, which is practically small. The p-value measures whether a deviation this large is surprising given the sample size. With <img src="https://latex.codecogs.com/png.latex?n%20=%20886">, even a small <img src="https://latex.codecogs.com/png.latex?D_n"> becomes highly significant — the test has enough power to confidently declare the fit imperfect. This is a well known phenomenon in hypothesis testing — with large sample sizes, even small and practically irrelevant deviations from the null become statistically significant. The test is not broken. It is doing exactly what it is designed to do: detect any deviation from <img src="https://latex.codecogs.com/png.latex?H_0">, no matter how small, given enough data. Whether that deviation matters in practice is a separate question that the p-value cannot answer.</p>
<p>Let’s spend a few minutes to analyze the output <img src="https://latex.codecogs.com/png.latex?D_n%20=%200.067"> and <img src="https://latex.codecogs.com/png.latex?p%20=%200.0007"> in practical terms. The small p-value tells us that the observed data is unlikely to have come from the fitted Weibull. Fine. The test statistic <img src="https://latex.codecogs.com/png.latex?D_n"> tells us that the fitted CDF and the empirical CDF are never off by more than 6.7 percent at any time point.</p>
<p>Because for any time <img src="https://latex.codecogs.com/png.latex?t">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%7C%5Chat%7BF%7D_n(t)%20-%20F(t;%5Chat%7B%5Ctheta%7D,%20%5Chat%7B%5Cgamma%7D)%7C%20%5Cleq%20D_n%20=%200.067"></p>
<p>Now suppose we use the fitted Weibull to make a decision: Schedule maintenance at time t* where the fitted CDF reaches 30% - i.e.&nbsp;act before 30% of the fleet has failed. Solving for t* gives,</p>
<p><img src="https://latex.codecogs.com/png.latex?t%5E*%20=%20%5Chat%7B%5Ctheta%7D%5Cleft%5B%5Cln%5Cleft(%5Cfrac%7B1%7D%7B1-0.30%7D%5Cright)%5Cright%5D%5E%7B1/%5Chat%7B%5Cgamma%7D%7D%20%5Capprox%2071%5C%20%5Ctext%7Bhours%7D"></p>
<p>We schedule maintenance at 71 hours. But by the K-S bound, the actual observed failure fraction at <img src="https://latex.codecogs.com/png.latex?t=71"> hours satisfies:</p>
<p><img src="https://latex.codecogs.com/png.latex?%7C%5Chat%7BF%7D_n(71)%20-%200.30%7C%20%5Cleq%200.067%20%5Cimplies%20%5Chat%7BF%7D_n(71)%20%5Cin%20%5B0.233,%5C%200.367%5D"></p>
<p>Out of 1000 machines, between 233 and 367 have actually failed by the time we trigger maintenance — a swing of <img src="https://latex.codecogs.com/png.latex?%5Cpm%2067"> machines driven entirely by the model’s imprecision. Whether that uncertainty is acceptable depends on the cost of an unplanned failure vs the cost of preventive maintenance. Let’s assume:</p>
<ul>
<li>Cost of an <strong>unplanned failure</strong> (breakdown, emergency repair, downtime): ₹5,00,000 per machine</li>
<li>Cost of <strong>preventive maintenance</strong> (scheduled, planned): ₹50,000 per machine</li>
</ul>
<p>At <img src="https://latex.codecogs.com/png.latex?t%5E*%20=%2071"> hours, you service all 1000 machines regardless. The question is how many had already failed before you arrived.</p>
<p><strong>Best case</strong> <img src="https://latex.codecogs.com/png.latex?%5Cleft(%5Chat%7BF%7D_n(71)%20=%200.233%5Cright)">: 233 machines had already failed, 767 are still running.</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7BTotal%20cost%7D%20=%20233%20%5Ctimes%205%7B,%7D00%7B,%7D000%20+%20767%20%5Ctimes%2050%7B,%7D000%20=%20%5Ctext%7B%E2%82%B911,65,00,000%7D%20+%20%5Ctext%7B%E2%82%B93,83,50,000%7D%20=%20%5Ctext%7B%E2%82%B915,48,50,000%7D"></p>
<p><strong>Worst case</strong> <img src="https://latex.codecogs.com/png.latex?%5Cleft(%5Chat%7BF%7D_n(71)%20=%200.367%5Cright)">: 367 machines had already failed, 633 are still running.</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7BTotal%20cost%7D%20=%20367%20%5Ctimes%205%7B,%7D00%7B,%7D000%20+%20633%20%5Ctimes%2050%7B,%7D000%20=%20%5Ctext%7B%E2%82%B918,35,00,000%7D%20+%20%5Ctext%7B%E2%82%B93,16,50,000%7D%20=%20%5Ctext%7B%E2%82%B921,51,50,000%7D"></p>
<p>The uncertainty from <img src="https://latex.codecogs.com/png.latex?D_n%20=%200.067"> alone translates into a cost swing of:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7B%E2%82%B921,51,50,000%7D%20-%20%5Ctext%7B%E2%82%B915,48,50,000%7D%20=%20%5Ctext%7B%E2%82%B96,03,00,000%7D"></p>
<p>approximately ₹6 crore. And crucially — we cannot identify <em>which</em> 67 machines are driving this uncertainty without knowing their individual characteristics. That requires modeling the effect of covariates on failure time. This is precisely what survival regression is built to do — and where we are headed. Let’s visualize how the cost swing and the trigger time <img src="https://latex.codecogs.com/png.latex?t%5E*"> vary across all possible thresholds from 5% to 95%.</p>
<div id="cell-fig-cost-swing" class="cell" data-execution_count="8">
<div class="cell-output cell-output-display">
<div id="fig-cost-swing" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-cost-swing-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://madhavpr191221.github.io/blog/posts/part-3-fitting-survival-distributions/index_files/figure-html/fig-cost-swing-output-1.png" width="1335" height="470" class="figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-cost-swing-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;5: Cost swing due to model uncertainty (D_n = 0.067) across maintenance thresholds
</figcaption>
</figure>
</div>
</div>
</div>
<p>The cost swing curve is nearly flat at ₹6 crore across all thresholds — this is not a coincidence. Since <img src="https://latex.codecogs.com/png.latex?D_n"> is a uniform bound over all <img src="https://latex.codecogs.com/png.latex?t">, the worst-case cost uncertainty is approximately:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCost%20swing%7D%20%5Capprox%202%20D_n%20%5Ctimes%20n%20%5Ctimes%20(c_%7B%5Ctext%7Bfailure%7D%7D%20-%20c_%7B%5Ctext%7Bmaintenance%7D%7D)%0A=%202%20%5Ctimes%200.067%20%5Ctimes%201000%20%5Ctimes%204%7B,%7D50%7B,%7D000%20=%20%5Ctext%7B%E2%82%B96.03%20crore%7D"> regardless of which threshold you choose. A better model — one that reduces <img src="https://latex.codecogs.com/png.latex?D_n"> — would shift this entire curve downward uniformly. The right panel shows the trigger time <img src="https://latex.codecogs.com/png.latex?t%5E*"> growing rapidly with threshold — waiting for 90% of the fleet to fail before intervening means waiting nearly 180 hours.</p>
<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center collapsed" data-bs-toggle="collapse" data-bs-target=".callout-4-contents" aria-controls="callout-4" aria-expanded="false" aria-label="Toggle callout">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
<span class="screen-reader-only">Note</span>Derivation of the Cost Swing Formula
</div>
<div class="callout-btn-toggle d-inline-block border-0 py-1 ps-1 pe-0 float-end"><i class="callout-toggle"></i></div>
</div>
<div id="callout-4" class="callout-4-contents callout-collapse collapse">
<div class="callout-body-container callout-body">
<p>At threshold <img src="https://latex.codecogs.com/png.latex?p%5E*">, you trigger maintenance at <img src="https://latex.codecogs.com/png.latex?t%5E*"> and service all <img src="https://latex.codecogs.com/png.latex?n"> machines. Of those, <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n(t%5E*)%20%5Ctimes%20n"> had already failed and each costs <img src="https://latex.codecogs.com/png.latex?c_%7B%5Ctext%7Bfailure%7D%7D">, while <img src="https://latex.codecogs.com/png.latex?(1%20-%20%5Chat%7BF%7D_n(t%5E*))%20%5Ctimes%20n"> are still running and each costs <img src="https://latex.codecogs.com/png.latex?c_%7B%5Ctext%7Bmaintenance%7D%7D">. The total cost is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7BTotal%20cost%7D%20=%20%5Chat%7BF%7D_n(t%5E*)%20%5Ccdot%20n%20%5Ccdot%20c_%7B%5Ctext%7Bfailure%7D%7D%20+%20(1%20-%20%5Chat%7BF%7D_n(t%5E*))%20%5Ccdot%20n%20%5Ccdot%20c_%7B%5Ctext%7Bmaintenance%7D%7D"></p>
<p>By the K-S bound, <img src="https://latex.codecogs.com/png.latex?%5Chat%7BF%7D_n(t%5E*)"> lies in <img src="https://latex.codecogs.com/png.latex?%5Bp%5E*%20-%20D_n,%5C%20p%5E*%20+%20D_n%5D">, so:</p>
<p><img src="https://latex.codecogs.com/png.latex?p_%7B%5Ctext%7Bbest%7D%7D%20=%20p%5E*%20-%20D_n,%20%5Cqquad%20p_%7B%5Ctext%7Bworst%7D%7D%20=%20p%5E*%20+%20D_n"></p>
<p>The cost swing is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Ctext%7BCost%20swing%7D%20=%20%5Ctext%7BCost%7D_%7B%5Ctext%7Bworst%7D%7D%20-%20%5Ctext%7BCost%7D_%7B%5Ctext%7Bbest%7D%7D"></p>
<p><img src="https://latex.codecogs.com/png.latex?=%20%5Cleft%5Bp_%7B%5Ctext%7Bworst%7D%7D%20%5Ccdot%20n%20%5Ccdot%20c_%7B%5Ctext%7Bfailure%7D%7D%20+%20(1-p_%7B%5Ctext%7Bworst%7D%7D)%20%5Ccdot%20n%20%5Ccdot%20c_%7B%5Ctext%7Bmaintenance%7D%7D%5Cright%5D%20-%20%5Cleft%5Bp_%7B%5Ctext%7Bbest%7D%7D%20%5Ccdot%20n%20%5Ccdot%20c_%7B%5Ctext%7Bfailure%7D%7D%20+%20(1-p_%7B%5Ctext%7Bbest%7D%7D)%20%5Ccdot%20n%20%5Ccdot%20c_%7B%5Ctext%7Bmaintenance%7D%7D%5Cright%5D"></p>
<p><img src="https://latex.codecogs.com/png.latex?=%20n(p_%7B%5Ctext%7Bworst%7D%7D%20-%20p_%7B%5Ctext%7Bbest%7D%7D)(c_%7B%5Ctext%7Bfailure%7D%7D%20-%20c_%7B%5Ctext%7Bmaintenance%7D%7D)"></p>
<p><img src="https://latex.codecogs.com/png.latex?=%20n%20%5Ccdot%202D_n%20%5Ccdot%20(c_%7B%5Ctext%7Bfailure%7D%7D%20-%20c_%7B%5Ctext%7Bmaintenance%7D%7D)"></p>
<p>This is constant across all thresholds <img src="https://latex.codecogs.com/png.latex?p%5E*"> — the cost swing depends only on <img src="https://latex.codecogs.com/png.latex?D_n">, <img src="https://latex.codecogs.com/png.latex?n">, and the cost difference, not on which threshold you choose.</p>
</div>
</div>
</div>
</section>
</section>
<section id="whats-next" class="level3">
<h3 class="anchored" data-anchor-id="whats-next">What’s Next?</h3>
<p>We have covered a lot of ground in Part 3. We started with the likelihood function- a truly theoretical object- and derived the censored likelihood from first principles. We then fit a Weibull to our fleet machine and assessed the fit using two diagnostic tools- the Q-Q plot, which told the story visually and the K-S test, which put a number on it.</p>
<p>Along the way, we took a detour that went from the supremum norm of functional analysis <img src="https://latex.codecogs.com/png.latex?D_n%20=%20%5C%7C%5Chat%7BF%7D_n%20-%20F%5C%7C_%5Cinfty"> - all the way to a ₹6 crore cost swing for a fleet of 1000 machines. This is the range this series (and survival analysis) operates in: rigorous mathematics grounded in practical consequences. The K-S test is rooted in advanced concepts in stochastic processes and functional analysis but its practical meaning is completely accessible: how wrong can my model be, and what does that cost me?</p>
<p>One thing that Part-3 has made very clear: a single Weibull is not enough for a heterogeneous fleet of machines. The Q-Q plot showed deviations from the diagonal and the K-S test rejected the null hypothesis. The cost analysis showed ₹6 crore of uncertainty. We need a better model- one that accounts for the fact that different machines have different failure characteristics.</p>
<p>But before we get to regression, we need a better estimator of the survival function itself - one that handles censored data properly, unlike the naive empirical CDF we used in the Q-Q plot.</p>
<p>In Part-4 we derive the most famous <strong>Kaplan-Meier estimator</strong> from first principles, prove <strong>Greenwood’s formula</strong> for its variance, handle tied failure times rigorously, and apply both to our machine fleet. Most survival analysis blogs introduce this estimator in the very beginning- like an entry point to the field. We are arriving at it in Part-4, after three parts of solid mathematical groundwork: censoring, hazard functions, parametric distributions, MLE and the empirical CDF. This is not a detour. This is the foundation that the Kaplan-Meier estimatior deserves. The clock is still ticking. See you in the next post.</p>


</section>
</section>

 ]]></description>
  <category>survival analysis</category>
  <category>statistics</category>
  <category>python</category>
  <category>MLE</category>
  <guid>https://madhavpr191221.github.io/blog/posts/part-3-fitting-survival-distributions/</guid>
  <pubDate>Sat, 18 Apr 2026 18:30:00 GMT</pubDate>
</item>
<item>
  <title>Part 2: Distributions in Survival Analysis</title>
  <dc:creator>Madhav Prashanth Ramachandran</dc:creator>
  <link>https://madhavpr191221.github.io/blog/posts/part2-distributions-in-survival-analysis/</link>
  <description><![CDATA[ 





<section id="recap-of-part-1-and-what-to-expect-in-part-2" class="level2">
<h2 class="anchored" data-anchor-id="recap-of-part-1-and-what-to-expect-in-part-2">Recap of Part 1 and What to Expect in Part 2</h2>
<p>In <a href="https://madhavpr191221.github.io/blog/posts/part-1-why-survival-analysis-exists/"><strong>Part 1</strong></a>, we established why survival analysis exists as a separate field- regression breaks, censoring is real and the functions we care about for survival analysis are fundamentally different from conditional expectations. Using basic calculus and probability theory, we introduced the key actors in survival analysis- the survival function, the hazard function and the cumulative hazard function. We showed that if you know one of these, you know all of them. In Part 2, we ask a natural follow up question- what does a survival distribution look like? The answer entirely depends on the shape of <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)">, the hazard function. We will see how different shapes of the hazard function lead to different survival curves and what that means in real life. A constant hazard leads to something familiar from undergrad probability. A linearly increasing hazard leads somewhere surprising. And a two parameter family called Weibull quietly unifies both of these and more. We will also understand the bathtub curve, a common shape for the hazard function in real life. Finally, we will meet the bathtub curve, a common pattern of failure in real life that cannot be captured by a single parametric distribution but can be approximated by stitching together different phases of the Weibull distribution.</p>
</section>
<section id="distributions-in-survival-analysis" class="level2">
<h2 class="anchored" data-anchor-id="distributions-in-survival-analysis">Distributions in Survival Analysis</h2>
<section id="constant-hazard-the-exponential-distribution" class="level3">
<h3 class="anchored" data-anchor-id="constant-hazard-the-exponential-distribution">Constant Hazard: The Exponential Distribution</h3>
<p>What happens if the hazard function is constant over time? That is, <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20%5Clambda"> for all <img src="https://latex.codecogs.com/png.latex?t">? From the interpretation of the hazard function, this means for any time interval <img src="https://latex.codecogs.com/png.latex?delta_t">, the probability of failure in that interval is the same regardless of how long the object has survived so far. That is, for any time intervals<img src="https://latex.codecogs.com/png.latex?%5Bt_1,%20t_1%20+%20%5CDelta%20t%5D"> and <img src="https://latex.codecogs.com/png.latex?%5Bt_2,%20t_2%20+%20%5CDelta%20t%5D">, we have the same probability of failure (conditional on survival up to <img src="https://latex.codecogs.com/png.latex?t_1"> and <img src="https://latex.codecogs.com/png.latex?t_2"> respectively). This is a strong assumption, but it leads to a very simple distribution. Let’s derive it.</p>
<p>We know that <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20%5Clambda"> is constant, so we can write the cumulative hazard function as <img src="https://latex.codecogs.com/png.latex?%5CLambda(t)%20=%20%5Cint_0%5Et%20%5Clambda(s)%20ds%20=%20%5Clambda%20t">. Using the relationship between the survival function and the cumulative hazard function, we have <img src="https://latex.codecogs.com/png.latex?S(t)%20=%20e%5E%7B-%5CLambda(t)%7D%20=%20e%5E%7B-%5Clambda%20t%7D">. The Cumulative distribution function (CDF) is then <img src="https://latex.codecogs.com/png.latex?F(t)%20=%201%20-%20S(t)%20=%201%20-%20e%5E%7B-%5Clambda%20t%7D">. The probability density function (PDF) is the derivative of the CDF, which gives us <img src="https://latex.codecogs.com/png.latex?f(t)%20=%20%5Clambda%20e%5E%7B-%5Clambda%20t%7D">. This distribution is known as the Exponential distribution with <strong>rate</strong> parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda"> - one of the most important continuous distributions in probability theory (probably after the normal distribution). The <strong>rate</strong> parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda"> is in fact the rate of failure per unit time. If you didn’t understand why the parameter lambda is called the rate parameter in your previous courses, you should now. We write <img src="https://latex.codecogs.com/png.latex?T%20%5Csim%20%5Ctext%7BExponential%7D(%5Clambda)"> to denote that the random variable <img src="https://latex.codecogs.com/png.latex?T"> follows an Exponential distribution with rate parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda">.</p>
<p>Once we have a distribution, the next natural quantities to compute are the moments of the distribution. It is a simple exercise in integration to show that the mean <img src="https://latex.codecogs.com/png.latex?E%5BT%5D"> of the exponential distribution is <img src="https://latex.codecogs.com/png.latex?1/%5Clambda"> and the variance <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BVar%7D%5BT%5D"> is <img src="https://latex.codecogs.com/png.latex?1/%5Clambda%5E2">. Other quantities related to the moments are the median and the mode. The median is the time <img src="https://latex.codecogs.com/png.latex?t"> such that <img src="https://latex.codecogs.com/png.latex?F(t)%20=%200.5">, which gives us the half-life (the time at which 50% of the systems have failed) <img src="https://latex.codecogs.com/png.latex?t_%7B1/2%7D%20=%20%5Cln(2)/%5Clambda">. Notice that the half-life is inversely proportional to the rate parameter, which is quite intuitive given the physical interpretation of <img src="https://latex.codecogs.com/png.latex?%5Clambda">. The mode is the time at which the PDF is maximized, which for the exponential distribution is at <img src="https://latex.codecogs.com/png.latex?t=0">. The distribution is right skewed- most systems fail early, but there is a long tail of systems that survive for a long time. Here is how the Exponential distribution looks like for different values of <img src="https://latex.codecogs.com/png.latex?%5Clambda">.</p>
<div id="cell-fig-exponential" class="cell" data-execution_count="1">
<div class="cell-output cell-output-display">
<div id="fig-exponential" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-exponential-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://madhavpr191221.github.io/blog/posts/part2-distributions-in-survival-analysis/index_files/figure-html/fig-exponential-output-1.png" width="1718" height="469" class="figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-exponential-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;1: Exponential survival functions for different rate parameters
</figcaption>
</figure>
</div>
</div>
</div>
<p>An important property of the exponential distribution is that it is <strong>memoryless</strong>. What does that mean?</p>
<p>Suppose the system has already survived for time <img src="https://latex.codecogs.com/png.latex?t_0">. What is the probability that it will survive for an additional time <img src="https://latex.codecogs.com/png.latex?t">? Let’s compute this probability. We want <img src="https://latex.codecogs.com/png.latex?P(T%20%3E%20t_0%20+%20t%20%7C%20T%20%3E%20t_0)">. Using the definition of conditional probability, we have: <img src="https://latex.codecogs.com/png.latex?P(T%20%3E%20t_0%20+%20t%20%5Cmid%20T%20%3E%20t_0)%20=%20%5Cfrac%7BP(%5C%7BT%20%3E%20t_0%20+%20t%5C%7D%20%5Ccap%20%5C%7BT%20%3E%20t_0%5C%7D)%7D%7BP(T%20%3E%20t_0)%7D"></p>
<p>Since <img src="https://latex.codecogs.com/png.latex?%5C%7BT%20%3E%20t_0%20+%20t%5C%7D%20%5Csubseteq%20%5C%7BT%20%3E%20t_0%5C%7D"> (because if the system survives for <img src="https://latex.codecogs.com/png.latex?t_0%20+%20t">, it must have survived for <img src="https://latex.codecogs.com/png.latex?t_0">), we can simplify this to:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(T%20%3E%20t_0%20+%20t%20%5Cmid%20T%20%3E%20t_0)%20=%20%5Cfrac%7BP(T%20%3E%20t_0%20+%20t)%7D%7BP(T%20%3E%20t_0)%7D"></p>
<p>Substituting the survival function for the Exponential distribution, we get:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(T%20%3E%20t_0%20+%20t%20%5Cmid%20T%20%3E%20t_0)%20=%20%5Cfrac%7Be%5E%7B-%5Clambda(t_0%20+%20t)%7D%7D%7Be%5E%7B-%5Clambda%20t_0%7D%7D%20=%20e%5E%7B-%5Clambda%20t%7D%20=%20P(T%20%3E%20t)"></p>
<p>What does this mean? It means that the probability of surviving for an additional time <img src="https://latex.codecogs.com/png.latex?t"> does not depend on how long the system has already survived. In other words, the system has no memory of its past survival time. This is a unique property of the exponential distribution and is not shared by any other continuous distribution. In fact, the only memoryless continuous distribution is exponential.</p>
<p>Let’s look at a simple example to illustrate this. Suppose we have a light bulb that has an Exponential lifetime with a rate of <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%200.1"> failures per hour. The mean lifetime of the light bulb is <img src="https://latex.codecogs.com/png.latex?1/%5Clambda%20=%2010"> hours. If the light bulb has already been on for 5 hours, what is the probability that it will last for another 5 hours? Using the memoryless property, we can compute this as:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(T%20%3E%2010%20%7C%20T%20%3E%205)%20=%20P(T%20%3E%205)%20=%20e%5E%7B-0.1%20%5Ccdot%205%7D%20=%20e%5E%7B-0.5%7D%20%5Capprox%200.6065"></p>
<p>This means that even though the light bulb has already lasted for 5 hours, it still has a 60.65% chance of lasting for another 5 hours. Taking this to the extreme, if the light bulb has already lasted for 1000 hours, the probability that it will last for another 5 hours is still 60.65%. This is a direct consequence of the memoryless property of the exponential distribution.</p>
<p>This is clearly unrealistic for most real-world systems. For example, if a machine has been running for 10 years, it is likely to be more prone to failure than a brand new machine. But it makes the exponential distribution a useful starting point for understanding survival analysis and serves as a building block for more complex distributions. When your data looks like it has a constant hazard, the exponential distribution is a good first choice for modeling it.</p>
<p>Next, we will look at what happens when the hazard function is not constant, but instead increases linearly with time. This leads us to the Rayleigh distribution, which has some surprising properties.</p>
</section>
<section id="linearly-increasing-hazard-the-rayleigh-distribution" class="level3">
<h3 class="anchored" data-anchor-id="linearly-increasing-hazard-the-rayleigh-distribution">Linearly Increasing Hazard: The Rayleigh Distribution</h3>
<p>Suppose for a fixed constant <img src="https://latex.codecogs.com/png.latex?%5Clambda%20%3E%200">, the hazard function increases linearly with time as <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20%5Clambda%20t">. This means that the probability of failure in a small time interval <img src="https://latex.codecogs.com/png.latex?%5CDelta%20t"> increases with time. This is a more realistic assumption for many real-world systems, as they tend to wear out over time. Let’s derive the corresponding distribution (if it exists) and explore its properties. The cumulative hazard function is given by <img src="https://latex.codecogs.com/png.latex?%5CLambda(t)%20=%20%5Cint_0%5Et%20%5Clambda(s)%5C,%20ds%20=%0A%5Cint_0%5Et%20%5Clambda%20s%5C,%20ds%20=%20%5Cfrac%7B%5Clambda%20t%5E2%7D%7B2%7D">. Using the relationship between the survival function and the cumulative hazard function, we have <img src="https://latex.codecogs.com/png.latex?S(t)%20=%20e%5E%7B-%5CLambda(t)%7D%20=%0Ae%5E%7B-%5Cfrac%7B%5Clambda%20t%5E2%7D%7B2%7D%7D">. The CDF is then <img src="https://latex.codecogs.com/png.latex?F(t)%20=%201%20-%20S(t)%20=%201%20-%20e%5E%7B-%5Cfrac%7B%5Clambda%20t%5E2%7D%7B2%7D%7D">. The PDF is the derivative of the CDF, which gives us <img src="https://latex.codecogs.com/png.latex?f(t)%20=%20%5Clambda%20t%20e%5E%7B-%5Cfrac%7B%5Clambda%20t%5E2%7D%7B2%7D%7D">. This distribution is known as the Rayleigh distribution, and we write <img src="https://latex.codecogs.com/png.latex?T%20%5Csim%20%5Ctext%7BRayleigh%7D%5Cleft(%5Cfrac%7B1%7D%7B%5Csqrt%7B%5Clambda%7D%7D%5Cright)">. The mean and variance are:</p>
<p><img src="https://latex.codecogs.com/png.latex?E%5BT%5D%20=%20%5Csqrt%7B%5Cfrac%7B%5Cpi%7D%7B2%5Clambda%7D%7D,%20%5Cqquad%20%5Ctext%7BVar%7D%5BT%5D%20=%20%5Cfrac%7B2%7D%7B%5Clambda%7D%5Cleft(1%20-%20%5Cfrac%7B%5Cpi%7D%7B4%7D%5Cright)"></p>
<p>The mode of the Rayleigh distribution is at <img src="https://latex.codecogs.com/png.latex?t%20=%20%5Csqrt%7B%5Cfrac%7B1%7D%7B%5Clambda%7D%7D">, which is the time at which the PDF is maximized. The median can be computed by solving <img src="https://latex.codecogs.com/png.latex?F(t)%20=%200.5">, which gives us <img src="https://latex.codecogs.com/png.latex?t_%7B1/2%7D%20=%20%5Csqrt%7B%5Cfrac%7B2%5Cln(2)%7D%7B%5Clambda%7D%7D">.</p>
<p>As we will see shortly, the Rayleigh distribution is a special case of the Weibull distribution — nature’s way of telling us that linearly increasing hazard and Weibull are secretly the same thing. But before we get there, let’s look at how the Rayleigh distribution looks like for different values of <img src="https://latex.codecogs.com/png.latex?%5Clambda">.</p>
<div id="cell-fig-rayleigh" class="cell" data-execution_count="2">
<div class="cell-output cell-output-display">
<div id="fig-rayleigh" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-rayleigh-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://madhavpr191221.github.io/blog/posts/part2-distributions-in-survival-analysis/index_files/figure-html/fig-rayleigh-output-1.png" width="1718" height="469" class="figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-rayleigh-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;2: Rayleigh distribution: survival function, density, and hazard for different rate parameters λ
</figcaption>
</figure>
</div>
</div>
</div>
<p>Before we move on, let’s take a short but illuminating detour to understand a connection between independent normal random variables and the Rayleigh distribution. Think of a rotating machine (motor, pump, compressor, etc.) that has two independent sources of random vibration in the horizontal(X) and vertical(Y) directions. Let <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> be independent normal random variables with mean 0 and variance <img src="https://latex.codecogs.com/png.latex?%5Csigma%5E2">. The magnitude of the vibration is given by <img src="https://latex.codecogs.com/png.latex?R%20=%20%5Csqrt%7BX%5E2%20+%20Y%5E2%7D">. Can we find the distribution of <img src="https://latex.codecogs.com/png.latex?R">?</p>
<p>To find the distribution of <img src="https://latex.codecogs.com/png.latex?R">, we can use the fact that <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> are independent normal random variables. The joint distribution of <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y"> is given by: <img src="https://latex.codecogs.com/png.latex?f_%7BX,Y%7D(x,y)%20=%20%5Cfrac%7B1%7D%7B2%5Cpi%5Csigma%5E2%7D%20e%5E%7B-%5Cfrac%7Bx%5E2%20+%20y%5E2%7D%7B2%5Csigma%5E2%7D%7D"> (Because the joint distribution of two independent normal random variables is the product of their individual distributions).</p>
<p>To find the distribution of <img src="https://latex.codecogs.com/png.latex?R">, let’s fix a value <img src="https://latex.codecogs.com/png.latex?R%20=%20r_0"> and calculate the probability that <img src="https://latex.codecogs.com/png.latex?R%20%5Cleq%20r_0">. Mathematically, we want to compute <img src="https://latex.codecogs.com/png.latex?P(R%20%5Cleq%20r_0)%20=%20P(%5Csqrt%7BX%5E2%20+%20Y%5E2%7D%20%5Cleq%20r_0)">. Writing this in terms of the joint distribution of <img src="https://latex.codecogs.com/png.latex?X"> and <img src="https://latex.codecogs.com/png.latex?Y">, we have:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(R%20%5Cleq%20r_0)%20=%20%5Ciint_%7Bx%5E2%20+%20y%5E2%20%5Cleq%20r_0%5E2%7D%20f_%7BX,Y%7D(x,y)%20dx%20dy"></p>
<p>By the previous independence expression above, we can write this as:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(R%20%5Cleq%20r_0)%20=%20%5Ciint_%7Bx%5E2%20+%20y%5E2%20%5Cleq%20r_0%5E2%7D%20%5Cfrac%7B1%7D%7B2%5Cpi%5Csigma%5E2%7D%20e%5E%7B-%5Cfrac%7Bx%5E2%20+%20y%5E2%7D%7B2%5Csigma%5E2%7D%7D%20dx%20dy"></p>
<p>Go back to your multivariable calculus notes and recall that the region of integration is a disk of radius <img src="https://latex.codecogs.com/png.latex?r_0"> centered at the origin. It is easier to evaluate this integral in polar coordinates, where <img src="https://latex.codecogs.com/png.latex?x%20=%20r%20%5Ccos(%5Ctheta)"> and <img src="https://latex.codecogs.com/png.latex?y%20=%20r%20%5Csin(%5Ctheta)">. The Jacobian of the transformation from Cartesian to polar coordinates is <img src="https://latex.codecogs.com/png.latex?r">, so we have:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(R%20%5Cleq%20r_0)%20=%20%5Cint_0%5E%7B2%5Cpi%7D%20%5Cint_0%5E%7Br_0%7D%20%5Cfrac%7B1%7D%7B2%5Cpi%5Csigma%5E2%7D%20e%5E%7B-%5Cfrac%7Br%5E2%7D%7B2%5Csigma%5E2%7D%7D%20r%20dr%20d%5Ctheta"></p>
<p>The integrand can be decoupled into a product of a function of <img src="https://latex.codecogs.com/png.latex?r"> and a function of <img src="https://latex.codecogs.com/png.latex?%5Ctheta">, so we can first evaluate the outer integral with respect to <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and get some cancellations:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(R%20%5Cleq%20r_0)%20=%20%5Cint_0%5E%7Br_0%7D%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D%20e%5E%7B-%5Cfrac%7Br%5E2%7D%7B2%5Csigma%5E2%7D%7D%20r%20dr"></p>
<p>Taking the constant <img src="https://latex.codecogs.com/png.latex?%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D"> outside the integral, we have: <img src="https://latex.codecogs.com/png.latex?P(R%20%5Cleq%20r_0)%20=%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D%20%5Cint_0%5E%7Br_0%7D%20r%20e%5E%7B-%5Cfrac%7Br%5E2%7D%7B2%5Csigma%5E2%7D%7D%20dr"></p>
<p>To evaluate this integral, we can use the substitution <img src="https://latex.codecogs.com/png.latex?u%20=%20%5Cfrac%7Br%5E2%7D%7B2%5Csigma%5E2%7D">, which gives us <img src="https://latex.codecogs.com/png.latex?du%20=%20%5Cfrac%7Br%7D%7B%5Csigma%5E2%7D%20dr">. The limits of integration change accordingly: when <img src="https://latex.codecogs.com/png.latex?r%20=%200">, we have <img src="https://latex.codecogs.com/png.latex?u%20=%200">, and when <img src="https://latex.codecogs.com/png.latex?r%20=%20r_0">, we have <img src="https://latex.codecogs.com/png.latex?u%20=%20%5Cfrac%7Br_0%5E2%7D%7B2%5Csigma%5E2%7D">. Substituting these into the integral, we get:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(R%20%5Cleq%20r_0)%20=%20%5Cint_0%5E%7B%5Cfrac%7Br_0%5E2%7D%7B2%5Csigma%5E2%7D%7D%20e%5E%7B-u%7D%20du"></p>
<p>If we set U to be an exponential random variable with rate parameter 1, then the above integral is just the CDF of U evaluated at <img src="https://latex.codecogs.com/png.latex?%5Cfrac%7Br_0%5E2%7D%7B2%5Csigma%5E2%7D">. The CDF of an exponential random variable with rate parameter 1 is given by <img src="https://latex.codecogs.com/png.latex?F_U(u)%20=%201%20-%20e%5E%7B-u%7D"> for <img src="https://latex.codecogs.com/png.latex?u%20%5Cgeq%200">. Substituting <img src="https://latex.codecogs.com/png.latex?u%20=%20%5Cfrac%7Br_0%5E2%7D%7B2%5Csigma%5E2%7D">, we get:</p>
<p><img src="https://latex.codecogs.com/png.latex?F_R(r_0)%20=%201%20-%20e%5E%7B-%5Cfrac%7Br_0%5E2%7D%7B2%5Csigma%5E2%7D%7D"></p>
<p>For a general <img src="https://latex.codecogs.com/png.latex?r">, the CDF of <img src="https://latex.codecogs.com/png.latex?R"> is given by:</p>
<p><img src="https://latex.codecogs.com/png.latex?F_R(r)%20=%201%20-%20e%5E%7B-%5Cfrac%7Br%5E2%7D%7B2%5Csigma%5E2%7D%7D"></p>
<p>The PDF of <img src="https://latex.codecogs.com/png.latex?R"> can be obtained by differentiating the CDF with respect to <img src="https://latex.codecogs.com/png.latex?r">, which gives us:</p>
<p><img src="https://latex.codecogs.com/png.latex?f_R(r)%20=%20%5Cfrac%7Br%7D%7B%5Csigma%5E2%7D%20e%5E%7B-%5Cfrac%7Br%5E2%7D%7B2%5Csigma%5E2%7D%7D"></p>
<p>This is exactly the PDF of a Rayleigh distribution with parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D">. To verify that the hazard rate is indeed linearly increasing, we can compute the hazard function as follows:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Clambda_R(r)%20=%20%5Cfrac%7Bf_R(r)%7D%7BS_R(r)%7D%20=%20%5Cfrac%7B%5Cfrac%7Br%7D%7B%5Csigma%5E2%7D%20e%5E%7B-%5Cfrac%7Br%5E2%7D%7B2%5Csigma%5E2%7D%7D%7D%7Be%5E%7B-%5Cfrac%7Br%5E2%7D%7B2%5Csigma%5E2%7D%7D%7D%20=%20%5Cfrac%7Br%7D%7B%5Csigma%5E2%7D"> which is linearly increasing in <img src="https://latex.codecogs.com/png.latex?r">. Thus, the magnitude of the vibration <img src="https://latex.codecogs.com/png.latex?R"> follows a Rayleigh distribution with parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%20%5Cfrac%7B1%7D%7B%5Csigma%5E2%7D">, and its hazard function is linearly increasing in <img src="https://latex.codecogs.com/png.latex?r">. Before we end this section, let’s stare at the expression for the CDF of <img src="https://latex.codecogs.com/png.latex?R"> for a moment above. Notice that the substitution <img src="https://latex.codecogs.com/png.latex?u%20=%20%5Cfrac%7Br%5E2%7D%7B2%5Csigma%5E2%7D"> we used earlier was not just a computational trick. If we define <img src="https://latex.codecogs.com/png.latex?W%20=%20%5Cfrac%7BR%5E2%7D%7B2%5Csigma%5E2%7D">, then:</p>
<p><img src="https://latex.codecogs.com/png.latex?P(W%20%5Cleq%20w)%20=%20P%5Cleft(%5Cfrac%7BR%5E2%7D%7B2%5Csigma%5E2%7D%20%5Cleq%20w%5Cright)%20=%201%20-%20e%5E%7B-w%7D"></p>
<p>which is exactly the CDF of a standard <img src="https://latex.codecogs.com/png.latex?%5Ctext%7BExponential%7D(1)"> random variable. So <img src="https://latex.codecogs.com/png.latex?W%20=%20%5Cfrac%7BR%5E2%7D%7B2%5Csigma%5E2%7D%20%5Csim%20%5Ctext%7BExponential%7D(1)">, or equivalently <img src="https://latex.codecogs.com/png.latex?R%5E2%20%5Csim%20%5Ctext%7BExponential%7D%5Cleft(%5Cfrac%7B1%7D%7B2%5Csigma%5E2%7D%5Cright)">. Note that you can show this derivation rigorously with the change of variables formula for PDFs but I am not going to do that here. The key takeaway is that the Rayleigh distribution can be obtained by applying a squaring transformation to an exponential random variable. The substitution variable <img src="https://latex.codecogs.com/png.latex?u"> was the exponential random variable all along — the Rayleigh and exponential distributions are secretly the same family, just related by a squaring transformation.</p>
<p>With the Rayleigh distribution under our belt, we now return to the main story. The exponential and Rayleigh are special cases of a more powerful family- one distribution to rule them all, one distribution to find them, one distribution to bring them all and in the darkness bind them. Enter the Weibull.</p>
</section>
<section id="the-weibull-distribution-a-unifying-family-of-distributions" class="level3">
<h3 class="anchored" data-anchor-id="the-weibull-distribution-a-unifying-family-of-distributions">The Weibull Distribution: A Unifying Family of Distributions</h3>
<p>We have seen two specific distributions that arise from particular shapes of the hazard function: the Exponential distribution from a constant hazard and the Rayleigh distribution from a linearly increasing hazard. A natural question is: What if the hazard follows a power law, i.e., <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20%5Clambda%20t%5E%7B%5Cgamma-1%7D"> for some <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%3E%200">? When <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20=%201">, we recover the exponential distribution with a constant hazard. When <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20=%202">, we recover the Rayleigh distribution with a linearly increasing hazard. And for other values of <img src="https://latex.codecogs.com/png.latex?%5Cgamma">, we get an entire family of distributions known as the Weibull distribution. It is the workhorse of survival analysis and reliability engineering- flexible enough to model increasing, decreasing, or constant hazard, and simple enough to be analytically tractable.</p>
<p>Before we explore the Weibull distribution in detail, we must make a change. So far a single parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda"> has controlled both the shape of the hazard (how fast it grows) and the scale of the distribution (how long the system lasts). Mixing two roles into a single parameter is not ideal and makes it harder to understand the effect of each role on the distribution. Think of the normal distribution: if it were parameterized by a single number controlling both the mean and the variance, interpretation would be a nightmare. For the Weibull distribution, we separate these roles by introducing a scale parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20%3E%200">. Making the substitution <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%20%5Cfrac%7B%5Cgamma%7D%7B%5Ctheta%5E%5Cgamma%7D">, the power-law hazard becomes:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20%5Cfrac%7B%5Cgamma%7D%7B%5Ctheta%5E%5Cgamma%7D%20t%5E%7B%5Cgamma%20-%201%7D%20=%20%5Cfrac%7B%5Cgamma%7D%7B%5Ctheta%7D%5Cleft(%5Cfrac%7Bt%7D%7B%5Ctheta%7D%5Cright)%5E%7B%5Cgamma-1%7D"></p>
<p>Now <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> controls the <strong>shape</strong> of the hazard — is it growing, shrinking, or flat? And <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> controls the <strong>scale</strong> — at what timescale are failures happening? Two parameters, two jobs, clean interpretation. We write <img src="https://latex.codecogs.com/png.latex?T%20%5Csim%20%5Ctext%7BWeibull%7D(%5Ctheta,%20%5Cgamma)"> to denote that the random variable <img src="https://latex.codecogs.com/png.latex?T"> follows a Weibull distribution with scale parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> and shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma">. The derivation follows the same pattern as before — integrate the hazard to get <img src="https://latex.codecogs.com/png.latex?%5CLambda(t)">, exponentiate to get <img src="https://latex.codecogs.com/png.latex?S(t)">, and differentiate to get <img src="https://latex.codecogs.com/png.latex?f(t)">. I encourage you to verify this yourself.</p>
<p>Integrating the hazard function, we get the cumulative hazard function: <img src="https://latex.codecogs.com/png.latex?%5CLambda(t)%20=%20%5Cint_0%5Et%20%5Clambda(s)%20ds%20=%20%5Cint_0%5Et%20%5Cfrac%7B%5Cgamma%7D%7B%5Ctheta%7D%5Cleft(%5Cfrac%7Bs%7D%7B%5Ctheta%7D%5Cright)%5E%7B%5Cgamma%20-%201%7D%20ds%20=%20%5Cleft(%5Cfrac%7Bt%7D%7B%5Ctheta%7D%5Cright)%5E%5Cgamma"></p>
<p>The probability density function (PDF) of the Weibull distribution is given by: <img src="https://latex.codecogs.com/png.latex?f(t)%20=%20%5Cfrac%7B%5Cgamma%7D%7B%5Ctheta%7D%5Cleft(%5Cfrac%7Bt%7D%7B%5Ctheta%7D%5Cright)%5E%7B%5Cgamma%20-%201%7D%20e%5E%7B-%5Cleft(%5Cfrac%7Bt%7D%7B%5Ctheta%7D%5Cright)%5E%5Cgamma%7D"></p>
<p>The cumulative distribution function (CDF) is: <img src="https://latex.codecogs.com/png.latex?F(t)%20=%201%20-%20e%5E%7B-%5Cleft(%5Cfrac%7Bt%7D%7B%5Ctheta%7D%5Cright)%5E%5Cgamma%7D"></p>
<p>And finally, the survival function is: <img src="https://latex.codecogs.com/png.latex?S(t)%20=%20e%5E%7B-%5Cleft(%5Cfrac%7Bt%7D%7B%5Ctheta%7D%5Cright)%5E%5Cgamma%7D"></p>
<p>The mean and variance of the Weibull distribution can be expressed in terms of the gamma function <img src="https://latex.codecogs.com/png.latex?%5CGamma(%5Ccdot)"> as follows: <img src="https://latex.codecogs.com/png.latex?E%5BT%5D%20=%20%5Ctheta%20%5CGamma%5Cleft(1%20+%20%5Cfrac%7B1%7D%7B%5Cgamma%7D%5Cright),%20%5Cqquad%20%5Ctext%7BVar%7D%5BT%5D%20=%20%5Ctheta%5E2%20%5Cleft%5B%5CGamma%5Cleft(1%20+%20%5Cfrac%7B2%7D%7B%5Cgamma%7D%5Cright)%20-%20%5Cleft(%5CGamma%5Cleft(1%20+%20%5Cfrac%7B1%7D%7B%5Cgamma%7D%5Cright)%5Cright)%5E2%5Cright%5D"></p>
<p>where the gamma function <img src="https://latex.codecogs.com/png.latex?%5CGamma(z)"> is defined as <img src="https://latex.codecogs.com/png.latex?%5CGamma(z)%20=%20%5Cint_0%5E%5Cinfty%20t%5E%7Bz-1%7D%20e%5E%7B-t%7D%20dt"> for <img src="https://latex.codecogs.com/png.latex?z%20%3E%200">. The mode of the Weibull distribution can be computed by finding the value of <img src="https://latex.codecogs.com/png.latex?t"> that maximizes the PDF, which gives us:</p>
<p><img src="https://latex.codecogs.com/png.latex?t_%7B%5Ctext%7Bmode%7D%7D%20=%20%5Ctheta%20%5Cleft(%5Cfrac%7B%5Cgamma%20-%201%7D%7B%5Cgamma%7D%5Cright)%5E%7B%5Cfrac%7B1%7D%7B%5Cgamma%7D%7D%20%5Cquad%20%5Ctext%7Bfor%20%7D%20%5Cgamma%20%3E%201"></p>
<p>For <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%5Cleq%201">, the PDF is maximized at <img src="https://latex.codecogs.com/png.latex?t%20=%200">, just like the exponential distribution. This makes sense — when the hazard is constant or decreasing, failures are most concentrated near the start.</p>
<p>The median can be computed by solving <img src="https://latex.codecogs.com/png.latex?F(t)%20=%200.5">, which gives us: <img src="https://latex.codecogs.com/png.latex?t_%7B1/2%7D%20=%20%5Ctheta%20(%5Cln(2))%5E%7B%5Cfrac%7B1%7D%7B%5Cgamma%7D%7D"></p>
<p>The scale parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> has a beautiful interpretation: At time <img src="https://latex.codecogs.com/png.latex?t%20=%20%5Ctheta">, the CDF is <img src="https://latex.codecogs.com/png.latex?F(%5Ctheta)%20=%201%20-%20e%5E%7B-1%7D%20%5Capprox%200.632">. This means that regardless of the shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma">, about 63.2% of the systems will have failed by the time we reach the scale parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta">. This is a neat property of the Weibull distribution and gives us an intuitive way to interpret the scale parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta"> — it is the time by which approximately 63.2% of the systems have failed.</p>
<section id="special-cases-of-the-weibull-distribution" class="level4">
<h4 class="anchored" data-anchor-id="special-cases-of-the-weibull-distribution">Special Cases of the Weibull Distribution</h4>
<p>As we have noted earlier, the Weibull distribution is a unifying family- the two distributions we have seen so far are special cases of the Weibull distribution.</p>
<p><strong><img src="https://latex.codecogs.com/png.latex?%5Cgamma%20=%201">: The Exponential Distribution.</strong> When the shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> is equal to 1, the hazard function simplifies to <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20%5Cfrac%7B1%7D%7B%5Ctheta%7D">, which is a constant hazard. But we already know that a constant hazard corresponds to the exponential distribution. Thus a Weibull distribution with <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20=%201"> is equivalent to an exponential distribution with rate parameter <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%20%5Cfrac%7B1%7D%7B%5Ctheta%7D"> or mean <img src="https://latex.codecogs.com/png.latex?E%5BT%5D%20=%20%5Ctheta">. In other words, <img src="https://latex.codecogs.com/png.latex?T%20%5Csim%20%5Ctext%7BWeibull%7D(%5Ctheta,%201)"> is the same as <img src="https://latex.codecogs.com/png.latex?T%20%5Csim%20%5Ctext%7BExponential%7D%5Cleft(%5Cfrac%7B1%7D%7B%5Ctheta%7D%5Cright)">. The physical interpretation of this is that when <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20=%201">, the system has a constant failure rate over time, which is a hallmark of the exponential distribution. Memorylessness (as we have explored earlier) is a direct consequence of constant hazard. Sudden unexpected failures, lightning strikes and random external shocks are examples of phenomena that can be modeled using the exponential distribution.</p>
<p><strong><img src="https://latex.codecogs.com/png.latex?%5Cgamma%20=%202">: The Rayleigh Distribution.</strong> When the shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> is equal to 2, the hazard function simplifies to <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20%5Cfrac%7B2%7D%7B%5Ctheta%5E2%7D%20t">, which is a linearly increasing hazard. We have already seen that a linearly increasing hazard <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20%5Clambda%20t"> corresponds to the Rayleigh distribution. Matching coefficients gives <img src="https://latex.codecogs.com/png.latex?%5Clambda%20=%20%5Cfrac%7B2%7D%7B%5Ctheta%5E2%7D">, so <img src="https://latex.codecogs.com/png.latex?T%20%5Csim%20%5Ctext%7BWeibull%7D(%5Ctheta,%202)"> is the same as <img src="https://latex.codecogs.com/png.latex?T%20%5Csim%20%5Ctext%7BRayleigh%7D%5Cleft(%5Cfrac%7B2%7D%7B%5Ctheta%5E2%7D%5Cright)">. The physical interpretation is that when <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20=%202">, the system has a failure rate that increases linearly over time — the older it gets, the more dangerous the next instant. Wear-out failures, fatigue in materials, and aging processes are phenomena that can be modeled using the Rayleigh distribution.</p>
<p><strong><img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%3C%201">: Decreasing Hazard.</strong> When the shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> is less than 1, the hazard function decreases over time. This models systems with high early failure rates that stabilize over time — think of manufacturing defects that cause early failures, but surviving units are essentially robust. This is sometimes called <strong>infant mortality</strong>. As we will see in the next section, decreasing hazard is just one phase of a richer failure pattern known as the <strong>bathtub curve</strong>.</p>
<p><strong><img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%3E%201">: Increasing Hazard.</strong> The hazard function increases over time. This models systems that wear out — the longer they run, the more likely they are to fail. This is the most common scenario in reliability engineering.</p>
<p>Let us look at how the Weibull distribution looks like for different values of <img src="https://latex.codecogs.com/png.latex?%5Cgamma"> and a fixed scale parameter <img src="https://latex.codecogs.com/png.latex?%5Ctheta%20=%201">.</p>
<div id="cell-fig-weibull" class="cell" data-execution_count="3">
<div class="cell-output cell-output-display">
<div id="fig-weibull" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-weibull-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://madhavpr191221.github.io/blog/posts/part2-distributions-in-survival-analysis/index_files/figure-html/fig-weibull-output-1.png" width="1718" height="469" class="figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-weibull-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;3: Weibull distribution: effect of shape parameter γ on survival function, density, and hazard (θ = 1)
</figcaption>
</figure>
</div>
</div>
</div>
</section>
</section>
<section id="the-bathtub-curve-a-common-failure-pattern-in-real-life" class="level3">
<h3 class="anchored" data-anchor-id="the-bathtub-curve-a-common-failure-pattern-in-real-life">The Bathtub Curve: A Common Failure Pattern in Real Life</h3>
<p>Unfortunately, real-world failures are often more complex than what can be captured by (a single) parametric distribution like the Weibull. One common pattern observed in many systems is the <strong>bathtub curve</strong>, which describes a failure rate that has three distinct phases:</p>
<ol type="1">
<li><p><strong>Phase 1: Infant Mortality (Decreasing Hazard, <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%3C%201">).</strong>: In the early life of a system, there is a high failure rate due to manufacturing defects, installation errors or weak components. Systems that survive this phase are typically more robust and have a lower failure rate.</p></li>
<li><p><strong>Phase 2: Useful Life (Constant Hazard, <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%5Capprox%201">).</strong>: After the initial phase, the failure rate stabilizes and remains relatively constant. This is the “useful life” phase where failures are mostly random and not due to wear-out. This is the exponential regime- memorylessness is a good approximation here.</p></li>
<li><p><strong>Phase 3: Wear-Out (Increasing Hazard, <img src="https://latex.codecogs.com/png.latex?%5Cgamma%20%3E%201">).</strong>: As the system ages, components wear out and the failure rate increases. This is the wear-out phase where failures are more likely to occur as time goes on. The older the system, the more dangerous the next instant.</p></li>
</ol>
<p>Here is a schematic of the bathtub curve:</p>
<div id="cell-fig-bathtub" class="cell" data-execution_count="4">
<div class="cell-output cell-output-display">
<div id="fig-bathtub" class="quarto-float quarto-figure quarto-figure-center anchored">
<figure class="quarto-float quarto-float-fig figure">
<div aria-describedby="fig-bathtub-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
<img src="https://madhavpr191221.github.io/blog/posts/part2-distributions-in-survival-analysis/index_files/figure-html/fig-bathtub-output-1.png" width="852" height="374" class="figure-img">
</div>
<figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-bathtub-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
Figure&nbsp;4: The bathtub curve: three phases of failure
</figcaption>
</figure>
</div>
</div>
</div>
<p>The Weibull distribution can model each phase of the bathtub curve individually by adjusting the shape parameter <img src="https://latex.codecogs.com/png.latex?%5Cgamma">. However, a single Weibull cannot capture all three phases simultaneously — the hazard can only be monotonically increasing, decreasing, or constant. To model the full bathtub curve, reliability engineers often use a mixture of Weibull distributions or more flexible models that allow the hazard to change shape over time. This is an active area of research in reliability engineering and survival analysis, and we will revisit it in later parts of this series.</p>
</section>
<section id="whats-next" class="level3">
<h3 class="anchored" data-anchor-id="whats-next">What’s Next?</h3>
<p>We now have a vocabulary of distributions to model different shapes of hazard functions. The exponential distribution for constant hazard, the Rayleigh distribution for linearly increasing hazard, and the Weibull distribution for power-law hazard. We have also seen how the Weibull distribution unifies these special cases and provides a flexible framework for modeling a wide range of failure patterns. But a distribution is unfortunately not a model. In <strong>Part 3</strong>, we ask- given a dataset of failure times (some censored, some not), how do we fit a distribution to the data? How do we estimate the parameters of the distribution? We will extend the maximum likelihood estimation framework to handle censored observations — the key ingredient that makes survival analysis different from standard statistical inference. We will review and revisit familiar diagnostic tools like the Q-Q plot and the K-S test to assess the quality of our fits. We will also write Python code to fit these distributions to real (synthetic) data using the library <code>lifelines</code>. Make no mistake- the series is still very mathematical but we will also have plenty of code and practical examples to keep things grounded. The clock is still ticking.</p>
</section>
<section id="a-note-on-agentic-predictive-maintenance" class="level3">
<h3 class="anchored" data-anchor-id="a-note-on-agentic-predictive-maintenance">A Note on Agentic Predictive Maintenance</h3>
<p>The mathematical framework we are building in this series is not purely academic. Modern predictive maintenance systems are increasingly or striving to be agentic — AI agents that continuously monitor machine health, estimate survival probabilities in real time, and autonomously trigger maintenance actions before failures occur. The survival function <img src="https://latex.codecogs.com/png.latex?S(t)"> and the hazard function <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)"> are the core quantities these agents reason about. An agent that knows a machine’s hazard rate is spiking can schedule maintenance, reroute workloads, or escalate to a human operator — all without being explicitly programmed for every failure scenario. We will dedicate a future part of this series to this intersection of survival analysis and agentic AI. For now, keep this application in the back of your mind as we build the mathematical foundations. And no — the mathematics is not going anywhere. Every agent decision we discuss will be grounded in the theory and proofs we are building right now.</p>


</section>
</section>

 ]]></description>
  <category>survival analysis</category>
  <category>statistics</category>
  <category>python</category>
  <guid>https://madhavpr191221.github.io/blog/posts/part2-distributions-in-survival-analysis/</guid>
  <pubDate>Fri, 17 Apr 2026 18:30:00 GMT</pubDate>
</item>
<item>
  <title>Part 1: Introduction to Survival Analysis</title>
  <dc:creator>Madhav Prashanth Ramachandran</dc:creator>
  <link>https://madhavpr191221.github.io/blog/posts/part-1-why-survival-analysis-exists/</link>
  <description><![CDATA[ 





<div class="callout callout-style-default callout-note callout-titled">
<div class="callout-header d-flex align-content-center">
<div class="callout-icon-container">
<i class="callout-icon"></i>
</div>
<div class="callout-title-container flex-fill">
Note
</div>
</div>
<div class="callout-body-container callout-body">
<p><strong>A note on platform and process.</strong> Part 0 of this series was published on <a href="https://medium.com/@madhavwas/a-survival-guide-to-survival-analysis-1ed6faaf8fea">Medium</a>. Starting from Part 1, I have moved to Quarto on GitHub Pages for one reason: proper LaTeX rendering. A series this mathematical deserves a math-native home.</p>
<p>This article is not AI generated. I used Claude for proofreading, LaTeX syntax, and occasional structural feedback. Every derivation, example, and word is mine.</p>
</div>
</div>
<p>Time = 0 has passed. There’s a machine fresh off the assembly line, there’s a patient that got a second shot at life after a surgery, there’s a customer who just signed up with your services. The question is not whether — machines fail, patients die, customers leave. The question is <strong>when</strong>, and what we can say about that when, given everything we know. The branch of statistics that deals with answering this and other related questions is called survival analysis.</p>
<p>If you are trained in basic statistics and machine learning methods, your first instinct would probably be to reach for regression. Time is a continuous variable and (linear) regression — the workhorse of statistics — is the best fit for dealing with continuous variables. Right? No.&nbsp;Wrong. Here are the reasons regression breaks.</p>
<p><strong>1.</strong> Regression does not place any constraints on the range of the response variable whereas the time to failure is always non-negative.</p>
<p><strong>2.</strong> In the linear regression setup, the response variable is an estimate of its conditional expectation given the covariates, i.e <img src="https://latex.codecogs.com/png.latex?E%5BY%20%5Cmid%20X%5D">. For survival analysis, conditional expectation is sometimes not the right quantity to estimate. In a clinical setting, you might care about “What fraction of patients survive beyond 10 years post surgery?” In a predictive maintenance setting, an engineer might ask “At what rate are machines failing after 10,000 hours of operation?” These are fundamentally different questions from “what is the average time to failure given these covariates?” — and regression, by construction, can only answer the latter.</p>
<p><strong>3.</strong> And here comes the most important part. Imagine you are studying the time-to-failure of 10 machines. You run the study for 10,000 hours and stop. At the end of your study, 6 machines have failed and you know their exact failure times. The 4 machines that survived are, well, still running and all you know is that their survival time is greater than 10,000. If you naively regress on the six machines that failed and discard the ones that did not, your estimate of the time to failure is biased downward — because the machines you discarded are precisely the most durable ones in your sample. There is a name for this situation which we will get to very shortly.</p>
<section id="the-setup" class="level2">
<h2 class="anchored" data-anchor-id="the-setup">The Setup</h2>
<p>From here on, let’s fix our language. We’ll talk about the lifetime <img src="https://latex.codecogs.com/png.latex?T"> of a system — a machine, a patient, a customer — and the results will be general enough to apply to all of them.</p>
<p>Let <img src="https://latex.codecogs.com/png.latex?T"> be the lifetime of a system — a non-negative continuous random variable with Probability Density Function (PDF) <img src="https://latex.codecogs.com/png.latex?f"> and Cumulative Distribution Function (CDF) <img src="https://latex.codecogs.com/png.latex?F">, which is the probability that the system’s lifetime is at most <img src="https://latex.codecogs.com/png.latex?t"> units. <img src="https://latex.codecogs.com/png.latex?%0AF(t)%20=%20P(T%20%5Cleq%20t)%0A"></p>
<p>The Survival Function of <img src="https://latex.codecogs.com/png.latex?T">, denoted <img src="https://latex.codecogs.com/png.latex?S(t)">, is the probability that the system survives at least <img src="https://latex.codecogs.com/png.latex?t"> units of time:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AS(t)%20=%20P(T%20%3E%20t)%0A"></p>
<p>The CDF and the Survival Function are related by the following simple relationship:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AS(t)%20+%20F(t)%20=%201%0A"></p>
</section>
<section id="censoring" class="level2">
<h2 class="anchored" data-anchor-id="censoring">Censoring</h2>
<p>Censoring is the reason survival analysis exists as a separate field of statistics. It is a situation where the exact time-to-event is unknown for the subject. All we know is that the failure had not occurred by the time we stopped observing — either because the study ended, or the subject was lost to follow-up. Going back to our machine example, the four machines that are still running after 10,000 hours are censored machines.</p>
<p>Censoring has three types.</p>
<ol type="1">
<li><strong>Right censoring</strong> — the most common. The study ends before the event of interest occurs. The four machines that are still running after 10,000 hours are right censored.</li>
<li><strong>Left censoring</strong> — The failure happened before you started observing. You inspect a machine for the first time and find it has already failed — you know failure occurred, but not when. Or a patient comes in with a disease that has already progressed to a certain stage — you know the disease started but not when.</li>
<li><strong>Interval censoring</strong> — You don’t know when the system failed but it happened between two inspection times.</li>
</ol>
<p>We will work with right censored data for pretty much the entire 12 part series.</p>
<blockquote class="blockquote">
<p><strong>Censoring vs.&nbsp;Missing Data.</strong> A missing value tells you nothing about a variable whereas a censored value conveys partial but concrete information. A machine that survived beyond 10,000 hours tells you exactly that — it lasted <em>at least</em> that long. That lower bound is real, and throwing it away is a statistical crime.</p>
</blockquote>
<p>Here is a formal mathematical setup of right censored data. We have <img src="https://latex.codecogs.com/png.latex?n"> independent and identically distributed samples of the form <img src="https://latex.codecogs.com/png.latex?(Y_i,%20%5Cdelta_i)"> where <img src="https://latex.codecogs.com/png.latex?Y_i%20=%20%5Cmin(T_i,%20C_i)"> and <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i%20=%20%5Cmathbb%7B1%7D%5C%7BT_i%20%5Cleq%20C_i%5C%7D">. Here <img src="https://latex.codecogs.com/png.latex?T_i"> is the lifetime of the <img src="https://latex.codecogs.com/png.latex?i">-th system and <img src="https://latex.codecogs.com/png.latex?C_i"> is the censoring time for that system. <img src="https://latex.codecogs.com/png.latex?%5Cdelta"> is an indicator variable which is 1 if failure happened before censoring, 0 otherwise. <img src="https://latex.codecogs.com/png.latex?Y_i"> is the observed time, which is the minimum of the true lifetime and the censoring time.</p>
<p>You never observe <img src="https://latex.codecogs.com/png.latex?T_i"> and <img src="https://latex.codecogs.com/png.latex?C_i"> separately — only their minimum and whether the event got there first. Here’s what you know about the <img src="https://latex.codecogs.com/png.latex?i">-th system based on the observed data:</p>
<ul>
<li>If <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i%20=%201">, failure happened before the study ended. You know exactly when.</li>
<li>If <img src="https://latex.codecogs.com/png.latex?%5Cdelta_i%20=%200">, the unit was still alive when you stopped watching. You only know that <img src="https://latex.codecogs.com/png.latex?T_i%20%3E%20Y_i">.</li>
</ul>
<p>Here is what a survival analysis dataset looks like in the real world. Each row is one machine — you never see <img src="https://latex.codecogs.com/png.latex?T"> and <img src="https://latex.codecogs.com/png.latex?C"> separately, only their minimum and whether the event occurred.</p>
<table class="caption-top table">
<caption>Survival data for 10 machines. <img src="https://latex.codecogs.com/png.latex?%5Cdelta%20=%201"> indicates failure observed; <img src="https://latex.codecogs.com/png.latex?%5Cdelta%20=%200"> indicates censoring.</caption>
<thead>
<tr class="header">
<th style="text-align: center;">Machine ID</th>
<th style="text-align: center;"><img src="https://latex.codecogs.com/png.latex?Y%20=%20%5Cmin(T,%20C)"> (hours)</th>
<th style="text-align: center;"><img src="https://latex.codecogs.com/png.latex?%5Cdelta"></th>
</tr>
</thead>
<tbody>
<tr class="odd">
<td style="text-align: center;">M01</td>
<td style="text-align: center;">2,341</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="even">
<td style="text-align: center;">M02</td>
<td style="text-align: center;">10,000</td>
<td style="text-align: center;">0</td>
</tr>
<tr class="odd">
<td style="text-align: center;">M03</td>
<td style="text-align: center;">7,823</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="even">
<td style="text-align: center;">M04</td>
<td style="text-align: center;">10,000</td>
<td style="text-align: center;">0</td>
</tr>
<tr class="odd">
<td style="text-align: center;">M05</td>
<td style="text-align: center;">1,205</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="even">
<td style="text-align: center;">M06</td>
<td style="text-align: center;">9,441</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="odd">
<td style="text-align: center;">M07</td>
<td style="text-align: center;">10,000</td>
<td style="text-align: center;">0</td>
</tr>
<tr class="even">
<td style="text-align: center;">M08</td>
<td style="text-align: center;">4,678</td>
<td style="text-align: center;">1</td>
</tr>
<tr class="odd">
<td style="text-align: center;">M09</td>
<td style="text-align: center;">10,000</td>
<td style="text-align: center;">0</td>
</tr>
<tr class="even">
<td style="text-align: center;">M10</td>
<td style="text-align: center;">6,102</td>
<td style="text-align: center;">1</td>
</tr>
</tbody>
</table>
</section>
<section id="the-hazard-function" class="level2">
<h2 class="anchored" data-anchor-id="the-hazard-function">The Hazard Function</h2>
<p>We have established that <img src="https://latex.codecogs.com/png.latex?S(t)"> tells us the probability of surviving past time <img src="https://latex.codecogs.com/png.latex?t">. But consider a different and more pointed question: given that a system has already survived until time <img src="https://latex.codecogs.com/png.latex?t_0">, how likely is it to fail in the next small instant?</p>
<p>Let <img src="https://latex.codecogs.com/png.latex?%5CDelta%20t"> be a small time interval. The probability that a system which has survived at least <img src="https://latex.codecogs.com/png.latex?t_0"> units of time fails in <img src="https://latex.codecogs.com/png.latex?%5Bt_0,%20t_0%20+%20%5CDelta%20t%5D"> is, by the definition of conditional probability:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AP(t_0%20%5Cleq%20T%20%5Cleq%20t_0%20+%20%5CDelta%20t%20%5Cmid%20T%20%3E%20t_0)%20=%20%5Cfrac%7BP(t_0%20%3C%20T%20%5Cleq%20t_0%20+%20%5CDelta%20t)%7D%7BP(T%20%3E%20t_0)%7D%20=%20%5Cfrac%7BF(t_0%20+%20%5CDelta%20t)%20-%20F(t_0)%7D%7BS(t_0)%7D%0A"></p>
<p>Let us call this quantity <img src="https://latex.codecogs.com/png.latex?G(t_0)">. Dividing both sides by <img src="https://latex.codecogs.com/png.latex?%5CDelta%20t">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7BG(t_0)%7D%7B%5CDelta%20t%7D%20=%20%5Cfrac%7B1%7D%7BS(t_0)%7D%20%5Ccdot%20%5Cfrac%7BF(t_0%20+%20%5CDelta%20t)%20-%20F(t_0)%7D%7B%5CDelta%20t%7D%0A"></p>
<p>Now, where have you seen an expression of the form <img src="https://latex.codecogs.com/png.latex?(f(a%20+%20h)%20-%20f(a))%20/%20h"> where <img src="https://latex.codecogs.com/png.latex?h"> is small? That’s right — calculus. As <img src="https://latex.codecogs.com/png.latex?%5CDelta%20t%20%5Cto%200">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7BF(t_0%20+%20%5CDelta%20t)%20-%20F(t_0)%7D%7B%5CDelta%20t%7D%20%5Cto%20f(t_0)%0A"></p>
<p>because the derivative of the CDF is the PDF. So in the limit:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Clim_%7B%5CDelta%20t%20%5Cto%200%7D%20%5Cfrac%7BG(t_0)%7D%7B%5CDelta%20t%7D%20=%20%5Cfrac%7Bf(t_0)%7D%7BS(t_0)%7D%0A"></p>
<p>This quantity has a name. It is the <strong>hazard function</strong> of <img src="https://latex.codecogs.com/png.latex?T"> evaluated at <img src="https://latex.codecogs.com/png.latex?t_0">, denoted <img src="https://latex.codecogs.com/png.latex?%5Clambda(t_0)">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cboxed%7B%5Clambda(t_0)%20=%20%5Cfrac%7Bf(t_0)%7D%7BS(t_0)%7D%7D%0A"></p>
<p>Now I know what you’re thinking. The math is straightforward but why in the world would I care about this silly expression?</p>
<section id="what-does-lambdat-actually-mean" class="level3">
<h3 class="anchored" data-anchor-id="what-does-lambdat-actually-mean">What does <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)"> actually mean?</h3>
<p>Recall that for a continuous random variable, probabilities over small intervals can be approximated using the PDF:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0AP(t_0%20%5Cleq%20T%20%5Cleq%20t_0%20+%20%5CDelta%20t)%20=%20%5Cint_%7Bt_0%7D%5E%7Bt_0%20+%20%5CDelta%20t%7D%20f(x)%5C,%20dx%20%5Capprox%20f(t_0)%20%5Ccdot%20%5CDelta%20t%0A"></p>
<p>So the conditional probability of failure in <img src="https://latex.codecogs.com/png.latex?%5Bt_0,%20t_0%20+%20%5CDelta%20t%5D"> becomes:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cfrac%7BP(t_0%20%5Cleq%20T%20%5Cleq%20t_0%20+%20%5CDelta%20t)%7D%7BS(t_0)%7D%20%5Capprox%20%5Cfrac%7Bf(t_0)%20%5Ccdot%20%5CDelta%20t%7D%7BS(t_0)%7D%20=%20%5Clambda(t_0)%20%5Ccdot%20%5CDelta%20t%0A"></p>
<p>This is the key insight. <img src="https://latex.codecogs.com/png.latex?%5Clambda(t_0)%20%5Ccdot%20%5CDelta%20t"> is approximately the conditional probability that a machine which has survived until <img src="https://latex.codecogs.com/png.latex?t_0"> will fail in the next <img src="https://latex.codecogs.com/png.latex?%5CDelta%20t"> hours. This is why <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)"> is called the <strong>rate of failure per unit time</strong> — it tells you, at every moment, how risky the next instant is for a system that has made it this far.</p>
<p>Let us clear any confusion between the probability density function <img src="https://latex.codecogs.com/png.latex?f(t)"> and the hazard function <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)">. The probability density function <img src="https://latex.codecogs.com/png.latex?f(t)"> is the <strong>unconditional density</strong> of failure at <img src="https://latex.codecogs.com/png.latex?t">. <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)"> conditions on survival. To make it more explicit, <img src="https://latex.codecogs.com/png.latex?f(t)"> is a probability <em>density</em>, not a probability. <img src="https://latex.codecogs.com/png.latex?f(5000)%20=%200.0001"> does not mean “the probability of failure at exactly 5000 hours is 0.0001.” It means the probability of failure in a small interval around 5000 hours is approximately <img src="https://latex.codecogs.com/png.latex?f(5000)%20%5Ccdot%20%5CDelta%20t%20=%0A0.0001%20%5Ccdot%20%5CDelta%20t">. Here is a concrete example to illustrate the difference.</p>
<p>Consider two machines — call them <strong>Machine A</strong> and <strong>Machine B</strong>. At <img src="https://latex.codecogs.com/png.latex?t_0%20=%205000"> hours, both have the same unconditional failure density: <img src="https://latex.codecogs.com/png.latex?f(5000)%20=%200.0001">. A naive reading suggests they are equally “failure-prone” at this moment. They are not.</p>
<ul>
<li><strong>Machine A</strong> has <img src="https://latex.codecogs.com/png.latex?S(5000)%20=%200.9"> — 90% of such machines survive to 5000 hours. Reaching this point is unremarkable.</li>
<li><strong>Machine B</strong> has <img src="https://latex.codecogs.com/png.latex?S(5000)%20=%200.1"> — only 10% of such machines survive to 5000 hours. This machine is a true survivor against the odds.</li>
</ul>
<p>Their hazard rates at <img src="https://latex.codecogs.com/png.latex?t_0%20=%205000">:</p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Clambda_A(5000)%20=%20%5Cfrac%7Bf(5000)%7D%7BS_A(5000)%7D%20=%20%5Cfrac%7B0.0001%7D%7B0.9%7D%20%5Capprox%200.000111%0A"></p>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Clambda_B(5000)%20=%20%5Cfrac%7Bf(5000)%7D%7BS_B(5000)%7D%20=%20%5Cfrac%7B0.0001%7D%7B0.1%7D%20=%200.001%0A"></p>
<p>Machine B’s hazard rate is <strong>9 times higher</strong> than Machine A’s at the exact same moment, despite having the same <img src="https://latex.codecogs.com/png.latex?f(5000)">. The reason is simple: Machine B has survived longer than 90% of its kind. The few that remain are under severe stress — and the hazard function knows this. <img src="https://latex.codecogs.com/png.latex?f(t)"> does not. This is the power of conditioning on survival. The hazard function carries information that the PDF simply cannot.</p>
</section>
<section id="another-way-to-write-lambdat-an-introduction-to-the-cumulative-hazard-function" class="level3">
<h3 class="anchored" data-anchor-id="another-way-to-write-lambdat-an-introduction-to-the-cumulative-hazard-function">Another way to write <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)">, an introduction to the cumulative hazard function</h3>
<p>The hazard function can also be expressed in terms of the survival function. Starting from the definition: <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20%5Cfrac%7Bf(t)%7D%7BS(t)%7D%0A"> We can rewrite the PDF in terms of the survival function:</p>
<p>As we know, the survival function is related to the CDF by <img src="https://latex.codecogs.com/png.latex?S(t)%20=%201%20-%20F(t)">. Differentiating both sides with respect to <img src="https://latex.codecogs.com/png.latex?t"> gives us <img src="https://latex.codecogs.com/png.latex?f(t)%20=%20-%5Cfrac%7Bd%7D%7Bdt%7DS(t)"></p>
<p>Substituting this into the hazard function: <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20%5Cfrac%7B-%5Cfrac%7Bd%7D%7Bdt%7DS(t)%7D%7BS(t)%7D%20=%20-%5Cfrac%7B1%7D%7BS(t)%7D%20%5Ccdot%20%5Cfrac%7Bd%7D%7Bdt%7DS(t)"> This can be rewritten as: <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)%20=%20-%5Cfrac%7Bd%7D%7Bdt%7D%5Clog%20S(t)"></p>
<p>This expression shows that the hazard function is the negative derivative of the logarithm of the survival function. Integrating both sides with respect to <img src="https://latex.codecogs.com/png.latex?t"> gives us:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cboxed%7B%5CLambda(t)%20=%20%5Cint_0%5Et%20%5Clambda(u)%5C,du%20=%20-%5Clog%20S(t)%7D"></p>
<p>This integral is known as the <strong>cumulative hazard function</strong>, often denoted by <img src="https://latex.codecogs.com/png.latex?%5CLambda(t)"> or <img src="https://latex.codecogs.com/png.latex?H(t)">. This relationship between the cumulative hazard function and the survival function is fundamental in survival analysis. We can express the survival function in terms of the cumulative hazard function as follows:</p>
<p><img src="https://latex.codecogs.com/png.latex?%5Cboxed%7BS(t)%20=%20e%5E%7B-%5CLambda(t)%7D%7D"></p>
</section>
<section id="what-does-the-cumulative-hazard-function-lambdat-mean" class="level3">
<h3 class="anchored" data-anchor-id="what-does-the-cumulative-hazard-function-lambdat-mean">What does the cumulative hazard function <img src="https://latex.codecogs.com/png.latex?%5CLambda(t)"> mean?</h3>
<p>The cumulative hazard function <img src="https://latex.codecogs.com/png.latex?%5CLambda(t)"> can be interpreted as the accumulated risk of failure or death up to time <img src="https://latex.codecogs.com/png.latex?t"> since the start of observation. Think of it as a “risk score” that increases over time. The higher <img src="https://latex.codecogs.com/png.latex?%5CLambda(t)"> is, the more likely the system is to fail by time <img src="https://latex.codecogs.com/png.latex?t">.</p>
<p>However, the most important consequence of the relationship between the cumulative hazard and the survival function is that the probability of survival decays exponentially with the cumulative hazard. This means that if the cumulative hazard increases linearly over time, the survival function will decay exponentially. If the cumulative hazard increases more rapidly, the survival function will decay even faster. This exponential relationship is the bridge between the hazard function (which can be estimated from data as we will see in the later sections) and the survival function (which is often what engineers or clinicians care about).</p>
</section>
<section id="the-unified-framework-of-survival-analysis" class="level3">
<h3 class="anchored" data-anchor-id="the-unified-framework-of-survival-analysis">The Unified Framework of Survival Analysis</h3>
<p>We have met three key functions in survival analysis: the PDF <img src="https://latex.codecogs.com/png.latex?f(t)">, the survival function <img src="https://latex.codecogs.com/png.latex?S(t)">, and the hazard function <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)">. These functions are not independent of each other. Know any one of them — you know them all.</p>
<p><img src="https://latex.codecogs.com/png.latex?f(t)%20%5Clongleftrightarrow%20S(t)%20%5Clongleftrightarrow%20%5CLambda(t)%20%5Clongleftrightarrow%20%5Clambda(t)"></p>
<p>Here’s how:</p>
<p><strong>From <img src="https://latex.codecogs.com/png.latex?f(t)">:</strong> <img src="https://latex.codecogs.com/png.latex?S(t)%20=%201%20-%20%5Cint_0%5Et%20f(u)%5C,du,%20%5Cqquad%20%5Clambda(t)%20=%20%5Cfrac%7Bf(t)%7D%7BS(t)%7D"></p>
<p><strong>From <img src="https://latex.codecogs.com/png.latex?S(t)">:</strong> <img src="https://latex.codecogs.com/png.latex?f(t)%20=%20-%5Cfrac%7Bd%7D%7Bdt%7DS(t),%20%5Cqquad%20%5Clambda(t)%20=%20-%5Cfrac%7Bd%7D%7Bdt%7D%5Clog%20S(t)"></p>
<p><strong>From <img src="https://latex.codecogs.com/png.latex?%5Clambda(t)">:</strong> <img src="https://latex.codecogs.com/png.latex?%5CLambda(t)%20=%20%5Cint_0%5Et%20%5Clambda(u)%5C,du,%20%5Cqquad%20S(t)%20=%20e%5E%7B-%5CLambda(t)%7D,%20%5Cqquad%20f(t)%20=%20%5Clambda(t)%5Ccdot%20e%5E%7B-%5CLambda(t)%7D"></p>
<p>In survival analysis, we can choose to work with any of these functions depending on the context and the question at hand. The Cox Proportional Hazards model, for example, is a regression model that directly models the hazard function. The Kaplan-Meier estimator is a non-parametric estimator of the survival function. The choice of which function to work with is often guided by the nature of the data and the specific research question being addressed.</p>
</section>
<section id="whats-next" class="level3">
<h3 class="anchored" data-anchor-id="whats-next">What’s next?</h3>
<p>We now have the language of survival analysis. We know what the key functions are and how they relate to each other. In the next part, we will meet some commonly used distributions of survival analysis and how they are characterized purely by their hazard functions. We will start with the simplest one- the distribution with a constant hazard function, which is the well known <strong>Exponential distribution</strong>. We will understand the memorylessness property of the exponential distribution and its consequences. We will then move on to the linearly increasing hazard function- the <strong>Rayleigh distribution</strong>, and the linearly decreasing hazard function, which is the <strong>Pareto distribution</strong>. We will see how the shape of the hazard function dictates the shape of the survival curve and what that means in real life. And finally, we will meet the workhorse of survival analysis and reliability engineering — the <strong>Weibull distribution</strong>, which can model a wide variety of hazard shapes and is used in a huge range of applications. We will also see how to estimate the parameters of these distributions from data and how to use them for prediction. The clock is still ticking. See you in Part 2.</p>


</section>
</section>

 ]]></description>
  <category>survival analysis</category>
  <category>statistics</category>
  <guid>https://madhavpr191221.github.io/blog/posts/part-1-why-survival-analysis-exists/</guid>
  <pubDate>Fri, 03 Apr 2026 18:30:00 GMT</pubDate>
  <media:content url="https://madhavpr191221.github.io/blog/posts/part-1-why-survival-analysis-exists/cover.png" medium="image" type="image/png" height="86" width="144"/>
</item>
</channel>
</rss>
