modified

Demi-wlw · Jul 28, 2024 · 8910fa7 · 8910fa7
1 parent 0525bf5
commit 8910fa7
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/_posts/2023-03-19-ChatGPT.md b/_posts/2023-03-19-ChatGPT.md
@@ -92,9 +92,9 @@ GPT has been a major breakthrough in natural language processing and the version
 The term _generative pre-training_ represents the unsupervised pre-training of the generative model.<d-footnote>They used a multi-layer Transformer decoder to produce an output distribution over target tokens.</d-footnote> Given an unsupervised corpus of tokens $\mathcal{U} = (u_1,\dots,u_n)$, they use a standard language modelling objective to maximize the following likelihood:
 {: .text-justify}
 
-$
+$$
 L_1(\mathcal{U})=\sum_i\log P(u_i\mid u_{i-k},\dots,u_{i-1};\Theta)
-$
+$$
 
 where $k$ is the size of the context window, and the conditional probability $P$ is modelled using a neural network with parameters $\Theta$ trained using stochastic gradient descent. **Intuitively, we train the Transformer-based model to predict the next token within the $k$-context window using unlabeled text from which we also extract the latent features $h$.**
 {: .text-justify}