From 8910fa7a791f1d4373a03f4d8e5db67bd4022231 Mon Sep 17 00:00:00 2001
From: Demi-wlw <88136271+Demi-wlw@users.noreply.github.com>
Date: Sun, 28 Jul 2024 19:27:41 +0100
Subject: [PATCH] modified

---
 _posts/2023-03-19-ChatGPT.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/_posts/2023-03-19-ChatGPT.md b/_posts/2023-03-19-ChatGPT.md
index f565fc6bc271..8db98592bde4 100644
--- a/_posts/2023-03-19-ChatGPT.md
+++ b/_posts/2023-03-19-ChatGPT.md
@@ -92,9 +92,9 @@ GPT has been a major breakthrough in natural language processing and the version
 The term _generative pre-training_ represents the unsupervised pre-training of the generative model.<d-footnote>They used a multi-layer Transformer decoder to produce an output distribution over target tokens.</d-footnote> Given an unsupervised corpus of tokens $\mathcal{U} = (u_1,\dots,u_n)$, they use a standard language modelling objective to maximize the following likelihood:
 {: .text-justify}
 
-$
+$$
 L_1(\mathcal{U})=\sum_i\log P(u_i\mid u_{i-k},\dots,u_{i-1};\Theta)
-$
+$$
 
 where $k$ is the size of the context window, and the conditional probability $P$ is modelled using a neural network with parameters $\Theta$ trained using stochastic gradient descent. **Intuitively, we train the Transformer-based model to predict the next token within the $k$-context window using unlabeled text from which we also extract the latent features $h$.**
 {: .text-justify}