From 8910fa7a791f1d4373a03f4d8e5db67bd4022231 Mon Sep 17 00:00:00 2001 From: Demi-wlw <88136271+Demi-wlw@users.noreply.github.com> Date: Sun, 28 Jul 2024 19:27:41 +0100 Subject: [PATCH] modified --- _posts/2023-03-19-ChatGPT.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/_posts/2023-03-19-ChatGPT.md b/_posts/2023-03-19-ChatGPT.md index f565fc6bc271..8db98592bde4 100644 --- a/_posts/2023-03-19-ChatGPT.md +++ b/_posts/2023-03-19-ChatGPT.md @@ -92,9 +92,9 @@ GPT has been a major breakthrough in natural language processing and the version The term _generative pre-training_ represents the unsupervised pre-training of the generative model.They used a multi-layer Transformer decoder to produce an output distribution over target tokens. Given an unsupervised corpus of tokens $\mathcal{U} = (u_1,\dots,u_n)$, they use a standard language modelling objective to maximize the following likelihood: {: .text-justify} -$ +$$ L_1(\mathcal{U})=\sum_i\log P(u_i\mid u_{i-k},\dots,u_{i-1};\Theta) -$ +$$ where $k$ is the size of the context window, and the conditional probability $P$ is modelled using a neural network with parameters $\Theta$ trained using stochastic gradient descent. **Intuitively, we train the Transformer-based model to predict the next token within the $k$-context window using unlabeled text from which we also extract the latent features $h$.** {: .text-justify}