How to prepare dataset to train a chatbot llm. #1203

WeiXiaoSummer · 2024-01-25T10:51:04Z

WeiXiaoSummer
Jan 25, 2024

Hi everyone!

Newbie here, perhaps the question sounds very stupid, so let's assume I have a dataset prepared for a chatbot that contains function calling, how should I feed this dataset to the LLM? What I mean is, as the conversation has multiple rounds, should we generate an individual training sample from the beginning to each round or should we have only one training sample for each conversation, where the input is the full conversation except the last LLM response, and the output is the last LLM response.

Thank you in advance!

Answered by NanoCode012

Feb 23, 2024

You should do the former! Each individual's message being one. You can checkout the sharegpt or vicuna datasets to see examples for this!

View full answer

NanoCode012 · 2024-02-23T17:31:15Z

NanoCode012
Feb 23, 2024
Collaborator

You should do the former! Each individual's message being one. You can checkout the sharegpt or vicuna datasets to see examples for this!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to prepare dataset to train a chatbot llm. #1203

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How to prepare dataset to train a chatbot llm. #1203

WeiXiaoSummer Jan 25, 2024

Replies: 1 comment

NanoCode012 Feb 23, 2024 Collaborator

WeiXiaoSummer
Jan 25, 2024

NanoCode012
Feb 23, 2024
Collaborator