The Application of RAG in Langchain Framework in Classical Chinese

Currently, the world's mainstream Large Language Models (LLMs) offer significantly less support for Chinese than for English, resulting in challenges when utilizing generative LLMs to produce high-quality Chinese traditional literature works. This paper proposes a data source creation method, t...

Full description

Bibliographic Details
Main Authors: Liu, Zhi Hao, Leong, Wai Yie
Format: Article
Language:English
English
Published: INTI International University 2025
Subjects:
Online Access:http://eprints.intimal.edu.my/2163/
http://eprints.intimal.edu.my/2163/1/ij2025_22.pdf
http://eprints.intimal.edu.my/2163/2/710
Description
Summary:Currently, the world's mainstream Large Language Models (LLMs) offer significantly less support for Chinese than for English, resulting in challenges when utilizing generative LLMs to produce high-quality Chinese traditional literature works. This paper proposes a data source creation method, this method interprets words according to their extended meanings, which means one meaning of a word produces another or several meanings related to it in the process of language development, then use a word segmentation tool to divide the different meanings of a word, which re-quantifies the nouns, verbs, stories and histories in classical Chinese, the advantage of quantifying in this way is that it can effectively solve the problem of polysemy of words, and enhances the logical correlation between contexts. From the results, the correlation between the generated classical Chinese and the real results has been greatly improved. We use the Retrieval Augmented Generation (RAG) method to get the results at the least cost without retraining the new LLM