Beijing’s latest attempt to control how artificial intelligence informs Chinese internet users has been rolled out as a chatbot trained on the thoughts of President Xi Jinping, FT reported.
The country’s newest large language model has been learning from its leader’s political philosophy, known as “Xi Jinping Thought on Socialism with Chinese Characteristics for a New Era”, as well as other official literature provided by the Cyberspace Administration of China.
Chinese President Xi Jinping
“The expertise and authority of the corpus ensures the professionalism of the generated content,” CAC’s magazine said, in a Monday social media post about the new LLM.
The effort to ensure AI understands Xi’s philosophy comes as Chinese officials navigate balancing the country’s draconian controls on free speech with fostering AI development and creating rivals to the likes of Open AI’s ChatGPT.
For now the new model is being used at a research centre under the powerful internet regulator, but eventually, it may be released for wider use, according to a person involved in the project. The new model can answer questions, create reports, summarise information and translate between Chinese and English, the post said.
The creation of the LLM follows extensive efforts by Chinese officials to disseminate Xi’s ideas on politics, economics and culture in a variety of formats.
More than a dozen books have been published in Xi’s name and his best- sellers typically take centre stage at book fairs in the country. Popular news apps from companies such as Tencent or NetEase reserve slots at the top of user feeds for articles from official media, most of the time featuring Xi.
Officials have also required school children as young as 10 to study his political philosophy. They created the Study Xi Strong Nation app to teach and test the country’s roughly 100mn party members on their knowledge. In 2018, his ideas were written into the state constitution.
CAC, which has led the way in issuing rules for generative AI and introduced a licensing regime, mandates that generative AI providers “embody core socialist values” and says generated content cannot “contain any content that subverts state power”. Companies are responsible for their AI output.
This is a particular challenge for model developers because of the relatively sparse Chinese language data sets available to train their LLMs. Most groups train on English language information as well, introducing the potential for generative AI to produce responses that fall foul of China’s speech norms.
Tech giants such as Baidu and Alibaba have ensured their models strictly control generated content related to Xi or other potentially sensitive issues. Both groups’ generative AI chatbots typically ask users to restart chats when pressed about sensitive topics.
To help developers deal with the issue, the Cyber Security Association of China, a non-profit aligned with CAC, released the first public database of 100mn entries of “high-quality and trustworthy data” for groups to use in model training in December. The training set draws heavily from government regulations and policy documents, state media reports and other official publications, according to portions reviewed by the Financial Times.
One of the dozens of text documents in the data package contains 86,314 mentions of Xi Jinping. “Let us unite more closely around the Party Central Committee with Comrade Xi Jinping at its core,” reads one line.
We must “ensure that in thought, politics, and action, we are always in high alignment with the Party Central Committee with General Secretary Xi Jinping at its core,” says another.
Comments