Microsoft (MSFT.US) is teaming up with HarperCollins, a unit of News Corp. (NWS.US), to train AI models on vast amounts of book data.
Microsoft Corp. (MSFT.US) has reached an agreement with HarperCollins, a division of News Corp. (NWS.US), to use its rich trove of non-fiction books to train its artificial intelligence models in order to improve their quality and performance, according to people familiar with the matter. The collaboration is limited to using selected old books for model training and does not involve creating new books, and authors have the right to choose whether to participate.
Specifically, Microsoft hopes to incorporate HarperCollins books into its yet-to-be-announced AI models to expand high-quality text sources and enhance the accuracy and expertise-providing capabilities of the models. Although Microsoft declined to comment, HarperCollins confirmed the agreement and said it would "allow limited use of selected out-of-copyright non-fiction books to train AI models."
At the same time, HarperCollins emphasized that the scope of the agreement is limited and has clear restrictions on exemplary outputs that respect author rights, and authors have the choice to participate.
"Our mission is to create opportunities for authors to deliberate while ensuring that the core value of their work and the revenue and royalties we share are protected," HarperCollins said. "This agreement, with its limited scope and clear boundaries for outstanding works that respect author rights, accomplishes that."
As is known, tech companies have been seeking more high-quality text sources to train their AI models, and Microsoft is no exception. They obtain licenses to use data ranging from social media sites to news articles to make their programs more accurate and better at answering questions or providing expertise on specific topics.
It's worth noting that News Corp. has previously signed an agreement with OpenAI to allow it to use content from several of its publications. Microsoft has also collaborated with multiple publishers on AI projects.
Moreover, earlier this year, Google reached a $60 million deal with Reddit, allowing the search giant to use a large number of subreddits to train its AI models.
However, some publishers have expressed dissatisfaction with AI companies' unauthorized use of content and have filed lawsuits. For example, The New York Times sued OpenAI and Microsoft, alleging copyright infringement.
All things considered, Microsoft's agreement with HarperCollins marks another significant step in tech companies' efforts to seek high-quality text sources to train AI models. However, how to respect author rights while utilizing these resources remains a challenge that publishers and tech companies need to face together.
Global insights driving the market strategies of tomorrow.
Latest Articles
Stay ahead of the market.
Get curated U.S. market news, insights and key dates delivered to your inbox.

Comments
No comments yet