Microsoft Partners with HarperCollins to Train AI Using Classic Non-Fiction Books
Microsoft (MSFT.US) has reportedly entered into an agreement with HarperCollins, a subsidiary of News Corp-B (NWS.US), aiming to leverage the rich non-fiction book catalog of HarperCollins to train its artificial intelligence models. This collaboration is focused on utilizing selected older books to enhance the quality and capabilities of these AI models. Importantly, the partnership does not involve the creation of new books, and authors retain the discretion to opt into the program.
According to details, Microsoft intends to incorporate HarperCollins' literature into its AI models that are yet to be announced, striving to expand its sources of high-quality text while boosting the models' precision and proficiency in providing specialized knowledge. Although Microsoft has refrained from commenting, HarperCollins has confirmed the agreement, stating it allows for a "limited use of selected non-fiction older books to train AI models".
HarperCollins has emphasized the limited scope of this agreement, highlighting strict boundaries on model outputs that respect author rights. Authors have the choice of whether or not to participate in the initiative.
“One of our missions is to create opportunities for authors to consider carefully while safeguarding the essential value of their work and shared revenue and royalties,” HarperCollins stated. “The agreement has a limited scope and clearly defines boundaries on works that respect author rights, effectively achieving this goal.”
Technology companies have consistently sought high-quality text sources for training AI models, and Microsoft is no exception. By securing licenses, they tap into data ranging from social media platforms to news articles to enhance the accuracy and knowledge capability of their programs.
This deal signifies a significant step forward for tech companies like Microsoft as they pursue diverse, quality text resources to train AI models. However, the challenge remains in balancing the utilization of these resources while upholding the rights of authors, a matter that both publishers and tech firms must navigate collaboratively.