Google’s Deal With StackOverflow Is the Latest Proof That AI Giants Will Pay for Data

Last year StackOverflow became one of the first websites to announce it would charge AI giants for access to content used to train chatbots. Now the popular Q&A service for coders has signed up its first customer—Google—in what CEO Prashanth Chandrasekar says is the start of a “meaningful” new stream of revenue.

The deal is significant, because it remains unclear how broadly Google and other AI developers will pay for content needed for AI projects. Millions of books and websites have fueled the development of AI systems, but most publishers have not been compensated, and some are suing over what they allege is misuse. Many publishers, including StackOverflow, appear threatened by ChatGPT and other generative AI products, which can answer queries that would have previously sent coders their way.

The deal will see Google’s cloud division use questions and answers from StackOverflow about Google Cloud services to provide coding assistance and technical support through a version of Google’s Gemini chatbot. Google’s cloud computing customers will also be able to ask questions through Google Cloud’s command-line interface. “Their AI may not have all the answers, and so we have a huge ability to help complete that loop,” Chandrasekar says. “We are the biggest place where community knowledge is curated and validated.”

Gemini will summarize answers drawn from StackOverflow in its own words but include the company’s logo, a link back to the original material, and the username of the site contributor who supplied it. The companies plan to demonstrate the system at Google Cloud Next, the search company’s annual cloud conference in April, and launch it soon after.

Chandrasekar says there are no significant restrictions on how Google Cloud can use StackOverflow data, meaning it can be used to train large language models and other AI systems. “Where we want to stand firm on is—nonnegotiable things for us— trust, accuracy, quality, and attribution back to the sources of these AI outputs,” he says.

He declined to say how much StackOverflow is being paid by Google for the data. “This will be a meaningful commercial offering for us in the near term, medium term, and long term,” Chandrasekar says.

Covert Scraping

Google and other AI developers have previously gathered data from StackOverflow and other websites without much notice. As demand for generative AI technologies has surged—and the valuations of the companies developing them has rocketed—the websites supplying the foundational text have begun demanding what they view as their fair share. Fortunately for StackOverflow, prospective customers have heeded the message, Chandrasekar says. “We’re not having to chase people,” he says.

StackOverflow data is particularly beneficial to AI systems that generate computer code, which have proven to be popular with software engineers and a significant source of revenue for Microsoft and OpenAI.

The new StackOverflow deal comes just a week after Google reached a licensing agreement to hoover up data from Reddit, the discussion forums operator, whose content has helped chatbots’ ability to converse. Reddit had unveiled plans to start charging for data access just before StackOverflow had last year.

Source

Author: showrunner