The open-source AI boom is built on Big Tech’s handouts. How long will it last?

Like the wider open-source community, Pineau and her colleagues believe that transparency should be the norm. “One thing I push my researchers to do is start a project thinking that you want to open-source,” she says. “Because when you do that, it sets a much higher bar in terms of what data you use and how you build the model.”

But there are serious risks, too. Large language models spew misinformation, prejudice, and hate speech. They can be used to mass-produce propaganda or power malware factories. “You have to make a trade-off between transparency and safety,” says Pineau.

For Meta AI, that trade-off might mean some models do not get released at all. For example, if Pineau’s team has trained a model on Facebook user data, then it will stay in house, because the risk of private information leaking out is too great. Otherwise, the team might release the model with a click-through license that specifies it must be used only for research purposes.

This is the approach it took for LLaMA. But within days of its release, someone posted the full model and instructions for running it on the internet forum 4chan. “I still think it was the right trade-off for this particular model,” says Pineau. “But I’m disappointed that people will do this, because it makes it harder to do these releases.”

“We’ve always had strong support from company leadership all the way to Mark [Zuckerberg] for this approach, but it doesn’t come easily,” she says.

The stakes for Meta AI are high. “The potential liability of doing something crazy is a lot lower when you’re a very small startup than when you’re a very large company,” she says. “Right now we release these models to thousands of individuals, but if it becomes more problematic or we feel the safety risks are greater, we’ll close down the circle and we’ll release only to known academic partners who have very strong credentials—under confidentiality agreements or NDAs that prevent them from building anything with the model, even for research purposes.”

If that happens, then many darlings of the open-source ecosystem could find that their license to build on whatever Meta AI puts out next has been revoked. Without LLaMA, open-source models such as Alpaca, Open Assistant, or Hugging Chat would not be nearly as good. And the next generation of open-source innovators won’t get the leg up the current batch have had.

Categories

The open-source AI boom is built on Big Tech’s handouts. How long will it last?

Leave a Reply Cancel reply