The AI industry is trying to undermine the definition of “open source AI”
The Open Source Initiative has published (article in the news here) their definition of “open source AI” and that terrible. It enables secret training data and mechanisms. This allows development to be done in secret. Since the training data for neural networks there is the source code is how the model is programmed – the definition doesn’t make sense.
And it’s confusing; most open source AI models, such as LLAMA, are open source in name only. But OSI appears to have been co-opted by industry players who want both corporate secrecy and the “open source” label. (Here is one refutation to be determined.)
It is worth fighting for. We need a the public AI optionand open source – true open source – is a necessary component of this.
But while open source should mean open source, there are some partially open source models that require some definition. There is a large field of research on privacy-preserving federated ML model training, and I think that’s a good thing. And OSI is right here:
Why do you allow some training data to be excluded?
Because we want open source AI to exist in areas where data cannot be shared legally, such as AI in medicine. Laws that allow learning data often restrict the redistribution of that same data to protect copyright or other interests. Privacy rules also give a person the legal ability to control their most sensitive information, such as decisions about their health. Similarly, much of the world’s indigenous knowledge is protected through mechanisms that are incompatible with later structures for exclusive rights and sharing.
How about we call it “open scales” instead of open source?
Bruce Schneier sidebar photo by Joe McInnis.