But while the open-source movement has spawned a colossal ecosystem that we all depend on, we do not fully understand it, experts like Aitel argue. There are countless software projects, millions of lines of code, numerous mailing lists and forums, and an ocean of contributors whose identities and motivation are often obscure, making it hard to hold them accountable.
That can be dangerous. For example, hackers have quietly inserted malicious code into open-source projects numerous times in recent years. Back doors can long escape detection, and, in the worst case, entire projects have been handed over to bad actors who take advantage of the trust people place in open-source communities and code. Sometimes there are disruptions or even takeovers of the very social networks that these projects depend on. Tracking it all has been mostly—though not entirely—a manual effort, which means it does not match the astronomical size of the problem.
Bratus argues that we need machine learning to digest and comprehend the expanding universe of code—meaning useful tricks like automated vulnerability discovery—as well as tools to understand the community of people who write, fix, implement, and influence that code.
The ultimate goal is to detect and counteract any malicious campaigns to submit flawed code, launch influence operations, sabotage development, or even take control of open-source projects.
To do this, the researchers will use tools such as sentiment analysis to analyze the social interactions within open-source communities such as the Linux kernel mailing list, which should help identify who is being positive or constructive and who is being negative and destructive.
The researchers want insight into what kinds of events and behavior can disrupt or hurt open-source communities, which members are trustworthy, and whether there are particular groups that justify extra vigilance. These answers are necessarily subjective. But right now there are few ways to find them at all.
Experts are worried that blind spots about the people who run open-source software make the whole edifice ripe for potential manipulation and attacks. For Bratus, the primary threat is the prospect of “untrustworthy code” running America’s critical infrastructure—a situation that could invite unwelcome surprises.
Unanswered questions
Here’s how the SocialCyber program works. DARPA has contracted with multiple teams of what it calls “performers,” including small, boutique cybersecurity research shops with deep technical chops.