Subj : CRYPTO-GRAM, November 15, 2024 Part 5
To   : All
From : Sean Rima
Date : Fri Nov 15 2024 04:13 pm

Expect lots of developments in this area over the next few years.

This is what I said in a recent interview:

    Let’s stick with software. Imagine that we have an AI that finds
    software vulnerabilities. Yes, the attackers can use those AIs to break
    into systems. But the defenders can use the same AIs to find software
    vulnerabilities and then patch them. This capability, once it exists,
    will probably be built into the standard suite of software development
    tools. We can imagine a future where all the easily findable
    vulnerabilities (not all the vulnerabilities; there are lots of
    theoretical results about that) are removed in software before
    shipping.

    When that day comes, all legacy code would be vulnerable. But all new
    code would be secure. And, eventually, those software vulnerabilities
    will be a thing of the past. In my head, some future programmer shakes
    their head and says, “Remember the early decades of this century when
    software was full of vulnerabilities? That’s before the AIs found them
    all. Wow, that was a crazy time.” We’re not there yet. We’re not even
    remotely there yet. But it’s a reasonable extrapolation.

EDITED TO ADD: And Google’s LLM just discovered an exploitable zero-day.

** *** ***** ******* *********** ************* IoT Devices in
Password-Spraying Botnet

[2024.11.06] Microsoft is warning Azure cloud users that a Chinese
controlled botnet is engaging in “highly evasive” password spraying. Not
sure about the “highly evasive” part; the techniques seem basically what
you get in a distributed password-guessing attack:

    “Any threat actor using the CovertNetwork-1658 infrastructure could
    conduct password spraying campaigns at a larger scale and greatly
    increase the likelihood of successful credential compromise and initial
    access to multiple organizations in a short amount of time,” Microsoft
    officials wrote. “This scale, combined with quick operational turnover
    of compromised credentials between CovertNetwork-1658 and Chinese
    threat actors, allows for the potential of account compromises across
    multiple sectors and geographic regions.”

    Some of the characteristics that make detection difficult are:

	The use of compromised SOHO IP addresses The use of a rotating set
	of IP addresses at any given time. The threat actors had thousands
	of available IP addresses at their disposal. The average uptime for
	a CovertNetwork-1658 node is approximately 90 days.  The low-volume
	password spray process; for example, monitoring for multiple failed
	sign-in attempts from one IP address or to one account will not
	detect this activity.

** *** ***** ******* *********** ************* Subverting LLM Coders

[2024.11.07] Really interesting research: “An LLM-Assisted Easy-to-Trigger
Backdoor Attack on Code Completion Models: Injecting Disguised
Vulnerabilities against Strong Detection“:

    Abstract: Large Language Models (LLMs) have transformed code completion
    tasks, providing context-based suggestions to boost developer
    productivity in software engineering. As users often fine-tune these
    models for specific applications, poisoning and backdoor attacks can
    covertly alter the model outputs. To address this critical security
    challenge, we introduce CODEBREAKER, a pioneering LLM-assisted backdoor
    attack framework on code completion models. Unlike recent attacks that
    embed malicious payloads in detectable or irrelevant sections of the
    code (e.g., comments), CODEBREAKER leverages LLMs (e.g., GPT-4) for
    sophisticated payload transformation (without affecting
    functionalities), ensuring that both the poisoned data for fine-tuning
    and generated code can evade strong vulnerability detection.
    CODEBREAKER stands out with its comprehensive coverage of
    vulnerabilities, making it the first to provide such an extensive set
    for evaluation. Our extensive experimental evaluations and user studies
    underline the strong attack performance of CODEBREAKER across various
    settings, validating its superiority over existing approaches. By
    integrating malicious payloads directly into the source code with
    minimal transformation, CODEBREAKER challenges current security
    measures, underscoring the critical need for more robust defenses for
    code completion.

Clever attack, and yet another illustration of why trusted AI is essential.

** *** ***** ******* *********** ************* Prompt Injection Defenses
Against LLM Cyberattacks

[2024.11.07] Interesting research: “Hacking Back the AI-Hacker: Prompt
Injection as a Defense Against LLM-driven Cyberattacks“:

    Large language models (LLMs) are increasingly being harnessed to
    automate cyberattacks, making sophisticated exploits more accessible
    and scalable. In response, we propose a new defense strategy tailored
    to counter LLM-driven cyberattacks. We introduce Mantis, a defensive
    framework that exploits LLMs’ susceptibility to adversarial inputs to
    undermine malicious operations. Upon detecting an automated
    cyberattack, Mantis plants carefully crafted inputs into system
    responses, leading the attacker’s LLM to disrupt their own operations
    (passive defense) or even compromise the attacker’s machine (active
    defense). By deploying purposefully vulnerable decoy services to
    attract the attacker and using dynamic prompt injections for the
    attacker’s LLM, Mantis can autonomously hack back the attacker. In our
    experiments, Mantis consistently achieved over 95% effectiveness
    against automated LLM-driven attacks. To foster further research and
    collaboration, Mantis is available as an open-source tool: this https
    URL.

This isn’t the solution, of course. But this sort of thing could be part of
a solution.

** *** ***** ******* *********** ************* AI Industry is Trying to
Subvert the Definition of “Open Source AI”

[2024.11.08] The Open Source Initiative has published (news article here)
its definition of “open source AI,” and it’s terrible. It allows for secret
training data and mechanisms. It allows for development to be done in
secret. Since for a neural network, the training data is the source code --
it’s how the model gets programmed -- the definition makes no sense.

And it’s confusing; most “open source” AI models -- like LLAMA -- are open
source in name only. But the OSI seems to have been co-opted by industry
players that want both corporate secrecy and the “open source” label.
(Here’s one rebuttal to the definition.)
--- 
 * Origin: High Portable Tosser at my node (21:1/229.1)

.