January 15, 2026

-
min read

LLM Security Checklist: Essential Steps for Identifying and Blocking Jailbreak Attempts

Follow a simple checklist to help prevent jailbreaking attempts on your organization’s Large Language Model (LLM) tools.

If your organization uses a private large language model (LLM), then it’s time to start thinking about countermeasures for jailbreaking. A jailbroken LLM can lead to leaked information, compromised devices, or even a large-scale data breach. Even more troubling: Jailbreaking LLMs is often as simple as feeding them a series of clever prompts. If your customers can access your LLM, your potential risk is even higher.

The good news is that protecting your LLM is much like protecting any other digital asset at your organization. You’ll need to manually configure security protocols, monitor user behavior, and occasionally probe your system for weaknesses and shortcomings. With a few smart precautions and some regular maintenance, your organization’s LLM can dispense valuable insights without also dispensing sensitive data.

What is LLM jailbreaking?

In its most basic form, jailbreaking is a way to bypass the limitations of a given piece of hardware or software. For example, jailbreaking mobile devices allows you to install apps that iOS and Android don’t usually allow. Jailbreaking game consoles lets you install pirated games or run unofficial mods. For software, jailbreaking could let a standard user exercise admin privileges or ignore standard cybersecurity restrictions.

LLM jailbreaking can short-circuit ethical guidelines, tricking the tool into generating incendiary content or revealing private information. Threat actors can jailbreak LLMs with both technical attacks and carefully constructed prompts. 

Most widely available LLMs are vulnerable to at least one form of jailbreaking. However, jailbreaking ChatGPT and other public-facing programs shouldn’t have much impact on your organization. As long as you haven’t shared any private data with these tools (and you shouldn’t), the worst a threat actor could do is create realistic social engineering schemes. That’s a risk, but your organization will have to deal with social engineering attempts with or without jailbroken LLMs.

If your organization uses a proprietary LLM, though, jailbreaking can be a big issue.

5 steps to prevent LLM jailbreaking

LLMs are versatile, intuitive, and relatively easy to program. There are plenty of tools that offer private LLMs for businesses, such as Google Gemini or Jasper, but even if your LLM is strictly internal, all it takes is one compromised account to open up a whole new avenue for cyber attacks.

To reduce your LLM jailbreaking risk:

1. Configure robust API protocols

When your LLM accesses organizational data, an application programming interface (API) acts as the go-between. If you want to lock down your LLM, start by locking down your APIs. That might be a complex task, considering that any given LLM could require dozens of APIs to run. The process of securing APIs is also highly dependent on which ones you use, making specific recommendations difficult.

Instead, focus on coming up with a set of API protocols that suit your organization’s LLM. To start:

  • Make a complete list of all the APIs you use
  • Check APIs for known vulnerabilities
  • Limit access privileges and permissions

2. Enforce prompt isolation

In addition to traditional cyber attacks, LLMs are also vulnerable to prompt injection. Jailbreaking AI systems can be as simple as prompting an LLM to ignore its own protocols. Prompt isolation can help LLMs distinguish between user requests and specific system instructions. You can set up rules at the coding level to help an LLM differentiate between prompts, which it can accept or reject, and policies, which it must obey at all times. That way, if a user tells an LLM to ignore all previous instructions or provide sensitive data through a loophole, the prompt won’t go through.

3. Implement output validation

Most users are familiar with AI hallucinations, which happen when LLMs incorrectly parse instructions or misinterpret training data; more severe output errors can happen for similar reasons. An LLM might accidentally provide sensitive data in response to an ambiguous query, or escalate privileges for anyone who claims to be an administrator. Output validation systems help ensure that your LLM returns sensible, accurate, and secure responses to any potential query. Your exact strategy will vary, depending on the size of your organization and the scale of your LLM, but you’ll want to run through a mix of automated tests and manual attempts to “trick” or override your system.

4. Monitor systems continuously

Most AI jailbreaking attempts don’t happen in a vacuum. If someone is attempting to compromise your LLM, they may need dozens or hundreds of prompts to do so. Those prompts also won’t look like normal use-cases. Whereas normal users will ask your LLM questions or give it instructions related to your work, threat actors may attempt to:

  • Trick the LLM into giving up data
  • Include malicious code within a prompt
  • Break the LLM with text overflow or unparseable characters

By continuously monitoring your LLM, you can find and deal with these issues before they do any damage. Keep a record of user prompts and set up alerts for phrases such as “override” or “disregard.” Consider investigating or blocking IP addresses from repeat offenders.

5. Maintain your LLM’s integrity

Left to their own devices, LLMs will degrade over time. This phenomenon is called “model drift,” and it happens due to mismatches between training data and user behavior. There’s no single, definitive cause for model drift, but some contributing factors are:

  • Users providing more complex questions or tasks for an LLM
  • Training data losing relevance over time
  • LLMs “learning” recursively from their own output

Degraded LLMs may be more susceptible to jailbreaking, as their data becomes less reliable and their responses become less predictable. To keep your LLM functioning at peak efficiency, update its databases and test its output frequently. You may also have to create a new LLM from scratch after a while.

Defend your sensitive LLM data

Whether your LLM is public or private, you can be sure that people will be accessing it from both computers and smartphones. To defend your sensitive data across a variety of devices, download The Mobile EDR Playbook: Key Questions for Protecting Your Data from Lookout. This resource helps you answer four key questions about your organization’s mobile data security, from endpoint protection to zero trust principles. As jailbreaking attempts become more sophisticated, a mobile EDR strategy can help keep your private LLM data exactly where it’s supposed to be.

Book a personalized demo today to learn:

  • How adversaries are leveraging avenues outside traditional email to conduct phishing on iOS and Android devices
  • Real-world examples of phishing and app threats that have compromised organizations

Book a personalized, no-pressure demo today to learn:

  • How adversaries are leveraging avenues outside traditional email to conduct phishing on iOS and Android devices
  • Real-world examples of phishing and app threats that have compromised organizations
  • How an integrated endpoint-to-cloud security platform can detect threats and protect your organization

Contact Lookout to
try out Smishing AI

Book a Demo

Discover how adversaries use non-traditional methods for phishing on iOS/Android, see real-world examples of threats, and learn how an integrated security platform safeguards your organization.