CamoText: Offline Text Anonymization Software

March 6, 2025

CamoText is a laptop-friendly desktop application, with a simple user interface and zero data retention or communication—ideal for sanitizing text before using to prompt generative AI and large language models (LLMs).

CamoText: Offline Text Anonymization Software

Background

CamoText was born from a practicing lawyer's1 firsthand experience with the challenges of maintaining client confidentiality while leveraging generative AI and LLMs for text analysis, research, and form generation. Entirely manual redaction was too time-consuming, but existing cloud-based and API-calling anonymization solutions posed extensive privacy risks and privilege/compliance issues—a new tool was necessary to make text anonymization fast, intuitive, and secure.

Some market research revealed every anonymization tool or service was either hosted on third party servers (and therefore introducing data security, privacy, and privilege risks) or usable only via command-line or other technically sophisticated interface on sufficiently performant hardware, rather than a simple download-and-use application.

So, CamoText was built to:

  • be private and compliant by design, with zero external connectivity and zero user data retained
  • be easy to use, with a simple and familiar interface
  • perform well on an average laptop
  • ensure its anonymizations were irreversible by third parties, and
  • ensure the human-in-the-loop can review and anonymize text after the auto-detection or revert false positives.2
  • AI Privacy Problems Prevent Adoption

    Absent expensive and complex on-premises hardware installations, current personal, enterprise, government, and professional-facing AI services all require their users transmit plaintext to external servers for processing.

    This requirement to blindly send text and data invokes understandable hesitations around using AI in the workplace. For government agencies, that hesitation might be borne from necessary adherence to data privacy laws, agency guidance and confidentiality policies; for lawyers, client privacy and privilege on materials; for healthcare professionals, HIPAA compliance; for startups, leaking trade secrets and product designs. The list goes on.

    Those hesitations are quite justified: the amount of personal and confidential information freely given to AI, and AI’s ability to extrapolate details and repeat user-submitted text, is alarming:

    "Given that LLMs are prone to memorization and can reproduce training data under certain conditions, the presence of such sensitive disclosures in their training corpora raises concerns about regurgitation of PII or sensitive topics in future outputs.”3

    In other words, not only is the mere provision of sensitive data to third parties problematic for the typical fears of misuse by data controllers or security breaches, LLMs can regurgitate exact details provided by users to others, both voluntarily or pursuant to a specific query or prompt requesting such information.

    CamoText's Solution

    Secure AI workflows must involve thoughtful top-to-bottom design and empowerment of the human-in-the-loop, but the first step is to prevent as much sensitive text as possible from ever leaving the source. CamoText ensures the anonymization process in conducted entirely on the user’s computer, then allowing the user to copy and paste the sanitized text wherever, such as to prompt the LLM of your choice (e.g. ChatGPT or Claude) to analyze the text for a specific purpose or generate a basic document form from the context.

    The app cannot access the internet, and fully resets when closed.

    Currently, CamoText uses natural language processing4 and custom pattern-matching to detect and anonymize numerous different categories of text that are commonly considered PII (personally identifiable information) or otherwise commercially sensitive:

  • PERSON: Full or partial names that might identify an individual.
  • EMAIL_ADDRESS: Email addresses.
  • PHONE_NUMBER: Phone numbers.
  • ADDRESS / STREET_ADDRESS: Physical addresses.
  • ENTITY: Company or other organization names.
  • MONEY: Monetary amounts (with currency symbols or references, numerical or words).
  • LOCATION: General location identifiers (e.g. Baltimore, Maryland).
  • ADDRESS / STREET_ADDRESS: Address lines, road names, etc.
  • CREDIT_CARD: Numerical string matching common credit card number patterns.
  • ACCOUNT: Account handles, PINs, and names (e.g. @CamoText1)
  • US_SSN: Numerical string matching US social security number pattern
  • US_PASSPORT: string matching US passport number pattern
  • US_BANK_NUMBER: US Bank Account number
  • US_ROUTING_NUMBER: US Bank routing number
  • IBAN_CODE: IBAN banking code
  • IP_ADDRESS: Both IPv4 and IPv6 patterns (e.g., 192.168.0.1).
  • MEDICAL_LICENSE: String matching common medical license pattern
  • API_KEY: Strings of characters that match known API credential formats.
  • UUID: Universally Unique Identifier strings (e.g., e-signed document identifiers, GUIDs).
  • CRYPTO_ADDRESS: cryptocurrency addresses (Bitcoin, Ethereum/EVM, Solana)
  • URL: Website urls or hyperlink patterns.
  • FILE: Common file types and file paths (e.g., C:\Users\Username\Documents\secret.pdf, HomeSecurityCam.mp4).

  • Camotext is not perfect at detecting and anonymizing sensitive text because nothing is — whether a given term or passage is confidential is often subjective and based on a given user’s circumstances and preferences.

    It’s imperative that users have an easy way to manually anonymize text that may have been missed by the auto-detection or that they deem confidential, privileged, or sensitive, including entire sections or pages. Because of this, CamoText has a simple highlight-and-anonymize feature for text of any length alongside the detected terms.

    Let Code Do the Work

    CamoText is a desktop application for Windows or MacOS, available for download now. Email contact@camotext.ai for custom recognizer patterns, implementation consults, training, and other options available for enterprise.

    Future versions will certainly prioritize user feedback, better detection algorithms, and broader categories of PII while still prioritizing usability on basic computers. As noted on the website,5 customers receive free access to updates for the year following their purchase, even if the price increases for their version.

    Exact performance depends on the user's machine and other open applications, but on a six year old laptop with 8 GB of RAM and an i5 CPU, CamoText analyzed 10,000 words (about 25 pages) in about 2.8 seconds.

    What started as a solution for a single legal practice is evolving into a powerful tool designed for professionals across industries. CamoText is a productivity-augmenting tool designed to help organizations embrace AI innovation while maintaining the highest standards of compliance, starting from a position of security and privacy. The human in the loop has ultimate control in determining what AI is able to read.

    Visit our site to see a demo video, review features, and get in touch!

    camotext.ai


    Endnotes

    1. Varia Law
    2. CamoText: Features
    3. Trust No Bot: Discovering Personal Disclosures in Human-LLM Conversations in the Wild. Section 5.1, p. 13.
    4. Including a bundled local SpaCy model. Some of the more tech-savvy readers might wonder: why don’t I just build this myself with my own patterns and Cursor/Windsurf/etc.? Well, because these tools were already leveraged in building CamoText, and the refinement and testing of the software’s capabilities and interface still took months!
    5. CamoText: Pricing