CamoTextCLI: Automated Anonymization
July 11, 2025
CamoTextCLI takes the zero-egress CamoText anonymization engine and drops it into your terminal or onto your server. No GUI, no internet, uniquely suited for headless tasks: CI/CD pipelines, cron jobs, containerized data processors, and AI agents. If your workflow can launch a shell command, it can privatize text with CamoTextCLI.

How to Automate Redaction and Anonymization
Apps with graphical interfaces are excellent when a human is steering, but organizations and their privacy officers increasingly need automation-first redaction:
- Continuous integration: sanitize docs pushed to a repository before they ever reach cloud storage.
- Serverless workloads: auto-strip PII from incoming documents.
- ChatOps: bots that auto-redact snippets users paste into a channel.
- RAG pipelines: anonymize knowledge-base documents before vectorizing them for retrieval-augmented generation.
A GUI can’t sit in those tight loops that precede human eyes—but a powerful NLP-powered CLI executable can.
Feature Highlights
- Headless, cross-platform. No Python runtime required, fully bundled executable. Just drop
camo.exe
(Windows) orcamo
(macOS/Linux) on disk and go. - Batch & recursive directory processing with multi-threaded workers for large archives.
- Flexible inputs. Feed a single file, raw text string, or an entire folder tree of mixed formats (TXT, PDF, DOCX, CSV…).
- Config settings for prioritized terms, hash length (set to 1 for simple masking/redaction, all using SHA-256 under the hood), and any data categories to ignore.
- Entity introspection. List detected categories without altering the file—useful for audits.
- JSON hash key export for reversible re-linking or auditing.
- Zero network calls. Like its GUI sibling, CamoTextCLI never touches an API endpoint or telemetry server —ideal for air-gapped hosts.
Quick Start
# Get comprehensive help
camo --help # Windows
./camo --help # macOS/Linux
# One-liner redaction (STDOUT)
camo --i "User last accessed from IP 192.168.0.1, username janedoe@gmail.com"
camo --i log.txt
# Save output + key file
camo --input contract.docx --output sanitized.docx --dump-key audit_key.json
Deeper Usage Patterns
Batch-sanitize an entire client folder
# Recursively process sub-folders
camo --input-dir ./ClientFiles --output-dir ./ClientFilesRedacted --recursive --dump-key batch_key.json
Configure a client- or usecase-specific workflow
Stop known sensitive terms from reaching the remote origin by prioritizing them.
# config.json
{
"priority": ["AcmeCorp", "Jane Doe", "Project Varia"],
"hash_length": 10
}
# Run CamoTextCLI with custom config
camo --config config.json -i sensitive.txt -o sanitized.txt
RAG pre-processor for AI agents
Feed every document through CamoTextCLI before embedding:
# inside ingestion.sh
for f in $(find ./docs -name '*.pdf'); do
camo -i "$f" -o "./redacted/${f##*/}"
done
python embed.py ./redacted
Sample Command & Flag Reference
Flag | Purpose | Example |
---|---|---|
--config |
Specify a JSON config file for priorities, hash length, and ignored categories | --config config.json |
-i / --input |
Single file path or raw string | -i notes.txt |
-o / --output |
Write redacted file; omit for STDOUT | --output clean.txt |
--input-dir |
Input folder for batch mode | --input-dir ./reports |
--output-dir |
Destination folder (batch) | --output-dir ./reports_redacted |
--recursive |
Descend into sub-folders | --recursive |
--priority |
Anonymize specific words first | --priority "Project Zephyr" |
--ignore-category |
Leave enumerated categories visible | --ignore-category "LOCATION" |
--dump-key |
Write JSON mapping of hashes→original | --dump-key key.json |
--list-entities |
Scan & report detected categories, then exit | --list-entities |
Full argument matrix: camo --help
.
Example Use Cases
Law Firms & E-Discovery
Run CamoTextCLI as an intake filter so first-pass reviewers never see raw PII; export anonymization keys for later re-linking when privilege checks are complete.
Data-Privacy Officers
Schedule a nightly cron job to strip employee or customer identifiers from log archives before they roll into SIEM or SEC compliance storage.
Private AI Agent Startups
Bundle the CamoTextCLI executable into a Docker image; your agent sanitizes context snippets locally, mitigating privacy headaches when you lean on a public LLM API downstream.
Public-Records Teams
Automate redaction of FOIA packets by pointing --input-dir
at a document dump and giving interns only the anonymized output plus the audit key.
Enterprise Customization & Accelerated Pipelines
CamoText's out‑of‑the‑box recognizers cover a broad swath of PII and other sensitive information, but enterprises often track domain‑specific tokens, like case IDs, policy numbers, proprietary SKUs, or research subject IDs. Many of these IDs may still be caught by the default recognizers (such as the UUID, CONTACT_NUMBER, or ACCOUNT recognizers), but larger organizations might get a quote from CamoText for custom recognizers.
Enterprises should also use the --config
JSON spec for large-scale, high-volume pipelines and long lists of priorities.
• Consistency: Maintain the same anonymization rules across multiple runs
• Reusability: Share configuration files across team members
• Version Control: Track anonymization settings in source control
• Reduced Command Length: Avoid typing long command lines repeatedly
Best Practices
• Human-in-the-loop is still required. NLP is probabilistic; always perform output checks and use the priority flag for ensured recognition.
• Leverage the user config. CamoTextCLI's user config file options are a powerful way to customize and expedite the anonymization process—prioritized terms are deterministically matched.
• Guard your key files. Sharing the hash map broadly defeats the purpose of anonymization—only save it if you need it.
For bespoke recognizers and features, integration consulting, or bulk licensing, email contact@camotext.ai.