Sensitive data protection for LLM requests

Mask sensitive data before it reaches an LLM.

Globesword is a locally deployable filtering gateway that detects configured PII, financial identifiers, credentials, national IDs, and customer-specific sensitive data before an LLM request is sent.

Sensitive values are replaced with request-scoped tokens. Approved values can be restored after inference through a controlled mapping layer.

Local or private deployment
Client-specific policies
Measurable PoV results

Request inspection

Globesword Masking Gateway

POLICY ACTIVE

Original request

Contact maya@example.com and transfer funds to IBAN DE89 3704 0044 0532 0130 00.

Payload sent to the LLM

Contact [EMAIL_1] and transfer funds to IBAN [IBAN_1].

emailMasked

Exact span

ibanMasked

MOD-97 valid

Restoration is limited to authorized tokens associated with the current request context.

Baseline functional validation

Initial evidence, presented with its scope.

These results demonstrate functional correctness for the tested examples. They are not presented as universal production accuracy.

13

Baseline scenarios

End-to-end functional tests

33

Expected entities

All detected with exact spans

0

Observed leaks

Within the baseline suite

0.154 ms

Deterministic p95

For tested short-text inputs

Client PoVs use larger customer-specific datasets containing positive, negative, malformed, overlap, multilingual, and document-specific test cases.

How it works

A controlled layer between enterprise data and the model.

Globesword is designed to sit in the request path before data is sent to an external or internally hosted LLM.

01

Inspect the request

The gateway scans prompts, messages, and extracted file text before an LLM request is created.

02

Apply layered detection

Deterministic patterns, checksum validators, contextual detection, and tenant rules identify configured data classes.

03

Replace sensitive values

Detected values are replaced with request-scoped typed tokens before the content reaches the target model.

04

Restore approved values

Authorized tokens can be restored after inference through the separate controlled mapping layer.

Detection coverage

Broad detector library. Narrow deployment policy.

The platform includes reusable detectors, but each deployment activates only what is relevant to the customer.

A German manufacturer, a US financial institution, and an Indian healthcare provider should not run the same national identifier policy.

Personal and contact data

Detect common personal identifiers before prompt content leaves the application boundary.

Email addresses
Phone numbers
Dates of birth with label context
Usernames and account identifiers

Financial identifiers

Use structural and checksum-aware detection for common financial data.

Payment cards
IBAN and SWIFT/BIC
Bank routing numbers
Cryptocurrency addresses

Government identifiers

Enable only the country-specific identifiers relevant to each deployment.

US SSN, EIN and ITIN
India Aadhaar and PAN
Brazil CPF and CNPJ
Selected UK and European identifiers

Credentials and secrets

Identify provider-specific credentials and explicitly labelled secrets.

API keys and access tokens
Bearer tokens and JWTs
Passwords and client secrets
Private key blocks

Network and infrastructure data

Protect technical identifiers that may expose private systems or environments.

IPv4 and IPv6 addresses
MAC addresses
URL-embedded credentials
Internal identifiers through custom policies

Customer-defined identifiers

Add organization-specific patterns without retraining the detection model.

Employee IDs
Customer and patient IDs
Matter and project codes
Facility and device identifiers

Technical approach

Built for measurable risk reduction, not absolute claims.

Structured identifiers, contextual entities, and ambiguous data classes require different detection and validation strategies.

Policy-driven detection

Only the detectors relevant to the customer, geography, document type, and workflow need to be enabled.

Checksum validation

Where supported, regex candidates are validated using algorithms such as Luhn, MOD-97, Verhoeff, and identifier-specific checks.

Client-specific tuning

Confidence thresholds, custom identifiers, token prefixes, severity, and active policies can be configured per deployment.

Private deployment

The detection layer is designed to run locally, in a private cloud, or inside an enterprise-controlled container environment.

Exact-span masking

The system tracks character offsets so only the identified value is replaced while surrounding text remains intact.

Measurable evaluation

PoV results are reported using precision, recall, F1, exact-span accuracy, leakage, restoration accuracy, and latency.

Where it fits

Add masking before the workflow reaches the model.

The gateway can be integrated into applications that send user input, documents, retrieved context, logs, or tool output to an LLM.

Document and knowledge assistants

Mask sensitive content extracted from PDFs, DOCX files, spreadsheets, emails, tickets, and knowledge-base records.

Developer copilots

Detect credentials, private keys, tokens, network identifiers, and labelled secrets before code or logs reach an LLM.

Internal enterprise assistants

Apply department-specific policies for HR, finance, customer support, legal, healthcare, and operational workflows.

RAG and agent workflows

Place a masking layer before retrieval context, agent messages, tool calls, or external model requests.

Proof of Value

Validate it against the client’s real risk profile.

Every PoV is scoped around the customer’s countries, departments, document types, identifiers, model workflows, and acceptable risk.

Scope

Selected workflows and data classes

Policy

Customer-specific detector profile

Evaluation

Positive and negative test datasets

Outcome

Measured results and limitations

PoV deliverables

Customer-specific detector and policy profile
Synthetic or approved sanitized evaluation dataset
Positive, negative, malformed, and overlap test cases
Precision, recall, F1, and exact-span results
Sensitive-data leakage and restoration measurements
Latency and throughput measurements
Known limitations and remediation recommendations
Deployment and integration handoff

Grounded security claims.

The product is evaluated against defined policies and datasets. Results are reported with their scope and known limitations.

No claim that regex alone can identify every form of sensitive information
Ambiguous detectors are enabled only when relevant to the customer workflow
Structured and contextual detection results are evaluated separately
Baseline results are not presented as universal production accuracy
Client-specific validation is completed before broader deployment

Evaluate what reaches your LLM before expanding its access.

Start with one workflow, one policy profile, and a measurable customer-specific dataset.

Private deployment optionsScoped evaluationIntegration handoff