Build a Maskbits deidentification job end to end — from selecting a datasource through to configuring masking rules, filtering rows, choosing an output destination, and reviewing before submission.
Before you begin
You need:
- The Maskbits Designer role
- At least one datasource configured
- A destination datasource where the deidentified output will be stored
What a deidentification job does. A deidentification job replaces sensitive values in your source data — names, emails, addresses, dates of birth, and similar — with masked or synthetic values, while preserving the overall structure of the data. The original data is not changed. Instead, new columns (or a new dataset, depending on your output choice) are created with the masked values.
Creating a job is a 6-step process: Select Project, Select Data Source, Configure, Filter, Output, Preview.
Steps
Step 1. Select Project and name the job
Open Maskbits from the main navigation and start a new Deidentification job.
Fill in three fields:
- Select Project — pick the project this job belongs to, or create one inline from the dropdown
- Title — a descriptive name you’ll use to find this job later
- Select Datasource Type — defaults to Datasource; leave as-is unless you have a specific reason to change it
Click Next to continue.
Worth knowing. Give the job a descriptive title like “Patient records — monthly deidentification” rather than something generic. You’ll likely have multiple jobs over time, and good names save you from having to open each one to figure out what it does.
Step 2. Select the data source and configuration method
Step 2 has two parts. First, pick the source that holds the data you want to deidentify from the Select Data Source dropdown. Then choose how you want to configure the masking rules.
Maskbits offers three configuration methods:
| AUTO CONFIGURATION | EXCEL BASED | MANUAL CONFIGURATION |
| Maskbits scans your data and automatically detects sensitive fields (PII) using AI, then proposes generators for each. Fastest path to a working job. | Upload a pre-filled Excel configuration template. Useful when you’ve already mapped fields to generators in a spreadsheet — often the case for compliance teams who maintain masking inventories externally. | Define every field and rule yourself through the UI. Use when you want full control or when Auto Configuration hasn’t detected the fields you need. |
The rest of this article walks through Manual Configuration because it exposes every option the product offers and makes the mental model clear. Once you’ve created a job manually, using Auto Configuration is straightforward — Maskbits pre-fills the same screens based on what it detected.
Select a method and click Next.
Step 3. Configure the deidentification rules
Step 3 is where you define what gets masked and how. The screen has two main areas:
- Dataset Names (left panel) — a searchable list of all datasets in the selected datasource. Tick the datasets you want to include in this job.
- Columns (right panel) — where you configure rules for each column in the selected dataset
For each column you want to deidentify, set three things:
1. Generator — choose how the replacement value is produced. Generators are grouped by data type, including names (First Name, Last Name), contact (Email, Phone), addresses (Address1, Address2, City, State, Zip), dates (Date Past, DOB), and numeric / generic (Alpha, Alpha Numeric, Numeric, Decimal, Random Existing Data). See the Maskbits Product Docs for the full generator reference.
2. Mode — toggle between Random (every matching value gets a different masked replacement), Consistent (the same source value always produces the same masked value) and Pass through (the deidentified value is the same as the source value). Consistent mode matters when you need deidentified data to preserve joins — for example, the same patient ID needs to map to the same masked ID everywhere it appears.
Why Pass through on State matters. When you set State to Pass through and set City and Zip to deidentify using the City Generator and Zip Generator, Maskbits uses the real state value as an anchor — the masked City and Zip values are generated as valid combinations for that state. A record originally showing California / Los Angeles / 90001 might become California / Fresno / 93650 — still a valid California city and zip. Without Pass through on State, the City and Zip would be randomly generated across the US, producing geographically invalid combinations like Los Angeles / 33101 (a Miami zip). If downstream systems validate geographic data, you want Pass through on State.
3. Field Path — for nested or complex data (JSON, BSON, Array), specify which field inside the structure to mask. Click the target icon next to the Field Path input to open the Select Field Path modal. The modal shows a tree view of the document structure with field types (string, object) and sample values, so you can drill into the exact nested field you want.
Handling complex data types. If your column contains JSON, BSON, or Array data, click Add Type to register it as a Complex Data Type. In the dialog, pick the data type (JSON, BSON, or Array) and select which columns it applies to. This tells Maskbits how to parse the column before applying generators to its inner fields.
When you’ve finished configuring all the columns for a dataset, click Done in the top-right of the Columns panel. Repeat for each additional dataset you selected, then click Next.
Step 4. Add filters (optional)
Step 4 lets you scope the job to a subset of rows. If you don’t need filtering, leave this step empty and click Next — the job will process every row in the selected datasets.
To add a filter:
- Enter a descriptive filter name in the Name field
- Click Configure Filter to set the criteria
- Click Select Datasets to pick which datasets this filter applies to
Use the + icon to add multiple filters. Each filter can target different datasets with different criteria.
Step 5. Configure the output
Step 5 defines where the deidentified data goes. Select an Output Data Source from the dropdown — this is where Maskbits writes the processed data.
You can also set a Prefix and Suffix that Maskbits applies to the output column names:
- Prefix — text added before each output column name (e.g., masked_)
- Suffix — text added after each output column name (e.g., _anonymized)
Leave both fields empty if you want the output columns to keep their original names. Use prefixes when you plan to merge deidentified data with original datasets and need to distinguish the two.
The original data is preserved. Deidentification doesn’t modify your source data. Maskbits writes new columns (or a new dataset) containing the masked values at the output destination you choose. You can re-run the job at any time without affecting the source.
Step 6. Review the summary and submit
Step 6 shows the Configuration Summary — a review of everything you’ve set up:
- Project, Title, Data Source, Configuration Method, Output destination
- Per-dataset configuration chips showing field paths and generators
Each line has an edit icon — click it to jump back to the relevant step and adjust that specific piece before submitting.
Once you’re satisfied, click Submit in the bottom-right to run the job.
Troubleshooting
- Next is disabled on one of the wizard steps. A required field isn’t filled in. Look for fields marked with a red asterisk and complete them. Commonly missed: Title on Step 1, selecting at least one dataset on Step 3, and choosing an Output Data Source on Step 5.
- I can’t find a field in the Select Field Path modal. Use the search bar at the top of the modal to filter fields by name, or click Expand to open every nested object at once. For deeply nested structures, the field path you’ll see is a dotted path like patient.contact.email.
- Auto Configuration missed a sensitive field I expected it to detect. Auto Configuration is good at common patterns but not exhaustive. Switch to Manual Configuration (or edit the Auto-generated config) and add the missing field yourself. Consider reporting the miss so future Auto Configuration runs can improve.
Related
- Maskbits generators reference (Product docs)
- Random vs Consistent mode — when to use each
- How to use Auto Configuration
- How to upload an Excel-based configuration
- How to schedule a recurring deidentification job
