Att 2021
Aug 20, 2021
In March 2024, approximately 70 million records allegedly breached from AT&T were posted to BreachForums by ShinyHunters. The data originally dates to August 2021 and was previously offered for sale before being freely released. AT&T initially denied a breach before later acknowledging data fields specific to their systems were present. The dataset contains AT&T customer records including full names, physical addresses, email addresses, phone numbers, dates of birth, US Social Security Numbers (encrypted), government-issued IDs, and account passcodes. The data is pipe-delimited and includes both current and billing address information for US consumers.
Data found in this dataset
Source files
Expand any file to inspect its column headers and the LLM's field-mapping reasoning, recorded during ingestion.
Breached_Info.txt0 rows
File structure
Notes: No actual data rows are present in the provided 50 lines — the entire sample is the BreachForums distribution preamble/advertisement block. No PII columns can be mapped from this sample alone. Based on the breach context (AT&T 2021, pipe-delimited), the known schema reportedly includes: firstName, lastName, address1, city, state, zip, email, phone, dob, ssn (encrypted), government ID, and account passcode fields — but these cannot be responsibly assigned column indices without seeing actual data rows. Provide data rows for accurate column mapping.
DOB1.csv2 rows
File structure
Notes: File contains only encrypted/decrypted value pairs with no delimited column structure. This is NOT a structured data export. The data shown is a reference table or lookup listing encryption mappings for dates (1900-01-01 through 1900-07-14), not customer PII records. No customer names, emails, phones, addresses, SSNs, or other PII fields are present in this sample. This does not match the AT&T breach context description (which mentions full names, addresses, emails, phones, DOB, SSNs as structured columns). This appears to be an encryption key reference document rather than the actual customer data file. No importable PII columns can be identified.
Info.txt28 columns12 rows
File structure
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | Numeric account/customer ID, auto-increment style AT&T internal identifier |
| 1 | fullName | high | Values are uppercase full names (e.g. 'JESSICA KEUREN', 'MARY KEYS') — current/service name |
| 2 | phone | high | 10-digit numeric values matching US phone number format (e.g. '3174428521') |
| 3 | phone | high | Second 10-digit phone number, likely alternate/secondary phone (e.g. '3173531767'); empty in some rows |
| 4 | skip | medium | Small integer values (1, 2, 3) — likely account type or service tier code, non-PII |
| 5 | skip | high | Constant value 'CONSUMER' across all rows — account segment/category label, non-PII |
| 6 | skip | high | Constant value 'CONSUMER' across all rows — duplicate segment label, non-PII |
| 7 | password | high | Encrypted/encoded strings beginning with '*1' pattern (e.g. '*1UexozmUvT7E=') — AT&T account passcodes, base64-encoded encrypted values; AT&T confirmed passcode reset after breach |
| 8 | ssn | high | Longer encrypted/encoded strings beginning with '*0' pattern (e.g. '*0Um4ZYEfz7NgDgk8rwbd7fQ==') — consistent with encrypted SSN per breach disclosure; AT&T confirmed SSNs were present in encrypted form |
| 9 | address1 | high | Street address values (e.g. '30 MARIN AV', '5305 E 10TH ST') — current/service address line 1 |
| 10 | city | high | City name values, sometimes abbreviated (e.g. 'NATCHEZ', 'FMT' for Fremont, 'ANH' for Anaheim) — current address city |
| 11 | state | high | Two-letter US state abbreviations (e.g. 'MS', 'IN', 'GA') — current address state |
| 12 | zip | high | 5-digit US ZIP codes (e.g. '39120', '46219') — current address ZIP |
| 13 | dob | high | Encrypted/encoded strings beginning with '*0' pattern, shorter than SSN column — consistent with encrypted date of birth per breach disclosure |
| 14 | skip | medium | Binary flag values (0 or 1) — likely account status, opt-in flag, or paperless billing indicator, non-PII |
| 15 | skip | medium | Single character values ('T' or 'C') — likely account type code (e.g. T=tablet, C=consumer), non-PII |
| 16 | high | Email address values where present (e.g. 'RONDYM@PBP1.COM', 'JERILEE2010@GMAIL.COM'); empty in some rows | |
| 17 | fullName | high | Uppercase full names (e.g. 'JESSICA KEUREN', 'RONALD DYMOND') — billing/account name, mirrors column 1 in most rows |
| 18 | address1 | high | Billing address line 1 or address line 2 where unit/apt present (e.g. '30 MARTIN RD', 'APT C', 'APT 212') — billing address first line |
| 19 | address2 | medium | Some rows contain a second address line (e.g. '1148 N CITRON ST', '39800 FREMONT BLVD'); other rows contain city+state+ZIP formatted string (e.g. 'NATCHEZ MS 39120-9199') — billing address line 2 or formatted city-state-zip continuation |
| 20 | address2 | medium | Values appear to be city+state+ZIP formatted billing address strings (e.g. 'INDIANAPOLIS IN 46219-4311') or additional address line — billing address overflow field |
| 21 | skip | medium | Mostly empty in sample rows — possible additional address or notes field, insufficient data to classify confidently |
| 22 | city | high | City name values for billing address (e.g. 'INDIANAPOLIS', 'MARIETTA', 'FREMONT') — billing address city, full form vs abbreviated column 10 |
| 23 | state | high | Two-letter US state abbreviations — billing address state |
| 24 | zip | high | 5-digit US ZIP codes — billing address ZIP |
| 25 | skip | medium | Mostly empty in sample rows — possible secondary ZIP+4 or extension field |
| 26 | skip | medium | Single character value 'W' consistent across all rows — likely rate code, market segment, or internal flag, non-PII |
| 27 | skip | medium | Empty in all sample rows — unknown trailing field, insufficient data; likely padding or reserved column |
Notes: File has a 6-line metadata/header block before data rows begin (date, description, compromised data summary, record count, HIBP link, forum thread link). Actual data starts at the 'SAMPLES:' marker, with the 'SAMPLES:' prefix on the first data row needing to be stripped. Delimiter is pipe ('|'), no column headers in data rows. Columns 7 and 8 contain AT&T-specific encrypted values: column 7 ('*1...' prefix) maps to encrypted account passcodes (AT&T confirmed resets post-breach), column 8 ('*0...' longer strings) maps to encrypted SSNs. Column 13 ('*0...' shorter strings) maps to encrypted dates of birth. Billing address spans columns 17-24 with some variation in field population depending on whether apt/unit info is present, causing a one-column shift in some rows. Column 10 contains abbreviated city names in current address (e.g. 'FMT'='Fremont', 'ANH'='Anaheim', 'ORM BCH'='Ormond Beach', 'TWN HRT'='Twain Harte', 'MRETA'='Marietta') while column 22 contains full city names for billing address. This dataset matches the AT&T August 2021 breach profile disclosed in March 2024 by ShinyHunters on BreachForums, confirmed to contain ~70M US consumer records.
MASTER.csv26 columns73,481,539 rows
File structure
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | numeric identifiers, appears to be internal customer/account IDs |
| 1 | fullName | high | contains full names in uppercase format (e.g., JESSICA KEUREN, MARY KEYS) |
| 2 | phone | high | 10-digit phone numbers in standard format |
| 3 | phone | high | secondary phone number field, also 10-digit format |
| 4 | skip | high | numeric codes, appears to be account type or status flags |
| 5 | skip | high | text codes (CONSUMER, SMALL OFFICE), account classification |
| 6 | skip | high | text codes (CONSUMER, SMALL OFFICE), account classification duplicate |
| 7 | ssn | high | encrypted values prefixed with *, AT&T context indicates encrypted SSN per breach documentation |
| 8 | password | high | encrypted values prefixed with *, AT&T context indicates encrypted account passcode |
| 9 | address1 | high | street addresses (e.g., 30 MARIN AV, 5305 E 10TH ST) |
| 10 | city | high | city names (abbreviated and full format) |
| 11 | state | high | US state abbreviations (MS, IN, GA, TX, CA, IL, etc.) |
| 12 | zip | high | 5-digit ZIP codes |
| 13 | skip | high | encrypted values prefixed with *, unknown AT&T internal data field |
| 14 | skip | high | binary flags (0 or 1), internal status indicator |
| 15 | skip | high | single character flags (C, T, O), account status or type codes |
| 16 | high | email addresses with @ symbol or empty values | |
| 17 | fullName | high | full names, duplicate/canonical entry of field 1 |
| 18 | address2 | medium | secondary address lines (APT designations, PO BOX, UNIT numbers) or full street name |
| 19 | address1 | high | expanded street addresses with full street names |
| 20 | skip | high | combined city-state-zip or partial address components |
| 21 | city | high | full city names, canonical entry |
| 22 | state | high | US state abbreviations, canonical entry |
| 23 | zip | high | 5-digit ZIP codes, canonical entry |
| 24 | skip | high | appears to be duplicate/alternate ID field or timestamp |
| 25 | gender | high | single character values (W, U, M, etc.) representing gender indicators |
Notes: AT&T 2021 data breach from August 2021. Pipe-delimited format with 26 fields. Contains current and billing address information. Fields 7 and 8 are encrypted using AT&T's encryption (prefix *). Multiple fields appear duplicated for data redundancy/validation. Field 25 gender values: W=likely female, U=unknown, M=likely male. Records include both individual consumers and small office/business accounts.
PRODUCTION.csv43,998,287 rows
File structure
Notes: Pre-LLM auto-detection: free-form text with visible emails / phones