← All datasets

USDoD

Apr 1, 2024

1,238,395,718
Records
3
Files
Jun 3, 2026
Added

In April 2024, a large trove of data made headlines as having exposed "3 billion people" due to a breach of the National Public Data background check service. The initial corpus of data released in the breach contained billions of rows of personal information, including US social security numbers. Further partial data sets were later released including extensive personal information and 134M unique email addresses, although the origin and accuracy of the data remains in question.

Data found in this dataset

First nameLast nameMiddle nameAddressCityStateSuffixdobzipphonefullNamessnaddress2

Search this dataset

Scoped to this dataset. Fill any combination — results match if any field hits.

Source files

Expand any file to inspect its column headers and the LLM's field-mapping reasoning, recorded during ingestion.

ssn2_ab
14 columns370,097,699 rows

File structure

Format: CSV·Delimiter: comma·Has header: no·Quote: "

Source columnMapped fieldConfidenceLLM assessment
1firstNamehigh[1] values are common given names (SABRINA, CARRIE)
2lastNamehigh[2] values are surnames (BIANCHI, MIDDLETON)
3middleNamehigh[3] values are middle names (LYNN, LEE), position between first and last name columns
5dobhigh[5] values match YYYYMMDD date pattern (19710113, 19780309)
6address1high[6] values are street addresses with house numbers and street names
7cityhigh[7] values are city names (OCEAN SPRINGS, JACKSONVILLE, CHICAGO)
9statehigh[9] values are 2-letter US state abbreviations (MS, AR, IL, AE)
10ziphigh[10] values are 5-digit ZIP codes (39564, 72076, 60645)
11phonehigh[11] values are 10-digit phone numbers (4029325087, 7732622608)
12fullNamehigh[12] values are full names with middle initial (SABRINA N BIANCHI, CARRIE LEE BRUNK)
13fullNamehigh[13] alternate full name variant (SABRINA LYNN DEMEMBER, CARRIE LEE) — alias/maiden name common in background check data
14fullNamemedium[14] third full name variant (CARRIE M BRUNK) — sparse but contains real full name PII
15dobmedium[15] values are YYYYMM format (199505, 201807) — partial date, likely year+month of a significant date in background check context
19ssnhigh[19] 9-digit numbers (594481480, 320788124) consistent with SSN format; NPD breach is known to contain SSNs

Notes: 20 columns total, no header row detected — data appears headerless. 14 contain PII. Column 0 contains sequential numeric IDs (skip). Columns 4, 8, 16, 17, 18 are empty. Multiple fullName columns (12–14) represent name aliases/variants typical of background check aggregator data. Column 15 contains YYYYMM partial dates of uncertain purpose but mapped as dob. Column 19 contains 9-digit SSNs consistent with the National Public Data breach profile.

ssn_aa
17 columns65,100,000 rows

File structure

Format: CSV·Delimiter: comma·Has header: yes·Quote: "

Source columnMapped fieldConfidenceLLM assessment
1firstNamehigh[1] header 'firstname', values are uppercase given names like 'AURETTA', 'JUNE'
2lastNamehigh[2] header 'lastname', values are uppercase surnames like 'TERRY'
3middleNamehigh[3] header 'middlename', values are middle names/initials like 'JUNE', 'A'
4suffixhigh[4] header 'name_suff', name suffix field
5dobhigh[5] header 'dob', value '19461201' matches YYYYMMDD date of birth pattern
6address1high[6] header 'address', values are street addresses like '6530 DONNA DR'
7cityhigh[7] header 'city', values are city names like 'ANCHORAGE'
9statehigh[9] header 'st', values are 2-letter US state codes like 'AK'
10ziphigh[10] header 'zip', values are 5-digit US postal codes like '99504'
11phonehigh[11] header 'phone1', phone number field
12fullNamehigh[12] header 'aka1fullname', full name alias/AKA field
13fullNamehigh[13] header 'aka2fullname', second full name alias/AKA field
14fullNamehigh[14] header 'aka3fullname', third full name alias/AKA field
16dobhigh[16] header 'alt1DOB', alternate date of birth field
17dobhigh[17] header 'alt2DOB', second alternate date of birth field
18dobhigh[18] header 'alt3DOB', third alternate date of birth field
19ssnhigh[19] header 'ssn', values are 9-digit numbers like '574182899' consistent with US Social Security Numbers

Notes: 20 columns total; 16 contain PII. Column 0 (ID) is an internal record identifier — skipped. Column 8 (county_name) is a geographic/administrative subdivision, not a standard PII field — skipped. Column 15 (StartDat) appears to be a timestamp/date flag — skipped. Columns 12–14 (aka1–3fullname) are AKA/alias full names and mapped as fullName as they contain searchable personal identity data. Columns 16–18 (alt1–3DOB) are alternate DOBs mapped as dob. SSNs appear without hyphens (9 raw digits).

ssn_ab
15 columns373,698,020 rows

File structure

Format: CSV·Delimiter: comma·Has header: no·Quote: "

Source columnMapped fieldConfidenceLLM assessment
1firstNamehigh[1] no header (headerless file), values are all-caps given names: JOHN, KAREN
2lastNamehigh[2] no header, values are all-caps surnames: TRACEY, TREACY
3middleNamehigh[3] no header, single-letter values (S, A) consistent with middle initials
5dobhigh[5] no header, 8-digit values in YYYYMMDD format: 19210410, 19680731
6address1high[6] no header, values are street addresses: 157 SERGEANTSVILLE RD, 13 ROBIN RD
7cityhigh[7] no header, values are city names: DEMAREST, WEST CALDWELL, DOVER
8address2medium[8] no header, values are county names (BERGEN, ESSEX, MORRIS); no county field available, mapped to address2 as geographic subdivision
9statehigh[9] no header, 2-letter US state abbreviations: NJ
10ziphigh[10] no header, 5-digit US ZIP codes: 08822, 07627, 07006
12firstNamemedium[12] no header, sparse values appear to be given names or surnames (Tracy, Caggiano, Treece); inconsistent mix, likely alternate name field
13fullNamehigh[13] no header, values are full names with spaces: ' John', ' Karen A', ' Dan W'
14fullNamehigh[14] no header, values are reversed full names: 'Caggiano Karen'
16dobhigh[16] no header, 8-digit YYYYMMDD dates: 19680731, 19600119 — alternate/duplicate DOB field
17dobhigh[17] no header, 8-digit YYYYMMDD dates: 19680731 — third DOB variant field
19ssnhigh[19] no header, 9-digit numeric values (131054158, 154644580) consistent with US Social Security Numbers; matches NPD breach context

Notes: File appears headerless (row 0 contains data values, not column labels). 20 columns total; column [0] contains large sequential numeric IDs (500000000+) treated as internal record IDs. Columns [4], [11], [15], [18] are empty/null. Three separate DOB columns ([5], [16], [17]) suggest denormalized or deduplicated source records. Column [19] 9-digit numbers strongly indicate SSNs given National Public Data breach context. County data in [8] mapped to address2 as no county field is available.