comelec.gov.ph
Mar 27, 2016
A breach of the Commission on Elections (COMELEC) of the Philippines, exposing the entire Philippine voter registration database. The archive contains voter registration records (new_id_released.txt, web_id_onhand.txt, web_id_disapproved.txt), overseas absentee voter data (overseas_absentee_all.txt, overseas_absentee_scratch.txt), geographic reference codes, embassy and country codes, web application user accounts with hashed passwords (dbadmin_usersinformation.txt), and internal system user accounts (fum_users.txt). The data includes full names, dates of birth, addresses, fingerprint data, voter identification numbers (VINs), passport numbers, and biometric information for millions of Filipino voters including overseas absentee voters.
Data found in this dataset
Source files
Expand any file to inspect its column headers and the LLM's field-mapping reasoning, recorded during ingestion.
code_tables.txt0 rows
File structure
Notes: The provided text is NOT DATA. It contains reference documentation: registration type codes, absentee disapproval codes, disapproval codes, system removal codes, change codes, and 'Pages' codes with their descriptions. This is a data dictionary/reference guide explaining what various codes mean in the COMELEC breach dataset, not a structured data file with actual voter records or PII. No columns to map. The actual data files (new_id_released.txt, web_id_onhand.txt, overseas_absentee_all.txt, dbadmin_usersinformation.txt, fum_users.txt, etc.) are not provided in this submission.
dbadmin_usersinformation.txt5 columns11 rows
File structure
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | username | high | [1] header 'username', values are login identifiers (alvin, chief.sungahid, rose.gomez, etc.) |
| 3 | lastName | high | [3] header 'lastname', values are surnames (GENOTA, SUNGAHID, GOMEZ, etc.) |
| 4 | firstName | high | [4] header 'firstname', values are given names (ALVIN, ROMEL, ROSEMARIE, etc.) |
| 5 | middleName | high | [5] header 'maternalname', values are middle/maternal names (VILLANUEVA, UMALI, FERNANDEZ, etc.) |
| 7 | password | high | [7] header 'password', values are hashed passwords (V12L6tEVPpv4JxIz40LS+/Llnic=, U2y5D2athOMZaN0ph1qp/+bAp28=, etc.) |
Notes: This is a system user account database from fum_users.txt (internal COMELEC web application user accounts). Columns 0 (Id), 2 (userrole), 6 (nickname), 8 (status), 9 (connected), and 10 (lastconnection) are skipped as non-PII (internal IDs, status flags, connection timestamps). 11 columns total, 5 contain PII.
embassy_country_codes_ref.txt4 columns0 rows
File structure
Format: CSV·Delimiter: pipe·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | country | high | [0] header 'country', values are full country names |
| 1 | country | high | [1] header 'mailcountry', values are country codes representing mailing country |
| 2 | skip | high | [2] header 'embassy', values are embassy location names - geographic reference data, not PII |
| 3 | skip | high | [3] header 'mailembassy', values are embassy codes - geographic reference data, not PII |
Notes: This file contains geographic reference data (countries and embassies) for the COMELEC overseas absentee voter system. Only country fields map to PII; embassy and location codes are non-PII reference data. The 'mailcountry' column contains country codes and maps to the country field as it represents actual country information.
fum_users.txt52 columns2,559 rows
File structure
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | sequential ID numbers (email_id) |
| 1 | high | email addresses with @comelec.gov.ph domain | |
| 2 | skip | high | department codes (ECAD, FINANCE, etc.) |
| 3 | skip | high | internal classification codes (0-9, A, B, etc.) |
| 4 | username | high | user login identifiers |
| 5 | password | high | bcrypt hashed passwords ($2a$15$ format), many empty (not set) |
| 6 | skip | high | boolean flag (is_passwdchanged) |
| 7 | skip | medium | region code |
| 8 | skip | medium | province code |
| 9 | skip | medium | municipality code |
| 10 | skip | medium | municipality value |
| 11 | skip | medium | region value |
| 12 | skip | medium | province value |
| 13 | skip | medium | embassy code |
| 14 | skip | medium | embassy value |
| 15 | country | medium | country code |
| 16 | skip | medium | country value |
| 17 | skip | medium | continent code |
| 18 | skip | medium | continent value |
| 19 | skip | medium | visit number |
| 20 | skip | high | timestamp (date_updated) |
| 21 | skip | medium | person flag |
| 22 | skip | medium | latitude coordinate |
| 23 | skip | medium | longitude coordinate |
| 24 | skip | medium | query district |
| 25 | skip | medium | total registered |
| 26 | skip | medium | barangay list |
| 27 | skip | medium | district number |
| 28 | skip | medium | per day slots |
| 29 | skip | medium | per hour slots |
| 30 | skip | medium | per schedule slots |
| 31 | skip | medium | total doc table |
| 32 | suffix | medium | title/suffix (ATTY., etc.) |
| 33 | lastName | high | surname of OIC |
| 34 | firstName | high | given name of OIC |
| 35 | middleName | high | middle name of OIC |
| 36 | skip | medium | additional name field |
| 37 | gender | high | gender (M/F) |
| 38 | phone | high | telephone number |
| 39 | phone | high | alternate telephone number |
| 40 | address1 | high | office address |
| 41 | skip | medium | position office code |
| 42 | skip | medium | designation office |
| 43 | high | alternate email address | |
| 44 | skip | medium | is_acting flag |
| 45 | skip | medium | is_dfa_emailadd flag |
| 46 | skip | medium | total_formtype |
| 47 | skip | medium | extract_formtype |
| 48 | skip | medium | toextract_formtype |
| 49 | skip | medium | blocksatsun flag |
| 50 | skip | medium | is_activepost flag |
| 51 | skip | medium | is_delete flag |
Notes: Web application user accounts export from COMELEC database. Contains COMELEC staff/administrator credentials with bcrypt-hashed passwords. Includes personal information for Office-In-Charge (OIC) staff members including names, contact info, addresses, and gender. Many password fields are empty (users without set passwords). Columns 7-31 contain geographic/administrative codes. Columns 32-42 contain personnel information for supervisory staff.
geo_codes_ref.txt0 rows
new_id_released.txt28 columns19,435,262 rows
File structure
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 6 | lastName | high | [6] header 'LASTNAME', values are common Filipino surnames (DEMABILDO, TABAMO, CASTRO) |
| 7 | firstName | high | [7] header 'FIRSTNAME', values are given names (ROSEMARIE, MERCY, ORESTES) |
| 8 | middleName | high | [8] header 'MATERNALNAME', values are maternal/middle names (PAJARILLO, LOPEZ, HERNANDEZ) |
| 9 | gender | high | [9] header 'SEX', values are M/F gender codes |
| 11 | fullName | high | [11] header 'SPOUSENAME', contains spouse's full names for married individuals |
| 12 | address1 | high | [12] header 'RESSTREET', values are street addresses (VILLA ANGELA SUBD. BRGY. PILAR, PUROK 7) |
| 16 | city | high | [16] header 'RESCITY', values are city names (HINIGARAN, INITAO) |
| 17 | state | high | [17] header 'RESPROVINCE', values are province names (NEGROS OCCIDENTAL, MISAMIS ORIENTAL) |
| 23 | lastName | high | [23] header 'FLASTNAME', father's last name—still a surname/searchable name identifier |
| 24 | firstName | high | [24] header 'FFIRSTNAME', father's first name—still a name identifier |
| 25 | middleName | high | [25] header 'FMATERNALNAME', father's maternal name |
| 26 | lastName | high | [26] header 'MLASTNAME', mother's last name—still a surname identifier |
| 27 | firstName | high | [27] header 'MFIRSTNAME', mother's first name—still a name identifier |
| 28 | middleName | high | [28] header 'MMATERNALNAME', mother's maternal name |
| 29 | lastName | high | [29] header 'REPLASTNAME', representative/contact last name |
| 30 | firstName | high | [30] header 'REPFIRSTNAME', representative/contact first name |
| 31 | middleName | high | [31] header 'REPMATERNALNAME', representative/contact maternal name |
| 32 | dob | high | [32] header 'DOBYEAR', birth year component (1959, 1986, 1983) |
| 33 | dob | high | [33] header 'DOBMONTH', birth month component (02, 10, 01) |
| 34 | dob | high | [34] header 'DOBDAY', birth day component (10, 24, 11) |
| 35 | city | high | [35] header 'BIRTHCITY', city of birth (CITY OF PARANÁ QUE, INITAO) |
| 36 | state | high | [36] header 'BIRTHPROVINCE', province of birth (NATIONAL CAPITAL REGION, MISAMIS ORIENTAL) |
| 37 | country | high | [37] header 'CITIZENSHIP', country/citizenship code (B for Filipino) |
| 40 | zip | high | [40] header 'COUNTRYRES', country of residence; also appears to encode postal/geographic codes |
| 59 | ssn | medium | [59] header 'TIN', Tax Identification Number (9-digit numeric identifier similar to SSN) |
| 62 | lastName | high | [62] header 'PASSPORTPLACE', passport place contains names; values like PASSPORTLOST, PASSPORTNB indicate passport data related to identity |
| 71 | city | high | [71] header 'REGCITY', registration city (residence registration) |
| 72 | state | high | [72] header 'REGPROVINCE', registration province |
Notes: COMELEC 2016 Philippine voter registration database. File contains full voter records with names (last, first, middle, maternal/patronymic), dates of birth (split into year/month/day), addresses, birthplace, citizenship, passport/identification numbers, family member names (parents, spouse, representative), and biometric/fingerprint data. Total columns: 95+. Columns mapped: 26 PII fields including personal identification data, family relationships, and location information. Fingerprint and biometric data columns (FINGER_INFO, FINGER_TOPO_COORD, QUALITY, MATCHING_FINGER, etc.) were excluded as they are encoded/binary. Internal administrative columns (IDs, timestamps, flags, status codes) were excluded per EXCLUSION RULES.
overseas_absentee_all.txt37 columns1,763,311 rows
File structure
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 6 | lastName | high | [6] header 'LASTNAME', values are family names (DE JESUS, CABANALAN, CHAN) |
| 7 | firstName | high | [7] header 'FIRSTNAME', values are given names (LEOPOLDO, TITO, ROSEMARIE) |
| 8 | middleName | high | [8] header 'MATERNALNAME', values are maternal/middle names (BARTOLOME, AMOYAN, CONSTANTINO) |
| 9 | gender | high | [9] header 'SEX', values are M/F gender codes |
| 12 | address1 | high | [12] header 'RESSTREET', residential street addresses (3 HORN BILL, 1351 LEGASPI ST.) |
| 15 | city | high | [15] header 'RESCITY', city names (MEYCAUAYAN CITY, ALIMODIAN) |
| 16 | state | high | [16] header 'RESPROVINCE', province/state names (BULACAN, ILOILO) |
| 21 | address1 | high | [21] header 'ABROADSTREET', overseas street addresses |
| 22 | zip | high | [22] header 'ABROADZIP', postal codes (249969, 13066) |
| 24 | city | high | [24] header 'ABROADCITY', overseas city names (MACAU, SENTRUM) |
| 25 | country | high | [25] header 'ABROADCOUNTRY', country codes (QA, NO, SG, HK, KW) |
| 39 | high | [39] header 'EMAIL', values contain @ symbol (anamarieamador@yahoo.com, pauneml@cpchem) | |
| 51 | lastName | high | [51] header 'FLASTNAME', father's last name (DE JESUS, CABANALAN) |
| 52 | firstName | high | [52] header 'FFIRSTNAME', father's first name (LIWANAG, CALIXTO) |
| 53 | middleName | high | [53] header 'FMATERNALNAME', father's maternal name |
| 54 | lastName | high | [54] header 'MLASTNAME', mother's last name (DE JESUS, AMOYAN) |
| 55 | firstName | high | [55] header 'MFIRSTNAME', mother's first name (PRISCILA, ROSARIO) |
| 56 | middleName | high | [56] header 'MMATERNALNAME', mother's maternal name (B., LACSON) |
| 57 | lastName | high | [57] header 'REPLASTNAME', representative last name |
| 58 | firstName | high | [58] header 'REPFIRSTNAME', representative first name |
| 59 | middleName | high | [59] header 'REPMATERNALNAME', representative maternal name |
| 60 | dob | high | [60] header 'DOBYEAR', birth year component (1951, 1964, 1962) |
| 61 | dob | high | [61] header 'DOBMONTH', birth month component (12, 12, 10) |
| 62 | dob | high | [62] header 'DOBDAY', birth day component (27, 14, 11) |
| 63 | city | high | [63] header 'BIRTHCITY', birth city names |
| 64 | state | high | [64] header 'BIRTHPROVINCE', birth province/state |
| 130 | address1 | high | [130] header 'MAILSTREET', mailing street address |
| 131 | zip | high | [131] header 'MAILZIP', mailing postal code |
| 132 | city | high | [132] header 'MAILCITY', mailing city |
| 133 | country | high | [133] header 'MAILCOUNTRY', mailing country |
| 135 | address1 | high | [135] header 'REPSTREET', representative street |
| 136 | city | high | [136] header 'REPBARANGAY', representative barangay (area/subdivision) |
| 137 | city | high | [137] header 'REPCITY', representative city |
| 138 | state | high | [138] header 'REPPROVINCE', representative province |
| 168 | country | high | [168] header 'CONTINENT', continent identifier |
| 169 | country | high | [169] header 'COUNTRY', country name |
| 170 | city | high | [170] header 'POST', postal/city reference |
Notes: Philippine COMELEC voter registration database. File contains overseas absentee voter records with full PII: names (voter, parents, representatives), dates of birth (year/month/day components), addresses (residential, overseas, mailing, birth), email, and geographic locations. 171 total columns; mapped 30 PII columns containing searchable personal information. Columns not listed are non-PII (application IDs, registration codes, biometric fingerprint data, processing flags, timestamps, profession codes, physical characteristics like height/weight, internal system fields).
overseas_absentee_scratch.txt69 columns138,928 rows
File structure
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | FORM_ID - internal identifier |
| 1 | skip | high | APP_TYPE - application type code |
| 2 | skip | high | REGISTRATION - registration status code |
| 3 | lastName | high | LASTNAME field |
| 4 | firstName | high | FIRSTNAME field |
| 5 | middleName | high | MATERNALNAME field (mother's maiden name, used as middle name) |
| 6 | gender | high | SEX field containing M/F values |
| 7 | skip | high | MARITALSTATUS - not PII |
| 8 | skip | high | SPOUSENAME - relationship data, not direct voter PII |
| 9 | address1 | high | RESSTREET - residential street address |
| 10 | skip | high | RESPRECINCTCODE - precinct code |
| 11 | skip | high | RESREGION - region code |
| 12 | skip | high | RESBARANGAY - barangay code |
| 13 | city | high | RESCITY - residential city |
| 14 | state | high | RESPROVINCE - residential province |
| 15 | skip | high | VILLAGE - address component |
| 16 | skip | high | CITY - duplicate city field |
| 17 | skip | high | PROVINCE - duplicate province field |
| 18 | high | EMAIL field containing email addresses | |
| 19 | skip | high | ABROADSTATUS - overseas voter status code |
| 20 | skip | high | ABROADSTATUSSPECIF - status specification |
| 21 | lastName | high | FLASTNAME - father's last name |
| 22 | firstName | high | FFIRSTNAME - father's first name |
| 23 | middleName | high | FMATERNALNAME - father's maternal name |
| 24 | lastName | high | MLASTNAME - mother's last name |
| 25 | firstName | high | MFIRSTNAME - mother's first name |
| 26 | middleName | high | MMATERNALNAME - mother's maternal name |
| 27 | skip | high | DOBYEAR - date of birth year (separate component) |
| 28 | skip | high | DOBMONTH - date of birth month |
| 29 | skip | high | DOBDAY - date of birth day |
| 30 | skip | high | BIRTHCITY - birth city |
| 31 | skip | high | BIRTHPROVINCE - birth province |
| 32 | skip | high | CITIZENSHIP - citizenship status |
| 33 | skip | high | NATURALIZATIONDATE - date field |
| 34 | skip | high | CERTIFICATENB - certificate number |
| 35 | country | high | COUNTRYRES - country of residence |
| 36 | skip | medium | CITYRESYEAR - years in city |
| 37 | skip | medium | CITYRESMONTH - months in city |
| 38 | skip | high | PROFESSION - occupation |
| 39 | skip | high | SECTOR - employment sector |
| 40 | skip | high | HEIGHT - physical characteristic |
| 41 | skip | high | WEIGHT - physical characteristic |
| 42 | skip | high | DISABLED - disability status |
| 43 | skip | high | ASSISTEDBY - assistance indicator |
| 44 | skip | high | TIN - tax ID (not voter ID) |
| 45 | skip | high | PASSPORTNB - passport number (encrypted/masked when present) |
| 46 | skip | high | PASSPORTPLACE - passport place |
| 47 | skip | high | PASSYEAR - passport year |
| 48 | skip | high | PASSMONTH - passport month |
| 49 | skip | high | PASSDAY - passport day |
| 50 | skip | high | REGBARANGAY - registration barangay |
| 51 | skip | high | REGREGION - registration region |
| 52 | skip | high | REGCITY - registration city |
| 53 | skip | high | REGPROVINCE - registration province |
| 54 | skip | high | REG_DATE - registration date |
| 55 | skip | high | STATIONID - station identifier |
| 56 | skip | high | LOCAL_ID - local identifier |
| 57 | skip | high | ANNEXTYPE - annex type code |
| 58 | skip | high | ANNEXRECORD - annex record |
| 59 | skip | high | CREATE_TIME - timestamp |
| 60 | skip | high | UPDATE_TIME - timestamp |
| 61 | phone | high | CONTACTNUMBER - phone contact number |
| 62 | skip | high | REFERENCENUMBER - voter reference number |
| 63 | skip | high | EMAIL_ID - email identifier |
| 64 | skip | high | UPDATED_DATETIME - timestamp |
| 65 | skip | high | IS_FRONTPAGE - boolean flag |
| 66 | skip | high | IS_REPRINT - boolean flag |
| 67 | skip | high | IS_OV - overseas voter flag |
| 68 | skip | high | IS_COUNTED - ballot counted flag |
Notes: Philippine COMELEC voter registration database. Contains full voter registration records with personal identification data. Fields 3-5 are voter's name info; fields 21-26 contain biometric relative information (parents). Some name fields appear encrypted/hashed (base64-encoded values) in certain records. DOB stored as separate year/month/day fields (27-29). Phone numbers sometimes contain multiple entries separated by 'or'. Email field frequently empty. This is structured voter registry data, not a combo list.
web_id_disapproved.txt11 columns13,001,017 rows
File structure
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] FORM_ID - internal form identifier, numeric/auto-generated |
| 4 | lastName | high | [4] LASTNAME header, values are surnames (DEL ROSARIO, VALEROS, BARRERAS, etc.) |
| 5 | firstName | high | [5] FIRSTNAME header, values are given names (CORAZON, FERDINAND, ANALYN, etc.) |
| 6 | middleName | high | [6] MATERNALNAME header - maternal/middle name (CAGALPIN, MARIANO, SAWADAN, etc.) |
| 7 | gender | high | [7] SEX header, values are F/M or S/M/W (single/married/widow indicators mixed with gender) |
| 9 | address1 | high | [9] RESSTREET - residential street address (BUENDIA BIGNAY I SARIAYA, etc.) |
| 13 | city | high | [13] CITY header, values are city names (SARIAYA, BANGUED, etc.) |
| 14 | state | high | [14] PROVINCE header, values are province names (QUEZON, ABRA, etc.) |
| 19 | dob | high | [19] DOBYEAR - birth year component (1971, 1969, 1993, etc.) |
| 20 | dob | high | [20] DOBMONTH - birth month component (09, 04, 01, etc.) |
| 21 | dob | high | [21] DOBDAY - birth day component (19, 10, 23, etc.) |
Notes: Philippine voter registration database (COMELEC 2016 breach). File contains delimited voter records with 40 total columns. Excluded: APP_TYPE, REGISTRATION, ABSENTIA, MARITALSTATUS, RESPRECINCT, RESPRECINCTCODE, VILLAGE, RESBARANGAY, RESCITY, RESPROVINCE, BIRTHCITY, BIRTHPROVINCE, DISABLED (status flag), VINP1/VINP2/VINP3/VINCONTROLCODE (voter ID numbers), REG_DATE, UPDATE_TIME (timestamps), DISAPPROVED (status flag), LOCAL_ID, GOV_ID, APPLICATION_ID, PAGES_DESCR, ID, N_ID (all internal identifiers). Date of birth mapped across three separate columns (DOBYEAR, DOBMONTH, DOBDAY) all coded as dob field since they collectively represent DOB.
web_id_onhand.txt121 columns5,752,070 rows
File structure
Format: CSV·Delimiter: pipe·Has header: yes·Quote: none
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | skip | high | [1] APPLICATION_ID - internal application identifier, auto-generated numeric |
| 2 | skip | high | [2] FORM_ID - internal form identifier, system reference |
| 3 | skip | high | [3] APP_TYPE - application type code (L, H) |
| 4 | skip | high | [4] ABSENTEE - status flag (L, O, V, R, N) |
| 5 | skip | high | [5] REGISTRATION - registration status code |
| 6 | lastName | high | [6] LASTNAME - surname values (TABAMO, CASTRO, SADE, etc.) |
| 7 | firstName | high | [7] FIRSTNAME - given name values (MERCY, ORESTES, WALING WALING, etc.) |
| 8 | middleName | high | [8] MATERNALNAME - maternal/middle name values (LOPEZ, HERNANDEZ, SACAY, etc.) |
| 9 | gender | high | [9] SEX - values are F/M (Female/Male) |
| 10 | skip | high | [10] MARITALSTATUS - marital status code (S, M, W, etc.) |
| 11 | skip | high | [11] SPOUSENAME - spouse full name (non-voter PII, mixed with empty) |
| 12 | skip | high | [12] SPOUSEFIRSTNAME - spouse first name (non-voter PII) |
| 13 | skip | high | [13] SPOUSELONGNAME - spouse long name (non-voter PII) |
| 14 | address1 | high | [14] RESSTREET - residential street address |
| 15 | skip | high | [15] VILLAGE - village/barangay subdivision (geographic, not direct PII) |
| 16 | city | high | [16] CITY - city name (INITAO, BALAGTAS, etc.) |
| 17 | state | high | [17] PROVINCE - province name (MISAMIS ORIENTAL, BULACAN, etc.) |
| 18 | skip | high | [18] RESPRECINCT - precinct code, numeric identifier |
| 19 | skip | high | [19] RESPRECINCTCODE - precinct code suffix |
| 20 | skip | high | [20] RESBARANGAY - barangay code numeric identifier |
| 21 | skip | high | [21] RESCITY - city code numeric identifier |
| 22 | skip | high | [22] RESPROVINCE - province code numeric identifier |
| 23 | address1 | medium | [23] ABROADSTREET - overseas/abroad street address |
| 24 | zip | medium | [24] ABROADZIP - overseas postal/zip code |
| 25 | skip | high | [25] ABSENTIA - absentia flag status |
| 26 | city | medium | [26] ABROADCITY - overseas city of residence |
| 27 | country | high | [27] ABROADCOUNTRY - overseas country of residence |
| 28 | skip | high | [28] ABROADPERIOD - period of overseas residence (duration) |
| 29 | skip | high | [29] ABROADRESCONT - residential contact status overseas |
| 30 | country | high | [30] REGCOUNTRY - registration country |
| 31 | skip | high | [31] REGEMBASSY - embassy code identifier |
| 32 | address1 | medium | [32] MAILSTREET - mailing address street |
| 33 | zip | medium | [33] MAILZIP - mailing postal code |
| 34 | city | medium | [34] MAILCITY - mailing city |
| 35 | country | medium | [35] MAILCOUNTRY - mailing country |
| 36 | skip | high | [36] MAILEMBASSY - mailing embassy code |
| 37 | address1 | medium | [37] REPSTREET - representative/authorized agent street address |
| 38 | skip | high | [38] REPBARANGAY - representative barangay code |
| 39 | city | medium | [39] REPCITY - representative city |
| 40 | state | medium | [40] REPPROVINCE - representative province |
| 41 | high | [41] EMAIL - email addresses with @ symbol | |
| 42 | skip | high | [42] ABROADSTATUS - status flag for overseas voters |
| 43 | skip | high | [43] ABROADSTATUSSPECIF - specific status details (non-PII descriptor) |
| 44 | skip | high | [44] LASTENTRYDATE - timestamp of last entry, not DOB |
| 45 | skip | high | [45] ABSREGISTERED - registration status flag |
| 46 | skip | high | [46] OLDPRECINCT - old precinct code identifier |
| 47 | skip | high | [47] OLDREGBARANGAY - old barangay code |
| 48 | skip | high | [48] OLDREGCITY - old city code |
| 49 | skip | high | [49] OLDREGPROVINCE - old province code |
| 50 | skip | high | [50] OLDREGDATE - old registration date, not DOB |
| 51 | lastName | high | [51] FLASTNAME - father's last name (family lineage, still personal) |
| 52 | firstName | high | [52] FFIRSTNAME - father's first name |
| 53 | middleName | high | [53] FMATERNALNAME - father's maternal name |
| 54 | lastName | high | [54] MLASTNAME - mother's last name |
| 55 | firstName | high | [55] MFIRSTNAME - mother's first name |
| 56 | middleName | high | [56] MMATERNALNAME - mother's maternal name |
| 57 | lastName | high | [57] REPLASTNAME - representative last name |
| 58 | firstName | high | [58] REPFIRSTNAME - representative first name |
| 59 | middleName | high | [59] REPMATERNALNAME - representative maternal name |
| 60 | dob | high | [60] DOBYEAR - birth year (1949-1987 values) |
| 61 | dob | high | [61] DOBMONTH - birth month component (01-12) |
| 62 | dob | high | [62] DOBDAY - birth day component (01-31) |
| 63 | city | medium | [63] BIRTHCITY - city of birth |
| 64 | state | medium | [64] BIRTHPROVINCE - province of birth |
| 65 | skip | high | [65] CITIZENSHIP - citizenship status (B=Filipino, etc.) |
| 66 | skip | high | [66] NATURALIZATIONDATE - naturalization date (non-DOB timestamp) |
| 67 | skip | high | [67] CERTIFICATENB - certificate number, internal ID |
| 68 | skip | high | [68] COUNTRYRES - country of residence code |
| 69 | skip | high | [69] CITYRESYEAR - year moved to city (duration, not DOB) |
| 70 | skip | high | [70] CITYRESMONTH - month moved to city |
| 71 | skip | high | [71] PROFESSION - occupation descriptor |
| 72 | skip | high | [72] SECTOR - employment sector code |
| 73 | skip | high | [73] HEIGHT - physical height (biometric, not PII identifier) |
| 74 | skip | high | [74] WEIGHT - physical weight (biometric, not PII identifier) |
| 75 | skip | high | [75] MARKS - physical marks/distinguishing features (biometric) |
| 76 | skip | high | [76] DISABLED - disability status flag |
| 77 | skip | high | [77] ASSISTEDBY - assisted by designation (operational flag) |
| 78 | skip | high | [78] OLD_VIN - old voter ID number, internal identifier |
| 79 | skip | high | [79] VINP1 - voter ID part 1 (internal reference) |
| 80 | skip | high | [80] VINP2 - voter ID part 2 |
| 81 | skip | high | [81] VINP3 - voter ID part 3 |
| 82 | skip | high | [82] VINCONTROLCODE - voter ID control code |
| 83 | skip | high | [83] TIN - tax identification number (financial, not standard PII) |
| 84 | skip | high | [84] PASSPORTLOST - passport lost status flag |
| 85 | skip | high | [85] PASSPORTNB - passport number (travel document, not voter PII for mapping) |
| 86 | skip | high | [86] PASSPORTPLACE - passport issuance place (descriptor) |
| 87 | skip | high | [87] PASSYEAR - passport year (non-DOB timestamp) |
| 88 | skip | high | [88] PASSMONTH - passport month |
| 89 | skip | high | [89] PASSDAY - passport day |
| 90 | skip | high | [90] REGBARANGAY - registration barangay code |
| 91 | skip | high | [91] REGCITY - registration city code |
| 92 | skip | high | [92] REGPROVINCE - registration province code |
| 93 | skip | high | [93] REG_DATE - registration date (non-DOB timestamp) |
| 94 | skip | high | [94] INTERNAME - internal operator name (system staff, not voter) |
| 95 | skip | high | [95] OFFICERNAME - officer name (system staff, not voter) |
| 96 | skip | high | [96] OPERNAME - operator name (system staff, not voter) |
| 97 | skip | high | [97] STATIONID - station identifier code |
| 98 | skip | high | [98] CDID - CD identifier (internal) |
| 99 | skip | high | [99] SETID - set identifier (internal) |
| 100 | skip | high | [100] PRINT_FLAG - printing flag status |
| 101 | skip | high | [101] FINGER_INFO - fingerprint data (biometric, not searchable PII) |
| 102 | skip | high | [102] FINGER_TOPO_COORD - fingerprint topographical coordinates (biometric) |
| 103 | skip | high | [103] QUALITY - fingerprint quality metric |
| 104 | skip | high | [104] MATCHING_FINGER - matching finger code (biometric) |
| 105 | skip | high | [105] TRANSFER_STATUS - transfer status flag |
| 106 | skip | high | [106] TRANSFER_UPDATE_TIME - transfer update timestamp |
| 107 | skip | high | [107] PAGES_DESCR - pages description (document metadata) |
| 108 | skip | high | [108] LOCAL_ID - local identifier code |
| 109 | skip | high | [109] CREATE_TIME - record creation timestamp |
| 110 | skip | high | [110] UPDATE_TIME - record update timestamp |
| 111 | skip | high | [111] LOCK_USER - lock user identifier |
| 112 | skip | high | [112] LOCK_TIME - lock timestamp |
| 113 | skip | high | [113] PROCESSING - processing status flag |
| 114 | skip | high | [114] IS_CURRENT - current status flag |
| 115 | skip | high | [115] DOC_VERSION - document version number |
| 116 | skip | high | [116] CD_STAT_ENTY - CD status entry code |
| 117 | skip | high | [117] DISAPPROVED - disapproval status flag |
| 118 | skip | high | [118] VOTING_HIST1 - voting history flag 1 |
| 119 | skip | high | [119] VOTING_HIST2 - voting history flag 2 |
| 120 | skip | high | [120] OP_CODE - operation code |
| 121 | skip | high | [121] OP_DATE - operation date (non-DOB timestamp) |
Notes: Philippine Commission on Elections (COMELEC) 2016 voter registration database. This is a 122-column voter registration export containing full voter records with personal identifiers (names, DOB, addresses, email), parental information (father/mother names), and overseas voter data. Columns 0, 1-5, 18-22, 25, 28-31, 36, 38, 42-50, 65-121 are non-PII or system metadata. Multiple address fields (residential, mailing, representative) are included and all mapped. Biometric fields (fingerprints, height, weight, physical marks) excluded. Family relation names (father, mother, spouse, representative) included as they constitute searchable personal identifiers in the context of voter records.
webvs_primary_7therb.txt13 columns75,302,279 rows
File structure
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 4 | lastName | high | [4] header 'LASTNAME', values are surnames (DEL ROSARIO, PERLAS, VALEROS, etc.) |
| 5 | firstName | high | [5] header 'FIRSTNAME', values are given names (CORAZON, JOANNA ROSE, JOHN LENON, etc.) |
| 6 | middleName | high | [6] header 'MATERNALNAME', values are middle names (CAGALPIN, PRINCENA, BRINGAS, etc.) |
| 7 | gender | high | [7] header 'SEX', values are M/F (F, F, M, F, etc.) |
| 9 | address1 | high | [9] header 'RESSTREET', values are street addresses (BUENDIA BIGNAY I SARIAYA, _, DALNETAN, etc.) |
| 10 | address2 | high | [10] header 'VILLAGE', values are village/barangay names (BIGNAY 1, AGTANGAO, etc.) |
| 11 | city | high | [11] header 'CITY', values are city names (SARIAYA, BANGUED, etc.) |
| 12 | state | high | [12] header 'PROVINCE', values are province names (QUEZON, ABRA, etc.) |
| 18 | dob | high | [18] header 'DOBYEAR', year component of DOB (1971, 1992, 1993, etc.) |
| 19 | dob | high | [19] header 'DOBMONTH', month component of DOB (09, 12, 03, etc.) |
| 20 | dob | high | [20] header 'DOBDAY', day component of DOB (19, 14, 10, etc.) |
| 21 | city | medium | [21] header 'BIRTHCITY', values are city names (SARIAYA, BANGUED, etc.) — birth location city |
| 22 | state | medium | [22] header 'BIRTHPROVINCE', values are province names (QUEZON, ABRA, etc.) — birth location province |
Notes: This is a voter registration database from the Philippine COMELEC 2016 breach. The file contains 39 columns total; 13 map to PII fields (names, DOB components, address components, city, state, gender). Columns not listed (FORM_ID, APP_TYPE, REGISTRATION, ABSENTIA, MARITALSTATUS, RESPRECINCT, RESPRECINCTCODE, RESBARANGAY, RESCITY, RESPROVINCE, DISABLED, VINP1, VINP2, VINP3, VINCONTROLCODE, REG_DATE, UPDATE_TIME, DISAPPROVED, LOCAL_ID, APPLICATION_ID, PAGES_DESCR, ID, N_ID) are skipped as they are internal IDs, registration metadata, VIN data, timestamps, or system identifiers.
webvs_primary_data_errors.txt0 rows
File structure
Notes: This file contains only internal administrative data with no PII fields. All 8 columns are non-PII: [0] ID is an internal voter ID number (skip), [1] REGISTRATION is a registration status code (skip), [2] REG_DATE is a timestamp (skip), [3] CREATE_TIME is a timestamp (skip), [4] RESPROVINCE is a geographic code (skip), [5] RESCITY is a geographic code (skip), [6] REMARKS contains only data quality error descriptions and flags (skip), [7] ID_NEW is a derived internal reference ID (skip). No personal identifiable information is present in this extract.