yatra.com
Jan 1, 2019
A breach of Yatra (yatra.com), a major Indian online travel booking platform. The archive contains 109 numbered CSV files with user account records including numeric user IDs, email addresses, salutation/title, first and last names, physical addresses (street, city, state, country, PIN code), mobile and alternate phone numbers. The data is entirely Indian in nature, evidenced by Indian addresses, Indian phone numbers, Indian email providers (rediffmail.com, yahoo.co.in), and references to Indian cities and states. One record explicitly references 'Yatra Office' in Gurgaon as an address, confirming the source. The archive was distributed via BreachForums.
Data found in this dataset
Source files
Expand any file to inspect its column headers and the LLM's field-mapping reasoning, recorded during ingestion.
1.csv12 columns98,991 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is an email; all values contain @ and are valid email addresses | |
| 2 | suffix | high | [2] header 'Mr.' and values are salutations/titles (Mr, Mr) |
| 3 | firstName | high | [3] header 'MOHAMED'; values are common given names (Shubham, Wilbert, NABI, Mithilesh, Sajmon) |
| 4 | lastName | high | [4] header 'RAFIQ'; values are surnames (Chadravanshi, Vaz, ANSARI, kumar, Rajamani) |
| 5 | address1 | high | [5] header 'METTUPALAYAM'; values are street addresses (81/G Risali Sector, Swan, 11 A Ashoka road new delhi, mariapuram, dispur) |
| 6 | address2 | medium | [6] header empty; values mostly empty or duplicate address info; secondary address field |
| 7 | city | high | [7] header 'COIMBATORE'; values are Indian city names (Bhilai, Siwan, New delhi, coimbatore, guwahati) |
| 8 | state | high | [8] header 'Tamilnadu'; values are Indian states (Chhattisgarh, Bihar, Delhi, Tamil Nadu, Assam) |
| 9 | country | high | [9] header 'India'; all values are 'India' or 'IND' |
| 10 | zip | high | [10] header '641301'; values are 6-digit Indian postal codes (490006, 841436, 110001, 641104, 781005) |
| 11 | phone | high | [11] header '9943834335'; values are 10-digit Indian mobile numbers (9885332000, 8602184719, 9821238286, 9971268719, 9868301865) |
| 12 | phone | high | [12] header empty; values are 10-digit numbers (23782736, 9500974345, 9864970759); secondary/alternate phone field |
Notes: Yatra 2019 Indian travel booking breach. Column [0] is numeric user ID (skip). All PII identified: email, name components (firstName, lastName, suffix), full address (address1, address2, city, state, zip), country, and dual phone fields. Indian phone numbers (10 digits starting with 7-9) and Indian postal codes (6 digits) confirmed.
10.csv12 columns98,981 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] All values contain @ signs; headers and sample values are clearly email addresses (yahoo.com, gmail.com, yahoo.co.in) | |
| 2 | suffix | high | [2] Header 'Mr'; all sample values are salutations/titles (Mr, consistent with Indian naming conventions) |
| 3 | firstName | high | [3] Values are common given names (Shashidhar, Amit, Dinesh, prabhakarkumar, aditya, syedfaiz) |
| 4 | lastName | high | [4] Values are surnames (mahavadi, Rishi, Namjoshi, singh, garg, hasan) |
| 5 | address1 | high | [5] Street-level address components (vishal nagar, B1/402 Supernal Garden, 7 d b gupta road, kharghar, raj nagar) |
| 6 | address2 | high | [6] Secondary address details (Kolshet RD., paharganj, sector23raj nagar gzb.) |
| 7 | city | high | [7] Indian city names (pune, THANE, new delhi, mumbai, ghaziabad, CHENNAI) |
| 8 | state | high | [8] Indian state abbreviations and names (maharashtra, delhi, Maharashtra, Uttar Pradesh, TN) |
| 9 | country | high | [9] Country codes (IN, IND) — India |
| 10 | zip | high | [10] Indian postal codes/PIN codes (411027, 110055, 410210, 201002, 600032) |
| 11 | phone | high | [11] Indian mobile numbers, 10 digits (9850832696, 9810678244, 8128992377, 7498081429) |
| 12 | phone | high | [12] Alternate/secondary phone numbers, 10 digits or variable length (9869266264, 9868208976, 666666 appears to be invalid/test data, 9958691421) |
Notes: 13 columns total. Column [0] is a numeric user ID (skip). All other columns map to PII. Indian travel booking platform context confirms address/phone/email structure. Columns [11] and [12] are both phone fields (primary and alternate mobile/contact numbers, common in Indian databases).
11.csv13 columns98,964 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] numeric user IDs (3623523, 3623524, etc.), auto-generated internal identifiers |
| 1 | high | [1] all values contain @ symbol, valid email addresses (yahoo.com, rediffmail.com, gmail.com, hotmail.com) | |
| 2 | suffix | high | [2] header 'Mr', 'Mrs' — salutation/title values, matches suffix field type |
| 3 | firstName | high | [3] single given names (Nitesh, umesh, SEEMA, Sharmila, Nikhil, SAIF), typical first names |
| 4 | lastName | high | [4] single family names (Sahni, patil, PANDEY, Jain, Agarwal, ALI), typical last names |
| 5 | address1 | high | [5] street addresses (Camp road, 12-1-334/17/18 katariya niwas, a-62 new agra, 29 BAJRANG VIHAR COLONY,JAITPURA) |
| 6 | address2 | medium | [6] secondary address component (Lalapet), appears to be locality/area designation |
| 7 | city | high | [7] Indian city names (Amravati, secunderabad, agra, JAIPUR) |
| 8 | state | high | [8] Indian states (Maharashtra, Andhra Pradesh, Uttar Pradesh, Rajasthan) |
| 9 | country | high | [9] country codes/names (IND, India) |
| 10 | zip | high | [10] Indian postal codes/PIN codes (444605, 500017, 282005, 303704), 6-digit format |
| 11 | phone | high | [11] Indian mobile phone numbers (9675850123, 7798055456, 9930015853, 9553906777, 9869033220, 9923453994), 10-digit format |
| 12 | phone | high | [12] alternate phone numbers (9052989121, 9458815226), 10-digit Indian format |
Notes: Yatra.com 2019 breach — 13 columns total, 12 contain PII (names, email, addresses, phone). Column [0] is auto-generated user_id (skip). Breach context confirms Indian travel booking platform with Indian address/phone data.
12.csv12 columns98,931 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and match email format (Gmail, Yahoo.co.in); clearly email addresses | |
| 2 | suffix | high | [2] values are 'Mr', 'Mr.', 'Ms' — salutation/title prefixes |
| 3 | firstName | high | [3] values are common given names (Subbiah, rajit, Rohit, saloni) |
| 4 | lastName | high | [4] values are surnames (Ramiah, venkataramanamurthy, singh, Kashyap, sinha) |
| 5 | address1 | high | [5] values are street addresses (371 SFS FLATS HAUZ KHAS, Flat No. 100, etc.) |
| 6 | city | high | [6] values are Indian city names (chembur — Chembur, Mumbai) |
| 7 | city | high | [7] values are Indian city names (NEW DELHI, Ranchi, mumbai) |
| 8 | state | high | [8] values are Indian states (Delhi, Jharkhand, Maharashtra) |
| 9 | country | high | [9] values are 'India' and 'IND' — country codes/names |
| 10 | zip | high | [10] values are Indian postal codes (110016, 834002, 400074) — 6-digit PIN codes |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9830658682, 9979862072, etc.) |
| 12 | phone | high | [12] values are 10-digit Indian mobile numbers (8797758664, 9386751758) — alternate/secondary phone |
Notes: Yatra 2019 travel booking breach. Column [0] is a numeric user ID (skip). Columns [6] and [7] both contain city data — likely city appears twice or [6] is a secondary city field. All records are Indian with Indian addresses, phone formats (+91 country code implicit in 10-digit numbers), and Indian email providers (yahoo.co.in, rediffmail.com). Breach context confirms Indian travel platform data.
13.csv12 columns99,169 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] Values contain @ signs and match standard email format (gmail.com, rediffmail patterns typical of Indian users) | |
| 2 | suffix | high | [2] Header 'Mr.' and values are salutations/titles (Mr, Mr.) |
| 3 | firstName | high | [3] Values are common given names (Tapeesh, Abhishek, ardhendu, sandeep, Shashank, KRISHAN) |
| 4 | lastName | high | [4] Values are surnames (Gupta, panda, patil, Chansoria, GOSWAMI) |
| 5 | address1 | high | [5] Values are street addresses and building identifiers (3043 Pocket B4 Vasant Kunj, plot 21, T7 Imperial Residency, D-486) |
| 6 | address2 | high | [6] Values are secondary address components (HSR Layout, TAGORE GARDEN EXTN) or empty |
| 7 | city | high | [7] Values are Indian city names (New Delhi, mau, bhubaneswar, baroda, Bangalore) |
| 8 | state | high | [8] Values are Indian states/provinces (Delhi, Uttar Pradesh, Orissa, Gujarat, Karnataka) |
| 9 | country | high | [9] Values are country codes/names (India, IND, IN) |
| 10 | zip | high | [10] Values are 6-digit Indian PIN codes (110070, 275101, 751012, 391775, 560034, 110027) |
| 11 | phone | high | [11] Values are 10-digit Indian mobile numbers (9818657776, 9999979468, 9919661666, 8763216420) |
| 12 | phone | high | [12] Values are 10-digit alternate/secondary phone numbers or empty (2667266608, 7381195997) |
Notes: Yatra 2019 breach — Indian travel booking platform. 13 columns total, 12 contain PII (names, emails, addresses, phones, location). Column [0] contains numeric user IDs and is skipped as internal identifier. All addresses, phone numbers, and email providers confirm Indian origin. Record structure matches expected user account data from travel booking platform.
14.csv13 columns98,992 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] numeric sequence (3713924–3713930), internal user IDs |
| 1 | high | [1] all values contain @ symbol, email addresses from Indian providers (yahoo.com, rediffmail.com, gmail.com) | |
| 2 | suffix | high | [2] values are titles: 'Ms', 'Mr', 'Mr.' — salutation/suffix field |
| 3 | firstName | high | [3] common given names (SHEETAL, VINOD, SHASHI, ashok, Rohit, manish) |
| 4 | lastName | high | [4] surnames (RAJPUT, SHARMA, SHEKHAR, jain, Shinde, Chavan) |
| 5 | address1 | high | [5] street addresses (916 MAHALAXMI NAGAR, G137A Sector 10 DLF, 754 GULABI BAGH) |
| 6 | address2 | medium | [6] secondary address component, appears to be building/landmark (NEW NKS HOSP, vaiduwadi) |
| 7 | city | high | [7] Indian city names (Faridabad, DELHI, Pune, Indore) |
| 8 | state | high | [8] Indian states (MADHYA PREDESH, Haryana, Delhi, Maharashtra) |
| 9 | country | high | [9] country codes/names: 'IN', 'India', 'IND' |
| 10 | zip | high | [10] 6-digit Indian postal codes (452010, 121006, 110007, 411007, 411013) |
| 11 | phone | high | [11] 10-digit Indian mobile numbers (9893091064, 9446917384, 9350447442, 9609802185) |
| 12 | phone | high | [12] 10-digit Indian phone number (8959848488), alternate/secondary phone |
Notes: Yatra 2019 breach: 13 columns, all contain PII. File structure is CSV without explicit headers but column positions match travel booking user records (ID, email, name components, full address with Indian states/cities/postal codes, dual phone numbers). All values consistent with Indian user accounts from yatra.com.
15.csv11 columns97,611 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is an email; all values contain @ and are email addresses | |
| 2 | suffix | high | [2] header 'Mr'; values are titles (Mr, Mr., Mr) — salutation/suffix field |
| 3 | firstName | high | [3] header 'Bhoj'; values are given names (Bhoj, Pulin Das, anurag, mahendra, Sidharth, sameer) |
| 4 | lastName | high | [4] header 'Sao'; values are family names (Sao, bhardwaj, Patel, Handa, chauhan, SEKARAN) |
| 5 | address1 | high | [5] street/primary address lines (Chakardhar Nagra Bangla Para, vijaynagar, C/33 Rameshwar 3rd Flr S V Road, ghasiyawas radhanpur, apartment addresses, Yatra.Com office) |
| 6 | address2 | high | [6] secondary address component (delhi, Mumbai, cross streets like '10th cross BTM 1st stage') |
| 7 | city | high | [7] header 'Raigarh'; values are Indian cities (Raigarh, delhi, Mumbai, radhanpur, TRICHY, Gurgoan) |
| 8 | state | high | [8] header 'Chhattisgarh'; values are Indian states (Chhattisgarh, Maharashtra, Gujarat, Tamilnadu, karnatka) |
| 9 | country | high | [9] header 'IN'; values are country codes/names (IN, IND, India) |
| 10 | zip | high | [10] header '496001'; values are 5-6 digit Indian postal codes (496001, 400054, 385340, 621216, 560029) |
| 11 | phone | high | [11] header '9762162480'; values are 10-digit Indian mobile numbers starting with 9 |
Notes: 13 columns total, 11 contain PII (email, name components, full address, phone). Column [0] is numeric user ID (skip). Column [12] contains sparse numeric values — appears to be an internal reference or secondary ID (skip). Breach context confirms Indian travel booking platform with personal account data including contact info and residential addresses.
16.csv12 columns98,904 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and match email format (yahoo.com, gmail.com, etc.) | |
| 2 | suffix | high | [2] header 'Mr', values are salutations/titles (Mr, Mr.) |
| 3 | firstName | high | [3] values are common given names (VIKASH, SAAGAR, Anubinda, BASANTA, JAHNAVI) |
| 4 | lastName | high | [4] values are surnames (KUMAR, Banshkar, Patra, NAYAK, MARATHE) |
| 5 | address1 | high | [5] values are street addresses (AT+ PO - JALPURA, Barjhai post panagar, D 601 REGENCY, etc.) |
| 6 | address2 | high | [6] values are secondary address components (PS +DIST- ARWAL, spaces suggesting address line 2) |
| 7 | city | high | [7] values are Indian city names (ARWAL, jabalpur, THANE, Thane) |
| 8 | state | high | [8] values are Indian states (BIHAR, Madhya Pradesh, Maharashtra) |
| 9 | country | high | [9] values are country codes/names (IN, India, IND) |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (804401, 483220, 401202, 400607) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (7305510321, 8699750340, 9898321401, 9755852574) |
| 12 | phone | high | [12] values are 10-digit Indian phone numbers (9324532709, 9920696417), alternate contact number |
Notes: Yatra 2019 breach — Indian travel booking platform. Column [0] is numeric user_id (skip). Columns [11] and [12] both map to phone as alternate contact numbers. All addresses, names, and contact details are Indian in origin.
17.csv12 columns99,257 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] All values contain @ symbol and match email address format (Indian domains: sbi.co.in, yahoo.com, gmail.com) | |
| 2 | suffix | high | [2] Values are 'Mr', 'Mr.', 'Mr' — salutation/title prefix |
| 3 | firstName | high | [3] Values are common Indian given names (Sumit, Arun, Raunak, KESHAV) |
| 4 | lastName | high | [4] Values are Indian surnames (Lakhotiya, DASARI, Rana, Agrawal, SHENDE) |
| 5 | address1 | high | [5] Street-level addresses (2b Rajkamal Complex, B-22 East Uttam Nagar, 11 SOMWAR PETH) |
| 6 | address2 | high | [6] Secondary address components (Panchsheel Square Dhantoli, sri nagar colony) |
| 7 | city | high | [7] Indian city names (Nagpur, Hyderabad, New Delhi, KARAD) |
| 8 | state | high | [8] Indian states/provinces (Maharashtra, Andhra Pradesh, Delhi) |
| 9 | country | high | [9] Country indicators (IN, IND, India) |
| 10 | zip | high | [10] Indian postal codes (440012, 500045, 110059, 415110 are valid PIN formats) |
| 11 | phone | high | [11] Indian mobile phone numbers (10 digits starting with 9: 9448993063, 9869345389, 9703580777, 9876644001, 9311056668) |
| 12 | phone | high | [12] Secondary phone numbers including landlines (040-44430777 is Hyderabad area code, 9960196954 is mobile) |
Notes: 13 columns total. Column [0] is numeric user_id (skip). Columns [1]–[12] contain personal PII. Data structure confirms Yatra travel platform breach with Indian user records containing contact and address information.
18.csv12 columns99,136 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbols and match email address patterns (gmail.com, yahoo.com, accenture.com, etc.) | |
| 2 | suffix | high | [2] header 'Mr', consistent salutation/title prefix values |
| 3 | firstName | high | [3] values are common given names (SHANKAR, Mohamed Saifulla, Khawar, sushant) |
| 4 | lastName | high | [4] values are surnames (MISHRA, Shakeel, Hussain, Nethala, shukla) |
| 5 | address1 | high | [5] street/mailing address values (marble mkt trikuta nagar jammu, H No: 9-7-8/7, Shivajinagar, etc.) |
| 6 | address2 | high | [6] secondary address line values (P O Lawsons Bay, banda(u.p.), main road) |
| 7 | city | high | [7] Indian city names (jammu, Visakhapatnam, lucknow, VIJAYAWADA) |
| 8 | state | high | [8] Indian state names (Jammu and Kashmir, Uttar Pradesh, Andhra Pradesh) |
| 9 | country | high | [9] country codes and names (IND, India) |
| 10 | zip | high | [10] Indian PIN codes (180012, 226001, 520001) — numeric postal codes |
| 11 | phone | high | [11] 10-digit Indian mobile phone numbers (9003698543, 7305591205, 9960517609, etc.) |
| 12 | phone | high | [12] alternate 10-digit Indian phone numbers (9815926631, 9450169787) — secondary contact |
Notes: 13 columns total, 12 contain PII. Column [0] is numeric user ID — skipped as internal identifier. Breach context confirms Indian travel booking platform (Yatra.com) with Indian addresses, states, PIN codes, and phone numbers. No header row present in data.
19.csv10 columns96,370 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and email domains (yahoo.in, rediffmail.com, gmail.com) | |
| 2 | suffix | high | [2] header pattern matches salutation/title; values are 'Mr.' and 'Mr' |
| 3 | firstName | high | [3] values are common Indian given names (siddhartha, santhosh, ARIJEET, nongthombam) |
| 4 | lastName | high | [4] values are Indian surnames (singh, RAIKAR, kangleinganba) |
| 5 | address1 | high | [5] values contain street addresses and landmarks (varuna, sunrise town, chowkaghat) |
| 7 | city | high | [7] values are Indian city names (varanasi, chennai, Imphal) |
| 8 | state | high | [8] values are Indian state names (Uttar Pradesh, Manipur) |
| 9 | country | high | [9] values are 'India' and 'IN' country codes |
| 10 | zip | high | [10] values are 6-digit Indian postal codes (211002, 795001) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9415015092, 8756538111, etc.) |
Notes: Yatra 2019 breach: Indian travel booking platform. Column [0] is numeric user ID (skip). Columns [6] and [12] are empty (skip). Total PII columns: 10 of 13.
2.csv12 columns98,870 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header 'email', values contain @ symbols and match email format (gmail.com, rediffmail.com, yahoo.co.in) | |
| 2 | suffix | high | [2] header 'salutation/title', values are 'Mr' — standard name suffix |
| 3 | firstName | high | [3] header 'first_name', values are common given names (Sandhya, sahil, prabhat, NITIN, Samidurai) |
| 4 | lastName | high | [4] header 'last_name', values are surnames (Gunjote, gaikwad, kumar, GUPTA, Nadarajan) |
| 5 | address1 | high | [5] header 'address', values are street addresses (1/7 ram nager, K61/119 SAPTSAGAR, 7 1st Floor Kamatchiamman Colony) |
| 6 | address2 | high | [6] header 'address2', values are secondary address components (wagle estate behind r p compnay thane w, Sriramapuram) |
| 7 | city | high | [7] header 'city', values are Indian city names (thane, VARANASI, Chennai, Bangalore) |
| 8 | state | high | [8] header 'state', values are Indian state names (Maharashtra, Uttar Pradesh, Tamilnadu, Karnataka) |
| 9 | country | high | [9] header 'country', values are country codes/names (IND, India, IN) |
| 10 | zip | high | [10] header 'PIN/postal_code', values are 6-digit Indian postal codes (400604, 221002, 600026, 560021) |
| 11 | phone | high | [11] header 'mobile_phone', values are 10-digit Indian phone numbers (9873323693, 8886801515, 9619222942, 9950674190) |
| 12 | phone | high | [12] header 'alternate_phone', values are 10-digit Indian phone numbers (67341600, 9198418199) — secondary phone field |
Notes: Yatra-2019 breach. Column [0] is numeric user_id (skip). All 13 columns analyzed. 12 PII columns identified. Data is entirely Indian travel platform user records with standard contact information and address fields.
20.csv13 columns95,946 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] header '4066017', values are sequential numeric IDs (4066018, 4066019, etc.) — internal customer/user identifiers |
| 1 | high | [1] header '[email protected]', all values are valid email addresses with @ symbol (yahoo.co.in, gmail.com, hotmail.com, etc.) | |
| 2 | suffix | high | [2] header empty, values are titles: 'Mr', 'Mr.', 'Mr' — salutation/suffix field |
| 3 | firstName | high | [3] header 'reetagupta', values are given names: 'RAHUL', 'DRTG', 'megh', 'mohankumar', 'Geeta' — first names |
| 4 | lastName | high | [4] header empty, values are surnames: 'CHAUDHARY', 'CHANDRAMOHAN', 'singh', 'N', 'kolkar' — last names |
| 5 | address1 | high | [5] header '1112seergovardhanpur', values are street addresses: 'MPHASIS LTD', '3 CITY PARK SAMA', 'AKE Road NO3 Tiruchengode', 'shri venkateshwara residency' — primary address lines |
| 6 | address2 | medium | [6] header 'Enter Address 2', mostly empty with occasional secondary address data ('Namakkal', '313/7 B&C colony R&D Pashan') |
| 7 | city | high | [7] header 'varanasi', values are Indian city names: 'MANGALORE', 'VADODARA', 'koderma', 'Namakkal', 'bangalore' |
| 8 | state | high | [8] header empty, values are Indian states: 'Karnataka', 'Gujarat', 'Jharkhand', 'Tamil Nadu' — mailing_state equivalent |
| 9 | country | high | [9] header empty, all values are 'IND' or 'India' — country code/name field |
| 10 | zip | high | [10] header empty, values are 6-digit Indian PIN codes: '575001', '390008', '825413', '637211', '562157' — postal codes |
| 11 | phone | high | [11] header '9795510893', values are 10-digit Indian mobile numbers: '9901121130', '9212412662', '9567228210' — primary phone |
| 12 | phone | high | [12] header empty, values are 10-digit Indian mobile numbers: '9430314543', '9171224815', '9341704489' — alternate/secondary phone |
Notes: 13 columns total. Yatra 2019 breach: Indian travel booking platform. Data includes user IDs, emails, names with titles, complete Indian addresses (street, city, state, country, PIN), and primary + secondary mobile phone numbers. All addresses and phone numbers confirm Indian origin. Column [0] is internal customer ID (skipped). Columns [11] and [12] are both phone fields (primary and secondary mobile contacts).
21.csv13 columns99,318 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] numeric user IDs (4174487, 4174489, etc.) — internal identifiers, not PII |
| 1 | high | [1] all values contain @ symbol and are valid email addresses ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] salutation/title values: 'Ms', 'Mr', 'Mr.' — clearly honorific suffixes |
| 3 | firstName | high | [3] given names (sivakumar, Nisha, AYUSH, SANGAY, Amit) — first name values in Indian context |
| 4 | lastName | high | [4] family names (Sharma, KHURANA, LAMA, EMANI) — last name values, though some rows contain anomalies |
| 5 | address1 | high | [5] street addresses (Vrajbhumi soc, A-1382 BAPU NAGAR BHILWAR, 6/6 mahant layout) — primary address line |
| 6 | address2 | medium | [6] mixed secondary address/contact data ([email protected], bull temple road, chola block, phone 9963785987) — appears to be secondary address or overflow field |
| 7 | city | high | [7] Indian city names (Vadodara, bhilwara, New Delhi, bangalore) — city field |
| 8 | state | high | [8] Indian state names (Gujarat, Rajasthan) and locality descriptors — state/region field |
| 9 | country | high | [9] country code 'IND' and empty values — country field |
| 10 | zip | high | [10] Indian postal codes (390021, 311001) — PIN/zip code field |
| 11 | phone | high | [11] 10-digit Indian mobile numbers (9442333343, 9558724370, 8058757575) — primary phone field |
| 12 | phone | high | [12] 10-digit Indian mobile numbers (9558724370, 8058757575) — alternate/secondary phone field |
Notes: 13 columns total. Data structure matches Yatra breach context: Indian user account records with numeric user IDs, email addresses, names, Indian addresses (street, city, state, PIN), and mobile phone numbers. No hasHeader flag present in raw data. Column 6 contains data quality issues with mixed field types (emails, addresses, phones) suggesting possible data corruption or secondary contact information.
22.csv12 columns97,711 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is email; all values contain @ and are valid email addresses | |
| 2 | suffix | high | [2] header 'Mr'; all values are titles/salutations (Mr, etc.) |
| 3 | firstName | high | [3] header 'Nitin'; values are given names (Amos, Tathagata, sainmi, Manoj) |
| 4 | lastName | high | [4] header 'Gupta'; values are family names (Samson, Sureddy, Ghosh, Kumar) |
| 5 | address1 | high | [5] header '45 Old Ashoka Garden'; values are street addresses (NSK Street, Mohammad Villa, etc.) |
| 6 | address2 | high | [6] header ' '; values are secondary address components (apt/area names: koperkhairene) |
| 7 | city | high | [7] header 'Bhopal'; values are Indian cities (Chromepet Chennai, Hyderabad, navi mumbai, PATHANKOT) |
| 8 | state | high | [8] header 'Madhya Pradesh'; values are Indian states/provinces (Tamil Nadu, Andhra Pradesh, Punjab, Maharashtra) |
| 9 | country | high | [9] header 'IND'; all values are 'IND' (India country code) |
| 10 | zip | high | [10] header '462023'; values are 6-digit Indian postal codes (600044, 500072, 145001, 401202) |
| 11 | phone | high | [11] header '9893084352'; values are 10-digit Indian mobile numbers |
| 12 | phone | high | [12] header '9893084352'; values are 10-digit Indian mobile numbers (alternate/secondary phone) |
Notes: 13 columns total. Column [0] is numeric user ID (skip). Columns [11] and [12] both contain phone numbers; [12] appears to be alternate/secondary mobile. Indian travel booking platform breach with complete address and contact information.
23.csv12 columns99,132 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] All values contain @ signs and match email format (gmail.com, yahoo.com, rediffmail patterns) | |
| 2 | suffix | high | [2] Values are 'Mr', 'Ms', 'Mr.' — salutation/title indicators |
| 3 | firstName | high | [3] Values are common given names (jagdish, yashmeen, SUNIL, anil, Ravi, venkat) |
| 4 | lastName | high | [4] Values are surnames (mamgai, kaur, KUMAR, satish, Sonule) |
| 5 | address1 | high | [5] Values are street addresses (E 499 sc 11 pratap vihar, flat no-240 dwarka delhi, b-84 indupuram aurangabad) |
| 6 | address2 | medium | [6] Secondary address component; mostly empty/spaces but contains some location data (kothapet guntur) |
| 7 | city | high | [7] Values are Indian city names (ghaziabad, delhi, Visakhapatnam, mumbai, guntur) |
| 8 | state | high | [8] Values are Indian states (Uttar Pradesh, Delhi, Andhra Pradesh, Maharashtra, Punjab) |
| 9 | country | high | [9] All values are 'IND' — country code for India |
| 10 | zip | high | [10] Values are 6-digit Indian PIN codes (201001, 110075, 530013, 400072, 522001, 143001) |
| 11 | phone | high | [11] Values are 10-digit Indian mobile numbers (9910629882, 8826955689, 9502889933, 9819382329) |
| 12 | phone | high | [12] Values are 10-digit Indian mobile numbers — alternate/secondary phone field |
Notes: 13 columns total, 12 contain PII (email, suffix, firstName, lastName, address1, address2, city, state, country, zip, phone×2). Column [0] appears to be user_id (numeric sequential identifiers like 4384367-4384371) and is skipped as internal ID. No header row present in data.
24.csv13 columns99,047 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] header '3674926', values are sequential numeric IDs (3674927, 3674928, etc.) — internal user/customer IDs |
| 1 | high | [1] header contains email address, all values contain @ sign and match email format ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] header 'Mr', values are salutations/titles (Mr, Mr., etc.) |
| 3 | firstName | high | [3] header 'sunil', values are given names (Prabhjot, chongkholun, Najmul, pratik) |
| 4 | lastName | high | [4] header 'kuriakose', values are surnames (Gill, haokip, Khan, amar) |
| 5 | address1 | high | [5] header 'Puthiyadathuparambil', values are street addresses (17/4 zorawar enclave, C-53 New Raipur Rd, etc.) |
| 6 | address2 | high | [6] header 'Puthiyadathuparambil', values are secondary address components (mall road near church, sadar hills dist.) |
| 7 | city | high | [7] header 'Kannur', values are Indian city names (Jalandhar, manipur, Kolkata) |
| 8 | state | high | [8] header 'Kerala', values are Indian states/provinces (Punjab, manipur, West Bengal) |
| 9 | country | high | [9] header 'IND', values are country codes/names (IND, IN, India) |
| 10 | zip | high | [10] header '670632', values are 6-digit postal codes (144005, 795001, 700084) — Indian PIN codes |
| 11 | phone | high | [11] header '9.19496E+11', values are 10-digit Indian mobile numbers (8123150766, 8872489814, 9415050816, 9957570748, 8657109081) |
| 12 | phone | high | [12] header 'Kerala', values are 10-digit phone numbers (9041660921, 8861797747) — alternate/secondary phone field |
Notes: 13 columns total, 12 contain PII. Column 0 is internal user ID (skip). Columns 11 and 12 both contain phone numbers, likely mobile and alternate phone as noted in breach context. Data confirms Indian travel booking platform with Indian addresses, phone numbers, and email providers.
25.csv12 columns98,808 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] Numeric user IDs (3781279, 3781280, etc.) — internal identifiers, not usernames |
| 1 | high | [1] Values contain @ symbol and email domains (gmail.com, sify.com, hotmail.com, zyduscadila.com) | |
| 2 | suffix | high | [2] Values are 'Mr.' and 'Mr' — salutation/title indicators |
| 3 | firstName | high | [3] Values are given names (Biju, Ignatius, Felix, Abhishek, PRASAD, NITISH) |
| 4 | lastName | high | [4] Values are family names (George, Richard, Dupont, Dani, KAPPAZHASANKARA, AMIN) |
| 5 | address1 | high | [5] Values are street addresses (26 GANDHI NAGAR DINDIGUL ROAD TRICHY, 21-C Vrindavan 2, IPSIT,21B, 80/1 A+B) |
| 6 | address2 | high | [6] Values are secondary address components (Panchavati, Pashan Rd, Varanasi soc, warje) |
| 7 | city | high | [7] Values are Indian city names (trichy, Pune) |
| 8 | state | high | [8] Values are Indian states (Tamilnadu, Mah/Maharashtra) |
| 9 | country | high | [9] Values are 'India' and 'IN' — country code and name |
| 10 | zip | high | [10] Values are 6-digit Indian PIN codes (620001, 411008, 411058) |
| 11 | phone | high | [11] Values are 10-digit Indian mobile numbers (9974051970, 9841322640, 9969352104, etc.) |
Notes: 13 columns total. Column 12 is empty. Breach context confirms Indian travel booking platform with Indian addresses, phone numbers, and email providers. All PII fields identified and mapped.
26.csv12 columns99,260 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] numeric user IDs (3886583, 3886584, etc.) — internal identifiers, auto-generated |
| 1 | high | [1] all values contain @ symbol, standard email addresses (gmail.com, yahoo.com, rediffmail.com) | |
| 2 | suffix | high | [2] salutation/title values ('Mr') — matches suffix field type |
| 3 | firstName | high | [3] personal given names (Deokumar, hitendra, RAKESH, venkatasi) |
| 4 | lastName | high | [4] personal family names (Singh, pawar, TIWARI, bommarajupeta) |
| 5 | address1 | high | [5] street addresses (Vighnharta Colony, H.NO. 107, 2ND FLOOR, 19-4-121/e3) |
| 6 | address2 | high | [6] secondary address lines (panchavati Gas Agency Road, GALI NO. 2, ASHOK VIHAR, RAILWAY ROAD) |
| 7 | city | high | [7] Indian city names (Dhule, GURGAON) |
| 8 | state | high | [8] Indian states (maharashtra, HARYANA) |
| 9 | country | high | [9] country code 'IN' (India), consistent with breach context |
| 10 | zip | high | [10] Indian postal codes/PIN codes (424002, 122001) — 6-digit format |
| 11 | phone | high | [11] Indian mobile phone numbers (9962266133, 9246647888, etc.) — 10 digits starting with 7-9 |
Notes: 12 columns total, 11 contain PII. Breach context (Yatra 2019, Indian travel platform) confirmed by Indian addresses, phone numbers, email providers (rediffmail.com, yahoo.co.in), and city/state data. No header row present in sample.
27.csv12 columns96,519 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is sample data; values contain @ signs and are valid email addresses | |
| 2 | suffix | high | [2] header 'Mr.' is sample data; values are salutations (Mr., Mrs) |
| 3 | firstName | high | [3] header 'Prince' is sample data; values are common given names (vijaylakshmi, Pradyumna, ramchandraprasad, AndrewWilson, Balu) |
| 4 | lastName | high | [4] header 'Chakma' is sample data; values are surnames (srikanth, Mohapatra, kalavala, Cornforth, Pathangey) |
| 5 | address1 | high | [5] header is sample data; values are street addresses (Bodhicariya, Maitree Nagar; ashok nagar; a-504,plot no -23) |
| 6 | address2 | high | [6] header is sample data; values are secondary address components (PO:Khandagiri, Bilandpur, 110colleg road) |
| 7 | city | high | [7] header 'Kolkata' is sample data; values are city names (chennai, Bhubaneswar, navi mumbai, Scottsdale, Gorakhpur) |
| 8 | state | high | [8] header 'West Bengal' is sample data; values are state/province codes and names (Tamilnadu, Odisha, Maharashtra, AZ, UP) |
| 9 | country | high | [9] header 'India' is sample data; values are country names/codes (IN, India, United States of America) |
| 10 | zip | high | [10] header '700135' is sample data; values are postal codes (600008, 751030, 400706, 85259, 273001) |
| 11 | phone | high | [11] header '9038432966' is sample data; values are 10-digit Indian mobile numbers (9940074243, 9937563105, 9619394141) |
| 12 | phone | high | [12] header is sample data; values are phone numbers (9436121525), appears to be alternate/secondary phone field |
Notes: File contains Indian travel platform user records. Column [0] (numeric user IDs like 3991230) is skipped as internal identifier. All columns except [0] map to PII fields. Context confirms Indian addresses, Indian phone numbering, and Indian email providers.
28.csv11 columns99,455 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] All values contain @ symbol and match email format (gmail.com, yahoo.com, yahoo.co.in domains) | |
| 2 | suffix | high | [2] Header 'Mr' and all sample values are salutations/titles (Mr) |
| 3 | firstName | high | [3] Values are common given names (sahil, ajay, Nitesh, Sajad, Tarun) |
| 4 | lastName | high | [4] Values are surnames (singla, jain, Mishra, bhat, Chanana) |
| 5 | address1 | high | [5] Street addresses with building/plot numbers (466 sector 31, 357 tagan street, samanvay nagar, JG1-105 B, Yatra.Com office) |
| 6 | address2 | high | [6] Neighborhood/area names (Vikas Puri, ameerpet) |
| 7 | city | high | [7] Indian city names (gurgaon, khatauli, bhopal, new delhi) |
| 8 | state | high | [8] Indian state/territory names (haryana, Uttar Pradesh, Madhya Pradesh, delhi) |
| 9 | country | high | [9] Country codes (IN, IND) |
| 10 | zip | high | [10] Indian PIN codes (122001, 251201, 462023, 110018) |
| 11 | phone | high | [11] Indian mobile phone numbers (10-digit format: 9819996726, 9811667166, 9412711594, etc.) |
Notes: 13 columns total, 11 contain PII. Column [0] is numeric user ID (skip). Column [12] is empty (skip). All addresses are Indian, consistent with Yatra.com breach context. Salutation field mapped as suffix.
29.csv12 columns95,992 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is an email address; all sample values contain @ signs and are valid email addresses | |
| 2 | suffix | high | [2] values are 'Mr', 'Mr.', 'Dr' — salutation/title suffixes |
| 3 | firstName | high | [3] header 'ARUNDABIR'; values are given names (Sadanand, PEEYUSH, SAMI, ASEM, Pradipta) |
| 4 | lastName | high | [4] values are surnames (Gharat, SAJJAD, SUNILKUMARSINGH, taj, KDas) |
| 5 | address1 | high | [5] street/building-level address component (Growel House, SAI SECURITIES, A-162/32, Hostel No.2) |
| 6 | address2 | high | [6] secondary address component (Dapodi, SIMALGAIR BAZAAR, Nandalaya,Subachani Road,Tinsukia,Assam) |
| 7 | city | high | [7] values are Indian city names (Pune, DELHI, THOUBAL, Silchar) |
| 8 | state | high | [8] values are Indian state names (Delhi, Manipur, Maharashtra, Madhya Pradesh) |
| 9 | country | high | [9] values are 'India' and 'IND' — country designator |
| 10 | zip | high | [10] values are 6-digit Indian postal codes (110053, 795138, 400706, 452001) |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers |
| 12 | phone | high | [12] alternate phone numbers; 10-digit Indian mobile format (9953262896, 9004795001) |
Notes: Yatra 2019 breach. 13 columns total, 12 contain PII. Column [0] is numeric user_id (skipped). Addresses span Indian cities/states with postal codes. Phone numbers are Indian mobile format. Breach context confirms Indian travel booking platform data.
3.csv13 columns99,171 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] numeric user IDs (3116178, 3116179, etc.), internal identifier pattern |
| 1 | high | [1] values contain @ signs, email addresses ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] salutation/title values (Mr), maps to suffix field |
| 3 | firstName | high | [3] given names (Pardeep, thangjam, Rohit, MOHAN, VelluMadhom) |
| 4 | lastName | high | [4] family names (Khatri, bishwarjitsingh, Sharma, KUMAR, Rajen) |
| 5 | address1 | high | [5] street addresses (V. % P. O. Ismaila Haryana 9Beswa, c-43 Ganga CHS, PCDA(NC)JAMMU) |
| 6 | address2 | high | [6] secondary address/locality lines (Sane Guruji Nagar Mulund E), often empty |
| 7 | city | high | [7] Indian city names (Distt. Rohtak, Mumbai, JAMMU) |
| 8 | state | high | [8] Indian states (Haryana, Maharastra, Jammu and Kashmir) |
| 9 | country | high | [9] country codes and names (India, IN, IND) |
| 10 | zip | high | [10] Indian PIN codes (124517, 400081, 180003) |
| 11 | phone | high | [11] 10-digit Indian mobile numbers (9416206295, 9716555414, 9780323332, etc.) |
| 12 | phone | high | [12] alternate 10-digit Indian phone numbers (9469000923), often empty |
Notes: Yatra travel booking platform breach (2019). 13 columns total, 11 contain PII. No header row present in data. All addresses are Indian, phone numbers follow Indian format (10 digits starting with 7-9), email providers include Indian domains (yahoo.co.in, rediffmail.com). Columns 0 is user_id (skip). Columns 5-6 are multi-part address. Columns 11-12 are primary and alternate phone numbers.
30.csv9 columns97,592 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and email domains (.yahoo.co.in, @gmail.com, @yahoo.com), clearly email addresses | |
| 2 | suffix | high | [2] header empty but values are 'Mrs', 'Mr' — salutation/title suffixes |
| 3 | fullName | high | [3] values are full names like 'Anju Puri', 'jayantha sanjeeva shetty', 'nirmala saagar' — full name field |
| 4 | lastName | high | [4] values are surnames: 'Shelat', 'Balram', 'Verma' — last name field |
| 5 | address1 | high | [5] values are street addresses like 'pari house, bunglow no.:4' and 'No,1 Nolambur main road' |
| 6 | address2 | medium | [6] values include apartment/suite details like 'A/2-C/4 golden fortune mogappair w' — secondary address |
| 7 | city | high | [7] header empty but values are Indian cities: 'mumbai', 'chennai', 'Ahemdabad', 'BANGALORE', 'delhi' |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers like '9870593838', '9930240545', '9980673487' |
| 12 | phone | high | [12] values are 10-digit Indian phone numbers like '8826749428' — alternate/secondary phone |
Notes: 13 columns total. Yatra 2019 breach — Indian travel booking platform. Columns [0], [8], [9], [10] are empty or non-PII (internal IDs, flags) and excluded. Columns [5] and [6] both map to address fields per Indian address structure (street + apt/suite). No state, zip, country, DOB, or SSN fields present in this file sample.
31.csv12 columns99,256 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbols and match email format (gmail.com, yahoo.com, yahoo.co.in) | |
| 2 | suffix | high | [2] header 'Mr', values are salutations/titles |
| 3 | firstName | high | [3] values are common given names (Ashraf, Vishu, ANIL, Sandeep, manoj, RAKESH) |
| 4 | lastName | high | [4] values are surnames (Shaikh, Kapoor, SURI, Pradhan, singh, MANJUNATH) |
| 5 | address1 | high | [5] values are street addresses (139 Green Avenue, 22,1st Floor,Sena Vihar, sultanpuri, #58, plot no 287 rawatpur) |
| 6 | address2 | medium | [6] column contains only spaces or empty values, consistent with optional address2 field |
| 7 | city | high | [7] values are Indian cities (Amritsar, Bangalore, delhi, BANGALORE, kanpur) |
| 8 | state | high | [8] values are Indian states (Punjab, Karnataka, Delhi, Uttar Pradesh) |
| 9 | country | high | [9] all values are 'IND' (India country code) |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (143001, 560043, 110086, 560078, 208019) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9699925818, 7893414806, 9845062142, 9096333326, 8604092136) |
| 12 | phone | high | [12] values are 10-digit Indian mobile numbers, alternate/second phone number |
Notes: 13 columns total, 12 contain PII. Column [0] is user_id (numeric identifier, skipped). Breach context confirms Indian travel booking platform with Indian addresses, phone numbers, and email providers. All address fields populated with valid Indian data.
32.csv12 columns99,407 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbol and match email format ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] values are salutations/titles ('Mr.', 'Mr') |
| 3 | firstName | high | [3] values are given names (ganesh, kamal, Anisha, Parikshit, venkat, PRAKASH) |
| 4 | lastName | medium | [4] values appear to be surnames (krishnan, gulati, Parse, rangan, PATIL) though some entries are unclear or may be abbreviations |
| 5 | address1 | high | [5] values contain street addresses (c-15 orange block orchrad, SRI RAM NAGAR COLONY, etc.) or are empty/NA |
| 6 | address2 | medium | [6] secondary address field, mostly empty or 'NA', consistent with optional address2 column |
| 7 | city | high | [7] values are Indian city names (chennai, BAGALKOT, RAICHUR) |
| 8 | state | high | [8] values are Indian states (Tamil Nadu, Karnataka) |
| 9 | country | high | [9] values are 'IND' or 'India', indicating country field |
| 10 | zip | high | [10] values are 6-digit Indian postal codes (600026, 587101, 534001) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9160126226, 9810484932, 9920079979, etc.) |
| 12 | phone | high | [12] secondary phone field with 10-digit Indian mobile numbers, consistent with alternate phone number column |
Notes: Yatra-2019 breach file: Indian travel booking platform. Column [0] is numeric user ID (skip). All address, phone, and personal identifiers are Indian. Two phone columns present ([11] and [12]) mapping to primary and alternate phone numbers.
33.csv13 columns99,276 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] Sequential numeric IDs (4484483, 4484484, etc.) — internal user_id identifiers |
| 1 | high | [1] Values contain @ signs, standard email format ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] Salutation/title values: 'Mr', 'Ms' — standard name suffix field |
| 3 | firstName | high | [3] Given names: 'mohd', 'HANIF', 'Rajat', 'hari', 'Rajesh', 'pooja' — first name component |
| 4 | lastName | high | [4] Family names: 'idrees', 'CHHATRIWALA', 'Singh', 'babu', 'Yadav', 'hagawane' — last name component |
| 5 | address1 | high | [5] Street addresses: 'SHANTI NIKETAN SCTY', 'Chandni Agar, Sangam Naga' — primary address line |
| 6 | address2 | medium | [6] Column appears empty in sample but positioned as secondary address field |
| 7 | city | high | [7] City names: 'MUMBAI', 'pune', 'Mumbai' — Indian cities |
| 8 | state | high | [8] State values: 'Maharashtra', 'Maharashtra' — Indian state codes/names |
| 9 | country | high | [9] Country codes: 'IND', 'India' — standardized to India |
| 10 | zip | high | [10] Indian postal codes: '400061', '411043', '400037' — PIN code format (6 digits) |
| 11 | phone | high | [11] 10-digit Indian mobile numbers: '9997579189', '9768378694', '9999012953' — primary phone |
| 12 | phone | high | [12] 10-digit Indian mobile numbers: '9768378694', '9604420117', '9619875434' — alternate/secondary phone |
Notes: Yatra.com 2019 breach. 13 columns, 11 contain PII. Full Indian user records including contact details, address, and phone numbers. Two phone columns ([11] and [12]) both map to phone field as they represent primary and alternate contact numbers.
34.csv11 columns96,873 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is an email address; all values contain @ and follow email format | |
| 2 | suffix | high | [2] header 'Mr', values are salutations/titles (Mr, Mrs) |
| 3 | firstName | high | [3] header 'sunil', values are common given names (anupma, vicky, shubham) |
| 4 | lastName | high | [4] header 'kumar', values are common surnames (bhatnagar, rajak, arora) |
| 5 | address1 | high | [5] header 'hno1b/49bstcollny', values are street addresses (hno patterns, 'a 1604 3rd floor', 'VIVEKANAND') |
| 7 | city | high | [7] header 'ganaur', values are Indian city names (delhi, sonepat, SULTANPUR, Bangalore) |
| 8 | state | high | [8] header 'Haryana', values are Indian state names (Delhi, Haryana, Uttar Pradesh) |
| 9 | country | high | [9] header 'IND', values are country codes/names (IND, India) |
| 10 | zip | high | [10] header '131101', values are 6-digit Indian postal codes (110033, 131001, 228001) |
| 11 | phone | high | [11] header '9812860304', values are 10-digit Indian mobile numbers |
| 12 | phone | high | [12] header '9812860304', values are 10-digit Indian mobile numbers (alternate phone) |
Notes: 13 columns total, 10 contain PII. Column [0] is numeric user ID (skip). Column [6] is empty (skip). Yatra-2019 Indian travel booking breach; all addresses, phone numbers, and email providers confirm Indian origin.
35.csv12 columns99,230 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, match email format (gmail.com, rediffmail.com, bheltry.co.in) | |
| 2 | suffix | high | [2] values are salutations/titles: 'Mr', 'Mrs', 'Mr.' |
| 3 | firstName | high | [3] values are given names: 'shikher', 'priyanka', 'lalit', 'ashish', 'KARUNANIDHY' |
| 4 | lastName | high | [4] values are family names: 'verma', 'sharma', 'luthra', 'modi', 'S' |
| 5 | address1 | high | [5] values are street addresses: 'Mayur Vihar-III', 'kalander chowk', '19 meghna arcade', 'EZHIL NAGAR' |
| 6 | phone | high | [6] values are 10-digit Indian mobile numbers: '9898067155' |
| 7 | city | high | [7] values are Indian city names: 'Delhi', 'ahmendnagar', 'panipat', 'TRICHY', 'Ahmedabad' |
| 8 | state | high | [8] values are Indian states: 'Delhi', 'Maharashtra', 'Haryana', 'Tamil Nadu', 'Gujarat' |
| 9 | country | high | [9] all values are 'IND' (India country code) |
| 10 | zip | high | [10] values are Indian postal codes: '110096', '414003', '132103', '620014', '380008' |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers: '9212201486', '8446688741', '9812020026' |
| 12 | phone | high | [12] alternate phone column, values are 10-digit Indian mobile numbers matching [11] pattern |
Notes: Yatra 2019 breach (Indian travel booking platform). Column [0] contains user IDs (skip). Columns [11] and [12] appear to be primary and alternate phone numbers — both mapped as phone. File lacks header row (hasHeader: false). All records contain Indian addresses, phone numbers, and email providers consistent with breach context.
36.csv11 columns96,059 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is an email address; all sample values contain @ and are valid email addresses | |
| 2 | suffix | high | [2] header 'Mr' and all values are 'Mr', 'Mr.' — salutation/title field |
| 3 | firstName | high | [3] header 'jothi' and sample values 'Namboodiri', 'abhishek', 'Hunijandi', 'bhupendra Singh', 'Mahendra' are given names; context confirms first name column |
| 4 | lastName | high | [4] header 'lakshmi' and sample values 'AGV', 'garg', 'Anthena', 'ranawat', 'Punmia' are family names; positioned after firstName |
| 5 | address1 | high | [5] sample values '3-13-102/1, Madhuranagar', '1695 / room no 5,', 'Yatra Office', '118-H,Thiru flats,lakshmi' are street addresses; breach context confirms physical addresses included |
| 7 | city | high | [7] sample values 'HYDERABAD', 'Gurgaon', 'Gurgaon', 'chennai' are Indian city names |
| 8 | state | high | [8] sample values 'Andhra Pradesh', 'Haryana', 'Haryana', 'Tamil Nadu' are Indian state names |
| 9 | country | high | [9] sample values 'IND', 'India', 'India', 'IND', 'India' are country indicators |
| 10 | zip | high | [10] sample values '500013', '122002', '122003', '600116' are 6-digit Indian postal codes (PIN codes) |
| 11 | phone | high | [11] sample values '9094864296', '8886000022', '9953535282', '8009123', '7428814684', '9860203638' are 10-digit Indian mobile numbers |
| 12 | phone | high | [12] sample values '8886000022', '7428814684', '9445191807' are 10-digit Indian phone numbers; alternate/secondary phone column |
Notes: 13 columns total; 11 contain PII (email, suffix, firstName, lastName, address1, city, state, country, zip, phone, alternate phone). Column [0] is numeric user_id (skipped). Column [6] is empty (skipped). Breach confirmed as Yatra 2019 with Indian user records containing contact and address information.
37.csv6 columns99,377 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] All values contain @ symbol and match email format (yahoo.com, gmail.com domains) | |
| 2 | suffix | high | [2] Values are 'Mr', 'Mr.' — salutation/title prefix |
| 3 | firstName | high | [3] Values are common Indian given names (jinu, cijoy, Sangeetha, Pramod, karthik) |
| 4 | lastName | high | [4] Values are surnames (garg, varghese, Kumar, mudaliar) |
| 9 | country | high | [9] Value is 'India' |
| 11 | phone | high | [11] All values are 10-digit Indian mobile numbers (8893212763, 9894939142, etc.) |
Notes: File appears to be headerless CSV from Yatra 2019 breach. Columns [0], [5], [6], [7], [8], [10], [12] contain sparse/empty data and are treated as skip. Column [0] appears to be user_id (skip). Columns [5], [6] appear to be incomplete address/zip fragments. Remaining empty columns skipped. Total 13 columns, 6 contain PII.
38.csv12 columns96,183 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header missing but values are email addresses with @ signs ([email protected], [email protected], [email protected]) | |
| 2 | suffix | high | [2] values are salutations/titles (Mr, Mr.) matching breach description of 'salutation/title' |
| 3 | firstName | high | [3] values are common given names (Jasmail, Christy, Anurag, Satyajitkumar, bhavesh) |
| 4 | lastName | high | [4] values are family names (Sidhu, Fernandez, Singh, kala) |
| 5 | address1 | high | [5] street/house addresses (ST NO 2 SEC NO 13, A-68, mokram manzil, b.b ganj) |
| 6 | address2 | high | [6] secondary address lines (South Extension Part 1) matching address2 pattern |
| 7 | city | high | [7] Indian city names (Malout, Gurgoan, cuttack, muzaffarpur) |
| 8 | state | high | [8] Indian state abbreviations/names (Punjab, Orissa, Bihar) |
| 9 | country | high | [9] country code/name (IND, India) |
| 10 | zip | high | [10] Indian PIN codes (152107, 753001, 842001) matching postal_code pattern |
| 11 | phone | high | [11] Indian mobile numbers (9317766144, 9711062778, 9431218870, 10-digit format) |
| 12 | phone | high | [12] alternate/secondary phone number (9317766144, duplicate pattern) |
Notes: 13 columns total, 11 contain PII. Column [0] contains numeric user IDs (4804981, etc.) and mixed junk data — treated as skip. Breach context confirms Indian travel platform with user account records including addresses, phones, names matching observed data.
39.csv11 columns99,330 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, email addresses from Indian providers (gmail.com, outlook.com, yahoo.com) | |
| 2 | suffix | high | [2] header value 'Dr.' and sample values 'Mr', 'Mr.', 'Mr' are common name suffixes/salutations |
| 3 | firstName | high | [3] header 'Tarundeep' with samples 'KISHOR', 'jigyasa', 'ABDUL', 'sudesh', 'yogesh' are given names |
| 4 | lastName | high | [4] header 'kaur' with samples 'KUNAL', 'srivastava', 'KF', 'subba', 'bhanushali' are surnames |
| 5 | address1 | high | [5] values are street addresses (e.g., '1548,pushpac complex,sector-49-b,chandigarh', 'RAHAM KHAN', 'murmah tea estate') |
| 7 | city | high | [7] values are Indian city names: 'chandigarh', 'DARBHANGA', 'mirik' |
| 8 | state | high | [8] values are Indian states: 'Chandigarh', 'Bihar', 'West Bengal' |
| 9 | country | high | [9] values are 'India' or 'IND', country identifier |
| 10 | zip | high | [10] values are 6-digit Indian postal codes: '160047', '846004', '734214' |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers: '9872876818', '9716688666', '7838594981' |
| 12 | phone | high | [12] values are 10-digit Indian mobile numbers, alternate/secondary phone field |
Notes: Yatra.com travel booking platform breach. Column [0] is numeric user ID (skipped). Column [6] is empty/blank (skipped). All PII fields identified: email, name components (suffix, first, last), full address (street, city, state, country, ZIP), and dual phone numbers.
4.csv12 columns99,017 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is an email; all values contain @ and are valid email addresses | |
| 2 | suffix | high | [2] header 'Mr'; all values are salutations/titles (Mr) |
| 3 | firstName | high | [3] values are common given names (ujjwal, vivek, prashant, priyanka, Sunil) |
| 4 | lastName | high | [4] values are surnames (gupta, tiwari, sangwan, vardhan, Sajnani) |
| 5 | address1 | high | [5] values are street addresses (168,A.P.R.,Colony Katanga, 772,main market, etc.) |
| 6 | address2 | medium | [6] values appear to be secondary address components (sec 27d, Brook Road Gadag Road, Nagar) |
| 7 | city | high | [7] values are Indian city names (Jabalpur, katni, chandigarh, Hubli, Ajmer) |
| 8 | state | high | [8] values are Indian states (Madhya Pradesh, Chandigarh, Karnataka, Rajasthan) |
| 9 | country | high | [9] all values are 'IND' (India country code) |
| 10 | zip | high | [10] values are Indian PIN codes (482001, 483770, 160019, 580020, 305001) |
| 11 | phone | high | [11] header '9811348731'; all values are 10-digit Indian mobile numbers |
| 12 | phone | high | [12] values are 10-digit phone numbers (alternate/secondary phone number) |
Notes: 13 columns total, 12 contain PII. Column [0] is a numeric user ID (skip). Breach context confirms Indian travel platform with Indian addresses, phone numbers, and email providers. Address broken into components: street (address1), secondary (address2), city, state, country, postal code. Two phone columns: primary [11] and alternate [12].
40.csv11 columns99,299 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and match email format ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] values are titles/salutations (Mr, Mrs) |
| 3 | firstName | high | [3] header position and values are common given names (Santosh, Prashant, deepa, harmohan, NAMAN, Daya) |
| 4 | lastName | high | [4] header position and values are surnames (Gupta, Sharma, kabra, bhatia, KAUSHIK, Nand) |
| 5 | address1 | high | [5] values are street addresses (champasari more, Sangam Vihar New Delhi, bougen villa aundh pune, h no 548 st no 5 guru nan, 31-SHIVAJI ENCLAVE) |
| 7 | city | high | [7] values are Indian cities (Siliguri, New Delhi, pune, patiala, LUCKNOW) |
| 8 | state | high | [8] values are Indian states (West Bengal, Delhi, Maharashtra, Punjab, Uttar Pradesh) |
| 9 | country | high | [9] values are country codes/names (IND, India) |
| 10 | zip | high | [10] values are Indian postal codes (734001, 110062, 411007, 147001, 226016) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9832043310, 9968238272, 9552570365, 9569585244, 9674304005) |
| 12 | phone | high | [12] values are 10-digit Indian phone numbers, appears to be alternate phone field (9832043310, 29913128, 9552570365, 9569585244, 9674304005) |
Notes: Yatra 2019 breach of Indian travel booking platform. Column [0] is numeric user ID (skip). Column [6] is empty/blank (skip). Columns [1], [11], [12] represent email and two phone number fields. All addresses confirmed as Indian with Indian postal codes and states.
41.csv12 columns99,313 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbols and match email format (gmail.com, rediffmail.com, yahoo.com) | |
| 2 | suffix | high | [2] header 'Mr', values are salutations/titles (Mr, Mrs.) |
| 3 | firstName | high | [3] values are common given names (zeeshana, Manas, SUKHENDU, James, Shiv) |
| 4 | lastName | high | [4] values are surnames (mushtaq, Bhattacharjee, CHANDA, Dillon, Mani) |
| 5 | address1 | high | [5] values are street addresses (New Rausa Patna Colony, kulgachia, K-509 Near Mata) |
| 6 | city | high | [6] values are Indian city names (howrah, Cuttack, kolkata, Delhi) |
| 7 | city | high | [7] values are Indian city names (Cuttack, kolkata, Delhi) — duplicate city field or city variant |
| 8 | state | high | [8] values are Indian state names (Orissa, Delhi) |
| 9 | country | high | [9] values are country codes (IND = India) |
| 10 | zip | high | [10] values are Indian postal codes/PIN codes (753001, 110037) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9906816378, 9777171714, 9477032166) |
| 12 | phone | high | [12] values are 10-digit Indian phone numbers, likely alternate/secondary phone |
Notes: Yatra.com travel booking platform breach. Column [0] is numeric user ID (skip). Columns [6] and [7] both contain city data — may represent primary/alternate city or data duplication. All phone numbers are Indian format. Addresses, cities, states, and postal codes are consistent with Indian geography. Column [2] contains salutation/title data mapped as suffix field.
42.csv10 columns99,346 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] All values contain @ signs and are email addresses (yahoo.in, gmail.com, rediffmail.com) | |
| 2 | suffix | high | [2] Header 'Mr.' and values are salutations (Mr., Mr) |
| 3 | firstName | high | [3] Values are common given names (siddhant, VINEETA, Chetan, dicky, sayyed, BRAJESWAR) |
| 4 | lastName | high | [4] Values are surnames (surana, SINGH, Kawatia, hora, khalid, MISHRA) |
| 5 | address1 | high | [5] Sample contains 'Yatra.Com 1101-03', typical street/building address format |
| 7 | city | high | [7] Values are Indian city names (kolkata, Gurgoan, mumbai) |
| 8 | state | high | [8] Values are Indian state names (West Bengal, Maharashtra) |
| 9 | country | high | [9] All values are 'India' |
| 10 | zip | high | [10] Sample '400089' is an Indian PIN code format |
| 11 | phone | high | [11] All values are 10-digit Indian mobile numbers starting with 9 |
Notes: 13 columns total. Column [0] is numeric user ID (skip). Columns [6] and [12] are empty (skip). Breach context confirms Indian travel platform with Indian addresses, phone numbers, and email providers.
43.csv11 columns99,397 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, clear email addresses from Indian domains (rediffmail.com, gmail.com, yahoo.com) | |
| 2 | suffix | high | [2] header 'Mr.', values are salutations/titles (Mr., Mr) |
| 3 | firstName | high | [3] values are given names (Rajneesh, HIMALAYA, akansha, Mapuia, NAVINRAMJI) |
| 4 | lastName | high | [4] values are family names (Bhimte, GAMOT, bilolikar, pant, Hmar, PARMAR) |
| 5 | address1 | high | [5] values are street addresses (Ward no 01 banjar coloney, 208 center plaza, 1/17 vivekanand path) |
| 7 | city | high | [7] values are Indian city names (balaghat, mumbai, patna, amravati) |
| 8 | state | high | [8] values are Indian states (Madhya Pradesh, Maharashtra, Bihar) |
| 9 | country | high | [9] values are country identifiers (India, IND) |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (481111, 400097, 800013, 444809) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9406766015, 9833406888, 9422165920) |
| 12 | phone | high | [12] values are 10-digit Indian phone numbers, alternate/second phone field (9798423128, 9404339108) |
Notes: 13 columns total. Column [0] is numeric user ID (skip). Column [6] is empty (skip). Yatra-2019 Indian travel booking breach with complete user profiles including names, emails, addresses, and phone numbers.
44.csv12 columns99,358 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and match email format ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] header 'Mr', values are salutations/titles (Mr, Mr, Mr) |
| 3 | firstName | high | [3] values are common given names (MURTAZA, DHRUVA, Ankan, vinoth, madhu) |
| 4 | lastName | high | [4] values are surnames (TRAVADI, REDDY, Sarma, kumar, venkatesh) |
| 5 | address1 | high | [5] values are street addresses (Raj paints, 17-1-478/19 krishna nagar, Lakhinagar, no 93 otthavadai st) |
| 6 | address2 | medium | [6] mostly empty/spaces, secondary address line field |
| 7 | city | high | [7] values are Indian city names (porbandar, hyderabad, Guwahati, arakkonam, hyd) |
| 8 | state | high | [8] values are Indian states (Gujarat, Andhra Pradesh, Assam, Tamil Nadu) |
| 9 | country | high | [9] values are 'IND', country code for India |
| 10 | zip | high | [10] values are 6-digit Indian postal codes (360575, 500059, 781005, 631002, 534275) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9825281392, 9789985126, 9706217852, 9196771286) |
| 12 | phone | high | [12] values are 10-digit Indian mobile numbers, alternate/secondary phone number |
Notes: Yatra-2019 breach. 13 columns total. Column [0] is numeric user_id (skipped). Columns [11] and [12] both contain phone numbers — likely primary and secondary/alternate mobile. All addresses, names, phones, emails, and postal codes are Indian in origin, consistent with Yatra.com platform.
45.csv12 columns99,364 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, clearly email addresses (gmail.com, rediffmail.com pattern) | |
| 2 | suffix | high | [2] header 'Mr.', values are salutations/titles (Mr, Mr.) |
| 3 | firstName | high | [3] values are given names (SUBHASISH, DEVENDRA, Chandan, Tapi, senthil, Ankit) |
| 4 | lastName | high | [4] values are family names (SAHA, WALDE, Dheeraj, Nalo, kumar, Murarka) |
| 5 | address1 | high | [5] values are street addresses and apartment numbers (202,195/34, A6/203, azhuvalappil) |
| 6 | address2 | high | [6] values are secondary address components (Shrinath Soceity, thalakap — building/locality names) |
| 7 | city | high | [7] values are Indian city names (Tiruchirappalli, lucknow, Pune, kottakkal) |
| 8 | state | high | [8] values are Indian states (Tamilnadu, Uttar Pradesh, Kerala) |
| 9 | country | high | [9] values are country indicators (India, IND) |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (620024, 226018, 676503) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9994593516, 9766576353, 8130808225) |
| 12 | phone | high | [12] values are 10-digit Indian phone numbers (8081863718, 4832706593) — alternate/secondary phone |
Notes: Yatra.com 2019 breach — Indian travel platform. Column [0] is numeric user_id (skipped). Complete address structure present: address1, address2, city, state, country, zip. Two phone columns capture mobile and alternate numbers. Suffix field captures Mr./Ms. salutations common in Indian records.
46.csv12 columns99,358 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] All values contain @ signs, clearly email addresses from various providers (gmail.com, yahoo.co.in, rediffmail.com, etc.) | |
| 2 | suffix | high | [2] Values are Mr., Mrs, Ms, consistent with salutation/title field |
| 3 | firstName | high | [3] Values are common given names (Sujit, ANKIT, Mamta, venu, INZAMAMUL, etc.) |
| 4 | lastName | high | [4] Values are surnames (Jaiswal, GUPTA, Jha, dorna, HOQUE, etc.) |
| 5 | address1 | high | [5] Values are street addresses (Room No:344 Narmada Hostel, H.N.66 CHRIAG DELHI, flat.no.102 reddy apartment, etc.) |
| 6 | address2 | medium | [6] Sparse column, appears to be secondary address/apartment info, mostly empty |
| 7 | city | high | [7] Values are Indian city names (Dharwad, NEW DELHI, secunderabad, Chennai, Patna, Darbhanga, Delhi, etc.) |
| 8 | state | high | [8] Values are Indian states (Karnataka, Delhi, Andhra Pradesh, Bihar, Tamil Nadu, Maharashtra, Uttar Pradesh, etc.) |
| 9 | country | high | [9] Values are India, IND, India consistently |
| 10 | zip | high | [10] Values are 6-digit Indian PIN codes (580002, 110017, 500003, 600117, 800004, 846001, 110092, 411019, 522626, 452008, 209724, 110009, 424108, etc.) |
| 11 | phone | high | [11] Values are 10-digit Indian mobile numbers (8123472948, 875062562, 7738560101, 9959224411, 9775890989, 9474283010, 8977385909, 9900968850, etc.) |
| 12 | phone | high | [12] Alternate/secondary phone field, also 10-digit Indian mobile numbers where populated |
Notes: Yatra-2019 breach data. 13 columns total (0-12), 12 contain PII. Column 0 is user_id (skip). Records are Indian travel booking platform user accounts with complete personal information including addresses, phone numbers, email addresses, names, and location data.
47.csv10 columns99,392 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbol and match email patterns ([email protected], [email protected], etc.) | |
| 3 | suffix | high | [3] values are salutations/titles (Mr., Mrs.) indicating suffix field |
| 4 | firstName | high | [4] values are common given names (Nikhil, Varun, parshuram, Apratim) |
| 5 | lastName | high | [5] values are family names (Gupta, Sharma, Yadavalli, yadav, Agrawal) |
| 6 | address1 | high | [6] values are street addresses and building identifiers (A-3, 9930919568 appears to be phone in this position, Viman Nagar, 632/308 Sanatan Nagar) |
| 7 | address1 | high | [7] continuation of address data (Bhushan Chs - apartment/building name) |
| 8 | address1 | high | [8] street/road names (V.P Road, Pune, Lucknow - mixed address components) |
| 9 | address2 | high | [9] landmark/area descriptors (Near Andhra Bank, Maharashtra, Uttar Pradesh) |
| 10 | country | high | [10] values are 'India' and 'IND' indicating country field |
| 11 | zip | high | [11] values are postal codes (421201, 411014, 226028) and city-PIN combinations (Dombivli(East)-421201) |
Notes: File has no header row. Column [0] contains numeric user IDs (customer_id) - treated as skip. Column [2] contains city names but mostly empty in sample - appears to be city field but sparse. Columns [6-11] contain fragmentary address data distributed across multiple fields (typical of parsed address storage). Indian context confirmed by address patterns, PIN codes, and city names (Thane, Dombivli, Pune, Lucknow). Total 12 columns, 11 contain PII.
48.csv11 columns99,235 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, standard email format (gmail.com, yahoo.co.in, rediffmail patterns) | |
| 2 | suffix | high | [2] values are 'Mr', 'Mr.', 'Mr' — salutation/title field |
| 3 | firstName | high | [3] values are given names: Ramakrishnarao, ZHANG, govind, Shanmuga, JITENDRA, pintu |
| 4 | lastName | high | [4] values are family names: Achanta, YUANCHAO, singh, Chitipiralla, VYAS, kumar |
| 5 | address1 | high | [5] values are street addresses: 'PLOT NO.176, WARD 12-C', 'model house ldh', 'singjamei thongam leikai' |
| 7 | city | high | [7] values are Indian city names: delhi, GANDHIDHAM, ludhiana, hoshangabad, Imphal |
| 8 | state | high | [8] values are Indian states: Delhi, Gujarat, Punjab, Madhya Pradesh, Manipur |
| 9 | country | high | [9] values are 'IND', 'India' — country codes/names |
| 10 | zip | high | [10] values are Indian PIN codes: 110044, 370201, 141003, 461001, 795008 (6-digit postal codes) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers: 9448066554, 9597227977, 8765730865, 9490843701, 9898038586 |
| 12 | phone | high | [12] alternate phone numbers, same format as [11]: 8765730865, 9560759461 |
Notes: 13 columns total. Column [0] is numeric user_id (skipped). Column [6] is entirely empty (skipped). Breach context confirms Indian travel platform with Indian addresses, phone numbers, email providers, and cities. All PII columns identified and mapped.
49.csv10 columns99,300 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] Values contain @ signs and match email format ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] Values are salutations/titles: 'Ms', 'Mr.', 'Mr' |
| 3 | firstName | high | [3] Values are given names: 'Lisa', 'k james', 'ravi', 'XAVIER', 'Rajasridhar', 'deepi' |
| 4 | lastName | high | [4] Values are family names: 'Stenmark', 'Mathai', 'mallian', 'SHERMEILA', 'Lankapalli', 'kaur' |
| 5 | address1 | high | [5] Values are street addresses: '38 sector A , Ambedkar Colony Govindp...' |
| 7 | city | high | [7] Values are city names: 'Bhopal' |
| 8 | state | high | [8] Values are Indian states: 'Madhya Pradesh' |
| 9 | country | high | [9] Values are 'India' |
| 10 | zip | high | [10] Values are Indian PIN codes: '462023' |
| 11 | phone | high | [11] Values are 10-digit Indian mobile numbers: '9815825646', '9826361390', etc. |
Notes: Yatra.com travel booking breach. Column [0] is numeric user ID (skip). Columns [6] and [12] are empty (skip). No numbered PII columns detected.
5.csv12 columns99,271 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, match email format with domains (yahoo.com, gmail.com, yahoo.co.in, yahoo.in) | |
| 2 | suffix | high | [2] header values are titles: 'Mr', 'Mrs.', 'Ms' — salutations/suffixes |
| 3 | firstName | high | [3] values are given names (vivek, krushna, SENTHIL, SK, preethy, Harpreetsingh) |
| 4 | lastName | high | [4] values are family names (mahajan, sahoo, MURGAN, Joshi, gopaal, Punjabi) |
| 5 | address1 | high | [5] values are street addresses (railwayrod, IOCL, suryasoma, Heliconia,Magarpatta city,Hadapsar) |
| 6 | address2 | medium | [6] secondary address component, sparse values (thirunageswaram, JAMMU) |
| 7 | city | high | [7] values are Indian city names (kumbakonam, JAMMU, munnar, Pune) |
| 8 | state | high | [8] values are Indian states (Karnataka, Tamilnadu, Jammu and Kashmir, Kerala, Maharashtra) |
| 9 | country | high | [9] values are country codes/names (India, IND, IN) |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (612204, 180015, 685612, 411028) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (8085660666, 9611672288, 8233586063, 9426422681, 9086088447, 9447004303) |
| 12 | phone | high | [12] alternate phone numbers, 10-digit format (9571003885, 1912460038) — secondary phone field |
Notes: Yatra-2019 breach, Indian travel booking platform. Column [0] is numeric user ID (skip). All 13 columns present; 12 contain PII. Data is entirely Indian: addresses, phone numbers, email providers (yahoo.co.in, rediffmail.com), cities, and states confirm Indian origin.
50.csv11 columns99,367 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, clear email addresses (gmail.com, yahoo.co.in, hotmail.com) | |
| 2 | suffix | high | [2] header 'Mr', values are salutations (Mr, Mr.) |
| 3 | firstName | high | [3] values are given names (julius, Babyrani, jaydeepnarayan, rahul, Ganesh) |
| 4 | lastName | high | [4] values are family names (amstrong, Arambam, srivastava, dey, Sahoo) |
| 5 | address1 | high | [5] values contain street addresses and building numbers (107classic business center, Yatra.Com 1101-03, #503) |
| 6 | address2 | medium | [6] values appear to be secondary address components (4th Main) or building details |
| 7 | city | high | [7] values are Indian cities (bengaluru, Gurgoan) |
| 8 | state | high | [8] values are Indian states (Karnataka) |
| 9 | country | high | [9] values indicate country (India) |
| 10 | zip | high | [10] values are 6-digit Indian postal codes (560001, 560019) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (8281255381, 8750788331, 9835021326) |
Notes: 13 columns total, 11 contain PII. Column [0] is numeric user_id (skip). Column [12] is empty (skip). Breach context confirms Indian travel booking platform with Indian user data.
51.csv11 columns99,387 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbols, standard email format with Indian domains (gmail.com, yahoo.com) | |
| 2 | suffix | high | [2] values are 'Mr', 'Mr.' — salutation/title prefix |
| 3 | firstName | high | [3] header pattern and values are common given names (ganesh, vijay, sreenivasapavankumar, Rushik, KNIRMALA, RS) |
| 4 | lastName | high | [4] values are surnames (pillai, bhushan, surey, Patel, NIRMALA, Sharma) |
| 5 | address1 | high | [5] street addresses with house numbers and locality names (5/89 A PRIYABHAVAN, House No.3977) |
| 7 | city | high | [7] values are Indian city names (CHENNAI, Rewari, RAJAHMUNDRY, Gurgoan) |
| 8 | state | high | [8] values are Indian state/province names (Tamil Nadu, Haryana, Andhra Pradesh, Delhi) |
| 9 | country | high | [9] values are 'IND' and 'India' — country code/name |
| 10 | zip | high | [10] 6-digit postal codes matching Indian PIN code format (629704, 123401, 533101, 110043) |
| 11 | phone | high | [11] 10-digit numbers matching Indian mobile phone format (9469732301, 9355763377, 9848156621) |
| 12 | phone | high | [12] 10-digit numbers matching Indian mobile phone format — alternate/secondary phone number |
Notes: Yatra.com 2019 breach — Indian travel booking platform. Column [0] is user_id (skipped). Column [6] is empty (skipped). All phone numbers are Indian mobile format (10 digits starting with 9). Addresses and location data all Indian. Two phone columns ([11] and [12]) represent primary and alternate contact numbers.
52.csv11 columns96,890 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and are valid email addresses (yahoo.co.in, gmail.com, hotmail.com) | |
| 2 | suffix | high | [2] header empty but values are 'Mr.' and 'Mr', which are salutation titles |
| 3 | firstName | high | [3] header empty but values are given names (abubakkar, surendra, Anil, diptakshi, Kamal) |
| 4 | lastName | high | [4] header empty but values are family names (siddique, kumar, Kumar, mondal, Bhardwaj) |
| 5 | address1 | high | [5] header empty but values are street addresses (No.302 moolai streeet, vill-silaich pos-balapur, Yatra.Com 1101-03, 6/1 J M LANE, 12/92 geeta colony) |
| 7 | city | high | [7] header empty but values are Indian city names (thiruvannamalai, ghazipur, Kolkata, Gurgoan, Bangalore) |
| 8 | state | high | [8] header empty but values are Indian states (Tamilnadu, Uttar Pradesh, West Bengal, Karnataka) |
| 9 | country | high | [9] header empty but values are 'India' and 'IND', country codes/names |
| 10 | zip | high | [10] header empty but values are Indian postal codes (606708, 273227, 700008, 560053, 110031) |
| 11 | phone | high | [11] header empty but values are 10-digit Indian mobile phone numbers (9994420065, 9790434352, 9721771231, 9818363222, 9883078522) |
| 12 | phone | high | [12] header empty but values are 10-digit Indian phone numbers (9538383882), appears to be alternate/secondary phone |
Notes: Yatra 2019 breach: 13 columns total. Column [0] is user_id (skipped). Column [6] contains no header and no sample values (skipped). All other columns contain valid PII. Breach context confirms Indian travel booking platform with Indian addresses, phone numbers, and email providers.
53.csv11 columns99,489 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and email domains (gmail.com, yahoo.co.in, rediffmail.com) | |
| 2 | suffix | high | [2] header shows salutation patterns: 'Mr', 'Mr.', 'Mrs' |
| 3 | firstName | high | [3] values are common given names (Nishabh, Rahul, ravindra, umesh, Antara, sujan) |
| 4 | lastName | high | [4] values are family names (Jauhari, Jadon, limay, sharma, Bhattacharjee, saha) |
| 5 | address1 | high | [5] values contain street addresses and building numbers (Yatra.Com 1101-03, gazna,east bishnupur,24pg) |
| 7 | city | high | [7] values are Indian cities (Gurgoan, kolkata) |
| 8 | state | high | [8] values are Indian states (West Bengal) |
| 9 | country | high | [9] values are country codes/names (IND, India, Other) |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (743273) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9003226678, 9329294999, 9224582575, 9158989735, 9830595722) |
| 12 | phone | high | [12] values are 10-digit Indian phone numbers (alternate/secondary phone) |
Notes: Yatra.com travel booking platform breach (2019). File contains 13 columns; 11 contain PII. Column [0] is user_id (skip), column [6] is empty (skip). Data is entirely Indian: Indian addresses, phone numbers, email providers (rediffmail.com, yahoo.co.in), cities (Kolkata, Gurgaon), and states (West Bengal). One record confirms 'Yatra Office' in Gurgaon.
54.csv7 columns99,436 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and match email format (yahoo.com, gmail.com, ingvysyabank.com) | |
| 2 | suffix | high | [2] header 'Mr', all values are salutations/titles |
| 3 | firstName | high | [3] values are common given names (Anil, Ravindra, Shiladitya, Ashish, Nikhil) |
| 4 | lastName | high | [4] values are common surnames (Sharma, Mate, Nag, Pahuja, Ahire) |
| 5 | address1 | medium | [5] sample value 'Yatra.Com 1101-03' matches street/building address pattern |
| 7 | city | medium | [7] sample value 'Gurgoan' (variant of Gurgaon) is an Indian city name |
| 11 | phone | high | [11] values are 10-digit numbers matching Indian mobile phone format |
Notes: 13 columns total. Column [0] is user_id (skip). Columns [6], [8], [9], [10], [12] are empty across samples (skip). Breach context confirms Indian travel platform with Indian addresses, phone numbers, and email providers.
55.csv11 columns98,005 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is an email address; all values contain @ signs and are valid email addresses | |
| 2 | suffix | high | [2] values are 'Mr', 'Mrs.', 'Mr.' — salutation/title suffixes |
| 3 | firstName | high | [3] values are given names: 'Akshaynidhi', 'Nima', 'shipinder', 'Mastinder Nath' |
| 4 | lastName | high | [4] values are family names: 'Rathore', 'Galjan', 'kaur', 'Yadav', 'AUGUSTIN' |
| 5 | address1 | high | [5] values are street addresses: 'NISHA RESIDENCY 1001', '34,anantya apartments,naththam link r...' |
| 7 | city | high | [7] values are Indian cities: 'jalandhar', 'GoregaonMumbai', 'chennai' |
| 8 | state | high | [8] values are Indian states: 'Punjab', 'Maharashtra', 'Tamil Nadu' |
| 9 | country | high | [9] values are country codes/names: 'India', 'IND' |
| 10 | zip | high | [10] values are Indian PIN codes: '144002', '400062', '603103' |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers: '9234667050', '8684012345', '9036721822' |
| 12 | phone | high | [12] alternate phone number; values are 10-digit Indian mobile numbers: '9867673042' |
Notes: Yatra.com 2019 breach. Column [0] is numeric user ID (skip). Column [6] is empty (skip). All PII columns identified: email, name components (firstName, lastName, suffix), full address (address1, city, state, country, zip), and dual phone numbers.
56.csv11 columns99,444 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] Values contain @ signs and match email format (yahoo.com, yahoo.co.in, rediffmail.com, gmail.com) | |
| 2 | suffix | high | [2] Values are 'Mr', 'Mr.', 'Mr' — salutation/title prefix |
| 3 | firstName | high | [3] Values are common given names (Hardik, surajit, PARTH, Nalin, riyaz) |
| 4 | lastName | high | [4] Values are surnames (Kapadi, singh, VERMA, Shah, Khan) |
| 5 | address1 | high | [5] Values contain street addresses and building references (Yatra.Com 1101-03, C/ 803 Shree Niketan, Guwahati assam) |
| 7 | city | high | [7] Values are Indian city names (Gurgoan, guwahati, Mumbai) |
| 8 | state | high | [8] Values are Indian states (Assam, Maharashtra) |
| 9 | country | high | [9] Values are 'IND' and 'India' — country code/name |
| 10 | zip | high | [10] Values are 6-digit postal codes (788736, 400067) — Indian PIN codes |
| 11 | phone | high | [11] Values are 10-digit numbers matching Indian mobile phone format |
| 12 | phone | high | [12] Values are 10-digit numbers — alternate/secondary phone number |
Notes: Yatra 2019 breach. 13 columns total, 12 contain PII. Column [0] is numeric user ID (skip). Column [6] is empty (skip). Data contains Indian user records with email, names, addresses, and phone numbers from major Indian travel booking platform.
57.csv13 columns99,499 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] Numeric sequential IDs (5437935, 5437936, etc.) — internal user/record identifiers |
| 1 | high | [1] All values contain @ symbol and match email format (gmail.com, infosys.com, yahoo.co.in) | |
| 2 | suffix | high | [2] Values are 'Mr', 'Mr.', 'Ms.' — salutation/title indicators |
| 3 | firstName | high | [3] Values are given names: Rajendra, srinivasa, KUNAL, idrish, vinit, Komal |
| 4 | lastName | high | [4] Values are surnames: Shende, karanam, SINHA, khan, kumar, Tak |
| 5 | address1 | high | [5] Values are street addresses: 'gandhidham', 'Bhaironath varanasi', 'Ramdas Colony,Ram nagar vistar, Sodala' |
| 6 | skip | high | [6] All values empty/blank across sample |
| 7 | city | high | [7] Values are Indian cities: bhuj, varanasi, Jaipur |
| 8 | state | high | [8] Values are Indian states: Uttar Pradesh, Rajasthan |
| 9 | country | high | [9] Values are 'IND' and 'India' — country codes/names |
| 10 | zip | high | [10] Values are 6-digit Indian PIN codes: 370110, 221001, 302006 |
| 11 | phone | high | [11] Values are 10-digit Indian mobile numbers: 8975657235, 9987303997, 9930566628, etc. |
| 12 | phone | high | [12] Values are 10-digit Indian phone numbers (alternate/secondary phone): 9099908462, 8795822111 |
Notes: Yatra 2019 breach — Indian travel platform. 13 columns total, 11 contain PII (email, names, full address, two phone fields). Columns 0 and 6 are non-PII (user_id and empty field). Breach context confirms Indian addresses, phone numbers, and email providers.
58.csv6 columns99,502 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and email domains (yahoo.com, gmail.com, rediffmail patterns); matches email format | |
| 2 | suffix | high | [2] values are 'Mr.' and 'Mr' — salutation/title fields typical in breach context |
| 3 | firstName | high | [3] header 'Abha' with values like 'abhik', 'Karthik', 'Jaz', 'ABDUL', 'Ram' — all common given names |
| 4 | lastName | high | [4] header 'Bhatnagarsaluja' with values like 'bajaj', 'Gurumoorthy', 'Manak', 'LEYMAN', 'Unrivaled' — surname pattern |
| 9 | country | high | [9] value 'India' matches country field; breach context confirms all Indian records |
| 11 | phone | high | [11] values are 10-digit numbers (9412522621, 858606047, 8884182631, 9415131305, 6593847985, 9820384897) — Indian mobile phone format |
Notes: Yatra-2019 Indian travel booking breach. Column [0] is numeric user_id (skip). Columns [5–8], [10], [12] are empty across all samples. Column [2] contains salutation/prefix (suffix field). 6 PII columns identified out of 13 total.
59.csv5 columns99,367 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and valid email addresses (Gmail, Yahoo, corporate domains) | |
| 2 | suffix | high | [2] header/values are salutations: 'Mr', 'Miss' — standard title/suffix indicators |
| 3 | firstName | high | [3] values are common given names (Sushil, Shirish, Ashish, Shubham, Chandreyi, satyabrata) |
| 4 | lastName | high | [4] values are surnames (Gautam, Chandavarkar, Agarwal, Singh, daklai, Das sharma) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (8294222522, 8587078399, 9833554423, etc.) |
Notes: File contains 13 columns. Column [0] is a numeric user ID (skip). Columns [5-10] and [12] are empty in all sample rows (skip). This is a structured PII breach from Yatra.com containing Indian user records with email, name, title, and phone data.
6.csv12 columns99,321 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header 'email', values contain @ signs and valid email addresses (gmail.com, yahoo.com, rediffmail.com) | |
| 2 | suffix | high | [2] header 'salutation', values are titles: 'Mr', 'Mrs' |
| 3 | firstName | high | [3] header 'first_name', values are common given names: Vidhu, binod, Sanjay, madderla, abiskar |
| 4 | lastName | high | [4] header 'last_name', values are surnames: Sood, kejriwal, Shinde, sathaiah, sinha |
| 5 | address1 | high | [5] header 'address1', values are street addresses with building numbers and street names |
| 6 | address2 | high | [6] header 'address2', values are secondary address components (apartment/area details) |
| 7 | city | high | [7] header 'city', values are Indian city names: Ghaziabad, muzaffarpur, Vasai, viziyanagaram, kolkata |
| 8 | state | high | [8] header 'state', values are Indian state names: Andhra Pradesh, Uttar Pradesh, bihar, Maharashtra, west bengal |
| 9 | country | high | [9] header 'country', values are country codes: 'IND', 'IN' |
| 10 | zip | high | [10] header 'zip', values are 6-digit Indian postal codes: 516001, 201005, 842001, 401208 |
| 11 | phone | high | [11] header 'phone', values are 10-digit Indian mobile numbers: 9966785475, 9971471113, 9431238778 |
| 12 | phone | high | [12] header 'alternate_phone', values are Indian phone numbers (mobile and landline formats): 0120-4134190, 8922223222, 9830325718 |
Notes: 13 columns total, 12 contain PII. Column [0] is user_id (numeric identifier, skipped). Breach context confirms this is Yatra 2019 Indian travel booking platform data with Indian addresses, phone numbers, and email providers.
60.csv12 columns99,302 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] Numeric user IDs (5547528, 5547529, etc.) — internal customer identifiers |
| 1 | high | [1] Values contain @ symbol and are valid email addresses ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] Salutation/title values: 'Ms', 'Mr', 'Mr.' — formal address prefixes |
| 3 | firstName | high | [3] Common given names (ranjitha, Reinhard, Abdul, ELAKKIYAVENDAN, OZA, Sameer) |
| 4 | lastName | high | [4] Common family names (nayak, Buck, Jabbar, RAJENDRAN, PUNAMCHAND, Belusonti) |
| 5 | address1 | high | [5] Street address values (e.g., '18-7-312/1d/43 Aman Nagar') — mailing street/first address line |
| 7 | city | high | [7] Indian city names (Hyderabad, srikakulam) |
| 8 | state | high | [8] Indian state names (Andhra Pradesh, Andhra Pradesh) |
| 9 | country | high | [9] Country code/name values (IND, India) |
| 10 | zip | high | [10] Indian PIN/postal codes (500023, 532001) |
| 11 | phone | high | [11] 10-digit Indian mobile/phone numbers (8105003571, 9059060173, 9944420624, etc.) |
| 12 | phone | high | [12] Alternate phone numbers — 10-digit Indian mobile format (9059060173) |
Notes: Yatra 2019 breach — Indian travel booking platform. Column 6 is empty/unused in all samples. 11 PII columns identified (email, suffix, firstName, lastName, address1, city, state, country, zip, phone, phone). Columns 0 and 6 are skipped (user_id and empty column respectively).
61.csv11 columns99,376 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and email domains (gmail.com, rediffmail patterns) | |
| 2 | suffix | high | [2] header/values are 'Mr', 'Mr.', 'Ms' — salutation titles |
| 3 | firstName | high | [3] values are common given names (Dhananjay, Ananthula, Pankaj, Veera, Dhairya, swetha) |
| 4 | lastName | high | [4] values are surnames (Kumar, Giridhar, kumar, manikandan, Ashar, s) |
| 5 | address1 | high | [5] values are street addresses (Yatra Office, A-7 Zavernagar Society) |
| 7 | city | high | [7] values are Indian cities (Gurgaon, vadodara) |
| 8 | state | high | [8] values are Indian states (Haryana, Gujarat) |
| 9 | country | high | [9] values are 'India', 'IND' — country codes/names |
| 10 | zip | high | [10] values are Indian PIN codes (122003, 390022) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9002135246, 9866559484, etc.) |
| 12 | phone | high | [12] alternate/secondary phone field, 10-digit Indian mobile numbers |
Notes: Yatra 2019 breach dataset. Column [0] is numeric user_id (skip). No header row present. All addresses, phone numbers, and email domains are Indian. 11 of 13 columns contain PII.
62.csv10 columns98,134 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is an email address; all sample values contain @ and are valid email addresses | |
| 2 | suffix | high | [2] values are 'Mr', 'Mr.', which are common salutation/title suffixes |
| 3 | firstName | high | [3] values are 'Amit', 'sakib', 'jahir', 'Aun' — common given names |
| 4 | lastName | high | [4] values are 'kumar', 'shaikh', 'abbas', 'Alia' — common family names |
| 5 | address1 | high | [5] values are street addresses like 'vill+post.=saryan', 'House no1 & 2 Gali no 1', '110052' (mixed with postal codes) |
| 6 | phone | high | [6] value '8588851512' is a 10-digit number matching Indian mobile phone format |
| 8 | state | high | [8] value 'Uttar Pradesh' is an Indian state name |
| 9 | country | high | [9] value 'India' is a country name |
| 10 | zip | high | [10] value '277121' is a 6-digit Indian PIN/postal code |
| 11 | phone | high | [11] values are 10-digit numbers ('9699545608', '9812195202', etc.) matching Indian mobile phone format; alternate phone number |
Notes: 13 columns total. Column [0] contains user IDs (numeric, skip). Column [7] appears to be a username or login handle ('baliia', skip as non-standard). Column [12] is empty. Breach context confirms Indian travel booking platform with Indian addresses, phone numbers, and email providers.
63.csv11 columns98,283 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is an email; all values are email addresses with @ signs | |
| 2 | suffix | high | [2] header 'Mr.' contains salutation/title; values are 'Mr', 'Mr.' — common suffixes/titles |
| 3 | firstName | high | [3] header 'rambhajan' is a given name; values are common first names (kranthi, Omkar, veer, dilshad) |
| 4 | lastName | high | [4] header 'kumawat' is a surname; values are family names (Chennamsetty, Mohite, sharma, alam, Gholap) |
| 5 | address1 | high | [5] header is a street address; values contain street addresses with house numbers, lane names, and building details |
| 7 | city | high | [7] header 'pune' is an Indian city; values are Indian cities (Gurgoan, NEW DELHI, Vapi) |
| 8 | state | high | [8] header 'Maharashtra' is an Indian state; values are Indian states (Maharashtra, Gujarat) |
| 9 | country | high | [9] header 'India' is a country; all values are 'India' or 'IND' |
| 10 | zip | high | [10] header '411033' is an Indian postal code (PIN); values are 5-6 digit Indian PIN codes (110019, 396191) |
| 11 | phone | high | [11] header '9049494641' is a 10-digit phone number; all values are 10-digit Indian mobile numbers |
| 12 | phone | high | [12] values are 10-digit Indian mobile numbers or empty; alternate/secondary phone field |
Notes: 13 columns total, 10 contain PII. Column [0] is numeric user ID (skip). Column [6] is empty (skip). Breach confirmed as Yatra 2019 — Indian travel booking platform with Indian addresses, phone numbers, email providers, and city/state references.
64.csv11 columns99,469 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, clearly email addresses (gmail.com, live.com, northwestern.edu) | |
| 2 | suffix | high | [2] values are 'Mr', 'Mr.', 'Mr' — salutation/title field |
| 3 | firstName | high | [3] values are given names (Murali, SUBHASIS, Edmond, Sandeep, venkat) |
| 4 | lastName | high | [4] values are family names (Mohan, BOSE, Ferrao, Goel, sundar) |
| 5 | address1 | high | [5] header context + sample values show street addresses (chakdaha kanthalpuli, kannppa bilding) |
| 7 | city | high | [7] values are Indian city names (kolkata, banglore) |
| 8 | state | high | [8] values are Indian states (West Bengal, Karnataka) |
| 9 | country | high | [9] values are 'IND', 'India' — country codes/names |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (741222, 560036) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9676438881, 7407535001, 9642525252) |
| 12 | phone | high | [12] values are 10-digit Indian mobile numbers — alternate/second phone number |
Notes: 13 columns total. Breach context confirms Yatra (Indian travel platform) with Indian addresses, Indian phone numbers (10-digit format), Indian cities/states, Indian PIN codes. Column [0] is numeric user ID (skip). Column [6] is empty. Columns [5], [7], [8], [9], [10] together form complete Indian postal address. Columns [11] and [12] are both phone numbers.
65.csv10 columns99,397 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbols and email domains (gmail.com, rediffmail.com patterns consistent with Indian travel booking) | |
| 2 | suffix | high | [2] values are 'Mr.' / 'Mr' — salutation/title prefix |
| 3 | firstName | high | [3] header position and values are common given names (vaibhav, Parag, john, Mohd, CEENA, DULAL) |
| 4 | lastName | high | [4] values are surnames (dubey, Mital, thapa, Ahmad, JOSEPH, SHIL) |
| 5 | address1 | high | [5] values are street addresses ('Srikona daily bazar', 'Yatra.Com 1101-03' — Gurgaon office reference) |
| 7 | city | high | [7] values are Indian city names (silchar, Gurgoan/Gurgaon) |
| 8 | state | high | [8] values are Indian states (Assam) |
| 9 | country | high | [9] values are 'India' — country field |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (788001) |
| 11 | phone | high | [11] values are 10-digit numbers consistent with Indian mobile phone format |
Notes: 13 columns total; 11 contain PII (email, suffix, firstName, lastName, address1, city, state, country, zip, phone). Columns [0], [6], [12] are empty or internal identifiers (skip). Breach context confirms Indian travel platform (Yatra.com) with user account records including addresses, emails, names, and phone numbers.
66.csv10 columns99,299 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, match email pattern (gmail.com, yahoo.co.in, etc.) | |
| 2 | suffix | high | [2] values are 'Mr', 'Mrs', 'Mr.' — salutation/title indicators |
| 3 | firstName | high | [3] header position and values are common Indian given names (Nagaraj, Ramaswamy, Abhijit, geeta, Sudhir, viswanath) |
| 4 | lastName | high | [4] values are surnames following first names (Sitaram, Ragu, nair, singh, Malik, ravikumar) |
| 5 | address1 | medium | [5] mostly empty but sample value 'Yatra Office' indicates street/mailing address line |
| 7 | city | high | [7] mostly empty but sample 'Gurgaon' is Indian city name |
| 8 | state | high | [8] mostly empty but sample 'Haryana' is Indian state |
| 9 | country | high | [9] mostly empty but sample 'India' confirms country field |
| 10 | zip | high | [10] mostly empty but sample '122003' matches Indian PIN code format (5-6 digits) |
| 11 | phone | high | [11] values are 10-digit numbers matching Indian mobile phone format (9739166243, 9873422360, etc.) |
Notes: 13 columns total. Yatra.com travel booking platform breach from 2019 containing Indian user account records. Column [0] contains numeric user IDs (skip). Columns [6] and [12] are empty (skip). Breach context confirms Indian addresses, phone numbers, email providers consistent with India.
67.csv5 columns98,098 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbols and match email format (gmail.com, rediffmail.com, etc.) | |
| 2 | suffix | high | [2] header 'Mr', values are salutations/titles (Mr, Ms) |
| 3 | firstName | high | [3] values are common given names (Sanjay, UNNIKRISHNA, Vicky, monika, sk, Atul) |
| 4 | lastName | high | [4] values are surnames (Sanjay, PANICKER, Hans, chandra, azijul, Tandon) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9431092293, 9869167114, 9213262144, etc.) |
Notes: File contains 13 columns; 5 contain PII. Columns [0] is user_id (skip), columns [5-10, 12] are empty (skip). No address fields present in this sample despite breach context mentioning addresses — those may be in other files or later columns.
68.csv10 columns99,548 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbols and are email addresses ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] values are titles/salutations (Mr, Ms, Mr., etc.) |
| 3 | firstName | high | [3] values are given names (Kapur, GEORGE, Susannah, Liyaqat, ankit, Samyak) |
| 4 | lastName | high | [4] values are family names (Shukla, WILLIAM, Robinson, Khan, prajapati, Jain) |
| 5 | address1 | high | [5] values are street addresses (Yatra Office, kadipur sultanpur uttar pradesh, Yatra.Com 1101-03) |
| 7 | city | high | [7] values are Indian city names (Gurgaon, sultanpur, Gurgoan) |
| 8 | state | high | [8] values are Indian states (Haryana, Uttar Pradesh) |
| 9 | country | high | [9] values are country (India) |
| 10 | zip | high | [10] values are Indian postal codes (122003, 228145) |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers (7709460812, 9709230221, 7841631742, etc.) |
Notes: Breach context confirms Yatra.com Indian travel platform. File contains user account records with names, email addresses, addresses (street, city, state, country, PIN), and mobile phone numbers. Column [0] is numeric user ID (skip). Columns [6] and [12] are empty (skip).
69.csv10 columns99,358 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbol and match email address format ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] values are salutations/titles: 'Mrs', 'Mr', 'Mrs.', 'Miss', 'Mr.' — standard name suffixes |
| 3 | firstName | high | [3] values are given names (MALATHY, KANDURU, Meenu, Mehak, surya, Mary) positioned after salutation |
| 4 | lastName | high | [4] values are family names (PILLAI, MAJHI, Bhullar, Sharma, khatri, kerketta) positioned after first name |
| 5 | address1 | high | [5] values are street addresses (F-21 GALI NO. 61 LAKSHMI NAGAR, 685/8 nanda nagar, a-13 Coral Block Vatika Green City) |
| 7 | city | high | [7] values are Indian city names (indore, Jamshedpur) — city field in address sequence |
| 8 | state | high | [8] values are Indian states (Uttar Pradesh, Madhya Pradesh, Jharkhand) — state field after city |
| 9 | country | high | [9] values are 'India' — country field in address hierarchy |
| 10 | zip | high | [10] values are Indian PIN codes (452001, 831018) — numeric postal codes |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9427623326, 9496559666, 9873653307, 7534078000, 9165077070) |
Notes: Yatra 2019 breach — Indian travel booking platform. Column [0] is numeric user ID (skip). Columns [6] and [12] are empty across all samples (skip). All remaining columns contain valid PII: email, name components (surname/title/first/last), complete Indian address (street, city, state, country, PIN), and mobile phone. No passwords, SSNs, or DOB fields present in this extract.
7.csv11 columns99,473 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbols and match email address format ([email protected], [email protected], etc.) | |
| 3 | suffix | high | [3] values are 'Mr' and 'Ms' — salutation/title indicators |
| 4 | firstName | high | [4] header context + values are common given names (Parag, sachin, Sudhir, gopal, Ajit, kalpana) |
| 5 | lastName | high | [5] values are surnames (Shah, sirohi, Singh, solanki, Gandhi, devi) |
| 6 | address1 | high | [6] values match street address format (193, h-8 , iitb , mumbai; flat no 106 sector 29; etc.) |
| 7 | address2 | high | [7] values are secondary address components or locality names (m.r.nagar,kodungaiyur; Sukchar; Lower Parel West; etc.) |
| 8 | city | high | [8] values are Indian city names (mumbai, noida, Mumbai, chennai, Kolkata) |
| 9 | state | high | [9] values are Indian state names (Maharashtra, Uttar Pradesh, Tamil Nadu, West Bengal) |
| 10 | country | high | [10] all values are 'IND' — country code for India |
| 11 | zip | high | [11] values are 6-digit Indian PIN codes (400076, 201301, 400056, 600118, 700115) |
| 12 | phone | high | [12] values are 10-digit Indian mobile numbers (9924013901, 9920740849, 9910013923, etc.) |
Notes: 13 columns total. Column [0] is a numeric user_id (skip — internal identifier). Column [2] appears to be a password or authentication token (values are alphanumeric strings like 'Trg9EQl', '96385274', 'ramjanee', etc.) — mapped as password. Breach context confirms Indian travel platform with Indian addresses, phone numbers, and email providers.
70.csv8 columns98,503 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and email domains (gmail.com, yahoo.com, rediffmail patterns) | |
| 2 | suffix | high | [2] values are 'Mr.', 'Dr.' — salutation/title indicators |
| 3 | firstName | high | [3] values are given names (davinder, bhargavi, PR RAVI, Saket) |
| 4 | lastName | high | [4] values are family names (vashisht, ravi, SHANTHI, Jha) |
| 5 | address1 | high | [5] values contain street addresses and location data (Yatra Office address, street address placeholders) |
| 6 | phone | high | [6] value '8870538888' is 10-digit Indian mobile number |
| 9 | country | high | [9] value is 'India' |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9448614523, 9041134608, etc.) |
Notes: 13 columns total. Column [0] appears to be user_id (skip). Columns [7], [8], [10], [12] are empty or non-PII. Column [2] contains title/suffix mixed with some geographic data ('Tamil Nadu', 'Gurgaon') but primary values are salutations. Two phone columns detected ([6] and [11]) — both valid Indian mobile numbers. Indian context confirmed by address formats, phone patterns (10-digit mobile), and country field.
71.csv10 columns99,400 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and are clearly email addresses (rediffmail.com, gmail.com, icicibank.com domains) | |
| 2 | suffix | high | [2] values are 'Mr.', 'Mr' — salutation/title indicators |
| 3 | firstName | high | [3] values are given names (Geetha, RAMANUJA, seetharaman, teja, Raqib) |
| 4 | lastName | high | [4] values are surnames or family names (Murugaiah, P, AVR, ananthakrishnan, rayasam, Baba) |
| 5 | address1 | high | [5] values contain street addresses (e.g., '117/H-1/255,Model town,Pandu Nagar', 'Kukatpally') |
| 7 | city | high | [7] values are Indian city names (hyderabad, Kanpur) |
| 8 | state | high | [8] values are Indian states (Andhra Pradesh, Uttar Pradesh) |
| 9 | country | high | [9] values are 'India' — country field |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (500072, 208005) |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers |
Notes: 13 columns total. Column [0] contains numeric user IDs (skip). Columns [6] and [12] are empty (skip). Column [5] may also contain address2 data (apartment/suite info embedded), but mapping as address1. File has no header row.
72.csv11 columns99,584 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and match email format (gmail.com, aol.pk, rediffmail patterns) | |
| 2 | suffix | high | [2] header 'Mr.' / values are salutations (Mr, Mr.) |
| 3 | firstName | high | [3] header 'abhijit', values are common given names (Shoueeb, Paras, Dileep, gaurav, Andrea) |
| 4 | lastName | high | [4] header 'mukherjee', values are surnames (Dar, Banka, Gunda, kumar, ketherton) |
| 5 | address1 | high | [5] values are full street addresses (35/6 kayastha para main road kol-700078, etc.) |
| 7 | city | high | [7] header 'kolkata', values are Indian city names (Gurgoan, Gurgaon, delhi) |
| 8 | state | high | [8] header 'West Bengal', values are Indian states/territories (Haryana, National Capital Territory of Delhi) |
| 9 | country | high | [9] header 'India', all values are 'India' |
| 10 | zip | high | [10] header '700078', values are 6-digit Indian PIN codes (121002, 110059) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9433852491, 9643313902, 8013120406) |
| 12 | phone | high | [12] alternate phone field, values are 10-digit numbers (8970353472) or empty |
Notes: Yatra 2019 breach. 13 columns total, 11 contain PII. Column [0] is numeric user_id (skipped). Column [6] appears empty across all samples (skipped). Data is entirely Indian: Indian addresses, phone numbers, states, PIN codes, and email providers confirm Indian user base.
73.csv5 columns99,584 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbols and match email format ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] values are 'Mr', 'Mr.' — salutation/title indicators |
| 3 | firstName | high | [3] header context and values are common given names (Sai, Rajib, Manjesh, Monindersingh, VIJAY, Animesh) |
| 4 | lastName | high | [4] header context and values are surnames (Varma, Banerjee, Thomas, Vasant, KUMAR, Dey) |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers (9494774499, 8986880570, 9946031800, etc.) |
Notes: 13 columns total, 5 contain PII. Column [0] is numeric user_id (skip). Columns [5-10] and [12] appear empty in samples and are skipped. Breach context confirms Indian travel platform with user account data including emails, names, phone numbers.
74.csv9 columns99,550 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and match email address format ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] values are 'Mr.' — standard salutation/title prefix |
| 3 | firstName | high | [3] values are common given names (Mohmmed, Vikas, Misato, bipinbhai, sarita) |
| 4 | lastName | high | [4] values are surnames (Muazzam, Gupta, Higashi, padhiyar, rewri) |
| 5 | address1 | high | [5] value 'B-188/2' matches street address format common in Indian addresses |
| 8 | state | high | [8] value 'National Capital Territory of Delhi' is an Indian state/territory |
| 9 | country | high | [9] values are 'India', 'Other' — country field |
| 10 | zip | high | [10] value '110009' is a 6-digit Indian postal code (PIN) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9940940671, 9827122704, etc.) |
Notes: 12 columns total. Column [0] is numeric user ID (skip). Columns [6] and [7] are empty in sample and skipped. Breach context confirms Indian travel booking platform with user account data including addresses, phones, and emails.
75.csv5 columns98,894 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header is email address, values contain @ symbol and match email format (gmail.com, jd.com domains) | |
| 2 | suffix | high | [2] header is empty/blank but values contain 'Ms.' which is a salutation/title suffix |
| 3 | firstName | high | [3] header is empty/blank but values are common given names (YASIN, Nitesh, Anuradha, mohammed, Yati) |
| 4 | lastName | high | [4] header is empty/blank but values are surnames (MOHIDEEN, ABDUL RAZAK, Marjara, Tiwari, khaliq, Bawri) |
| 11 | phone | high | [11] header is empty/blank but values are 10-digit Indian mobile phone numbers (7598208158, 9438262776, 8447963651, 9177485105, 6392581470) |
Notes: 13 columns total. Columns [0] (numeric user ID), [5-10] (empty), [12] (empty) are skipped as non-PII or unpopulated. Yatra 2019 Indian travel booking breach; phone numbers and email addresses confirm Indian origin.
76.csv7 columns99,445 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] Values contain @ symbols and are clearly email addresses (gmail.com, yahoo domains, corporate emails) | |
| 2 | suffix | high | [2] Values are titles: 'Mr', 'Mrs.', 'Mr.' — salutation/suffix field |
| 3 | firstName | high | [3] Values are given names: Balaji, Neha, SHIV, ravi, Pranav, Amrita |
| 4 | lastName | high | [4] Values are family names: Subramanian, Maney, YADAV, kumar, Punjabi, Rai |
| 5 | address1 | medium | [5] Sample value 'Yatra.Com 1101-03' suggests address/building information; mostly empty but when populated contains address data |
| 7 | city | medium | [7] Sample value 'Gurgoan' (variant spelling of Gurgaon, Indian city); city field mostly empty but populated values are city names |
| 11 | phone | high | [11] Values are 10-digit Indian mobile numbers: 8451900852, 9743666660, 9950595563, 7396560040, 9538106042, 8238391804 |
Notes: Yatra-2019 breach (Indian travel booking platform). Column [0] appears to be user/record ID (skip). Columns [6], [8], [9], [10], [12] are entirely empty in sample and unmapped. No header row provided; analysis based on value patterns. Data is consistently Indian (mobile numbers, cities, email providers).
77.csv11 columns99,358 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] Values contain @ signs and email domains (gmail.com, rediffmail.com, yahoo.com, ymail.com) | |
| 2 | suffix | high | [2] Values are 'Mr.', 'Mr', 'Ms' — salutation/title indicators |
| 3 | firstName | high | [3] Values are common given names (daulat, SHIVASHANKAR, Parkash, abhishek, GOVIND) |
| 4 | lastName | high | [4] Values are surnames (bhansali, MUGALI, Shankar, sharma, JAGIRDAR) |
| 5 | address1 | high | [5] Values are street addresses (e.g., '1/21 champa nagar beawar') |
| 6 | address2 | medium | [6] Column typically follows address1, though values appear empty in sample |
| 7 | city | high | [7] Values are city names (beawar) |
| 8 | state | high | [8] Values are Indian states (Rajasthan) |
| 9 | country | high | [9] Values are country names (India) |
| 10 | zip | high | [10] Values are 6-digit Indian postal codes (305901) |
| 11 | phone | high | [11] Values are 10-digit Indian mobile phone numbers (9414009587, 8586968292, etc.) |
Notes: 13 total columns. Column [0] is numeric user ID (skip). Column [12] is empty (skip). Breach context confirms Indian travel booking platform with Indian addresses, phone numbers, and email providers.
78.csv10 columns98,468 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and match standard email format (Gmail, Hotmail, Yahoo) | |
| 2 | suffix | high | [2] header 'Mr.' and values are salutations/titles (Mr, Mr) |
| 3 | firstName | high | [3] values are common given names (naramalla, Tribhuwannath, Yuthika, APARNA, Bunty) |
| 4 | lastName | high | [4] values are surnames (purushotham, Srivastava, Kathuria, RAWAL, Thokchom) |
| 5 | address1 | high | [5] values are street addresses with house numbers and road names (h.no.22-118/3,saikunta road, chunnambattiwada,mancherial) |
| 7 | city | high | [7] values are Indian city names (mancherial) |
| 8 | state | high | [8] values are Indian state names (Andhra Pradesh) |
| 9 | country | high | [9] values are country names (India) |
| 10 | zip | high | [10] values are Indian postal codes/PIN codes (504208) |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers (9989145950, 8197557799, 9001714000, 9538916585) |
Notes: Yatra.com travel booking platform breach. Column [0] is numeric user ID (skip). Column [6] and [12] are empty (skip). This is a clean Indian user database with addresses, contact info, and names. All PII columns identified.
79.csv10 columns99,524 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, standard email format (gmail.com, yahoo.co.in, yahoo.com) | |
| 2 | suffix | high | [2] header 'Mr', values are salutations/titles |
| 3 | firstName | high | [3] values are given names (Suresh, AMIIT, adf, Gopal singh, SUNIL, tarun) |
| 4 | lastName | high | [4] values are family names (Kumar, AMAR, basdas, Dasana, NAGORI, bhatnagar) |
| 5 | address1 | high | [5] sample shows 'TATA Consulting Engineers Limited, A-...' — company/street address format |
| 7 | city | high | [7] sample value 'Kolkata' is an Indian city |
| 8 | state | high | [8] sample value 'West Bengal' is an Indian state |
| 9 | country | high | [9] sample value 'India' |
| 10 | zip | high | [10] sample value '700091' is a 6-digit Indian PIN code |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers (8378982262, 7042265840, 9999999999, etc.) |
Notes: Yatra 2019 breach — Indian travel booking platform. Column [0] is numeric user_id (skip). Columns [6] and [12] are empty. No DOB, SSN, username, password, or other sensitive PII detected in sample.
8.csv12 columns98,921 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] Values contain @ signs, match email format (rediffmail.com, gmail.com, hotmail.com) | |
| 2 | suffix | high | [2] Header 'Mr', values are salutations (Mr, Mr., etc.) |
| 3 | firstName | high | [3] Values are common given names (Abhishek, DILIP, Arun, Vipin, nikhil) |
| 4 | lastName | high | [4] Values are surnames (Jain, PATIL, Jayaraman, Katiyar, vaibhav) |
| 5 | address1 | high | [5] Street addresses (Goyal Nagar, patil hospital, No 24 2nd Cross St, Yatra Office) |
| 6 | address2 | high | [6] Supplementary address info (near lalbag road, weavers colony) |
| 7 | city | high | [7] Indian city names (Indore, karad, Venkata Nagar, Gurgaon, banglore, ranchi) |
| 8 | state | high | [8] Indian states (Madhya Pradesh, Maharashtra, Pondicherry, Haryana, Karnataka, Jharkhand) |
| 9 | country | high | [9] Country codes/names (IND, India) |
| 10 | zip | high | [10] Indian PIN codes (452001, 415110, 605011, 122003, 560027, 834003) |
| 11 | phone | high | [11] 10-digit Indian mobile numbers (9926478790, 9822031498, 9443500362, 8446340001) |
| 12 | phone | high | [12] Alternate phone numbers, 10-digit or shorter (9922955171, 22238788, 9431079981, 4442055381) |
Notes: Yatra 2019 breach. 13 columns total, 12 contain PII (column 0 is user_id, skipped). All data is Indian: Indian addresses, Indian phone numbers, Indian email providers (rediffmail.com, yahoo.co.in), Indian cities/states, country code IND. Column 0 contains auto-incremented numeric IDs. Columns 11 and 12 both map to phone (primary and alternate contact numbers).
80.csv5 columns99,594 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] Values contain @ signs and are clearly email addresses (gmail.com, yahoo.com domains) | |
| 2 | suffix | high | [2] Values are 'Mr', a salutation/title prefix |
| 3 | firstName | high | [3] Values are common given names (Sahil, Rehana, Nikhil, Amita, Hiten, harsh) |
| 4 | lastName | high | [4] Values are surnames (Jain, Islam, Patel, Kumari, Karani, shukla) |
| 11 | phone | high | [11] Values are 10-digit Indian mobile numbers (9034567082, 9830674770, 8123895937, etc.) |
Notes: 12 columns total. [0] is numeric user ID (skip). [5-10] are empty columns (skip). Columns [1,3,4,11] contain core PII. Column [2] contains salutation. Indian phone numbers and email providers confirm breach context.
81.csv5 columns99,368 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbol and match email format (yahoo.com, rediffmail.com, gmail.com, hotmail.com) | |
| 2 | suffix | high | [2] values are 'Mr' and empty strings, consistent with salutation/title field |
| 3 | firstName | high | [3] values are common given names (Naga, jayadeep, Mayank, sri, Rachit, rabin) |
| 4 | lastName | high | [4] values are surnames (Vemula, uppalapati, Prakash, ch, Mehrotra, ray) |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers (9491328661, 9930468956, etc.) |
Notes: Yatra-2019 travel booking platform breach. Column [0] is numeric user ID (skip). Columns [5-10] and [12] are empty in sample rows (likely address/demographic fields with no data in these particular records, cannot be reliably mapped from empty values alone). Breach context confirms Indian user data with phone numbers, emails, and names.
82.csv5 columns99,503 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, match email format (gmail.com, rediffmail.com patterns) | |
| 2 | suffix | high | [2] values are 'Mr.' — salutation/title indicator |
| 3 | firstName | high | [3] values are common given names (Alex, Raju, Ravi, Shameer, arun, Manjeet) |
| 4 | lastName | high | [4] values are surnames (Thomas, chirra, Puri, Hakk, Saminathan, Dabas) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (8893955294, 8754514873, etc.) |
Notes: 12 columns total. Column [0] is numeric user_id (skip). Columns [5-10] are empty or contain no sample values (skip). Yatra-2019 Indian travel booking platform breach: email addresses, names, salutations, and Indian phone numbers confirmed.
83.csv5 columns99,492 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and match email format (gmail.com, rediffmail.com, etc.) | |
| 2 | suffix | high | [2] values are 'Mr', 'Ms.' — standard salutation/title suffixes |
| 3 | firstName | high | [3] header context and values are common given names (Suman, Tapan, Harsha, Anuj, ashish, Divya) |
| 4 | lastName | high | [4] values are surnames (KUMAR, Gupta, GC, Ishu, k, Sruthi) following firstName |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (8147760998, 9344870960, etc.) |
Notes: Column [0] is numeric user ID (skip). Columns [5], [6], [7], [8], [9], [10] are mostly empty in sample rows; [7] has one value 'Gurgoan' (city) but insufficient data to confirm address mapping. No address1, address2, city, state, zip, country, dob, ssn, or password columns present in header row sample.
84.csv10 columns99,394 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, clearly email addresses (gmail.com, rediffmail domains typical of Indian users) | |
| 2 | suffix | high | [2] header pattern indicates salutation/title, values are 'Mr.' — standard suffix field |
| 3 | firstName | high | [3] values are individual given names (jay, AMOD, Anand, Promina) |
| 4 | lastName | high | [4] values are individual family names (patel, KESHRI, Singh, steels, Barve) |
| 5 | address1 | medium | [5] sparse but contains 'Yatra Office' — street/mailing address line |
| 7 | city | high | [7] values are Indian cities (Gurgaon) |
| 8 | state | high | [8] values are Indian states (Haryana) |
| 9 | country | high | [9] values are 'India' |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (122003) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9753038723, 7567739881, etc.) |
Notes: 12 columns total. Column [0] is numeric user ID (skip). Column [6] is empty (skip). Remaining 10 columns contain PII. Breach context confirms Indian travel booking platform with Indian phone numbers, cities, states, PIN codes, and email providers.
85.csv4 columns99,531 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, standard email addresses (gmail.com, yahoo.com, adani.com domains) | |
| 3 | firstName | high | [3] values are common given names (Mehul, Kranthi, Abhijith, ehtesham, krishna) |
| 4 | lastName | high | [4] values are surnames (Rupera, Kumar, Balakrishnan, khan, miglani) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (7676662525, 9099002463, 9666698666, 9999999999) |
Notes: File has no header row. Column [0] contains numeric user IDs (skip). Columns [2], [5], [6], [7], [8], [9], [10] are predominantly empty in sample rows; [5] shows 'Yatra.Com' (company name, skip); [7] shows 'Gurgoan' (likely city but too sparse to confidently map). Total 12 columns; 4 contain identifiable PII.
86.csv5 columns99,495 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbol, standard email format with common Indian email providers (gmail.com) | |
| 2 | suffix | high | [2] header empty but values are 'Mr', a salutation/title suffix |
| 3 | firstName | high | [3] header empty but values are common given names (Shalender, MANJIT, Prakash, Madhuri, Dipak) |
| 4 | lastName | high | [4] header empty but values are family names (Sharma, Wadhwani, Salgaonkar, Sarda) |
| 11 | phone | high | [11] values are 10-digit numbers consistent with Indian mobile phone format (9822264981, 9867683858, etc.) |
Notes: Yatra 2019 breach — Indian travel booking platform. Columns [0], [5], [6], [7], [8], [9], [10] are mostly empty in sample and mapped as skip. Column [0] appears to be user_id (numeric, auto-generated). Columns [5]-[10] likely contain address fields (company name, address components, state) but are predominantly empty in provided sample; if populated in full dataset, would map as address1, city, state, country, zip. File structure suggests 12 total columns.
87.csv5 columns99,330 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header is email address, values contain @ symbol and are clearly email addresses from Indian providers (gmail.com, yahoo.com, yahoo.co.in, rediffmail.com) | |
| 2 | suffix | high | [2] values are 'Mrs.' and similar salutations/titles |
| 3 | firstName | high | [3] values are common given names (Ravi, Sesha, UDAY, avais) |
| 4 | lastName | high | [4] values are surnames/family names (Vadrevu, Kaushik, KAMAT, ahmmed) |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers (8168014290, 8104192360, 9676142910, etc.) |
Notes: Yatra 2019 travel booking breach. Column [0] is numeric user ID (skip). Columns [5-10] are empty in sample rows. Columns [3] and [4] sometimes contain full names or single names, but mapped as firstName/lastName based on context and breach description mentioning 'first and last names' fields.
88.csv9 columns99,329 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbol and are clearly email addresses (gmail.com, yahoo.co.in domains) | |
| 3 | firstName | high | [3] header pattern suggests given name, values are common Indian first names (Dhiarj, RUshi, Amit, ramakrishna) |
| 4 | lastName | high | [4] values are surnames (Patel, Kumar, reddy) paired with firstName column |
| 5 | address1 | high | [5] values include street-level addresses (Yatra Office) |
| 7 | city | high | [7] values are Indian cities (Gurgaon) |
| 8 | state | high | [8] values are Indian states/provinces (Haryana) |
| 9 | country | high | [9] values are country names (India) |
| 10 | zip | high | [10] values are Indian postal codes/PIN codes (122003) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9766032846, 9554629112, 9999999999) |
Notes: Yatra travel booking platform breach from India. Column [0] is numeric user_id (skip). Columns [2] and [6] are empty (skip). Remaining columns map to core PII: email, name, address components, and phone.
89.csv10 columns99,526 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is sample email; all values contain @ signs and are valid email addresses | |
| 2 | suffix | high | [2] values are 'Mr.' — salutation/title indicator |
| 3 | firstName | high | [3] values are single given names (Shaik, sambath, Ganesh) |
| 4 | lastName | high | [4] values are surnames (Chand, koshy, Venkatraman) |
| 5 | address1 | high | [5] values are street addresses (Yatra Office) |
| 7 | city | high | [7] values are Indian cities (Gurgaon) |
| 8 | state | high | [8] values are Indian states (Haryana) |
| 9 | country | high | [9] values are country names (India) |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (122003) |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers (8805477500, 9394706583, 9629178770) |
Notes: Yatra 2019 breach — Indian travel platform. 13 total columns. Column [0] is numeric user_id (skip). Columns [6] and [12] are empty (skip). Mapped 10 PII columns: email, suffix, firstName, lastName, address1, city, state, country, zip, phone.
9.csv12 columns99,258 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbol and email domains (gmail.com, yahoo.co.in); clearly email addresses | |
| 2 | suffix | high | [2] header 'Mr', values are salutations/titles (Mr) |
| 3 | firstName | high | [3] values are common given names (Vinod, Viraf, Rahul, JAYANT, Manish, Nipom) |
| 4 | lastName | high | [4] values are family names (Gandhi, Chinoy, Raghavan, KUMAR, Jaiswal, Boruah) |
| 5 | address1 | high | [5] values contain street/building addresses (E2/506 Bharat Nagar, #218 Rich field appt., etc.) |
| 6 | address2 | high | [6] values contain secondary address components (342 Grant Road, ORR Marathahalli, etc.) |
| 7 | city | high | [7] values are Indian city names (Mumbai, BHARUCH, Bangalore, Guwahati) |
| 8 | state | high | [8] values are Indian state names (Gujarat, karnataka, Assam) |
| 9 | country | high | [9] values are country codes (IND, IN) representing India |
| 10 | zip | high | [10] values are 6-digit Indian postal codes (400007, 392001, 560037, 781003) |
| 11 | phone | high | [11] values are 10-digit Indian mobile phone numbers (9322230917, 9820636090, etc.) |
| 12 | phone | high | [12] values are 10-digit Indian alternate phone numbers (9820636090, 8040998026, etc.); duplicate phone column |
Notes: 13 columns total, 12 contain PII. Column [0] is numeric user_id (skip). File is structured user account records from Yatra.com breach with Indian user data. Columns [11] and [12] are both phone numbers (primary and alternate mobile).
90.csv4 columns99,450 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] All values contain @ symbol and match email format (gmail.com, yatra.com, yahoo.in domains) | |
| 3 | firstName | high | [3] Values are single given names (mohd, Padma, krishna) |
| 4 | lastName | high | [4] Values are surnames (faisal, BHIMIREDDY, Ghadhi, kumar) |
| 11 | phone | high | [11] All values are 10-digit numbers consistent with Indian mobile phone format |
Notes: File contains 12 columns. Column [0] is numeric user ID (skip). Columns [2] contains salutation/title (Mr.) — mapped as suffix would be appropriate but values are minimal; however, per instructions, only PII fields are included, so this is excluded. Columns [5-10] are empty in all sample rows (skip). Breach context confirms Indian travel booking platform with Indian addresses, emails, and phone numbers.
91.csv4 columns99,483 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and match email format (gmail.com, yahoo.co.in, rediffmail.com, sbi.co.in) | |
| 3 | firstName | high | [3] values are common Indian given names (Chandrasekhar, Umesh, Rohit, Anju, Acharya) |
| 4 | lastName | high | [4] values are common Indian surnames (Reddy, Pati, Bhardwaj, Sawai, Ashwin) |
| 11 | phone | high | [11] values are 10-digit numbers matching Indian mobile phone format (7066820264, 9989131113, 8468895380, etc.) |
Notes: 13 columns total; 4 contain PII. Column [0] appears to be numeric user IDs (skip). Columns [2], [5], [6], [7], [8], [9], [10], [12] are empty or contain no readable data. Breach context confirms Indian travel platform with Indian addresses, phone numbers, and email providers.
92.csv10 columns99,463 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs and email domains (gmail.com, yahoo.co.in); clear email addresses | |
| 2 | suffix | high | [2] values are 'Mr.' and empty strings; salutation/title indicator |
| 3 | firstName | high | [3] header position and values are common given names (Anita, Amrita, lindaj, Satish, ilangovan) |
| 4 | lastName | high | [4] header position after firstName; values are surnames (Behera, Bais, bos, Sasanapuri, ramakrishnan) |
| 5 | address1 | high | [5] street/mailing address field; contains 'Yatra Office' and mostly empty values |
| 7 | city | high | [7] values are Indian city names (Gurgaon); city field in standard address structure |
| 8 | state | high | [8] values are Indian state names (Haryana); state field in standard address structure |
| 9 | country | high | [9] values are 'India'; country field |
| 10 | zip | high | [10] values are Indian PIN codes (6 digits: 122003); postal code field |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9925224870, 9952400470, etc.); mobile phone field |
Notes: Yatra travel booking platform breach (2019). 13 columns total, 11 contain PII. Column [0] is numeric user ID (skip). Columns [6] and [12] are empty (skip). Indian addresses confirmed by PIN codes, Indian phone numbers, Indian cities/states, and Indian email providers.
93.csv4 columns99,346 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] Values contain @ signs and are valid email addresses (gmail.com, rediffmail patterns consistent with Indian breach context) | |
| 3 | firstName | high | [3] Header absent but values are common given names (Vamsi, SOWMYA, Gokul, sheetal, Debiprasanna) |
| 4 | lastName | high | [4] Header absent but values are common surnames (Krishna, KUMAR, Sampath, sharma, Mahanta) |
| 11 | phone | high | [11] 10-digit numbers matching Indian mobile phone format (9790712356, 9840402744, etc.) |
Notes: 13 columns total. [0] is internal user_id (skip). [2], [5], [6], [7], [8], [9], [10], [12] are empty or non-PII (skip). Yatra-2019 breach context confirmed: Indian phone numbers, Indian email providers, consistent with travel platform user records.
94.csv12 columns99,143 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header is email address, all values contain @ symbol and are valid email addresses | |
| 2 | suffix | high | [2] header 'Mr'/'Ms', values are salutations (Mr, Ms) |
| 3 | firstName | high | [3] values are given names (Arumugaperumal, Raviprakash, Amin, mayank, Kanupriya, jayesh) |
| 4 | lastName | high | [4] values are family names (chelliah, Baruah, MERCHANT, shah, Bhardwaj, parmar) |
| 5 | address1 | high | [5] header contains street address, values are building/street addresses with street numbers and names |
| 6 | address2 | high | [6] values are neighborhoods/sub-localities (Ambur, SANTACRUZ, Janakpuri, Mirchandani Gardens) |
| 7 | city | high | [7] header 'city', values are Indian city names (Ambasumdaram, guwahati, MUMBAI, New Delhi) |
| 8 | state | high | [8] header 'state', values are Indian state names (Tamilnadu, Assam, Maharashtra, Delhi) |
| 9 | country | high | [9] all values are 'IND' (India country code) |
| 10 | zip | high | [10] header contains postal code, values are 6-digit Indian PIN codes |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9787005915, 9864012243, etc.) |
| 12 | phone | high | [12] alternate phone number column, values are phone numbers (some blank, some 10-digit) |
Notes: 13 columns total. Column [0] is a numeric user ID (skip). All other columns map to PII fields. This is Indian travel booking platform data with complete address records and phone numbers.
95.csv13 columns99,152 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] header '1993765', values are sequential numeric IDs (1993765-1993770), characteristic of auto-generated user_id/record_id |
| 1 | high | [1] header '[email protected]', all values contain @ symbol, valid email addresses from Indian providers (yahoo.in, gmail.com, rediffmail.com) | |
| 2 | suffix | high | [2] header 'Mr', values are salutation titles (Mr, Mrs), classic suffix/title field |
| 3 | firstName | high | [3] header 'VASUDEVAN', values are common given names (Biju, Dinesh, vivek, PARAMITA, Satpal) |
| 4 | lastName | high | [4] header 'KRISHNAN', values are surnames (vatavathi, kumar, malik, DAS, Singh) |
| 5 | address1 | high | [5] header 'AIKKARASERIL PALLARIMANGALAM...', values are street addresses (full mailing addresses with building/locality details, Indian format) |
| 6 | address2 | high | [6] header 'AIKKARASERIL PALLARIMANGALAM...', values are secondary address components (locality/area names: REYAMI, LODI ROAD) |
| 7 | city | high | [7] header 'MAVELIKARA', values are Indian city names (ERNAKULAM, chennai, new delhi, BHILAI, Mumbai) |
| 8 | state | high | [8] header 'Kerala', values are Indian state names (Kerala, Tamilnadu, Delhi, Chhattisgarh, Maharashtra) |
| 9 | skip | high | [9] header 'IND', all values are 'IND' (country code constant), non-variant flag field |
| 10 | zip | high | [10] header '690107', values are 6-digit Indian postal codes (PIN codes: 682011, 600104, 110085, 491006, 400028) |
| 11 | phone | high | [11] header '9328112572', values are 10-digit Indian mobile numbers (9746678670, 9840310039, 9891398331, 9163669229, 9827873816) |
| 12 | phone | high | [12] header '' (empty), values are 10-digit Indian phone numbers (9746678670, 9962443677, 9830168804, 9819217173, 9796742972), alternate/secondary phone field with some empty cells |
Notes: 13 columns total. Yatra-2019 travel booking platform breach containing Indian user account records. All phone numbers are Indian format (10 digits starting with 9). All addresses, cities, states, and postal codes are Indian. 11 PII columns mapped (email, suffix, firstName, lastName, address1, address2, city, state, zip, phone, phone). Column [9] is constant country code 'IND' (skip). Column [0] is sequential numeric user ID (skip).
96.csv12 columns99,410 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs; email addresses from Gmail, IIT, corporate domains | |
| 2 | suffix | high | [2] header/values are 'Mr' — salutation/title prefix |
| 3 | firstName | high | [3] values are given names: ARUN, VSKmurthy, mithilesh, MURALI, virendra |
| 4 | lastName | high | [4] values are family names: KUMAR, Balijepalli, prasad, dharan, pandey |
| 5 | address1 | high | [5] values are street addresses: '1A kannan avenue', 'IIT Bombay', '46assamrifle', 'C-1/202, NILGIRI GARDEN' |
| 6 | address2 | high | [6] values are secondary address components: apt/suite/unit lines like 'c/o99a.p.o', 'SECTOR 24, AMRA ROAD, C.B.D.BELAPUR' |
| 7 | city | high | [7] values are Indian cities: chennai, Mumbai, tezpur, delhi, NAVI MUMBAI |
| 8 | state | high | [8] values are Indian states: Tamil Nadu, Maharashtra, Assam, Delhi |
| 9 | country | high | [9] values are country codes: IND, IN (India) |
| 10 | zip | high | [10] values are 6-digit Indian postal codes: 600063, 400076, 784110, 110096, 400614 |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers: 7299926890, 9967178363, 9864806036, 9970050045 |
| 12 | phone | high | [12] values are 10-digit Indian phone numbers (alternate); same format as [11] |
Notes: 13 columns total. Column [0] is numeric user_id (internal identifier, skipped). Columns [11] and [12] both map to phone as they represent primary and alternate phone numbers. All data is Indian in origin (Yatra travel platform breach). No DOB, SSN, username, or password fields present in sample.
97.csv12 columns51,771 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' contains @ symbol; all values are email addresses from Indian providers (yahoo.co.in, yahoo.com, gmail.com) | |
| 2 | suffix | high | [2] header 'Mr'; all values are salutations/titles (Mr, Mrs, Ms) |
| 3 | firstName | high | [3] header 'MOHIT'; values are given names (Mohit, Sonaal, shubhamoy, wrickban, vinoy, Satish) |
| 4 | lastName | high | [4] header 'GUPTA'; values are family names (Gupta, Goel, Mukherjee, mazumdar, vincent, Mishra) |
| 5 | address1 | high | [5] street addresses: 'E-165 GREATER KAILASH PART-2', 'Virat House', 'madhpur', 'flat no.-104.shahi palace' |
| 6 | address2 | high | [6] secondary address lines: '49/1 Kishanpur', 'near kulharia complex.ashok rajpath.' — continuation of address blocks |
| 7 | city | high | [7] Indian cities: NEW DELHI, Dehradun, hyderabad, patna |
| 8 | state | high | [8] Indian states/territories: DELHI, Uttarakhand, 'Andaman and Nicobar Islands', Bihar |
| 9 | country | high | [9] country codes: 'IN', 'IND' (India) |
| 10 | zip | high | [10] Indian PIN codes: 110048, 248001, 123456, 800001 |
| 11 | phone | high | [11] Indian mobile numbers (10 digits starting with 9): 9810779217, 9411570350, 8939900215, 9830411474 |
| 12 | phone | high | [12] alternate phone numbers (mix of mobile and landline): 9810779217, 1147020804, 2677097, 7923290121 |
Notes: Yatra 2019 travel booking platform breach. All 13 columns contain PII. Column [0] (user IDs) automatically skipped as internal identifier. Data is entirely Indian: addresses, phone numbers, email providers (yahoo.co.in, gmail.com), cities (Delhi, Dehradun, Hyderabad, Patna), and states (Delhi, Uttarakhand, Bihar) confirm Indian user base. Columns [5] and [6] form complete street addresses. Columns [11] and [12] are both phone fields — [11] appears primary, [12] alternate contact.
98.csv11 columns10,440 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' is an email address; all values contain @ and are valid email addresses | |
| 2 | suffix | high | [2] header 'Mr'; values are salutations (Mr, Mr., etc.) |
| 3 | firstName | high | [3] header 'shaikh'; values are common given names (ramasamy, jamie, sridher, sandeep, gurpreet) |
| 4 | lastName | high | [4] header 'mdrehan'; values are surnames (muthukrishnan, chithung, kumar, singh) |
| 5 | address1 | high | [5] values are street addresses (v.p.o- wadala veeram, Yatra.Com 1101-03, New BEL Road, 203-A sector) |
| 7 | city | high | [7] values are Indian cities (amritsar, gurgoan, bangalore, hsr, bhopal) |
| 8 | state | high | [8] values are Indian states (Punjab, Karnataka, Haryana, Madhya Pradesh) |
| 9 | country | high | [9] values are country codes/names (India, IND) |
| 10 | zip | high | [10] values are 6-digit Indian postal codes (143601, 560094, 125005, 462023) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9892667779, 9840013010, 8586054672, 9597165868) |
| 12 | phone | high | [12] values are 10-digit Indian mobile numbers; alternate phone field |
Notes: 13 columns total, 10 contain PII. Column [0] is a numeric user ID (skip). Columns [6] are empty (skip). Yatra 2019 Indian travel booking breach with typical user profile data: name, email, address, and phone numbers.
99.csv10 columns96,105 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] All values contain @ symbol and email domain patterns (gmail.com, yahoo.com, etc.) | |
| 2 | suffix | high | [2] Values are salutations/titles: 'Mr', 'Mr.', 'Mrs' — matches suffix field type |
| 3 | firstName | high | [3] Values are individual given names: 'SUBASH', 'prasad', 'kareem', 'Talat', 'jinesh' |
| 4 | lastName | high | [4] Values are family names: 'SUBRAMANIYAM T', 'jadhav', 'mullah', 'Taj', 'sheth' |
| 5 | address1 | high | [5] Values are street addresses: 'At Post:- Sakurede' — primary address line |
| 6 | zip | high | [6] Values are 6-digit Indian PIN codes: '412303', '8600014818' appears to be phone mixed in row |
| 7 | city | high | [7] Values are Indian city names: 'chennai' |
| 8 | state | high | [8] Values are Indian state names: 'Tamilnadu', 'Maharashtra' |
| 9 | country | high | [9] Values are 'India' — country field |
| 11 | phone | high | [11] All values are 10-digit Indian mobile numbers: '8123740534', '9003054257', '9821842298', '9654990353', '9901665970' |
Notes: 13 columns total, 10 contain PII. Column [0] contains numeric user IDs (skip). Column [10] and [12] are empty/non-PII (skip). Column [6] appears to have mixed data (both 6-digit PIN and 10-digit phone in different rows) — mapped as zip based on header position and primary values. Breach context confirms Indian travel booking platform with user account records.
tuser__1_.csv12 columns198,934 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] All values contain @ symbol and match standard email format (gmail.com, yahoo.com, tatkal.in, etc.) | |
| 3 | suffix | high | [3] Header 'Mr.' and values are salutation/title markers (Mr, Mr.) |
| 4 | firstName | high | [4] Common given names (Deepak, Milind, Jaya, Upendra, Sharique) matching Indian naming patterns |
| 5 | lastName | high | [5] Surname values (Kulkarni, Krishnan, Singh, Ahmad) following first names in typical name order |
| 6 | address1 | high | [6] Street addresses with building/house numbers and locality names (484/75 Sawali, F-13 Ravindra Bhawan, HN 933 Sunita Nagar) |
| 7 | address2 | high | [7] Secondary address components (neighborhood/area names: Mitramandal Colony, Surabhi Nagar, Wadgaonsheri) |
| 8 | city | high | [8] Indian city names (Pune, Haridwar, Kochi) |
| 9 | state | high | [9] Indian state abbreviations and names (Maharashtra, Uttarkhand) |
| 10 | country | high | [10] Country codes (IND, IN) indicating India |
| 11 | zip | high | [11] Indian postal/PIN codes (411009, 247667, 411014) — 6-digit format standard for India |
| 12 | phone | high | [12] Indian mobile phone numbers (10 digits, starting with 7-9 as per Indian telecom standards: 9428330969, 9765361020, 9811178409, 9820296136) |
| 13 | phone | high | [13] Alternate/secondary Indian mobile phone numbers (10-digit format: 2024459307, 9446740168) |
Notes: Yatra.com travel platform breach (2019). 14 columns total, 11 contain searchable PII. Column [0] is numeric user ID (skip). Column [2] appears to be internal username/platform identifier (skip). File has no header row. All addresses, phone numbers, and email providers confirm Indian origin as documented in breach context.
tuser__2_.csv10 columns99,035 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header is email address, values contain @ signs and are clearly email addresses (yahoo.co.in, gmail.com) | |
| 2 | suffix | high | [2] header 'Mr', values are salutations/titles (Mr) |
| 3 | firstName | high | [3] header 'shyamal', values are common given names (Vijay, jitendra, Bhalchandra, ajay, Aditya) |
| 4 | lastName | high | [4] header 'bandyopadhyay', values are surnames (Vijay, jain, Badve, oberoi, Aditya) |
| 5 | address1 | high | [5] header 'P-12 nilachal complex...', values are street addresses (P-12 nilachal complex, 504 Shanti Kamal Bldg, House No-2328) |
| 6 | address2 | high | [6] header blank, values appear to be secondary address info (Dr. B. A. Road, Chinchpokli) |
| 7 | city | high | [7] header 'kolkata', values are Indian city names (Mumbai, tilak nagar, Mohali) |
| 8 | state | high | [8] header 'West Bengal', values are Indian states (Maharashtra, Delhi, Punjab) |
| 10 | zip | high | [10] header '700103', values are 6-digit Indian PIN codes (400012, 110018, 160065) |
| 11 | phone | high | [11] header '9546274404', values are 10-digit Indian mobile numbers (9833222314, 9819739937, 8939136619) |
Notes: 13 columns total, 10 contain PII. Column [0] is numeric user_id (skip). Column [9] is country code 'IND' (skip). Column [12] is empty (skip). All PII fields mapped including Indian addresses, phone numbers, and email addresses consistent with Yatra.com travel booking platform breach context.
tuser__3_.csv12 columns99,150 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] Values contain @ symbols and match email format ([email protected], [email protected], etc.) | |
| 2 | suffix | high | [2] Header 'Mr', consistent salutation/title across all rows |
| 3 | firstName | high | [3] Values are given names (SWAMINATHAN, Venugopal, aswin, asif, TANMOY, Chelaram) |
| 4 | lastName | high | [4] Values are family names (LANKA, Thangamuthu, fernandis, masud, CHOWDHURY, Choudary) |
| 5 | address1 | high | [5] Street addresses (#28-7-6 ANNADANA SAMAJAM ROAD, 24 Bharathi Nagar, orthodox church center sector 10 a, BARRACKPORE) |
| 6 | address2 | high | [6] Secondary address component (Opp: AMBEDKAR BHAVAN ARUNDELPET, Kovaipudur) |
| 7 | city | high | [7] Indian city names (VIJAYAWADA, Coimbatore, navi mumbai, KOLKATA) |
| 8 | state | high | [8] Indian states (Andhra Pradesh, Tamilnadu, Maharashtra, West Bengal) |
| 9 | country | high | [9] Country codes (IND, IN) representing India |
| 10 | zip | high | [10] Indian PIN codes (520002, 641042, 400703, 700122) |
| 11 | phone | high | [11] 10-digit Indian mobile numbers (9347406048, 9894019632, 9210920151, etc.) |
| 12 | phone | high | [12] Alternate phone numbers, 10-digit format matching Indian numbering, some empty cells allowed |
Notes: 13 columns total, 12 contain PII. Column [0] (numeric IDs like 2569869) is skipped as internal user_id. Data confirmed Indian travel booking platform (Yatra) with Indian addresses, phone numbers, email providers (yahoo.co.in, rediffmail.com), and city/state references.
tuser__4_.csv12 columns99,200 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header '[email protected]' contains @ signs, all values are valid email addresses | |
| 2 | suffix | high | [2] values are 'Mr', 'Ms' — salutation/title indicators |
| 3 | firstName | high | [3] values are given names: 'Venugopal', 'Manoj', 'Molle', 'Anil', 'sateeshkumar' — first name column |
| 4 | lastName | high | [4] values are family names: 'Joshi', 'Sethi', 'kumar', 'Kulkarni' — last name column |
| 5 | address1 | high | [5] values are street/building details: 'sotla', 'Shop No 5', 'delhi', '201 Batavia Chambers' — primary address line |
| 6 | address2 | high | [6] values are secondary address components: 'Sector 14', 'Kumarakrupa Road' — secondary address line |
| 7 | city | high | [7] values are Indian city names: 'hoshiarpur', 'Gurgaon', 'Delhi', 'Bangalore' |
| 8 | state | high | [8] values are Indian states: 'Punjab', 'Haryana', 'Delhi', 'Karnataka' |
| 9 | country | high | [9] values are country codes: 'IND', 'IN' — country identifier |
| 10 | zip | high | [10] values are 6-digit postal codes: '144210', '122001', '110001', '560001' — Indian PIN codes |
| 11 | phone | high | [11] values are 10-digit phone numbers: '9716203040', '7666519936' — mobile numbers |
| 12 | phone | high | [12] values are 10-digit phone numbers: '9999410009' — alternate/secondary phone |
Notes: Yatra.com 2019 breach. Column [0] is numeric user ID (skipped). All other columns contain valid PII. Indian addresses, email providers (gmail.com, yahoo.co.in), and phone format (10 digits starting with 7-9) confirm Indian travel booking records.
tuser__5_.csv12 columns98,921 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ symbols and email domain patterns (gmail.com, yahoo.co.in, hotmail.com) | |
| 2 | suffix | high | [2] header/values are 'Mr', 'Mrs' — salutations/titles |
| 3 | firstName | high | [3] values are given names (Anjana, Pranay, viswanathan, sajad, Gopal) |
| 4 | lastName | high | [4] values are family names (Naskar, Sinha, vijaya, sofi, Raju) |
| 5 | address1 | high | [5] values are street addresses (J.P.Nagar, F 401 SBOQ) |
| 6 | address2 | high | [6] values are secondary address components (Sector 17, Vashi) |
| 7 | city | high | [7] values are Indian city names (Bangalore, Navi Mumbai) |
| 8 | state | high | [8] values are Indian state abbreviations/names (Karnataka, Maharashtra) |
| 9 | country | high | [9] values are country codes (IN, IND) — all India, consistent with breach context |
| 10 | zip | high | [10] values are Indian PIN codes (560078, 400705) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (8600520494, 9433498635, etc.) |
| 12 | phone | high | [12] values are 10-digit alternate phone numbers (4422243847) |
Notes: 13 columns total. Column [0] is numeric user ID (skip). All addresses, phone numbers, and email domains are Indian in origin, consistent with Yatra 2019 breach context. Suffix field indicates formal salutation data.
tuser__6_.csv12 columns98,887 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] values contain @ signs, clearly email addresses from Indian providers (rediffmail.com, gmail.com, hotmail.com) | |
| 2 | suffix | high | [2] header 'MR', values are titles (Mr, MR) |
| 3 | firstName | high | [3] values are common given names (RAHUL, Jagadish, Krishnamurthy, shaikh, bhanumurthy, Rajeev) |
| 4 | lastName | high | [4] values are surnames (NIGAM, Prabhala, Viswanathan, sayeed, govindu, Sharma) |
| 5 | address1 | high | [5] values contain street addresses and building details (Basement 25Vishwas Market, 267 jawahar nagar, 7/104 Prithvi Brahmand) |
| 6 | address2 | high | [6] values contain secondary address components (Ghodbunder Road, landmarks, street names) |
| 7 | city | high | [7] values are Indian city names (Lucknow, moulali, THANE, Bangalore, nagpur) |
| 8 | state | high | [8] values are Indian states (Uttar Pradesh, Andhra Pradesh, MAHARASHTRA, Karnataka) |
| 9 | country | high | [9] values are country codes (IN, IND) indicating India |
| 10 | zip | high | [10] values are 6-digit Indian PIN codes (226020, 500040, 400607, 560032) |
| 11 | phone | high | [11] values are 10-digit Indian mobile numbers (9415049287, 9903138911, 9654426143) |
| 12 | phone | high | [12] values are 10-digit Indian mobile numbers, alternate phone field (9905417398, 9945567054, 9441231084) |
Notes: Yatra 2019 breach. Column [0] is numeric user_id (skip). All other columns contain searchable PII from Indian travel booking records.
tuser__7_.csv12 columns98,763 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] Values contain @ signs and match standard email format (gmail.com, rediffmail.com, yahoo.com) | |
| 2 | suffix | high | [2] Header is 'Mr', all values are titles/salutations |
| 3 | firstName | high | [3] Header appears to be first name position, values are common Indian given names (Siddhartha, Vanshika, SATYENDRA, etc.) |
| 4 | lastName | high | [4] Header appears to be last name position, values are common Indian surnames (Bhargava, Saksena, JAIN, SUBRAMANIAM, etc.) |
| 5 | address1 | high | [5] Values match street address format (D-1, 11A 3rd Cross 1st Main, D-No 38-34-1 marripalem, Milan apartment thatipur) |
| 6 | address2 | high | [6] Values are locality/area names (Defence Colony, KHB Colony Basaveshwara Nagar, daman) |
| 7 | city | high | [7] Values are Indian city names (New Delhi, Bangalore, Visakhapatnam, Gwalior, daman) |
| 8 | state | high | [8] Values are Indian states/territories (Delhi, Karnataka, Andhra Pradesh, Madhya Pradesh, Daman and Diu) |
| 9 | country | high | [9] Values are country codes/names (IND, India) consistent with breach context |
| 10 | zip | high | [10] Values are 6-digit Indian PIN codes (110024, 560079, 530018, 474011, 396210) |
| 11 | phone | high | [11] Values are 10-digit Indian mobile numbers (8126968932, 8600600843, 9879966325, etc.) |
| 12 | phone | high | [12] Values are 10-digit Indian phone numbers (alternate/secondary phone, some 7-10 digits) |
Notes: Yatra 2019 breach data. Column [0] is numeric user ID (skip). 12 PII columns identified mapping to complete user profiles with email, name, address, and phone contact information. All addresses and phone numbers confirmed as Indian per breach context.
tuser__8_.csv11 columns98,470 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 1 | high | [1] header 'email' (inferred), values contain @ signs and email domains (gmail.com, yahoo.co.in, whsmith.co.uk) | |
| 2 | suffix | high | [2] header 'salutation/title', values are honorifics (Mr, Mrs, Dr) |
| 3 | firstName | high | [3] header 'first_name' (inferred), values are common given names (Mathew, Shailendra, Ankur, SANTOSH) |
| 4 | lastName | high | [4] header 'last_name' (inferred), values are surnames (Edes, Singh, Bhatnagar, BHARTI, ARUL) |
| 5 | address1 | high | [5] header 'address1' (inferred), values are street addresses (65 Riverpark Drive, 229 G.Floor Mandakini Enclave, Flat 131) |
| 6 | address2 | high | [6] header 'address2' (inferred), values are secondary address components (C-58/21 Sector 62, BY PASS ROAD) |
| 7 | city | high | [7] header 'city' (inferred), values are Indian city names (New Delhi, Noida, MUMBAI, MADURAI, surat) |
| 8 | state | high | [8] header 'state' (inferred), values are Indian states (Delhi, UP, Maharashtra, Tamilnadu, Gujarat) |
| 9 | country | high | [9] header 'country' (inferred), values are country codes (GB, IND, IN) |
| 10 | zip | high | [10] header 'zip/PIN' (inferred), values are postal codes (SL7 1QT, 110019, 201301, 400005, Indian PIN formats) |
| 11 | phone | high | [11] header 'mobile' (inferred), values are 10-digit Indian mobile phone numbers (7792166476, 9873338291, 8010609426) |
Notes: Yatra 2019 breach — Indian travel platform. 13 columns total, 11 contain PII. Column [0] is numeric user_id (skip). Column [12] contains incomplete phone numbers/data (skip). All addresses, emails, and phone numbers are consistent with Indian geography and telecom formats.
tuser__9_.csv13 columns98,929 rows
File structure
Format: CSV·Delimiter: comma·Has header: no·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | [0] Sequential numeric user IDs (2816970, 2816971, etc.) — internal identifier pattern |
| 1 | high | [1] Values contain @ symbols and email domain patterns (gmail.com, yahoo.co.in, verizon.net, hotmail.com) | |
| 2 | suffix | high | [2] Header 'Mr' and values are salutation/title prefixes (Mr, Mr, Mr) — common Indian address convention |
| 3 | firstName | high | [3] Common given names (SESHADEV, Anil, Luke, Ambarish, Sahil, mohit) — typical first name values |
| 4 | lastName | high | [4] Family names (RAY, Puri, Hay, Bhusari, Ravjit, wasson) — typical last name values |
| 5 | address1 | high | [5] Street-level addresses (332 khajor road, 155 Victoria Drive, Flat B-104 Prime Heights, house no. 323 sector-71 mohali, c-9/6 sector-8 rohini) |
| 6 | address2 | medium | [6] Suburb/locality names (Jimboomba, Sus Road Pashan) — secondary address component |
| 7 | city | high | [7] City/locality names (karol bagh, Logan City, PUNE, mohali, delhi) — Indian and international cities |
| 8 | state | high | [8] State/province names (Delhi, Queensland, Maharashtra, Punjab) — Indian states and international provinces |
| 9 | country | high | [9] Country codes and names (India, IND, AU, IN) — country identifiers |
| 10 | zip | high | [10] Indian postal codes and international postcodes (110005, 4280, 411045, 160071, 110085) — ZIP/PIN patterns |
| 11 | phone | high | [11] Indian mobile numbers (9439559613, 9810013834, 9797186126, 9890923172, 9930022115) — 10-digit pattern starting with 9, typical Indian mobile format |
| 12 | phone | high | [12] Alternate phone numbers (9814361138, 1127941766) — secondary contact numbers |
Notes: Yatra.com 2019 breach. 13 columns total, 11 contain searchable PII (email, names, addresses, phone, country). Column [0] is numeric user ID (skip). Suffix column indicates formal Indian address format. Multiple phone columns indicate primary and alternate contact numbers common in Indian travel booking data.