appen.com
Mar 1, 2019
Database dump from Appen (formerly CrowdFlower), a data annotation and AI training data company. The breach contains user account records including names, email addresses, bcrypt-hashed passwords, authentication tokens, phone numbers, company affiliations, sign-in metadata, and account creation timestamps. The data file is named after CrowdFlower, the company that Appen acquired in 2016. Data records span from approximately 2014 to early 2019.
Data found in this dataset
Source files
Expand any file to inspect its column headers and the LLM's field-mapping reasoning, recorded during ingestion.
Appen_BF__data__crowdflower.txt38 columns5,888,237 rows
File structure
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 0 | skip | high | UUID identifier field (id) |
| 1 | fullName | high | Full names of users (e.g., 'Denis kemei', 'Brian Ward') |
| 2 | high | Email addresses with @ symbol (e.g., '[email protected]') | |
| 3 | password | high | bcrypt-hashed passwords starting with $2a$12$ prefix |
| 4 | skip | high | reset_password_token - security token, not PII |
| 5 | skip | high | reset_password_sent_at - timestamp metadata |
| 6 | skip | high | remember_created_at - timestamp metadata |
| 7 | skip | high | sign_in_count - internal counter |
| 8 | skip | high | current_sign_in_at - timestamp metadata |
| 9 | skip | high | last_sign_in_at - timestamp metadata |
| 10 | skip | high | current_sign_in_ip - login metadata |
| 11 | skip | high | last_sign_in_ip - login metadata |
| 12 | skip | high | failed_attempts - internal counter |
| 13 | skip | high | unlock_token - security token |
| 14 | skip | high | locked_at - timestamp metadata |
| 15 | skip | high | authentication_token - API/session token |
| 16 | skip | high | salt - password salt for hashing |
| 17 | skip | high | created_at - timestamp |
| 18 | skip | high | updated_at - timestamp |
| 19 | skip | high | email_verified_at - timestamp |
| 20 | skip | high | email_verification_sent_at - timestamp |
| 21 | skip | high | email_verification_token - security token |
| 22 | skip | medium | unverified_email - duplicate of verified email, metadata |
| 23 | phone | medium | phone_number field, mostly empty in samples |
| 24 | skip | high | company - employer affiliation, not core PII field |
| 25 | skip | high | email_subscriber - boolean flag |
| 26 | skip | high | title - honorific prefix (Mr, Mrs, Ms, Dr, Prof, Rev), no PII field exists per instructions |
| 27 | skip | high | roles_updated_at - timestamp |
| 28 | skip | high | quick_sign_up - boolean flag |
| 29 | skip | high | internal_contributor_created_at - timestamp |
| 30 | skip | high | external_contributor_created_at - timestamp |
| 31 | skip | high | requestor_created_at - timestamp |
| 32 | skip | high | resend_verification_email_count - counter |
| 33 | skip | high | identity_id - internal identifier |
| 34 | skip | high | disabled_at - timestamp |
| 35 | skip | high | terms_of_service_accepted_at - timestamp |
| 36 | skip | high | current_team_id - internal team identifier |
| 37 | skip | high | api_team_id - internal team identifier |
Notes: This is a database dump (pipe-delimited) from Appen's user accounts table, not a traditional combo list. Extracted PII: fullName, email, bcrypt passwords, and optional phone. Most fields are timestamps, tokens, or internal metadata. The 'title' field contains honorifics (Mr, Dr, etc.) which are skipped per instructions—no PII field type exists for these. Phone_number field is present but mostly empty in sample data.