DataCamp (datacamp.com)
Jan 27, 2017
A breach of DataCamp (datacamp.com), an online learning platform for data science and programming education. The dataset contains user account records including numeric user IDs, email addresses, bcrypt-hashed passwords, password reset tokens, sign-in counts, last sign-in timestamps and IP addresses, account creation timestamps, authentication tokens, names, locations, education, biography, avatar file information, Coursera integration flags, Stripe/PayPal/Braintree customer IDs, payment method tokens, company names, group membership hashes, inviter IDs, first names, last names, and anonymous email flags. The earliest account creation dates are from May 2013, while the latest activity timestamps are from January 2017. The data includes accounts from DataCamp co-founders Dieter De Mesmaeker ([email protected]) and Jonathan Cornelissen ([email protected]). The archive was distributed via BreachForums.
Data found in this dataset
Source files
Expand any file to inspect its column headers and the LLM's field-mapping reasoning, recorded during ingestion.
codecamp.txt5 columns760,598 rows
File structure
Format: CSV·Delimiter: comma·Has header: yes·Quote: "
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| 2 | high | [2] header 'email', values are clearly email addresses ([email protected], [email protected]) | |
| 15 | fullName | medium | [15] header 'name', maps to fullName; values are NA in sample but header and context indicate full name field |
| 30 | username | high | [30] header 'slug', values are URL-safe username slugs (dieter, jonathancornelissen, dieterdm90-test) — searchable user identifiers |
| 41 | firstName | high | [41] header 'first_name', values are given names (Dieter, Jonathan) |
| 42 | lastName | high | [42] header 'last_name', values are family names (De Mesmaeker, Cornelissen) |
Notes: 44 columns total, 5 contain PII. encrypted_password is excluded per rules (password_hash pattern). IP address columns (current_sign_in_ip, last_sign_in_ip) are skipped per exclusion rules. company_name is skipped per company exclusion rule. authentication_token, reset_password_token are internal tokens — skipped. location, education, biography, marketing_biography are all NA in sample and too vague to map confidently. All timestamp, counter, flag, and payment ID columns are skipped.