topsy.com
Mar 15, 2014
A database dump from what appears to be Topsy, a social media analytics platform acquired by Apple in 2013. The data contains user account records including names, email addresses, usernames, OAuth tokens for Google and Facebook, profile photo URLs, and account metadata. Records are dated around March 2014. The data includes live OAuth access tokens for Google and Facebook at the time of breach.
Data found in this dataset
Source files
Expand any file to inspect its column headers and the LLM's field-mapping reasoning, recorded during ingestion.
part2.json66 columns100,000 rows
File structure
Format: NDJSON
| Source column | Mapped field | Confidence | LLM assessment |
|---|---|---|---|
| date | skip | high | record creation date, not PII |
| google.data.ageRange | skip | high | age range metadata |
| facebook.url | skip | high | profile URL |
| google.data.profile_url | skip | high | profile URL |
| microsoft | skip | high | nested provider object (Microsoft OAuth) |
| twitter.id | skip | high | Twitter user ID |
| google.data.displayName | fullName | high | display name same as full name |
| google.id | skip | high | provider user ID |
| skip | high | nested provider object | |
| facebook.data.token_expired | skip | high | token metadata |
| google.data.objectType | skip | high | metadata |
| google.data.emails | skip | high | nested array; extract individual email values |
| permissions | skip | high | metadata array |
| facebook.data.profile_url | skip | high | profile URL |
| id | skip | high | internal numeric ID with $numberLong type |
| google.data.screen_name | username | high | social media screen name from Google profile |
| is_searchable | skip | high | boolean flag |
| google.data.gender | gender | high | key is 'gender', values are 'male', 'female' |
| google.data.image | skip | high | image URL |
| created | skip | high | record creation timestamp |
| skip | high | nested provider object containing OAuth tokens and profile data; nested objects require separate analysis | |
| email_confirmed | skip | high | boolean flag |
| google.data.profile_img_url | skip | high | image URL |
| facebook.email | high | email within provider object (often blank for Facebook) | |
| google.data.occupation | skip | high | occupation/employment data |
| name | fullName | high | contains full personal names like 'Van Harold', 'angel sadika khan', 'Lupita Alvarado' |
| facebook.data.name | fullName | high | full name from Facebook profile |
| username_changed | skip | high | metadata boolean flag |
| google.data.language | skip | high | language preference |
| _id | skip | high | MongoDB internal ID |
| google.data.email | high | email from provider data | |
| updated | skip | high | record update timestamp |
| google.data.cover | skip | high | profile cover photo metadata |
| google.email | high | email address within provider object | |
| twitter.token | skip | high | OAuth token |
| twitter.data.screen_name | username | high | Twitter handle |
| google.data.circledByCount | skip | high | social metric |
| twitter.secret | skip | high | OAuth token secret |
| google.data.token_expired | skip | high | token metadata |
| google.data.url | skip | high | profile URL |
| top | skip | high | boolean flag, not PII |
| google.url | skip | high | profile URL, not PII |
| provider | skip | high | OAuth provider name (google, facebook, twitter, microsoft) |
| google.data.etag | skip | high | API metadata |
| google.data.kind | skip | high | schema metadata |
| origins | skip | high | metadata array |
| facebook.id | skip | high | provider user ID |
| categories | skip | high | metadata array |
| facebook.data.email | high | email from Facebook provider data | |
| key | skip | high | internal UUID key |
| high | key is 'email', values contain email addresses with @ symbol | ||
| subscribe | skip | high | boolean flag |
| skip | high | nested provider object; contains OAuth tokens and nested data | |
| verified | skip | high | boolean flag |
| photo | skip | high | profile image URL, not PII field |
| google.token | skip | high | OAuth access token, not a PII field type in scope |
| blacklist | skip | high | boolean flag |
| facebook.data.token | skip | high | OAuth token |
| google.data.token | skip | high | OAuth token |
| google.data.name | fullName | high | full name from OAuth provider data |
| twitter.data.name | fullName | high | full name from Twitter profile |
| comment | skip | high | user comment field |
| google.data.isPlusUser | skip | high | boolean flag |
| facebook.token | skip | high | OAuth access token |
| google.data.verified | skip | high | metadata flag |
| username | username | high | social media handle/account username |
Notes: Topsy social media platform dump from March 2014. Data contains OAuth-linked user profiles from Google, Facebook, Twitter, and Microsoft. Primary PII fields are: name (fullName), email, and username (social handle). Nested provider objects (google, facebook, twitter, microsoft) contain OAuth tokens and linked profile data. Extract email and fullName/displayName from nested provider.data objects separately. Keys like 'id' with numeric values or UUIDs are internal IDs and must be skipped. Values in nested 'data' objects should be analyzed independently. Gender values observed: 'male', 'female'. No address, phone, DOB, SSN, or password fields detected in sample.