Hello, this is my jupyter notebook counterpart for my '2 Ways of Analyzing Geographic Culture Through X API v2 + British LLM?' article on Medium.com. Link --> https://medium.com/@adelnsahuc/analyzing-geographic-culture-through-x-api-v2-british-llm-53d26f832876?postPublishedType=repub
Feel free to report issues. Notice that the findings should not be considered a complete analysis of these regions cultures as a set of people from X do not represent the full cultural identity of any region, but it does give us some insights.
Config Files (configs/*.json)
- Region settings: bio keywords, output paths, API limits
- Example:
uk.json,nyc.json,sg.json
Local Seeds (data/local_seeds/*.json)
- Categorized seed accounts (sports, music, tech_lifestyle, comedy)
- Used as starting points to find regional users
Users (data/users/*.jsonl)
- User objects:
id,username,name,description,location,public_metrics - Filtered by bio keywords to confirm regional association
Tweets (data/tweets/*.jsonl)
- Tweet objects:
id,text,author_id,created_at,public_metrics - Collected from confirmed local users
Adjacent Users (data/users_adjacent/*.jsonl)
- Users found via mentions of seed accounts
- Same structure as users, before bio filtering
-
Data Collection (Pipeline Notebook)
- Load region config and seed accounts
- Resolve seeds via X API
/2/users/by - Find adjacent users via
/2/tweets/search/recent(mentions) - Filter adjacent users by bio keywords → confirmed locals
- Collect tweets from confirmed locals via
/2/users/:id/tweets
-
Data Analysis (Analysis Notebook)
- Load pre-collected tweets for NYC, SG, UK
- Sentiment analysis with NLTK VADER
- Topic profiling with Empath dictionary
- Dimensionality reduction with PCA/UMAP
- Compare regional "cultural profiles" in topic space