This repository offers a wide range of datasets and queries from open data or our own practices (with necessary desensitization).
Datasets include a large number of typical domains, with diversified data characters (e.g., different column/tuple numbers).
Queries are real SQL statements that support various functionalities, such as feature extraction (
), transactions (
), and analytical queries (coming soon).
| name |
description |
table number |
column number |
SQL |
source |
| GEF2012-wind-forecasting |
Hourly power generation at 7 wind farms |
10 |
61 |
 |
kaggle |
| electric-power-consumption |
Per capita energy consumption in Morocco |
1 |
9 |
 |
kaggle |
| energydata_complete |
|
2 |
59 |
 |
|
| ashrae-energy-prediction |
Energy usage from over 1,000 buildings over a three-year timeframe |
5 |
32 |
 |
kaggle |
| name |
description |
table number |
column number |
SQL |
source |
| recruit-restaurant-visitor-forecasting |
The browsing statistics of two restaurant websites |
8 |
28 |
 |
kaggle |
| santander-customer-satisfaction |
Hundreds of anonymized features that could reflect whether a customer is satisfied with their banking experience |
1 |
372 |
 |
kaggle |
| GiveMeSomeCredit |
Credit features of 250,000 borrowers in banking scenario |
1 |
13 |
 |
kaggle |
| daily-financial-news |
Daily financial news for over 6,000 stocks |
2 |
12 |
 |
tianchi |
| restaurant-revenue-prediction |
Demographic, real estate, and commercial data for the investments of new restaurant sites |
2 |
85 |
 |
kaggle |
| homesite-quote-conversion |
An anonymized database of information on customer and sales activity |
2 |
597 |
 |
kaggle |
| allstate-claims-severity |
Insurance claims for worry-free customer experiences |
3 |
265 |
 |
kaggle |
| tiantian |
The price-related features constructed using the fund market data downloaded from TianTian Fund website |
1 |
332 |
 |
tianchi |
| sberbank-russian-housing-market |
Information about overall conditions in the country's economy and finance sector |
4 |
685 |
 |
kaggle |
| dow_jones_index |
|
1 |
16 |
 |
|
| robinhood-stock-data |
The historical stock price of Robinhood (ticker symbol HOOD) |
1 |
6 |
 |
kaggle |
| porto-seguro-safe-driver-prediction |
The features that affect an auto insurance policy holder files a claim |
1 |
60 |
 |
kaggle |
| amex-default-prediction |
|
4 |
384 |
 |
|
| house-rent-prediction-dataset |
Information on almost 4700+ Houses/Apartments/Flats Available for Rent |
1 |
12 |
 |
kaggle |
| name |
description |
table number |
column number |
SQL |
source |
| big-data-derby-2022 |
A wealth of data is now collected, including measures for heart rate, EKG, longitudinal movement, et al |
3 |
24 |
 |
kaggle |
| predict-west-nile-virus |
Weather, location, testing, and spraying data |
5 |
51 |
 |
kaggle |
| covid19-global-forecasting-week-2 |
Statistics of COVID19 cases in various locations across the world |
1 |
6 |
 |
kaggle |
| covid19-global-forecasting-week-5 |
Statistics of COVID19 cases in various locations across the world |
1 |
9 |
 |
kaggle |
| covid19-global-forecasting-week-4 |
Statistics of COVID19 cases in various locations across the world |
1 |
6 |
 |
kaggle |
| covid19-global-forecasting-week-1 |
Statistics of COVID19 cases in various locations across the world |
1 |
8 |
 |
kaggle |
| covid19-global-forecasting-week-3 |
Statistics of COVID19 cases in various locations across the world |
1 |
6 |
 |
kaggle |
| name |
description |
table number |
column number |
SQL |
source |
| facebook-v-predicting-check-ins |
|
3 |
13 |
 |
|
| telstra-recruiting-network |
|
7 |
18 |
 |
|
| twitter-threads |
Thread functionality in Twitter |
5 |
35 |
 |
tianchi |
| spotify-app-reviews-2022 |
Spotify reviews on Google Play Store |
1 |
6 |
 |
kaggle |
| name |
description |
table number |
column number |
SQL |
source |
| PRSA2017_Data_20130301-20170228 |
|
12 |
216 |
 |
|
| AirQualityUCI |
The responses of a gas multisensor device deployed on the field in an Italian city |
1 |
1 |
 |
UCI_ML |
| historicalweatherdataforindiancities |
Temperature data (Minimum, Average, Maximum) in degrees Centigrade and Precipitation data |
7 |
34 |
 |
kaggle |
| name |
description |
table number |
column number |
SQL |
source |
| store-sales-time-series-forecasting |
Dates, store and product information |
5 |
22 |
 |
kaggle |
| coupon-purchase-prediction |
A year of transactional data for 22,873 users on the site ponpare.jp |
9 |
80 |
 |
kaggle |
| grupo-bimbo-inventory-demand |
9 weeks of sales transactions in Mexico |
6 |
28 |
 |
kaggle |
| rossmann-store-sales |
Historical sales data for 1,115 Rossmann stores |
2 |
19 |
 |
kaggle |
| favorita-grocery-sales-forecasting |
Dates, store and item information, whether that item was being promoted, as well as the unit sales |
6 |
26 |
 |
kaggle |
| walmart-recruiting-store-sales-forecasting |
|
5 |
26 |
 |
|
| walmart-recruiting-sales-in-stormy-weather |
Sales data for 111 products whose sales may be affected by the weather (such as milk, bread, umbrellas, etc.) |
4 |
28 |
 |
kaggle |
| ecommerce-customerssales-record |
Order Statistics |
1 |
41 |
 |
kaggle |
| competitive-data-science-predict-future-sales |
Daily historical sales data. |
5 |
16 |
 |
kaggle |
| m5-forecasting-accuracy |
Item sales at stores in various locations for two 28-day time periods |
3 |
1965 |
 |
kaggle |
| goods |
Public production introduction information |
41 |
807 |
 |
|
| material |
Historical inventory statistics |
79 |
1265 |
 |
|
| orders |
Historical order details |
35 |
809 |
 |
|
| shopmall |
Comments and shelf status of goods |
35 |
809 |
 |
|
| transaction |
Order details (query only) |
50 |
1069 |
 |
|
| name |
description |
table number |
column number |
SQL |
source |
| pkdd-15-taxi-trip-time-prediction-ii |
|
4 |
24 |
 |
kaggle |
| nyc-taxi-trip-duration |
NYC Yellow Cab trip record data |
3 |
22 |
 |
kaggle |
| taxi-trajectory |
A complete year (from 01/07/2013 to 30/06/2014) of the trajectories for all the 442 taxis running in the city of Porto |
1 |
9 |
 |
tianchi |
| pkdd-15-predict-taxi-service-trajectory-i |
|
4 |
25 |
 |
kaggle |
| name |
description |
table number |
column number |
SQL |
source |
| talkingdata-mobile-user-demographics |
|
8 |
34 |
 |
kaggle |
| sf-crime |
incidents derived from SFPD Crime Incident Reporting system |
3 |
57 |
 |
tianchi |
| detecting-insults-in-social-commentary |
Detect social spam, account hacking, bot attacks, and more. |
1 |
5 |
 |
kaggle |
| expedia-hotel-recommendations |
Customer behavior |
2 |
174 |
 |
kaggle |
| nfl-big-data-bowl-2022 |
|
7 |
113 |
 |
|
| airbnb-recruiting-new-user-bookings |
Users along with their demographics, web session records, and some summary statistics |
6 |
51 |
 |
kaggle |
| unimelb |
Information on the investigators who are applying for the grant |
1 |
251 |
 |
kaggle |
| Ipin2016Dataset |
|
8 |
314 |
 |
|
| dspp1 |
|
4 |
19 |
 |
|
| lish-moa |
|
4 |
1488 |
 |
|
| foursquare-location-matching |
|
2 |
38 |
 |
|
| bike-sharing-demand |
The duration of travel, departure location, arrival location, and time elapsed |
1 |
12 |
 |
kaggle |
| web-traffic-time-series-forecasting |
|
6 |
1363 |
 |
|
| web-traffic-time-series-forecasting-1 |
|
2 |
553 |
 |
|
| korean-baseball-pitching-data-1982-2021 |
Team pitching data from every season of KBO Baseball |
1 |
34 |
 |
kaggle |
| RSSI_dataset |
RSSIs obtained on smartphones |
2 |
12 |
 |
UCI_ML |
| DontGetKicked |
Car information |
2 |
67 |
 |
kaggle |
| cyclistic-bike-share-user-dataset-1-year |
Cyclistic bikes |
1 |
18 |
 |
kaggle |
| data-science-job-salaries |
|
1 |
12 |
 |
|
| Hybrid_Indoor_Positioning |
|
1 |
67 |
 |
UCI_ML |