dotnet tool install --global soddiIf you have .NET 10 or later, you can run soddi without installing it using DNX (like npx for Node.js):
dnx soddi [command] [options]DNX downloads and runs the tool on-demand without permanent installation. On first use, you'll see a confirmation prompt. All examples in this README can use dnx soddi instead of soddi.
Examples:
# List available archives
dnx soddi list
# Download and import (without installing soddi globally)
dnx soddi download math
dnx soddi import math -d math.stackexchange.comInspirited by the original Soddi, DotNet-Soddi is a console application that assists in not just importing the Stack Overflow data dumps, but also obtaining them.
- Discover archives from the command line.
- Importing data from xml files or stream directly from the .7z archive files.
- Download archive files from the command line via HTTP based on their site name.
- Download archive files via BitTorrent based on their site name.
- SQL Server (default)
- PostgreSQL
Use the --provider flag with the import command to specify your database provider (see examples below).
Not all features of the original Soddi are supported.
- Full Text isn't supported.
The following command will download the archive for the math.stackexchange.com, then import it into an existing database named math.stackexchange.com
soddi download math
soddi import math -d math.stackexchange.comOr with DNX (.NET 10+):
dnx soddi download math
dnx soddi import math -d math.stackexchange.comsoddi download math
soddi import math --provider postgres --connectionString "Host=localhost;Username=postgres;Password=yourpassword" -d math_stackexchangeOr with DNX (.NET 10+):
dnx soddi download math
dnx soddi import math --provider postgres --connectionString "Host=localhost;Username=postgres;Password=yourpassword" -d math_stackexchangeBecause of the size of the database and the bandwidth of archive.org, you might be better off using the torrent option. The following command will do everything required to connect to the appropriate trackers and peers to download the math database and exit upon completion.
soddi torrent mathDownloads a Stack Overflow site archive via HTTP from archive.org.
USAGE:
download archive for aviation.stackexchange.com:
soddi download aviation
download archive for math.stackexchange.com to a particular folder:
soddi download --output c:\stack-data math
pick from archives containing stack and download:
soddi download --pick stack
-o, --output Output folder
-p, --pick (Default: false) Pick from a list of archives to download
--help Display this help screen.
--version Display version information.
Archive (pos. 0) Required. Archive to downloadList available Stack Overflow data dumps.
USAGE:
List all archives:
soddi list
List all archives containing the letters "av":
soddi list av
List all archives containing the letters "av" including meta sites:
soddi list --includeMeta av
--includeMeta (Default: false) Include meta databases.
--help Display this help screen.
--version Display version information.
Pattern (pos. 0) (Default: ) Pattern to include (e.g. "av" includes all archives containing "av").Import data from a 7z archive or folder
USAGE:
Import data using defaults (SQL Server):
soddi import math.stackexchange.com.7z
Import data from a folder containing a collection of .7z or .xml files:
soddi import stackoverflow
Import data using a connection string and database name:
soddi import --connectionString "Server=(local)\Sql2017;User Id=admin;password=t3ddy" --database math
math.stackexchange.com.7z
Import data using defaults and create database:
soddi import --dropAndCreate math.stackexchange.com.7z
Import data using defaults without constraints:
soddi import --skipConstraints math.stackexchange.com.7z
PostgreSQL examples:
soddi import math.stackexchange.com.7z --provider postgres --connectionString "Host=localhost;Username=postgres;Password=yourpassword"
soddi import stackoverflow --provider postgresql --connectionString "Host=myserver;Port=5432;Database=postgres;Username=admin;Password=secret" -d stackoverflow
soddi import math.stackexchange.com.7z --provider pg --dropAndCreate --connectionString "Host=localhost;Username=postgres;Password=yourpassword"
-d, --database Name of database. If omitted the name of the file or folder will be used.
-c, --connectionString Connection string to server.
SQL Server (default): Server=.;Integrated Security=true
PostgreSQL: Host=localhost;Username=postgres;Password=yourpassword
Initial catalog/database in the connection string will be ignored and the --database
parameter will be used for the database name.
--provider (Default: sqlserver) Database provider to use. Options: sqlserver, postgres (aliases:
postgresql, pg). Case-insensitive.
--dropAndCreate (Default: false) Drop and recreate database. If a database already exists with this name it
will be dropped. Then a database will be created in the default server file location with
the default server options.
--skipConstraints (Default: false) Skip adding primary keys and unique constraints.
--skipTags (Default: false) Skip adding PostTags table.
--help Display this help screen.
--version Display version information.
Path (pos. 0) Required. File or folder containing xml archives. The file must be a .7z file. If using a
folder it can contain either .7z or .xml contentDownload database via BitTorrent. For larger databases this might prove significantly faster.
USAGE:
Download files associated with the math site from the torrent file:
soddi torrent math
Download to a specific folder:
soddi torrent --output "c:\torrent files" math
Enable port forwarding:
soddi torrent --portForward math
Pick from archives containing "stack":
soddi torrent --pick stack
-o, --output Output folder
-f, --portForward (Default: false) Experimental. Enable port forwarding
-p, --pick (Default: false) Pick from a list of archives to download
--help Display this help screen.
--version Display version information.
Archive (pos. 0) Required. Archive to downloadSuper experimental. Downloads and extracts one of the BrentO provided databases via BitTorrent.
If you don't supply an ARCHIVE_NAME you will be prompted for one.
USAGE:
soddi brent [ARCHIVE_NAME] [OPTIONS]
EXAMPLES:
soddi brent
soddi brent small
soddi brent medium
soddi brent large
soddi brent extra-large
ARGUMENTS:
[ARCHIVE_NAME] Archive to download
OPTIONS:
-h, --help Prints help information
-o, --output Output folder
--skipExtraction Don't extract the downloaded 7z files
-f, --portForward Experimental. Enable port forwarding
When using PostgreSQL as the database provider:
- Table names are created in lowercase (PostgreSQL convention)
- Uses PostgreSQL native types:
SERIAL,TEXT,TIMESTAMP,BOOLEAN,VARCHAR(n)
- Bulk insert uses PostgreSQL's high-performance
COPYcommand (similar to SQL Server'sSqlBulkCopy) - Significantly faster than individual INSERT statements
- Foreign keys are added with the
NOT VALIDclause to allow orphaned references - This is common in Stack Overflow data dumps where referenced records may not exist
- To find orphaned references after import:
-- Example: Find badges with non-existent users SELECT * FROM badges WHERE userid NOT IN (SELECT id FROM users);
- To validate constraints after import:
ALTER TABLE badges VALIDATE CONSTRAINT fk_badges_users;
PostgreSQL connection strings use Npgsql format:
Host=localhost;Port=5432;Database=postgres;Username=postgres;Password=yourpassword
Common options:
SSL Mode=Require- Require SSL/TLS encryptionPooling=true- Enable connection pooling (default)Timeout=30- Connection timeout in seconds