Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
137 changes: 2 additions & 135 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,15 +9,6 @@
1. [What is Csv2sql ?](#what)
2. [Why Csv2sql ?](#why)
3. [Using the browser based interface](#dashboard)
1. [Installation and usage](#dashboardinstall)
4. [Running from source](#sourceinstall)
5. [Supported data types](#support)
6. [Handling custom date/datetime formats](#datetime)
7. [Known issues, caveats and troubleshooting](#issues)
8. [Future plans](#future)


*Please have a quick look over the [Known issues, caveats and troubleshooting](#issues) section before using the app.*

<a name="what"></a>
## What is Csv2sql?
Expand Down Expand Up @@ -47,134 +38,10 @@ Csv2Sql can automatically...
<a name="dashboard"></a>
## Use csv2sql from your browser

[Please refer to csv2sql-dashboard](https://github.com/kreeti/csv2sql-ui)

For ease of use csv2sql has browser interface which can be used to easily configure the tool and also provides an interface that shows what is the progress of the various running tasks, which files are currently being processed, the current cpu and memory usage, etc.

<p align="center">
<img src="https://github.com/kreeti/csv2sql/assets/69915843/a657f0ba-6364-4658-b572-147f9b1d3700" alt="browser interface demo"/>
</p>

### Installation and usage: <a name="dashboardinstall"></a>

There are no dependencies needed to use the app via your browser, however you must have mysql or postgres installed.

Download the latest release of the app from the releases section in this repository.

You can now easily run the executable on your linux system, by:

* Extract the zip file named `csv2sql_xx`
* cd into the extracted directory from your terminal: `cd csv2sql_xx`
* Execute the following command: `./bin/csv2sql_and_dashboard start`

This will run a local server which your access at `localhost:4000` in your browser.

Thats all!

*Please create an issue with details of your OS distribution, architecture(for example, x86_64 or ARM) and ABI (for example, musl or gnu) if the app does not run on your system*

Using the app via the browser is super easy, once the app is running, go to `localhost:4000` in your browser.

Now go to the `Change configuration` tab, and enter the relevant configuration details, hover over any configuration option to see what it does.

Whenever your are done, click on the `Start` tab and click on `Start` button below to start the import process.

<a name="sourceinstall"></a>
## Running the app from source code

You must have elixir and mysql/postgresql installed in your system to run Csv2Sql.

To use the app just clone this repository and then install dependencies
by `mix deps.get`

Finally, start the application by ```mix phx.server```

This runs the phoenix server at [localhost:4000](localhost:4000) which provides a browser based interface to use the app.

Thats all !

<a name="support"></a>
## Supported data types

Csv2sql currently supports [MySql](https://www.mysql.com/) and [PostgreSQL](https://www.postgresql.org/) database.

Csv2Sql will map data in CSVs into one of the following data types:


| Type | mysql| postgres |
|----------|------|----------|
| date | For values matching pattern like YYYY-MM-DD or [custom patterns](#datetime) | NOT SUPPORTED, will map to VARCHAR|
| datetime | For values matching pattern like YYYY-MM-DD hh:mm:ss or [custom patterns](#datetime) , (WARNING: fractional seconds or timezone information will be lost if present) | NOT SUPPORTED, will map to VARCHAR|
| boolean | Maps values 0/1 or true/false to [BIT](https://dev.mysql.com/doc/refman/8.0/en/bit-type.html) type | Maps values 0/1 or true/false to [BOOLEAN](https://www.postgresql.org/docs/9.5/datatype-boolean.html) type |
| integer | [INT](https://dev.mysql.com/doc/refman/8.0/en/integer-types.html) | [INT](https://www.postgresql.org/docs/9.5/datatype-numeric.html#DATATYPE-INT) |
| float | [DOUBLE](https://dev.mysql.com/doc/refman/8.0/en/floating-point-types.html) | [NUMERIC(1000, 100)](https://www.postgresql.org/docs/9.5/datatype-numeric.html#DATATYPE-NUMERIC-DECIMAL) |
| varchar | VARCHAR | VARCHAR |
| text | TEXT | TEXT |

All other types of data, will map to either VARCHAR or TEXT.

<a name="datetime"></a>
## Handling custom date/datetime formats

By default csv2sql will identify date or datetime of the following patterns `YYYY-MM-DD` and `YYYY-MM-DD hh:mm:ss` respectively.
If a csv file contains date or datetime in some other format then they will be imported as varchar by default however by specifying custom
patterns we can import such data of arbitrary formats as date or datetime.

csv2sql uses the [Timex](https://github.com/bitwalker/timex) library to parse date/datetime.
You can specify multiple custom patterns for date or datetime as a string having one or more patterns separated by `;`

When using the Web UI for csv2sql enter these pattern strings in the config page under "Custom date patterns" or "Custom datetime patterns".

The patterns should be compatible with Timex directives specified [here](https://hexdocs.pm/timex/Timex.Format.DateTime.Formatters.Default.html#module-list-of-all-directives).

(Custom patterns are only supported when using the web ui and are not available in the cli version of the application)

#### Good to know/Caveats

* Fractional seconds or timezone information is not handled when importing datetime data.
* When multiple custom patterns are specified for large csvs the import process might be slower due to the additional overhead of matching patterns.
* Always double check the patterns specified and verify imported date or datetime data

#### Examples

To parse datetime like `11/14/2021 3:43:28 PM` a pattern like `{0M}/{0D}/{YYYY} {h12}:{m}:{s} {AM}` can be specified

The custom pattern needed is like...

`{0M}/{0D}/{YYYY} {h12}:{m}:{s} {AM}`

Consider a CSV with date or datetime having multiple formats like...

|Example Date|Date Pattern|Example Datetime|Datetime Pattern|
|--|--|--|--|
|2021-11-14|{YYYY}-{0M}-{0D}|2021-11-14T15:43:28|{YYYY}-{0M}-{0D}T{0h24}:{m}:{s}|
|11-14-2021|{0M}-{0D}-{YYYY}|11-14-2021 15:43:28|{0M}-{0D}-{YYYY} {0h24}:{m}:{s}|
|11/14/2021|{0M}/{0D}/{YYYY}|11/14/2021 3:43:28 PM|{0M}/{0D}/{YYYY} {h12}:{m}:{s} {AM}|

The pattern strings to parse the above csv would look like...

For date
`{YYYY}-{0M}-{0D};{0M}-{0D}-{YYYY}`

For datetime
`{YYYY}-{0M}-{0D}T{0h24}:{m}:{s};{0M}-{0D}-{YYYY} {0h24}:{m}:{s};{0M}/{0D}/{YYYY} {h12}:{m}:{s} {AM}`


<a name="issues"></a>
## Known issues, caveats and troubleshooting:

* Timestamp columns will lose there fractional seconds data or time zone information when importing to mysql.

* When importing into a mysql/postgres database you must create the database manually before running the application, otherwise it will fail.

* Csvsql uses the csv file names as table names, make sure that the csv file names are valid table names.

* Make sure your csvs have correct encoding and valid column names to avoid errors.

* If you face database connection timeout errors try reducing the worker and db_worker count in the configurations or change the database timeout, pool size and other related database configurations.

* In case of errors, check your terminal for a clue, or create an issue.

<a name="future"></a>
## Future

* Support for windows os
* Work on known issues and better support for various data types
17 changes: 16 additions & 1 deletion mix.exs
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,21 @@ defmodule Csv2sql.MixProject do
aliases: aliases(),
compilers: Mix.compilers(),
test_coverage: [tool: ExCoveralls],
releases: [{@app, release()}]
releases: [{@app, release()}],
name: "Csv2sql",
description: "Csv2Sql is a blazing fast fully automated tool to load huge CSV files into a RDBMS.",
package: package(),
source_url: "https://github.com/kreeti/csv2sql"
]
end

defp package() do
[
name: "csv2sql",
licenses: ["MIT"],
links: %{
"GitHub" => "https://github.com/kreeti/csv2sql"
}
]
end

Expand Down Expand Up @@ -43,6 +57,7 @@ defmodule Csv2sql.MixProject do
{:stream_split, "~> 0.1.7"},
{:codepagex, "~> 0.1.6"},
{:bakeware, "~> 0.2.4"},
{:ex_doc, "~> 0.33.0"},

# For dev and/or test
{:dotenv, github: "avdi/dotenv_elixir", only: [:test]},
Expand Down
6 changes: 6 additions & 0 deletions mix.lock
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,12 @@
"dialyxir": {:hex, :dialyxir, "1.4.3", "edd0124f358f0b9e95bfe53a9fcf806d615d8f838e2202a9f430d59566b6b53b", [:mix], [{:erlex, ">= 0.2.6", [hex: :erlex, repo: "hexpm", optional: false]}], "hexpm", "bf2cfb75cd5c5006bec30141b131663299c661a864ec7fbbc72dfa557487a986"},
"dir_walker": {:hex, :dir_walker, "0.0.8", "5332225074e4887e6e60ca0242af490215f296511ded4df18d554ae25394f727", [:mix], [], "hexpm", "2f4fb16e6427523700df9eb12eece5679ad4459aaefb1ca3cb580184bfc8d173"},
"dotenv": {:git, "https://github.com/avdi/dotenv_elixir.git", "d6fd3f327173fe18a455203987da95ef9f6cd4c5", []},
"earmark_parser": {:hex, :earmark_parser, "1.4.39", "424642f8335b05bb9eb611aa1564c148a8ee35c9c8a8bba6e129d51a3e3c6769", [:mix], [], "hexpm", "06553a88d1f1846da9ef066b87b57c6f605552cfbe40d20bd8d59cc6bde41944"},
"ecto": {:hex, :ecto, "3.11.2", "e1d26be989db350a633667c5cda9c3d115ae779b66da567c68c80cfb26a8c9ee", [:mix], [{:decimal, "~> 2.0", [hex: :decimal, repo: "hexpm", optional: false]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: true]}, {:telemetry, "~> 0.4 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "3c38bca2c6f8d8023f2145326cc8a80100c3ffe4dcbd9842ff867f7fc6156c65"},
"ecto_sql": {:hex, :ecto_sql, "3.11.1", "e9abf28ae27ef3916b43545f9578b4750956ccea444853606472089e7d169470", [:mix], [{:db_connection, "~> 2.4.1 or ~> 2.5", [hex: :db_connection, repo: "hexpm", optional: false]}, {:ecto, "~> 3.11.0", [hex: :ecto, repo: "hexpm", optional: false]}, {:myxql, "~> 0.6.0", [hex: :myxql, repo: "hexpm", optional: true]}, {:postgrex, "~> 0.16.0 or ~> 0.17.0 or ~> 1.0", [hex: :postgrex, repo: "hexpm", optional: true]}, {:tds, "~> 2.1.1 or ~> 2.2", [hex: :tds, repo: "hexpm", optional: true]}, {:telemetry, "~> 0.4.0 or ~> 1.0", [hex: :telemetry, repo: "hexpm", optional: false]}], "hexpm", "ce14063ab3514424276e7e360108ad6c2308f6d88164a076aac8a387e1fea634"},
"elixir_make": {:hex, :elixir_make, "0.6.3", "bc07d53221216838d79e03a8019d0839786703129599e9619f4ab74c8c096eac", [:mix], [], "hexpm", "f5cbd651c5678bcaabdbb7857658ee106b12509cd976c2c2fca99688e1daf716"},
"erlex": {:hex, :erlex, "0.2.6", "c7987d15e899c7a2f34f5420d2a2ea0d659682c06ac607572df55a43753aa12e", [:mix], [], "hexpm", "2ed2e25711feb44d52b17d2780eabf998452f6efda104877a3881c2f8c0c0c75"},
"ex_doc": {:hex, :ex_doc, "0.33.0", "690562b153153c7e4d455dc21dab86e445f66ceba718defe64b0ef6f0bd83ba0", [:mix], [{:earmark_parser, "~> 1.4.39", [hex: :earmark_parser, repo: "hexpm", optional: false]}, {:makeup_c, ">= 0.1.0", [hex: :makeup_c, repo: "hexpm", optional: true]}, {:makeup_elixir, "~> 0.14 or ~> 1.0", [hex: :makeup_elixir, repo: "hexpm", optional: false]}, {:makeup_erlang, "~> 0.1 or ~> 1.0", [hex: :makeup_erlang, repo: "hexpm", optional: false]}, {:makeup_html, ">= 0.1.0", [hex: :makeup_html, repo: "hexpm", optional: true]}], "hexpm", "3f69adc28274cb51be37d09b03e4565232862a4b10288a3894587b0131412124"},
"excoveralls": {:hex, :excoveralls, "0.18.1", "a6f547570c6b24ec13f122a5634833a063aec49218f6fff27de9df693a15588c", [:mix], [{:castore, "~> 1.0", [hex: :castore, repo: "hexpm", optional: true]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: false]}], "hexpm", "d65f79db146bb20399f23046015974de0079668b9abb2f5aac074d078da60b8d"},
"expo": {:hex, :expo, "0.5.2", "beba786aab8e3c5431813d7a44b828e7b922bfa431d6bfbada0904535342efe2", [:mix], [], "hexpm", "8c9bfa06ca017c9cb4020fabe980bc7fdb1aaec059fd004c2ab3bff03b1c599c"},
"file_system": {:hex, :file_system, "1.0.0", "b689cc7dcee665f774de94b5a832e578bd7963c8e637ef940cd44327db7de2cd", [:mix], [], "hexpm", "6752092d66aec5a10e662aefeed8ddb9531d79db0bc145bb8c40325ca1d8536d"},
Expand All @@ -28,11 +30,15 @@
"idna": {:hex, :idna, "6.1.1", "8a63070e9f7d0c62eb9d9fcb360a7de382448200fbbd1b106cc96d3d8099df8d", [:rebar3], [{:unicode_util_compat, "~>0.7.0", [hex: :unicode_util_compat, repo: "hexpm", optional: false]}], "hexpm", "92376eb7894412ed19ac475e4a86f7b413c1b9fbb5bd16dccd57934157944cea"},
"jason": {:hex, :jason, "1.4.1", "af1504e35f629ddcdd6addb3513c3853991f694921b1b9368b0bd32beb9f1b63", [:mix], [{:decimal, "~> 1.0 or ~> 2.0", [hex: :decimal, repo: "hexpm", optional: true]}], "hexpm", "fbb01ecdfd565b56261302f7e1fcc27c4fb8f32d56eab74db621fc154604a7a1"},
"libgraph": {:hex, :libgraph, "0.16.0", "3936f3eca6ef826e08880230f806bfea13193e49bf153f93edcf0239d4fd1d07", [:mix], [], "hexpm", "41ca92240e8a4138c30a7e06466acc709b0cbb795c643e9e17174a178982d6bf"},
"makeup": {:hex, :makeup, "1.1.2", "9ba8837913bdf757787e71c1581c21f9d2455f4dd04cfca785c70bbfff1a76a3", [:mix], [{:nimble_parsec, "~> 1.2.2 or ~> 1.3", [hex: :nimble_parsec, repo: "hexpm", optional: false]}], "hexpm", "cce1566b81fbcbd21eca8ffe808f33b221f9eee2cbc7a1706fc3da9ff18e6cac"},
"makeup_elixir": {:hex, :makeup_elixir, "0.16.2", "627e84b8e8bf22e60a2579dad15067c755531fea049ae26ef1020cad58fe9578", [:mix], [{:makeup, "~> 1.0", [hex: :makeup, repo: "hexpm", optional: false]}, {:nimble_parsec, "~> 1.2.3 or ~> 1.3", [hex: :nimble_parsec, repo: "hexpm", optional: false]}], "hexpm", "41193978704763f6bbe6cc2758b84909e62984c7752b3784bd3c218bb341706b"},
"makeup_erlang": {:hex, :makeup_erlang, "1.0.0", "6f0eff9c9c489f26b69b61440bf1b238d95badae49adac77973cbacae87e3c2e", [:mix], [{:makeup, "~> 1.0", [hex: :makeup, repo: "hexpm", optional: false]}], "hexpm", "ea7a9307de9d1548d2a72d299058d1fd2339e3d398560a0e46c27dab4891e4d2"},
"metrics": {:hex, :metrics, "1.0.1", "25f094dea2cda98213cecc3aeff09e940299d950904393b2a29d191c346a8486", [:rebar3], [], "hexpm", "69b09adddc4f74a40716ae54d140f93beb0fb8978d8636eaded0c31b6f099f16"},
"mimerl": {:hex, :mimerl, "1.2.0", "67e2d3f571088d5cfd3e550c383094b47159f3eee8ffa08e64106cdf5e981be3", [:rebar3], [], "hexpm", "f278585650aa581986264638ebf698f8bb19df297f66ad91b18910dfc6e19323"},
"mix_unused": {:hex, :mix_unused, "0.4.1", "9f8d759a300a79d2077d6baf617f3a5af6935d50b0f113c09295b265afc3e411", [:mix], [{:libgraph, ">= 0.0.0", [hex: :libgraph, repo: "hexpm", optional: false]}], "hexpm", "fa21f688a88e0710e3d96ac1c8e5a6181aea8a75c8a4214f0edcfeb069b831a3"},
"myxql": {:hex, :myxql, "0.6.4", "1502ea37ee23c31b79725b95d4cc3553693c2bda7421b1febc50722fd988c918", [:mix], [{:db_connection, "~> 2.4.1 or ~> 2.5", [hex: :db_connection, repo: "hexpm", optional: false]}, {:decimal, "~> 1.6 or ~> 2.0", [hex: :decimal, repo: "hexpm", optional: false]}, {:geo, "~> 3.4", [hex: :geo, repo: "hexpm", optional: true]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: true]}, {:table, "~> 0.1.0", [hex: :table, repo: "hexpm", optional: true]}], "hexpm", "a3307f4671f3009d3708283649adf205bfe280f7e036fc8ef7f16dbf821ab8e9"},
"nimble_csv": {:hex, :nimble_csv, "1.2.0", "4e26385d260c61eba9d4412c71cea34421f296d5353f914afe3f2e71cce97722", [:mix], [], "hexpm", "d0628117fcc2148178b034044c55359b26966c6eaa8e2ce15777be3bbc91b12a"},
"nimble_parsec": {:hex, :nimble_parsec, "1.4.0", "51f9b613ea62cfa97b25ccc2c1b4216e81df970acd8e16e8d1bdc58fef21370d", [:mix], [], "hexpm", "9c565862810fb383e9838c1dd2d7d2c437b3d13b267414ba6af33e50d2d1cf28"},
"parse_trans": {:hex, :parse_trans, "3.4.1", "6e6aa8167cb44cc8f39441d05193be6e6f4e7c2946cb2759f015f8c56b76e5ff", [:rebar3], [], "hexpm", "620a406ce75dada827b82e453c19cf06776be266f5a67cff34e1ef2cbb60e49a"},
"postgrex": {:hex, :postgrex, "0.17.5", "0483d054938a8dc069b21bdd636bf56c487404c241ce6c319c1f43588246b281", [:mix], [{:db_connection, "~> 2.1", [hex: :db_connection, repo: "hexpm", optional: false]}, {:decimal, "~> 1.5 or ~> 2.0", [hex: :decimal, repo: "hexpm", optional: false]}, {:jason, "~> 1.0", [hex: :jason, repo: "hexpm", optional: true]}, {:table, "~> 0.1.0", [hex: :table, repo: "hexpm", optional: true]}], "hexpm", "50b8b11afbb2c4095a3ba675b4f055c416d0f3d7de6633a595fc131a828a67eb"},
"shorter_maps": {:git, "https://github.com/boyzwj/shorter_maps.git", "787c3447e48c74cf3224f35aba03c84f4c6a3c13", []},
Expand Down