diff --git a/README.md b/README.md
index 215867e..1799174 100644
--- a/README.md
+++ b/README.md
@@ -42,139 +42,19 @@ Csv2Sql can automatically...
* It is **completely automatic**, provide a path with hundreds of csvs having size in gigabytes and start the application, it will handle the rest!
-* It comes in **2 flavours**, as a **[command line tool](#cmd)** or a **[browser user interface](#dashboard)**, and is super easy to configure and use.
+* It comes as **[browser user interface](#dashboard)** and is super easy to configure and use.
* While you can have maximum utilization of your cpu to get excellent performance, csv2sql is fully **customizable**, also comes with [lots of options](#cmdargs) which can be changed to fine tune the application based on requirement and to lower down resource usage and database load.
* Csv2Sql supports **partial operations**, so if you only want to generate a schema file from the csvs without touching the database or you want to only insert data from the csvs into already created tables without creating the tables again or just validate already imported data, Csv2Sql has got you covered !
-
-## Using from command line
-
-Csv2sql can be easily used as a command line tool, with lots of customizable options passing by different command line arguments.
-
-
-
-
-
-
-### Installation and usage:
-
-You must have erlang installed to use the command line tool on any linux distribution.
-##### Add the erlang repository using the following commands
-```
-wget https://packages.erlang-solutions.com/erlang-solutions_1.0_all.deb
-sudo dpkg -i erlang-solutions_1.0_all.deb
-```
-
-##### Install erlang
-
-```
-sudo apt-get update
-sudo apt-get install esl-erlang
-```
-
-Download the executable binary from the latest release in this repository
-and run the executable using: ```./csv2sql --```
-
-The next section describes all the available command line arguments.
-
-
-### Using command line args
-
-You can pass various command line arguments to Csv2Sql to configure how to process csvs and specify other important information.
-
-A description of all the available command line arguments that can be used are given below:
-
-
-| Flag| Description| Default value|
-|:-----------:|----------------------|------|
-|\-\-schema-file-path|The location were the generated schema file will be stored|If no value is supplied it saves the generated schema file in the same directory as the source csv files specified by "\-\-source-csv-directory" flag|
-|**\-\-source-csv-directory**|**The source directory where the csvs are located**|**Defaults to the current directory from which the program is run**|
-| **\-\-db-connection-string**|**A connection string to connect ot the database, in the format: "::@/"**|**This is a compulsory argument if database access is required**|
-|\-\-imported-csv-directory|The directory were the csvs will be moved after importing to database, make sure it is present and is empty|(source-csv-directory)/imported|
-|\-\-validated-csv-directory|The directory were the csvs will be moved after they are validated, make sure it is present and is empty|(source-csv-directory)/validated|
-|\-\-skip-make-schema|Skip inferring schema and making a schema file|false|
-|\-\-skip-insert-schema| Skip inserting the inferred schema in the database. Useful if the table structures are already present and you only wish to insert data from the csv files.(This will be true automatically if skip-make-schema is used)|false|
-|\-\-skip-insert-data|Skip inserting data from the csvs|false|
-|\-\-skip-validate-import|Skip validating the imported data|false|None, this is compulsory if the operations specified requires database access|
-|\-\-connection-socket|The mysql socket file path|/var/run/mysqld/mysqld.sock|
-|\-\-varchar-limit|The value of varchar type, and the limit after which a string is considered a text and not a varchar|100|
-|\-\-schema-infer-chunk-size|The chunk size to use when the schema fora CSV will be inferred parallelly. For example, a chunk size 100 means the CSV will be read 100 rows at a time and separate processes will be used to infer the schema for each 100-row chunk|100|
-|\-\-worker-count|The number of workers, directly related to how many CSVs will be processed parallelly|10|
-|\-\-db-worker-count|The number of database workers, lowering the value will lead to slow performance but lesser load on database, a higher value can lead to too many database connection errors.|15|
-|\-\-insertion-chunk-size|Number of records to insert into the database at once, increasing this may result in mysql error for too many placeholders|100|
-|\-\-job-count-limit|Number of chunks to keep in memory (Memory required=insertion_chunk_size * job_count_limit)|10|
-|\-\-log|Enable ecto logs, to log the queries being executed, possible values are :debug, :info, :warn|false|
-|\-\-timeout|The time in milliseconds to wait for the query call to finish|60000|
-|\-\-connect-timeout|The number of seconds that the mysqld server waits for a connect packet before responding with Bad handshake|60000|
-|\-\-pool-size|The pool_size controls how many connections you want to the database.|20|
-|\-\-queue-target|The time to wait for a database connection|5000|
-| \-\-queue-interval|If all connections checked out during a :queue_interval takes more than :queue_target, then we double the :queue_target.|1000|
-
-
-### Examples:
-
-##### Load csvs to database, this will infer the schema, insert the inferred schemas to the database, insert the data and then validate data for all the csvs
-
-`./csv2sql --source-csv-directory "/home/user/Desktop/csvs" --db-connection-string "mysql:root:pass@localhost/test_csv"`
-
-Here "mysql" is the database type, "root" is the mysql username, "pass" is the mysql password, "localhost" is the database host and "test_csv" is the database name where the data will be imported.
-
----
-##### Import schema only:
-
-`./csv2sql --source-csv-directory "/home/user/Desktop/csvs" --skip-insert-schema --skip-insert-data --skip-validate-import`
-
----
-##### Skip validation:
-
-`./csv2sql --source-csv-directory "/home/user/Desktop/csvs" --db-connection-string "postgres:root:pass@localhost/test_csv" --skip-validate-import`
-
-Here "postgres" is the database type.
-
----
-##### Only validate imported csv:
-
- `./csv2sql --skip-make-schema --skip-insert-data --imported-csv-directory "/home/user/Desktop/imported-csvs" --db-connection-string "mysql:root:pass@localhost/test_csv"`
-
-Here we are running simple validation check over a previously imported csvs, this check will NOT compare the actual data but will only compare the row count in the csv and in the database.
-
----
-##### Custom path for imported and validated csv files:
-
-`./csv2sql --source-csv-directory "/home/user/Desktop/csvs" --imported-csv-directory "/home/user/Desktop/imported_csvs" --validated-csv-directory "/home/user/Desktop/validated_csvs" --db-connection-string "postgres:root:pass@localhost/test_csv"`
-
----
-##### Only infer and create schema but don't insert data:
-
-`./csv2sql --source-csv-directory "/home/user/Desktop/csvs" --skip-insert-data --db-connection-string "postgres:root:pass@localhost/test_csv"`
-
-This will create empty table in the database after analyzing the csvs.
-
- ---
-##### Change the worker count, setting this to one will lead to processing a single csv at a time, this will be slower but will lead to lower cpu usage and Database load:
-
-`./csv2sql --source-csv-directory "/home/user/Desktop/csvs" --worker-count 1 --db-connection-string "mysql:root:pass@localhost/test_csv"`
-
----
-##### Enable logs, to log the queries being executed:
-
-`./csv2sql --source-csv-directory "/home/user/Desktop/csvs" --log debug --db-connection-string "mysql:root:pass@localhost/test_csv"`
-
----
-##### Set the number of workers inserting data into the database, lowering the value will lead to slow performance but lesser load on database, a higher value can lead to too many database connection errors:
-
-`./csv2sql --source-csv-directory "/home/user/Desktop/csvs" --db-worker-count 2 --db-connection-string "mysql:root:pass@localhost/test_csv"`
-
-## Using csv2sql from your browser
+## Use csv2sql from your browser
-For ease of use csv2sql also has a browser interface which can be used to easily configure the tool and also provides an interface that shows what is the progress of the various running tasks, which files are currently being processed, the current cpu and memory usage, etc.
+For ease of use csv2sql has browser interface which can be used to easily configure the tool and also provides an interface that shows what is the progress of the various running tasks, which files are currently being processed, the current cpu and memory usage, etc.
-
+
### Installation and usage:
diff --git a/apps/csv2sql/lib/csv2sql/database/database.ex b/apps/csv2sql/lib/csv2sql/database/database.ex
index 6ddf1f3..165913b 100644
--- a/apps/csv2sql/lib/csv2sql/database/database.ex
+++ b/apps/csv2sql/lib/csv2sql/database/database.ex
@@ -122,7 +122,7 @@ defmodule Csv2sql.Database do
{:ok, str, replaced} =
Codepagex.to_string(str, :iso_8859_1, Codepagex.replace_nonexistent(""), 0)
- # TODO: fix t his can slow down things
+ # TODO: fix this can slow down things
if replaced > 0,
do:
Logger.warning("[#{Process.get(:file)}] Replaced #{replaced} characters in binary data")
@@ -133,13 +133,6 @@ defmodule Csv2sql.Database do
end
end
- @spec string_column_type(non_neg_integer()) :: :text | {:varchar, non_neg_integer()}
- def string_column_type(max_data_length) do
- if max_data_length > varchar_limit(),
- do: :text,
- else: {:varchar, max_data_length}
- end
-
# Callbacks to implement
@callback type_mapping(type_map()) :: String.t()
diff --git a/apps/csv2sql/lib/csv2sql/database/mysql.ex b/apps/csv2sql/lib/csv2sql/database/mysql.ex
index 4cbe210..1f7bb12 100644
--- a/apps/csv2sql/lib/csv2sql/database/mysql.ex
+++ b/apps/csv2sql/lib/csv2sql/database/mysql.ex
@@ -18,7 +18,7 @@ defmodule Csv2sql.Database.MySql do
type_map[:is_integer] -> "INT"
type_map[:is_float] -> "DOUBLE"
type_map[:is_text] -> "LONGTEXT"
- true -> type_map[:max_data_length] |> string_column_type() |> get_string_column_type()
+ true -> "VARCHAR(#{varchar_limit()})"
end
end
@@ -64,7 +64,4 @@ defmodule Csv2sql.Database.MySql do
defp to_datetime_string(val), do: val
- defp get_string_column_type(:text), do: "LONGTEXT"
- defp get_string_column_type({:varchar, 0}), do: "VARCHAR(#{varchar_limit()})"
- defp get_string_column_type({:varchar, size}), do: "VARCHAR(#{size})"
end
diff --git a/apps/csv2sql/lib/csv2sql/import_validator/import_validator.ex b/apps/csv2sql/lib/csv2sql/import_validator/import_validator.ex
index 1a3f7e9..ec823a7 100644
--- a/apps/csv2sql/lib/csv2sql/import_validator/import_validator.ex
+++ b/apps/csv2sql/lib/csv2sql/import_validator/import_validator.ex
@@ -40,12 +40,17 @@ defmodule Csv2sql.ImportValidator.ImportValidator do
defp validate_csv(file, row_count) do
IO.puts("Checking File: #{file.name}")
- db_count = Database.get_db_row_count_if_exists(file.path)
+ db_row_count = Database.get_db_row_count_if_exists(file.path)
+
+ inserted_db_row_count =
+ if Helpers.get_config(:drop_existing_tables),
+ do: db_row_count,
+ else: db_row_count - file.existing_db_row_count
IO.puts("Count in csv: #{row_count}")
- IO.puts("Count in database: #{db_count}")
+ IO.puts("Count in database: #{inserted_db_row_count}")
- if row_count == (db_count - file.existing_db_row_count) do
+ if row_count == inserted_db_row_count do
(IO.ANSI.green() <> "Correct !" <> IO.ANSI.reset()) |> IO.puts()
File.rename(
@@ -90,5 +95,7 @@ defmodule Csv2sql.ImportValidator.ImportValidator do
IO.puts("* #{Path.basename(file)}")
end)
end
+
+ ProgressTracker.check_files_status()
end
end
diff --git a/apps/csv2sql/lib/csv2sql/progress_tracker/progress_tracker.ex b/apps/csv2sql/lib/csv2sql/progress_tracker/progress_tracker.ex
index 52b3b5f..8127fb3 100644
--- a/apps/csv2sql/lib/csv2sql/progress_tracker/progress_tracker.ex
+++ b/apps/csv2sql/lib/csv2sql/progress_tracker/progress_tracker.ex
@@ -6,7 +6,7 @@ defmodule Csv2sql.ProgressTracker do
use GenServer
use Csv2sql.Types
- alias Csv2sql.ProgressTracker.State
+ alias Csv2sql.{Helpers, ProgressTracker.State}
require Logger
# Sets the files list
@@ -159,15 +159,25 @@ defmodule Csv2sql.ProgressTracker do
do: {:noreply, %{state | validation_status: status}}
defp check_files_status_and_update_state(files, %State{subscribers: subscribers} = state) do
- files
- |> Enum.all?(fn {_path, %Csv2sql.File{status: status}} -> status == :done end)
- |> if do
- end_time = DateTime.utc_now()
- Enum.each(subscribers, fn subscriber -> Process.send(subscriber, :finish, []) end)
-
- %{state | status: :finish, files: files, end_time: end_time}
- else
- %{state | files: files}
+ files_imported =
+ files
+ |> Enum.all?(fn {_path, %Csv2sql.File{status: status}} -> status == :done end)
+
+ insert_data = Helpers.get_config(:insert_data)
+
+ cond do
+ files_imported and
+ ((insert_data and not is_nil(state.validation_status)) or not insert_data) ->
+ end_time = DateTime.utc_now()
+ Enum.each(subscribers, fn subscriber -> Process.send(subscriber, :finish, []) end)
+
+ %{state | status: :finish, files: files, end_time: end_time}
+
+ files_imported and insert_data and is_nil(state.validation_status) ->
+ %{state | status: :imported, files: files}
+
+ true ->
+ %{state | files: files}
end
end
end
diff --git a/apps/csv2sql/lib/csv2sql/progress_tracker/state.ex b/apps/csv2sql/lib/csv2sql/progress_tracker/state.ex
index 32098da..0c1c88a 100644
--- a/apps/csv2sql/lib/csv2sql/progress_tracker/state.ex
+++ b/apps/csv2sql/lib/csv2sql/progress_tracker/state.ex
@@ -11,7 +11,7 @@ defmodule Csv2sql.ProgressTracker.State do
field(:start_time, DateTime.t())
field(:end_time, DateTime.t(), enforce: false)
field(:files, files_map(), default: %{})
- field(:status, :init | :working | :finish | {:error, String.t()}, default: :init)
+ field(:status, :init | :working | :imported | :validating | :finish | {:error, String.t()}, default: :init)
field(:subscribers, list(pid()), default: [])
field(:validation_status, :passed | :failed, default: nil)
end
diff --git a/apps/csv2sql/lib/csv2sql/stages/analyze.ex b/apps/csv2sql/lib/csv2sql/stages/analyze.ex
index 55be0fe..8591e98 100644
--- a/apps/csv2sql/lib/csv2sql/stages/analyze.ex
+++ b/apps/csv2sql/lib/csv2sql/stages/analyze.ex
@@ -48,14 +48,19 @@ defmodule Csv2sql.Stages.Analyze do
defp wait_for_finish() do
Csv2sql.ProgressTracker.get_state().status
|> case do
- status when status in [:finish] ->
- Csv2sql.ImportValidator.ImportValidator.validate_import()
+ :finish ->
:ok
{:error, reason} ->
IO.inspect("Error #{inspect(reason)}")
:ok
+ :imported ->
+ Csv2sql.ImportValidator.ImportValidator.validate_import()
+
+ ProgressTracker.check_files_status()
+ wait_for_finish()
+
_ ->
ProgressTracker.check_files_status()
wait_for_finish()
diff --git a/apps/csv2sql/lib/csv2sql/type_deducer/type_checker.ex b/apps/csv2sql/lib/csv2sql/type_deducer/type_checker.ex
index 2daea1b..a42ea8b 100644
--- a/apps/csv2sql/lib/csv2sql/type_deducer/type_checker.ex
+++ b/apps/csv2sql/lib/csv2sql/type_deducer/type_checker.ex
@@ -9,6 +9,7 @@ defmodule Csv2sql.TypeDeducer.TypeChecker do
@spec check_type(String.t(), type_map()) :: type_map()
# credo:disable-for-next-line Credo.Check.Refactor.CyclomaticComplexity
def check_type(item, existing_type_map) do
+ item_length = String.length(item)
if item |> String.trim() |> is_empty?() do
Map.put(existing_type_map, :is_empty, existing_type_map.is_empty && true)
else
@@ -21,8 +22,7 @@ defmodule Csv2sql.TypeDeducer.TypeChecker do
is_boolean: existing_type_map.is_boolean && is_boolean?(item),
is_integer: existing_type_map.is_integer && is_integer?(item),
is_float: existing_type_map.is_float && is_float?(item),
- is_text: existing_type_map.is_text || is_text?(item),
- max_data_length: max(existing_type_map.max_data_length, String.length(item))
+ is_text: existing_type_map.is_text || is_text?(item_length)
}
end
end
@@ -73,9 +73,9 @@ defmodule Csv2sql.TypeDeducer.TypeChecker do
end
end
- defp is_text?(item) do
+ defp is_text?(item_length) do
varchar_limit = Helpers.get_config(:varchar_limit)
- if String.length(item) > varchar_limit, do: true, else: false
+ if item_length > varchar_limit, do: true, else: false
end
defp parse_datetime_pattern(datetime_string, pattern) do
diff --git a/apps/csv2sql/lib/csv2sql/type_deducer/type_deducer.ex b/apps/csv2sql/lib/csv2sql/type_deducer/type_deducer.ex
index 974649e..714c4e4 100644
--- a/apps/csv2sql/lib/csv2sql/type_deducer/type_deducer.ex
+++ b/apps/csv2sql/lib/csv2sql/type_deducer/type_deducer.ex
@@ -96,8 +96,7 @@ defmodule Csv2sql.TypeDeducer do
is_boolean: true,
is_integer: true,
is_float: true,
- is_text: false,
- max_data_length: 0
+ is_text: false
}
|> List.duplicate(Enum.count(headers))
end
@@ -111,8 +110,7 @@ defmodule Csv2sql.TypeDeducer do
is_boolean: acc_map.is_boolean && current_map.is_boolean,
is_integer: acc_map.is_integer && current_map.is_integer,
is_float: acc_map.is_float && current_map.is_float,
- is_text: acc_map.is_text || current_map.is_text,
- max_data_length: max(acc_map.max_data_length, current_map.max_data_length)
+ is_text: acc_map.is_text || current_map.is_text
}
end
end
diff --git a/apps/dashboard/lib/dashboard_web/live/helpers.ex b/apps/dashboard/lib/dashboard_web/live/helpers.ex
index c8c6814..c4dfd4a 100644
--- a/apps/dashboard/lib/dashboard_web/live/helpers.ex
+++ b/apps/dashboard/lib/dashboard_web/live/helpers.ex
@@ -8,7 +8,7 @@ defmodule DashboardWeb.LiveHelpers do
def get_nav_item_attrs(_assigns, _curr_page), do: %{class: "nav-link bg-light-grey"}
- def get_config_item_attrs(status) when status != :working,
+ def get_config_item_attrs(status) when status not in [:working, :imported, :validating],
do: %{"phx-click" => "page-change", "phx-value-page" => "config"}
def get_config_item_attrs(_status), do: %{}
diff --git a/apps/dashboard/lib/dashboard_web/live/main_live.ex b/apps/dashboard/lib/dashboard_web/live/main_live.ex
index 0fae194..07440f1 100644
--- a/apps/dashboard/lib/dashboard_web/live/main_live.ex
+++ b/apps/dashboard/lib/dashboard_web/live/main_live.ex
@@ -59,7 +59,7 @@ defmodule DashboardWeb.Live.MainLive do
{:noreply, socket}
- socket_state.status == :working or is_nil(socket_state.validation_status) ->
+ socket_state.status in [:working, :imported, :validating] ->
{:noreply, socket}
true ->
@@ -182,7 +182,7 @@ defmodule DashboardWeb.Live.MainLive do
|> Float.round()
end
- if state.status in [:init, :working] or is_nil(state.validation_status) do
+ if state.status in [:init, :working, :imported, :validating] do
Process.send_after(self(), :update_state, 200)
end
diff --git a/apps/dashboard/lib/dashboard_web/live/start_live.ex b/apps/dashboard/lib/dashboard_web/live/start_live.ex
index d3f6462..71908e0 100644
--- a/apps/dashboard/lib/dashboard_web/live/start_live.ex
+++ b/apps/dashboard/lib/dashboard_web/live/start_live.ex
@@ -16,18 +16,16 @@ defmodule DashboardWeb.Live.StartLive do
Error! check logs
<% true -> %>
- <%= @state.status %>
+ <%= show_status(@state.status) %>
<% end %>
-
- <%= if not is_nil(@state.validation_status) do %>
-
- Validation Status:
- <%= @state.validation_status %>!
-
- <% end %>
-
+ <%= if not is_nil(@state.validation_status) do %>
+
+ Validation Status:
+ <%= @state.validation_status %>!
+
+ <% end %>
<%= if @state.status != :init do %>
Total Files: <%= Enum.count(Map.values(@state.files)) %>
Files Imported: <%= Enum.count(Map.values(@state.files), fn %{status: status} -> status == :done end) %>
@@ -90,8 +88,9 @@ defmodule DashboardWeb.Live.StartLive do
<%= cond do %>
<% @state.status == :init -> %> Start!
- <% @state.status == :working -> %> Working..
- <% @state.status == :finish and @state.validation_status == :passed -> %> Finished! Reset?
+ <% @state.status in [:working, :imported] -> %> Working..
+ <% @state.status == :validating -> %> Validating..
+ <% @state.status == :finish -> %> Finished! Reset?
<% true -> %> ERROR! Reset?
<% end %>
@@ -110,9 +109,15 @@ defmodule DashboardWeb.Live.StartLive do
end
defp spinner_loading_class(status) do
- if status == :working, do: "spinner loading", else: ""
+ if status in [:working, :imported, :validating],
+ do: "spinner loading",
+ else: ""
end
+ defp show_status(:imported), do: :working
+
+ defp show_status(status), do: status
+
defp button_class(changeset) do
if not changeset.valid?, do: "button-disabled", else: "button-enabled"
end