Added unique naming for output files by slmnsh · Pull Request #22 · davidverweij/csv2docx

slmnsh · 2020-05-05T20:10:56Z

used @jawrainey create output directory function using pathlib
using same pathlib to generate unique names now based on file existance in directory
now -n name option is compulsary
just bit more cleaned code using docstring comments thanks @jawrainey for suggestion

now can use custom naming using -n parameter -n [column_name]

Renamed "csv2docx" to "src" now code looks clean

@jawrainey

Using @jawrainey's pathlib solution for checking and creating directory Changed logic to name files now it is based on availability of file Now file name is required Overall Cleanup

Merge branch 'master' of github.com:salmannotkhan/csv2docx

now using relative import in cli

slmnsh · 2020-05-05T20:12:39Z

Sorry for messy commits. I didn't knew how to clean up commits :)

jawrainey · 2020-05-05T21:11:11Z

Please address my comments in the code review and the following issues identified by flake8, then we can discuss it further:

Install flake8 locally (DO NOT COMMIT THIS!!):

poetry add --dev flake8

Use the python virtual environment

poetry shell

Run flake8 from the command line:

flake8

You'll see the following issues identified:

./csv2docx/csv2docx.py:5:1: E302 expected 2 blank lines, found 1
./csv2docx/csv2docx.py:17:1: E302 expected 2 blank lines, found 1
./csv2docx/csv2docx.py:29:62: E202 whitespace before ')'
./csv2docx/csv2docx.py:33:1: E302 expected 2 blank lines, found 1
./csv2docx/csv2docx.py:34:10: E211 whitespace before '('
./csv2docx/csv2docx.py:39:25: E711 comparison to None should be 'if cond is not None:'
./csv2docx/csv2docx.py:40:17: E117 over-indented
./csv2docx/csv2docx.py:51:18: E211 whitespace before '('
./csv2docx/csv2docx.py:60:35: E203 whitespace before ':'
./csv2docx/csv2docx.py:64:33: W292 no newline at end of file
./csv2docx/cli.py:27:1: W191 indentation contains tabs
./csv2docx/cli.py:27:1: E101 indentation contains mixed spaces and tabs
./csv2docx/cli.py:27:2: E117 over-indented

Rather than me write an in-depth code review, the above are all important concerns that you should address, especially E711.

Create one more commit addressing the above and give it the message "Fixed Flake8 issues."

jawrainey · 2020-05-05T21:12:05Z

csv2docx/csv2docx.py

        csvdict = csv.DictReader(csvfile, delimiter=delimiter)
        csv_headers = csvdict.fieldnames
-
+        if (custom_name != None and custom_name not in csv_headers):


@salmannotkhan - This is E711 as identified by flake8. You can write custom_name instead of custom_name != None

now that we made custom_name required, we don't need that condition.

Great point -- the IF statement needs to stay of course due to checking if the argument exists in the csv 👍

jawrainey · 2020-05-05T21:13:46Z

csv2docx/csv2docx.py

+    """Checks whether file with same name exists or not
+    Args:
+        filename: the file we want to check
+        path: the directory we want to check in
+    Returns:
+        An unique full path with directory.
+    """


Good documentation -- well done

jawrainey · 2020-05-05T21:14:26Z

csv2docx/csv2docx.py

+    filename += ".docx"
+    checkpath = path / filename
+    if checkpath.exists():
+        # Count available files with same name
+        counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1
+        return f"{path}/{checkpath.stem}_{counter}.docx"
+    return checkpath


Very tidy and excellent solution, especially the use of both glob and pathlib.

I wonder if instead of using two return statements that you could assign checkpath to the f-string in line 30? e.g.

if checkpath.exists(): # Count available files with same name counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1 checkpath = f"{path}/{checkpath.stem}_{counter}.docx" return checkpath

This and other code review suggestions should be in a separate commit alongside the flake8 changes.

jawrainey · 2020-05-05T21:18:08Z

csv2docx/cli.py

+@click.option(
+    '--name', '-n',
+    required=True,
+    help='naming scheme for output files.')


Can you capitalise naming -> Naming?

jawrainey · 2020-05-05T21:19:41Z

Couple of points for future PRs (we should probably make a standard guideline for PRs), but generally you should:

State what you changed and (most importantly) why;
Explain how we (the reviewers) can test and review your code. I want step-by-step instructions so anyone reading this PR can also test it. In this case, I had to run convert with -n and without -n to ensure it worked in both conditions. Also to change the output parameter.
Reference any other issues or PRs that this PR addresses.

It's recommended to use titles/subtitles to document these 3 areas. Worth referring to this in future PRs as it makes code reviewing easier 👍

jawrainey · 2020-05-05T21:25:18Z

csv2docx/csv2docx.py

-def convert(data, template, delimiter=";"):
+from pathlib import Path
+
+def create_path_if_not_exists(path: str):


You need to annotate the return type, e.g. adding pathlib.Path:

def create_path_if_not_exists(path: str) -> pathlib.Path:

jawrainey · 2020-05-05T21:26:59Z

csv2docx/csv2docx.py

+        path.mkdir(parents=True, exist_ok=True)
+    return path
+
+def generate_name(filename, path):


You need to add type annotations to both the paramters and return type, e.g.

def generate_name(str: filename, pathlib.Path: path) -> pathlib.Path:

jawrainey · 2020-05-05T21:29:31Z

csv2docx/csv2docx.py

-            # TODO: write to user-defined subfolder
-            docx.write(f"{counter}.docx")
+            # Striping every name to remove extra spaces
+            filename = generate_name(row[custom_name].strip(), path)


call .strip() inside generate_name as imho it's cleaner that way since we can more easily understand which params are being sent

jawrainey · 2020-05-05T21:32:13Z

csv2docx/csv2docx.py

+        filename: the file we want to check
+        path: the directory we want to check in


Update the docstring args to say:

filename: the name of the file to create path: the path where the file is stored

jawrainey · 2020-05-05T21:32:46Z

csv2docx/csv2docx.py

+        An unique full path with directory.
+    """
+    filename += ".docx"
+    checkpath = path / filename


can we rename checkpath to filepath because this method returns the path of the file to create

jawrainey · 2020-05-05T21:46:51Z

Once you've made the above changes I will have another review and discuss with @davidverweij.

For future reference: code changes made in a PR should not draw from other PRs, e.g. you using the absolute_path method I created in in #18. My reasoning is two-fold: (i) you're attributing your name to my commit/contribution; and (ii) if we want to accept this PR and merge it into master then #18 will have to rebase which is a pain.

You're basing this PR on multiple other issues and PRs (closes #16 #13, #18, #20) which are work in progresses and INDEPENDENT branches. By bringing together all the code from across these independent PRs you have made it: (i) more difficult to test and review; and (ii) muddied the specific issues. Each issue and PR must isolate changes to make testing easier.

Having said that, this PR is an elegant and simple solution to address #20, so from that regard, good job @salmannotkhan. For future reference: if you have an alternative solution to a PR then present it in that PR via comment or gist and explain how it addresses the issue. This will save us opening more PRs for the same issue.

This solution will also close #13 and #18 and requires that we create a new issue for Dave to implement the testing infrastructure.

jawrainey · 2020-05-05T21:59:48Z

csv2docx/csv2docx.py

+    checkpath = path / filename
+    if checkpath.exists():
+        # Count available files with same name
+        counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1


Can we also update this to use an f-string instead, e.g.

counter = len(list(path.glob(f"{checkpath.stem}*docx"))) + 1

jawrainey · 2020-05-05T22:22:42Z

csv2docx/csv2docx.py

+        path.mkdir(parents=True, exist_ok=True)
+    return path
+
+def generate_name(filename, path):


Rename generate_name to create_unique_name

jawrainey · 2020-05-05T22:23:24Z

csv2docx/cli.py

+def main(data, template, delimiter, name, path):
+	csv2docx.convert(data, template, delimiter, name, path)


The proposed change below requires this to be reordered as:

data, template, path, delimiter, name

we have put it in order like:
data, template, custom_name, path, delimiter
Because we made custom_name required that doesn't have any default argument
and all non-default argument comes first.

Another good point: name should be renamed to custom_name to mirror the library.

jawrainey · 2020-05-05T22:23:35Z

csv2docx/csv2docx.py

-def convert(data, template, delimiter=";"):
+from pathlib import Path
+
+def create_path_if_not_exists(path: str):


Can we also rename create_path_if_not_exists to: create_output_folder

jawrainey · 2020-05-05T22:24:34Z

csv2docx/csv2docx.py

+        return f"{path}/{checkpath.stem}_{counter}.docx"
+    return checkpath
+
+def convert(data, template, delimiter=";", custom_name=None, path="output"):


custom_name is required so should not have a default argument of None.

Once you update this, the program will not run as custom_name must come before delimiter as it's a required argument.

Update the CLI method to reflect these changes

slmnsh · 2020-05-05T22:58:40Z

Thank you for all the suggestion and sorry for all that mess. if you don't know this is my first contribution to a project that's why i don't know things about PR and repo management

slmnsh · 2020-05-05T23:06:02Z

Please address my comments in the code review and the following issues identified by flake8, then we can discuss it further:

Install flake8 locally (DO NOT COMMIT THIS!!):

poetry add --dev flake8

Use the python virtual environment

poetry shell

Run flake8 from the command line:

flake8

How can I ignore this in commit?

jawrainey · 2020-05-06T10:22:51Z

Thank you for all the suggestion and sorry for all that mess. if you don't know this is my first contribution to a project that's why i don't know things about PR and repo management

No worries. It's great that you're contributing and learning as Dave and I have setup this project both because we would like to use the library but also as a learning experience 👍

How can I ignore this in commit?

Make sure that you do not commit the changes to the pyproject.toml/poetry.lock files once you add the dependencies. If you use the terminal with git then checkout those files or do it through a visual git editor (e.g. Github Desktop) if that's what you use.

If you make the above changes I can review them this evening (am in GMT+1 timezone).

@jawrainey

and other improvment from review by @jawrainey Ignoring vscode config folder

slmnsh · 2020-05-06T10:34:28Z

I'm done with PR if your guys think of any new feature let me know please

jawrainey · 2020-05-06T10:36:25Z

csv2docx/cli.py

+def main(data, template, name, delimiter, path):
+    csv2docx.convert(data, template, name, path, delimiter)


Can you change 'main' to accept the same order of parameters as convert? e.g. switch delimiter and path

def convert(data, template, name, path="output", delimiter=";"):

Yeah I didn't noticed that

jawrainey · 2020-05-06T10:37:33Z

csv2docx/csv2docx.py

+        filename: the name of file to create
+        path: the path where the file is stored
+    Returns:
+        An unique full path with directory.


Can you update the doc of 'Returns' to read:

The absolute path to the filename.

jawrainey · 2020-05-06T10:37:58Z

csv2docx/csv2docx.py

+
+
+def create_unique_name(filename: str, path: Path) -> Path:
+    """Checks whether file with same name exists or not


Can you update this text (Checks whether ...) to: Creates a filename if it does not exist in the specified path.

this would be more clear I guess
"Creates an unique file name for specified path"

Yes, that sounds good 👍

jawrainey · 2020-05-06T10:38:31Z

csv2docx/csv2docx.py

            docx.merge_templates([single_document], separator='page_break')
-            # TODO: write to user-defined subfolder
-            docx.write(f"{counter}.docx")
+            # Striping every name to remove extra spaces


Can you remove this comment (# Striping every ... ) as it's left from previous commits 👍

davidverweij · 2020-05-06T13:22:55Z

csv2docx/cli.py

+    help='Delimiter used in your csv. Default is \';\'')
+@click.option(
+    '--name', '-n',
+    required=True,


I think making --name required is a reasonable approach, though would argue we need to allow flexibility of use. I imagine the scenario of generating un-named tickets, numbered flyers, etc - in which case the name of the files does not need to have a relation to the data. I'll expand on this in a separate issue.

Just to add, some columns might be incomplete - which might provide a result the user is not expected. We can have a discussion in the issue #25 on what approach might make more sense.

Great point about incomplete cells 👍

davidverweij · 2020-05-06T13:36:00Z

csv2docx/cli.py

+@click.option(
+    '--name', '-n',
+    required=True,
+    help='Naming scheme for output files.')


Perhaps we can expand on the help by adding:
Specific column name to be used in the naming scheme for output files. '

Also - is this case sensitive?

At the moment, yes. Let's discuss in #25

davidverweij

Nicely done @salmannotkhan, I've added a few comments - but these are more related to future Issues. I'll merge these into the master - after which our focus will be to set up unittesting, linting and automating those processes.

davidverweij · 2020-05-06T14:21:09Z

I forgot to mention, I updated the README to reflect the changes before merging this PR, see 202fdf9.

slmnsh added 6 commits May 4, 2020 22:20

Customize output name

4e006a9

now can use custom naming using -n parameter -n [column_name]

Setting souce for poetry to work

63e26af

Renamed "csv2docx" to "src" now code looks clean

Merge branch 'master' into master

545b9bd

Optimized code and logic for file names

5d46d41

Using @jawrainey's pathlib solution for checking and creating directory Changed logic to name files now it is based on availability of file Now file name is required Overall Cleanup

Now ignoring all docx files except example files

1fae0e4

Merge branch 'master' of github.com:salmannotkhan/csv2docx

Changed source directory name back to package name

fefa3ab

now using relative import in cli

slmnsh mentioned this pull request May 5, 2020

Customize output name: round 2 #20

Closed

jawrainey reviewed May 5, 2020

View reviewed changes

jawrainey changed the title ~~Improved Unique naming function~~ Added unique naming for output files May 5, 2020

jawrainey reviewed May 5, 2020

View reviewed changes

jawrainey mentioned this pull request May 6, 2020

Destination Output and Unit Testing #18

Closed

slmnsh added 2 commits May 6, 2020 15:52

Merge remote-tracking branch 'upstream/master'

db57e56

Fixed Flake8 issues

a054d1b

and other improvment from review by @jawrainey Ignoring vscode config folder

jawrainey reviewed May 6, 2020

View reviewed changes

Improved Documentation and minor improvmens.

8c658fe

davidverweij reviewed May 6, 2020

View reviewed changes

davidverweij mentioned this pull request May 6, 2020

User Experience of naming scheme for output files #25

Open

davidverweij reviewed May 6, 2020

View reviewed changes

davidverweij approved these changes May 6, 2020

View reviewed changes

davidverweij merged commit 8c658fe into davidverweij:master May 6, 2020

		filename: the file we want to check
		path: the directory we want to check in

		def main(data, template, delimiter, name, path):
		csv2docx.convert(data, template, delimiter, name, path)

		def main(data, template, name, delimiter, path):
		csv2docx.convert(data, template, name, path, delimiter)



		def create_unique_name(filename: str, path: Path) -> Path:
		"""Checks whether file with same name exists or not

Conversation

slmnsh commented May 5, 2020

Uh oh!

slmnsh commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jawrainey commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jawrainey May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jawrainey May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jawrainey May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jawrainey commented May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jawrainey commented May 5, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jawrainey May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jawrainey May 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slmnsh commented May 5, 2020

Uh oh!

slmnsh commented May 5, 2020

Uh oh!

jawrainey commented May 6, 2020

Uh oh!

slmnsh commented May 6, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

slmnsh May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

slmnsh commented May 5, 2020 •

edited

Loading

jawrainey commented May 5, 2020 •

edited

Loading

jawrainey May 5, 2020 •

edited

Loading

jawrainey May 6, 2020 •

edited

Loading

jawrainey May 5, 2020 •

edited

Loading

jawrainey commented May 5, 2020 •

edited

Loading

jawrainey May 5, 2020 •

edited

Loading

jawrainey May 5, 2020 •

edited

Loading

slmnsh May 6, 2020 •

edited

Loading

jawrainey May 6, 2020 •

edited

Loading

davidverweij May 6, 2020 •

edited

Loading