Added unique naming for output files#22
Added unique naming for output files#22davidverweij merged 9 commits intodavidverweij:masterfrom slmnsh:master
Conversation
now can use custom naming using -n parameter -n [column_name]
Renamed "csv2docx" to "src" now code looks clean
Using @jawrainey's pathlib solution for checking and creating directory Changed logic to name files now it is based on availability of file Now file name is required Overall Cleanup
Merge branch 'master' of github.com:salmannotkhan/csv2docx
now using relative import in cli
|
Sorry for messy commits. I didn't knew how to clean up commits :) |
|
Please address my comments in the code review and the following issues identified by Install flake8 locally (DO NOT COMMIT THIS!!):
Use the python virtual environment
Run flake8 from the command line:
You'll see the following issues identified: Rather than me write an in-depth code review, the above are all important concerns that you should address, especially E711. Create one more commit addressing the above and give it the message "Fixed Flake8 issues." |
csv2docx/csv2docx.py
Outdated
| csvdict = csv.DictReader(csvfile, delimiter=delimiter) | ||
| csv_headers = csvdict.fieldnames | ||
|
|
||
| if (custom_name != None and custom_name not in csv_headers): |
There was a problem hiding this comment.
@salmannotkhan - This is E711 as identified by flake8. You can write custom_name instead of custom_name != None
There was a problem hiding this comment.
now that we made custom_name required, we don't need that condition.
There was a problem hiding this comment.
Great point -- the IF statement needs to stay of course due to checking if the argument exists in the csv 👍
csv2docx/csv2docx.py
Outdated
| """Checks whether file with same name exists or not | ||
| Args: | ||
| filename: the file we want to check | ||
| path: the directory we want to check in | ||
| Returns: | ||
| An unique full path with directory. | ||
| """ |
There was a problem hiding this comment.
Good documentation -- well done
csv2docx/csv2docx.py
Outdated
| filename += ".docx" | ||
| checkpath = path / filename | ||
| if checkpath.exists(): | ||
| # Count available files with same name | ||
| counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1 | ||
| return f"{path}/{checkpath.stem}_{counter}.docx" | ||
| return checkpath |
There was a problem hiding this comment.
Very tidy and excellent solution, especially the use of both glob and pathlib.
I wonder if instead of using two return statements that you could assign checkpath to the f-string in line 30? e.g.
if checkpath.exists():
# Count available files with same name
counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1
checkpath = f"{path}/{checkpath.stem}_{counter}.docx"
return checkpathThere was a problem hiding this comment.
This and other code review suggestions should be in a separate commit alongside the flake8 changes.
csv2docx/cli.py
Outdated
| @click.option( | ||
| '--name', '-n', | ||
| required=True, | ||
| help='naming scheme for output files.') |
There was a problem hiding this comment.
Can you capitalise naming -> Naming?
|
Couple of points for future PRs (we should probably make a standard guideline for PRs), but generally you should:
It's recommended to use titles/subtitles to document these 3 areas. Worth referring to this in future PRs as it makes code reviewing easier 👍 |
csv2docx/csv2docx.py
Outdated
| def convert(data, template, delimiter=";"): | ||
| from pathlib import Path | ||
|
|
||
| def create_path_if_not_exists(path: str): |
There was a problem hiding this comment.
You need to annotate the return type, e.g. adding pathlib.Path:
def create_path_if_not_exists(path: str) -> pathlib.Path:
csv2docx/csv2docx.py
Outdated
| path.mkdir(parents=True, exist_ok=True) | ||
| return path | ||
|
|
||
| def generate_name(filename, path): |
There was a problem hiding this comment.
You need to add type annotations to both the paramters and return type, e.g.
def generate_name(str: filename, pathlib.Path: path) -> pathlib.Path:
csv2docx/csv2docx.py
Outdated
| # TODO: write to user-defined subfolder | ||
| docx.write(f"{counter}.docx") | ||
| # Striping every name to remove extra spaces | ||
| filename = generate_name(row[custom_name].strip(), path) |
There was a problem hiding this comment.
call .strip() inside generate_name as imho it's cleaner that way since we can more easily understand which params are being sent
csv2docx/csv2docx.py
Outdated
| filename: the file we want to check | ||
| path: the directory we want to check in |
There was a problem hiding this comment.
Update the docstring args to say:
filename: the name of the file to create
path: the path where the file is stored
csv2docx/csv2docx.py
Outdated
| An unique full path with directory. | ||
| """ | ||
| filename += ".docx" | ||
| checkpath = path / filename |
There was a problem hiding this comment.
can we rename checkpath to filepath because this method returns the path of the file to create
|
Once you've made the above changes I will have another review and discuss with @davidverweij. For future reference: code changes made in a PR should not draw from other PRs, e.g. you using the You're basing this PR on multiple other issues and PRs (closes #16 #13, #18, #20) which are work in progresses and INDEPENDENT branches. By bringing together all the code from across these independent PRs you have made it: (i) more difficult to test and review; and (ii) muddied the specific issues. Each issue and PR must isolate changes to make testing easier. Having said that, this PR is an elegant and simple solution to address #20, so from that regard, good job @salmannotkhan. For future reference: if you have an alternative solution to a PR then present it in that PR via comment or gist and explain how it addresses the issue. This will save us opening more PRs for the same issue. This solution will also close #13 and #18 and requires that we create a new issue for Dave to implement the testing infrastructure. |
csv2docx/csv2docx.py
Outdated
| checkpath = path / filename | ||
| if checkpath.exists(): | ||
| # Count available files with same name | ||
| counter = len(list(path.glob(checkpath.stem + "*docx" ))) + 1 |
There was a problem hiding this comment.
Can we also update this to use an f-string instead, e.g.
counter = len(list(path.glob(f"{checkpath.stem}*docx"))) + 1
csv2docx/csv2docx.py
Outdated
| path.mkdir(parents=True, exist_ok=True) | ||
| return path | ||
|
|
||
| def generate_name(filename, path): |
There was a problem hiding this comment.
Rename generate_name to create_unique_name
csv2docx/cli.py
Outdated
| def main(data, template, delimiter, name, path): | ||
| csv2docx.convert(data, template, delimiter, name, path) |
There was a problem hiding this comment.
The proposed change below requires this to be reordered as:
data, template, path, delimiter, name
There was a problem hiding this comment.
we have put it in order like:
data, template, custom_name, path, delimiter
Because we made custom_name required that doesn't have any default argument
and all non-default argument comes first.
There was a problem hiding this comment.
Another good point: name should be renamed to custom_name to mirror the library.
csv2docx/csv2docx.py
Outdated
| def convert(data, template, delimiter=";"): | ||
| from pathlib import Path | ||
|
|
||
| def create_path_if_not_exists(path: str): |
There was a problem hiding this comment.
Can we also rename create_path_if_not_exists to: create_output_folder
csv2docx/csv2docx.py
Outdated
| return f"{path}/{checkpath.stem}_{counter}.docx" | ||
| return checkpath | ||
|
|
||
| def convert(data, template, delimiter=";", custom_name=None, path="output"): |
There was a problem hiding this comment.
custom_name is required so should not have a default argument of None.
Once you update this, the program will not run as custom_name must come before delimiter as it's a required argument.
Update the CLI method to reflect these changes
|
Thank you for all the suggestion and sorry for all that mess. if you don't know this is my first contribution to a project that's why i don't know things about PR and repo management |
How can I ignore this in commit? |
No worries. It's great that you're contributing and learning as Dave and I have setup this project both because we would like to use the library but also as a learning experience 👍
Make sure that you do not commit the changes to the If you make the above changes I can review them this evening (am in GMT+1 timezone). |
and other improvment from review by @jawrainey Ignoring vscode config folder
|
I'm done with PR if your guys think of any new feature let me know please |
csv2docx/cli.py
Outdated
| def main(data, template, name, delimiter, path): | ||
| csv2docx.convert(data, template, name, path, delimiter) |
There was a problem hiding this comment.
Can you change 'main' to accept the same order of parameters as convert? e.g. switch delimiter and path
def convert(data, template, name, path="output", delimiter=";"):There was a problem hiding this comment.
Yeah I didn't noticed that
csv2docx/csv2docx.py
Outdated
| filename: the name of file to create | ||
| path: the path where the file is stored | ||
| Returns: | ||
| An unique full path with directory. |
There was a problem hiding this comment.
Can you update the doc of 'Returns' to read:
The absolute path to the filename.
csv2docx/csv2docx.py
Outdated
|
|
||
|
|
||
| def create_unique_name(filename: str, path: Path) -> Path: | ||
| """Checks whether file with same name exists or not |
There was a problem hiding this comment.
Can you update this text (Checks whether ...) to: Creates a filename if it does not exist in the specified path.
There was a problem hiding this comment.
this would be more clear I guess
"Creates an unique file name for specified path"
csv2docx/csv2docx.py
Outdated
| docx.merge_templates([single_document], separator='page_break') | ||
| # TODO: write to user-defined subfolder | ||
| docx.write(f"{counter}.docx") | ||
| # Striping every name to remove extra spaces |
There was a problem hiding this comment.
Can you remove this comment (# Striping every ... ) as it's left from previous commits 👍
| help='Delimiter used in your csv. Default is \';\'') | ||
| @click.option( | ||
| '--name', '-n', | ||
| required=True, |
There was a problem hiding this comment.
I think making --name required is a reasonable approach, though would argue we need to allow flexibility of use. I imagine the scenario of generating un-named tickets, numbered flyers, etc - in which case the name of the files does not need to have a relation to the data. I'll expand on this in a separate issue.
There was a problem hiding this comment.
Just to add, some columns might be incomplete - which might provide a result the user is not expected. We can have a discussion in the issue #25 on what approach might make more sense.
There was a problem hiding this comment.
Great point about incomplete cells 👍
| @click.option( | ||
| '--name', '-n', | ||
| required=True, | ||
| help='Naming scheme for output files.') |
There was a problem hiding this comment.
Perhaps we can expand on the help by adding:
Specific column name to be used in the naming scheme for output files. '
There was a problem hiding this comment.
Also - is this case sensitive?
davidverweij
left a comment
There was a problem hiding this comment.
Nicely done @salmannotkhan, I've added a few comments - but these are more related to future Issues. I'll merge these into the master - after which our focus will be to set up unittesting, linting and automating those processes.
|
I forgot to mention, I updated the README to reflect the changes before merging this PR, see 202fdf9. |
used @jawrainey create output directory function using pathlib
using same pathlib to generate unique names now based on file existance in directory
now -n name option is compulsary
just bit more cleaned code using docstring comments thanks @jawrainey for suggestion