Skip to content

Conversation

@bertrandhaut
Copy link
Contributor

closes #781

The 'decimal' option exists for read_csv method but not yet in 'to_csv' methods.
The lack of this option is particulary painful when we have to work with Excel with European regional settings.

This modification add this option to both Series.to_csv and Dataframe.to_csv.

@jreback jreback added IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string labels Oct 3, 2014
@jreback jreback added this to the 0.15.1 milestone Oct 3, 2014
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you write this in a way that avoids redundant code? e.g., define formatter = lambda x: x.replace('.', decimal, 1) depending on the desired formatting and then use the function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New proposal submited

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you actually want to create a new formatter, but NOT in the lambda, e.g.

if float_format and decimal != '.':
            f = float_format.replace('.',decimal,1)
            formatter = lambda v: f % v

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand you point.

If the float_format specified is something like '%.3f' just replacing the '.' by a ',' will lead to an uncorrect formater.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you construt the integer/decimal portion separatly then you can do it (you might have to 'parse' the formatter a bit), somethign like %d%s%d (but then you'll have to do some int truncation and such).

nvm. your original is prob ok (though will be slow)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to keep this first version as such and I will submit another pull request if I'm able to find a better solution (ideally we should also handle the case where the float_format is a class accepting the '%' operator).

The slowness is real but will be close to what people are currently doing in (iterating over the final result to replace all '.' by ',')

@jreback
Copy link
Contributor

jreback commented Oct 9, 2014

along those lines, would be nice to add a vbench for this as well (and then you'll know!)
see here

@jorisvandenbossche
Copy link
Member

@jreback Can this be revisited?

Is there something that needs to be changed by @bertrandhaut (apart from adding a vbench)

@shoyer
Copy link
Member

shoyer commented Feb 9, 2015

I think this needs a test?

@bertrandhaut
Copy link
Contributor Author

As suggested, I've added a test for this new option.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use standard single quotes here? ' instead of

@bertrandhaut
Copy link
Contributor Author

Joris' comments taken into account

@jorisvandenbossche
Copy link
Member

@jreback Is this good to go? (not really familiar with this)

@bertrandhaut Before merging, we ask to squash the commits into one. Can you do that (https://github.com/pydata/pandas/wiki/Using-Git#fetch-and-then-rebase-interactively-to-squash-reword-fix-otherwise-change-some-commits, if you have questions, just ask!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number as a comment here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried to squash the commits as described but I faced the following error:
$git rebase pandas/master
fatal: Needed a single revision
invalid upstream pandas/master

Any idea ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try this:

git checkout master
git pull pandas/master master
git checkout your-branch
git rebase master

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be something special that I do to create my local version...

When I do the command "git checkout master" in my to level directory I get
error: pathspec 'master' did not match any file(s) known to git.

If I do it in the "pandas" directory (the one with the setup.py file) I get:
Already on 'master'
Your branch is up-to-date with 'origin/master'.
but then the second command git pull pandas/master master lead to
fatal: 'pandas/master' does not appear to be a git repository
fatal : Could not read from remote repository

By the way, in my repository, I don't think I've made any branch. I've done
everything in master. Maybe was that not a good idea ?

On Tue, Mar 3, 2015 at 9:31 AM, Stephan Hoyer notifications@github.com
wrote:

In pandas/tests/test_format.py
#8448 (comment):

@@ -2343,7 +2343,21 @@ def test_csv_to_string(self):
df = DataFrame({'col' : [1,2]})
expected = ',col\n0,1\n1,2\n'

self.assertEqual(df.to_csv(), expected)

  • def test_to_csv_decimal(self):
  •    df = DataFrame({'col1' : [1], 'col2' : ['a'], 'col3' : [10.1] })
    

Try this:

git checkout master
git pull pandas/master master
git checkout your-branch
git rebase master


Reply to this email directly or view it on GitHub
https://github.com/pydata/pandas/pull/8448/files#r25670074.

Bertrand Haut

@jreback
Copy link
Contributor

jreback commented Mar 1, 2015

@bertrandhaut can you do a vbench run on the csv's to make sure that perf is unchanged, see here:https://github.com/pydata/pandas/wiki/Performance-Testing

@bertrandhaut
Copy link
Contributor Author

For the vbench I tried but without success. For information I've only python3 on my computer and it seems that vbench is still not ported to python3.

@jorisvandenbossche
Copy link
Member

About the vbench, I think that is correct that is not yet ported to python 3.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2015

@bertrandhaut can you rebase and squash to a single commit. looks ok otherwise to me.

@jorisvandenbossche
Copy link
Member

@bertrandhaut You were indeed working on master. In the future, better to always first make a branch if you are going to work on a certain feature/fix. But for this PR, it is OK.

Given you are working on master, this should work to rebase/squash:

git fetch upstream
git rebase -i upstream/master

@jreback
Copy link
Contributor

jreback commented Mar 7, 2015

merged via 671c4b3

thanks!

@jreback jreback closed this Mar 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

IO CSV read_csv, to_csv IO Data IO issues that don't fit into a more specific label Output-Formatting __repr__ of pandas objects, to_string

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Handle European decimal formats in to_csv

4 participants