Type Safety: is silent truncation a bug?

Happy to move to JIRA if this is confirmed as a bug

```python
In [8]: import pandas as pd
   ...: import pyarrow as arw

In [9]: df = pd.DataFrame({'A': list('abc'), 'B': np.arange(3)})
   ...: df
Out[9]:
   A  B
0  a  0
1  b  1
2  c  2

In [10]: schema = arw.schema([
    ...:     arw.field('A', arw.string()),
    ...:     arw.field('B', arw.int32()),
    ...: ])

In [11]: tbl = arw.Table.from_pandas(df, preserve_index=False, schema=schema)
    ...: tbl
Out[11]:
pyarrow.Table
A: string
B: int32
metadata
--------
{b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [{"name":'
            b' "A", "field_name": "A", "pandas_type": "unicode", "numpy_type":'
            b' "object", "metadata": null}, {"name": "B", "field_name": "B", "'
            b'pandas_type": "int32", "numpy_type": "int32", "metadata": null}]'
            b', "pandas_version": "0.23.1"}'}

In [12]: tbl.to_pandas().equals(df)
Out[12]: True
```
...so if the `schema` matches the pandas datatypes all is well - we can roundtrip the DataFrame.

Now, say we have some bad data such that column 'B' is now of type float64. The datatypes of the DataFrame don't match the explicitly supplied `schema` object but rather than raising a `TypeError` the data is silently truncated and the roundtrip DataFrame doesn't match our input DataFame without even a warning raised!
```python
In [13]: df['B'].iloc[0] = 1.23
    ...: df
Out[13]:
   A     B
0  a  1.23
1  b  1.00
2  c  2.00

In [14]: # I would expect/want this to raise a TypeError since the schema doesn't match the pandas datatypes
    ...: tbl = arw.Table.from_pandas(df, preserve_index=False, schema=schema)
    ...: tbl
Out[14]:
pyarrow.Table
A: string
B: int32
metadata
--------
{b'pandas': b'{"index_columns": [], "column_indexes": [], "columns": [{"name":'
            b' "A", "field_name": "A", "pandas_type": "unicode", "numpy_type":'
            b' "object", "metadata": null}, {"name": "B", "field_name": "B", "'
            b'pandas_type": "int32", "numpy_type": "float64", "metadata": null'
            b'}], "pandas_version": "0.23.1"}'}

In [15]: tbl.to_pandas()  # <-- SILENT TRUNCATION!!!
Out[15]:
   A  B
0  a  1
1  b  1
2  c  2

```

To be clear, I would really like `Table.from_pandas` to raise a `TypeError` if the DataFrame types don't match an explicitly supplied schema and would hope this current behaviour would be considered a bug.

```
win64/py36
arrow 0.9.0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type Safety: is silent truncation a bug? #2217

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type Safety: is silent truncation a bug? #2217

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions