Purpose
Import object datatype values of DataFrame to Treasure Data as string type.
Behaviors
Currently, when we try to import these data into Treasure Data by using pandas_td.to_td, the values would be not necessarily that these are object:
e.g.)
import pytd.pandas_td as td
import pandas as pd
df = pd.DataFrame([{'id':'00123', 'num': '45678'}])
con = td.connect()
td.to_td(df, 'kasai_test.pytd', con, if_exists='replace')
In this case, both id and num are stored as long type in Treasure Data in spite of object type in DataFrame.
I guess that converting to CSV might cause this result since CSV doesn't have typing system.
However, fmt='msgpack' has been available in calling BulkImportWriter.write_dataframe since version 1.0.0.
Thus, we can store these values as string if we use the write_dataframe explicitly like below:
import pytd
import pandas as pd
df = pd.DataFrame({'id': '00324', 'test': '5678'}, index=range(2))
client = pytd.Client()
table = pytd.table.Table(client, "kasai_test", "pytd")
writer = pytd.writer.BulkImportWriter()
writer.write_dataframe(df, table, if_exists="overwrite", fmt="msgpack")
Because pytd has already enabled choosing format when using bulk_import, users might expect to be able to set format as well as write_dataframe.
Thus, I propose that pytd supports msgpack format in to_td options.
What do you think about it?
Purpose
Import
objectdatatype values of DataFrame to Treasure Data asstringtype.Behaviors
Currently, when we try to import these data into Treasure Data by using
pandas_td.to_td, the values would be not necessarily that these areobject:e.g.)
In this case, both
idandnumare stored aslongtype in Treasure Data in spite ofobjecttype in DataFrame.I guess that converting to CSV might cause this result since CSV doesn't have typing system.
However,
fmt='msgpack'has been available in callingBulkImportWriter.write_dataframesince version 1.0.0.Thus, we can store these values as
stringif we use thewrite_dataframeexplicitly like below:Because pytd has already enabled choosing format when using
bulk_import, users might expect to be able to set format as well aswrite_dataframe.Thus, I propose that pytd supports
msgpackformat into_tdoptions.What do you think about it?