Pandas 将 csv 读取为字符串类型

Pandas reading csv as string type(Pandas 将 csv 读取为字符串类型)
本文介绍了Pandas 将 csv 读取为字符串类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我有一个带有字母数字键的数据框,我想将其保存为 csv 并稍后读回.由于各种原因,我需要将此键列显式读取为字符串格式,我有严格数字的键,甚至更糟,例如:1234E5,Pandas 将其解释为浮点数.这显然使密钥完全无用.

I have a data frame with alpha-numeric keys which I want to save as a csv and read back later. For various reasons I need to explicitly read this key column as a string format, I have keys which are strictly numeric or even worse, things like: 1234E5 which Pandas interprets as a float. This obviously makes the key completely useless.

问题是,当我为数据框或其任何列指定字符串 dtype 时,我只会得到垃圾.我这里有一些示例代码:

The problem is when I specify a string dtype for the data frame or any column of it I just get garbage back. I have some example code here:

df = pd.DataFrame(np.random.rand(2,2),
                  index=['1A', '1B'],
                  columns=['A', 'B'])
df.to_csv(savefile)

数据框如下:

           A         B
1A  0.209059  0.275554
1B  0.742666  0.721165

然后我是这样读的:

df_read = pd.read_csv(savefile, dtype=str, index_col=0)

结果是:

   A  B
B  (  <

这是我的电脑问题,还是我在这里做错了什么,或者只是一个错误?

Is this a problem with my computer, or something I'm doing wrong here, or just a bug?

推荐答案

更新:这有 已修复:从 0.11.1 开始,您传递 str/np.str 将等同于使用 object.

Update: this has been fixed: from 0.11.1 you passing str/np.str will be equivalent to using object.

使用对象数据类型:

In [11]: pd.read_csv('a', dtype=object, index_col=0)
Out[11]:
                      A                     B
1A  0.35633069074776547     0.745585398803751
1B  0.20037376323337375  0.013921830784260236

或者更好,只是不要指定数据类型:

or better yet, just don't specify a dtype:

In [12]: pd.read_csv('a', index_col=0)
Out[12]:
           A         B
1A  0.356331  0.745585
1B  0.200374  0.013922

但是绕过类型嗅探器并真正返回 only 字符串需要使用 converters:

but bypassing the type sniffer and truly returning only strings requires a hacky use of converters:

In [13]: pd.read_csv('a', converters={i: str for i in range(100)})
Out[13]:
                      A                     B
1A  0.35633069074776547     0.745585398803751
1B  0.20037376323337375  0.013921830784260236

其中 100 是等于或大于您的总列数的某个数字.

where 100 is some number equal or greater than your total number of columns.

最好避免使用 str dtype,例如参见 这里.

这篇关于Pandas 将 csv 读取为字符串类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯了您的权益,请联系我们,我们会在确认后第一时间进行删除!

相关文档推荐

Multiprocessing on Windows breaks(Windows 上的多处理中断)
How to use a generator as an iterable with Multiprocessing map function(如何将生成器用作具有多处理映射功能的可迭代对象)
read multiple files using multiprocessing(使用多处理读取多个文件)
Why does importing module in #39;__main__#39; not allow multiprocessig to use module?(为什么在__main__中导入模块不允许multiprocessig使用模块?)
Trouble using a lock with multiprocessing.Pool: pickling error(使用带有 multiprocessing.Pool 的锁时遇到问题:酸洗错误)
Python sharing a dictionary between parallel processes(Python 在并行进程之间共享字典)