pandas 使用python中pandas的read_excel函数将日期保留为字符串

声明:本页面是StackOverFlow热门问题的中英对照翻译,遵循CC BY-SA 4.0协议,如果您需要使用它,必须同样遵循CC BY-SA许可,注明原文地址和作者信息,同时你必须将它归于原作者(不是我):StackOverFlow 原文地址: http://stackoverflow.com/questions/34156830/
Warning: these are provided under cc-by-sa 4.0 license. You are free to use/share it, But you must attribute it to the original authors (not me): StackOverFlow

提示:将鼠标放在中文语句上可以显示对应的英文。显示中英文
时间:2020-09-14 00:20:55  来源:igfitidea点击:

Leave dates as strings using read_excel function from pandas in python

pythonexceldatetimepandas

提问by MattB

Python 2.7.10
Tried pandas 0.17.1 -- function read_excel
Tried pyexcel 0.1.7 + pyexcel-xlsx 0.0.7 -- function get_records()

Python 2.7.10
试过Pandas 0.17.1 -- 函数 read_excel
试过 pyexcel 0.1.7 + pyexcel-xlsx 0.0.7 -- 函数 get_records()

When using pandas in Python is it possible to read excel files (formats: xls|xlsx) and leave columns containing dateor date + timevalues as stringsrather than auto-convertingto datetime.datetimeor timestamptypes?

在 Python 中使用 Pandas 时,是否可以读取 excel 文件(格式:xls|xlsx)并将包含日期日期 + 时间值的列保留为字符串,而不是自动转换datetime.datetimetimestamp类型?

If this is not possible using pandas can someone suggest an alternate method/library to read xls|xlsxfiles and leave date column values as strings?

如果使用Pandas无法做到这一点,有人可以建议另一种方法/库来读取xls|xlsx文件并将日期列值保留为字符串吗?

For the pandassolution attempts the df.info()and resultant date column types are shown below:

对于Pandas解决方案尝试df.info()和结果日期列类型如下所示:

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 117 entries, 0 to 116
Columns: 176 entries, Mine to Index
dtypes: datetime64[ns](2), float64(145), int64(26), object(3)
memory usage: 161.8+ KB
>>> type(df['Start Date'][0])
Out[6]: pandas.tslib.Timestamp
>>> type(df['End Date'][0])
Out[7]: pandas.tslib.Timestamp

Attempt/Approach 1:

尝试/方法 1:

def read_as_dataframe(filename, ext):
   import pandas as pd
   if ext in ('xls', 'xlsx'):
      # problem: date columns auto converted to datetime.datetime or timestamp!
      df = pd.read_excel(filename) # unwanted - date columns converted!

   return df, name, ext

Attempt/Approach 2:

尝试/方法 2:

import pandas as pd
# import datetime as datetime
# parse_date = lambda x: datetime.strptime(x, '%Y%m%d %H')
parse_date = lambda x: x
elif ext in ('xls', 'xlsx', ):
    df = pd.read_excel(filename, parse_dates=False)
    date_cols = [df.columns.get_loc(c) for c in df.columns if c in ('Start Date', 'End Date')]
    # problem: date columns auto converted to datetime.datetime or timestamp!
    df = pd.read_excel(filename, parse_dates=date_cols, date_parser=parse_date)

And have also tried pyexcel library but it does the same auto-magic convert behavior:

并且还尝试过 pyexcel 库,但它执行相同的自动魔术转换行为:

Attempt/Approach 3:

尝试/方法 3:

import pyexcel as pe
import pyexcel.ext.xls
import pyexcel.ext.xlsx

t0 = time.time()
if ext == 'xlsx':
    records = pe.get_records(file_name=filename)
    for record in records:
        print("start date = %s (type=%s), end date = %s (type=%s)" %
              (record['Start Date'],
               str(type(record['Start Date'])),
               record['End Date'],
               str(type(record['End Date'])))
              )

回答by YDD9

  • Using converters{'Date': str} option inside the pandas.read_excel which helps. pandas.read_excel(xlsx, sheet, converters={'Date': str})
  • you can try convert your timestamp back to the original format
    df['Date'][0].strftime('%Y/%m/%d')
  • 在 pandas.read_excel 中使用 converters{'Date': str} 选项会有所帮助。 pandas.read_excel(xlsx, sheet, converters={'Date': str})
  • 您可以尝试将时间戳转换回原始格式
    df['Date'][0].strftime('%Y/%m/%d')

回答by Nolan Conaway

I ran into an identical problem, except pandas was oddly converting only somecells into datetimes. I ended up manually converting each cell into a string like so:

我遇到了同样的问题,除了Pandas奇怪地只将一些单元格转换为日期时间。我最终手动将每个单元格转换为一个字符串,如下所示:

def undate(x):
    if pd.isnull(x):
        return x
    try:
        return x.strftime('%d/%m/%Y')
    except AttributeError:
        return x
    except Exception:
        raise

for i in list_of_possible_date_columns:
    df[i] = df[i].apply(undate)

回答by Sriram Veturi

I tried saving the file in a CSV UTF-8 format(manually) and used pd.read_csv()and worked fine.

我尝试saving the file in a CSV UTF-8 format(手动)并使用pd.read_csv()并且工作正常。

I tried a bunch of things to figure the same thing with read_excel. Did not work anything for me. So, I am guessing read_excelis probably updating your string in a datetime object which you can not control.

我尝试了很多东西来计算同样的事情read_excel。对我没有任何作用。因此,我猜测read_excel可能是在您无法控制的日期时间对象中更新您的字符串。