我正在尝试合并一个(熊猫14.1)数据帧和一系列数据.该系列应该与一些NAs(因为系列的索引值是数据帧的索引值的子集)形成一个新的列.
这适用于玩具示例,但不适用于我的数据(详见下文).
例:
import pandas as pd import numpy as np df1 = pd.DataFrame(np.random.randn(6,4),columns=['A','B','C','D'],index=pd.date_range('1/1/2011',periods=6,freq='D')) df1 A B C D 2011-01-01 -0.487926 0.439190 0.194810 0.333896 2011-01-02 1.708024 0.237587 -0.958100 1.418285 2011-01-03 -1.228805 1.266068 -1.755050 -1.476395 2011-01-04 -0.554705 1.342504 0.245934 0.955521 2011-01-05 -0.351260 -0.798270 0.820535 -0.597322 2011-01-06 0.132924 0.501027 -1.139487 1.107873 s1 = pd.Series(np.random.randn(3),name='foo',periods=3,freq='2D')) s1 2011-01-01 -1.660578 2011-01-03 -0.209688 2011-01-05 0.546146 Freq: 2D,Name: foo,dtype: float64 pd.concat([df1,s1],axis=1) A B C D foo 2011-01-01 -0.487926 0.439190 0.194810 0.333896 -1.660578 2011-01-02 1.708024 0.237587 -0.958100 1.418285 NaN 2011-01-03 -1.228805 1.266068 -1.755050 -1.476395 -0.209688 2011-01-04 -0.554705 1.342504 0.245934 0.955521 NaN 2011-01-05 -0.351260 -0.798270 0.820535 -0.597322 0.546146 2011-01-06 0.132924 0.501027 -1.139487 1.107873 NaN
数据的情况(见下文)似乎基本相同 – 将一个数组与DatetimeIndex串联起来,其值是数据帧的一个子集.但是它给出了标题中的ValueError(blah1 =(5,286)blah2 =(5,276)).为什么不工作?
In[187]: df.head() Out[188]: high low loc_h loc_l time 2014-01-01 17:00:00 1.376235 1.375945 1.376235 1.375945 2014-01-01 17:01:00 1.376005 1.375775 NaN NaN 2014-01-01 17:02:00 1.375795 1.375445 NaN 1.375445 2014-01-01 17:03:00 1.375625 1.375515 NaN NaN 2014-01-01 17:04:00 1.375585 1.375585 NaN NaN In [186]: df.index Out[186]: <class 'pandas.tseries.index.DatetimeIndex'> [2014-01-01 17:00:00,...,2014-01-01 21:30:00] Length: 271,Freq: None,Timezone: None In [189]: hl.head() Out[189]: 2014-01-01 17:00:00 1.376090 2014-01-01 17:02:00 1.375445 2014-01-01 17:05:00 1.376195 2014-01-01 17:10:00 1.375385 2014-01-01 17:12:00 1.376115 dtype: float64 In [187]:hl.index Out[187]: <class 'pandas.tseries.index.DatetimeIndex'> [2014-01-01 17:00:00,2014-01-01 21:30:00] Length: 89,Timezone: None In: pd.concat([df,hl],axis=1) Out: [stack trace] ValueError: Shape of passed values is (5,286),indices imply (5,276)
解决方法
我有一个类似的问题(加入工作,但concat失败).
检查df1和s1中的重复索引值(例如df1.index.is_unique)
删除重复的索引值(例如,df.drop_duplicates(inplace = True))或其中一个方法https://stackoverflow.com/a/34297689/7163376应该解决它.