问题描述
我编写了以下代码,它创建了两个数据帧 nq
和 cmnt
.nq
包含UserId
和对应的徽章获得时间date
.cmnt
包含 OwnerUserId
和用户发表评论的时间 CreationDate
.
我想计算获得徽章 1 周前后所有天的评论数,以便我可以从中创建时间序列线图.
I have written the following code which creates two dataframes nq
and cmnt
.
nq
contains the UserId
and corresponding time of Badge Attainment date
.
cmnt
contains OwnerUserId
and the time when the User made a comment CreationDate
.
I want to get a count of the comments made for all days before and after 1 week of badge attainment so that I can create a time series line plot out of it.
以下代码执行相同的操作,但会产生 KeyError.请提供为所有用户执行此操作的代码.
The following code perform the same but produces a KeyError. Please provide a code that performs this operations for all users.
nq
UserId | date
1 2009-10-17 17:38:32.590
2 2009-10-19 00:37:23.067
3 2009-10-20 08:37:14.143
4 2009-10-21 18:07:51.247
5 2009-10-22 21:25:24.483
cmnt
OwnerUserId | CreationDate
1 2009-10-16 17:38:32.590
1 2009-10-18 17:38:32.590
2 2009-10-18 00:37:23.067
2 2009-10-17 00:37:23.067
2 2009-10-20 00:37:23.067
3 2009-10-19 08:37:14.143
4 2009-10-20 18:07:51.247
5 2009-10-21 21:25:24.483
代码
nq.date = pd.to_datetime(nq.date)
cmnt.CreationDate = pd.to_datetime(cmnt.CreationDate)
count= []
for j in range(len(nq)):
for i in range(-7,8):
check_date = nq.date.iloc[j] + timedelta(days=i)
count = cmnt.loc[(cmnt.OwnerUserId == nq.UserId.iloc[j]) & (cmnt.CreationDate == check_date)].shape[0]
nq.iloc[j].append({nq[i]:count})
预期输出
UserId | date |-7|-6|-5|-4|-3|-2|-1|0 |1 |2 |3 |4 |5 |6 |7
1 2009-10-17 17:38:32.590 |0 |0 |0 |0 |0 |0 |1 |0 |1 |0 |0 |0 |0 |0 |0
2 2009-10-19 00:37:23.067 |0 |0 |0 |0 |0 |1 |1 |0 |1 |0 |0 |0 |0 |0 |0
3 2009-10-20 08:37:14.143 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
4 2009-10-21 18:07:51.247 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
5 2009-10-22 21:25:24.483 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
这里的列-1
表示获得徽章前1天发表的评论,1
表示获得徽章后一天发表的评论,依此类推.
Here column -1
means comment made 1 day before badge attainment and 1
means comment made one day after badge attainment and so on.
注意可以有一种完全替代的方法来做到这一点.我的主要目标是绘制一个时间序列线图,显示用户在获得徽章之前和之后发表的评论数.
Note There can be a completely alternately way to do this. My main objective is to draw a time series line plot which shows the number of comments made by the users before and after attainment of the badge.
推荐答案
可能你想要一个交叉合并,过滤,然后一个 crosstab
:
Probably you want a cross-merge, filter and then a crosstab
:
# merge the two dataframes
merged = (nq.merge(cmnt, left_on='UserId',
right_on='OwnerUserId',
how='left')
)
# extract the date difference between `date` and `CreationDate`
merged['date_diff'] = merged['date'].dt.normalize() - merged['CreationDate'].dt.normalize()
merged['date_diff'] = (merged['date_diff'] / pd.to_timedelta('1D')).astype(int)
# filter the comments within the range
merged = merged[merged['date_diff'].between(-7,7)]
# crosstab
pd.crosstab([merged['UserId'],merged['date']], merged['date_diff'])
输出:
date_diff -1 1 2
UserId date
1 2009-10-17 17:38:32.590 1 1 0
2 2009-10-19 00:37:23.067 1 1 1
3 2009-10-20 08:37:14.143 0 1 0
4 2009-10-21 18:07:51.247 0 1 0
5 2009-10-22 21:25:24.483 0 1 0
这篇关于Python:如何根据日期时间获取值的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!