Numpy:条件总和

Numpy: conditional sum(Numpy:条件总和)
本文介绍了Numpy:条件总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我有以下 numpy 数组:

I have the following numpy array:

import numpy as np
arr = np.array([[1,2,3,4,2000],
                [5,6,7,8,2000],
                [9,0,1,2,2001],
                [3,4,5,6,2001],
                [7,8,9,0,2002],
                [1,2,3,4,2002],
                [5,6,7,8,2003],
                [9,0,1,2,2003]
              ])

我理解 np.sum(arr, axis=0) 提供结果:

array([   40,    28,    36,    34, 16012])

我想做的(没有for循环)是根据最后一列的值对列求和,以便提供的结果是:

what I would like to do (without a for loop) is sum the columns based on the value of the last column so that the result provided is:

array([[   6,    8,   10,   12, 4000],
       [  12,    4,    6,    8, 4002],
       [   8,   10,   12,    4, 4004],
       [  14,    6,    8,   10, 4006]])

我意识到如果没有循环可能会有些牵强,但希望能做到最好……

I realize that it may be a stretch to do without a loop, but hoping for the best...

如果必须使用 for 循环,那将如何工作?

If a for loop must be used, then how would that work?

我试过 np.sum(arr[:, 4]==2000, axis=0) (我会用 for 循环中的变量替换 2000),但是它给出了 2

I tried np.sum(arr[:, 4]==2000, axis=0) (where I would substitute 2000 with the variable from the for loop), however it gave a result of 2

推荐答案

你可以在纯 numpy 中使用 np.diffnp.add.reduceat.np.diff 将为您提供最右侧列更改的索引:

You can do this in pure numpy using a clever application of np.diff and np.add.reduceat. np.diff will give you the indices where the rightmost column changes:

d = np.diff(arr[:, -1])

np.where 会将您的布尔索引 d 转换为 np.add.reduceat 期望的整数索引:

np.where will convert your boolean index d into the integer indices that np.add.reduceat expects:

d = np.where(d)[0]

reduceat 也期望看到零索引,并且所有内容都需要移动一:

reduceat will also expect to see a zero index, and everything needs to be shifted by one:

indices = np.r_[0, e + 1]

使用 np.r_ 这里比 方便一点np.concatenate 因为它允许标量.然后总和变为:

Using np.r_ here is a bit more convenient than np.concatenate because it allows scalars. The sum then becomes:

result = np.add.reduceat(arr, indices, axis=0)

这当然可以组合成一条线:

This can be combined into a one-liner of course:

>>> result = np.add.reduceat(arr, np.r_[0, np.where(np.diff(arr[:, -1]))[0] + 1], axis=0)
>>> result
array([[   6,    8,   10,   12, 4000],
       [  12,    4,    6,    8, 4002],
       [   8,   10,   12,    4, 4004],
       [  14,    6,    8,   10, 4006]])

这篇关于Numpy:条件总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

patching a class yields quot;AttributeError: Mock object has no attributequot; when accessing instance attributes(修补类会产生“AttributeError:Mock object has no attribute;访问实例属性时)
How to mock lt;ModelClassgt;.query.filter_by() in Flask-SqlAlchemy(如何在 Flask-SqlAlchemy 中模拟 lt;ModelClassgt;.query.filter_by())
FTPLIB error socket.gaierror: [Errno 8] nodename nor servname provided, or not known(FTPLIB 错误 socket.gaierror: [Errno 8] nodename nor servname provided, or not known)
Weird numpy.sum behavior when adding zeros(添加零时奇怪的 numpy.sum 行为)
Why does the #39;int#39; object is not callable error occur when using the sum() function?(为什么在使用 sum() 函数时会出现 int object is not callable 错误?)
How to sum in pandas by unique index in several columns?(如何通过几列中的唯一索引对 pandas 求和?)