问题描述
大家好,我有两个系列的数据:每日原始股票价格回报(正或负浮动)和交易信号(买入=1,卖出=-1,无交易=0).
Greetings all, I have two series of data: daily raw stock price returns (positive or negative floats) and trade signals (buy=1, sell=-1, no trade=0).
原始价格回报只是今天价格除以昨天价格的对数:
The raw price returns are simply the log of today's price divided by yesterday's price:
log(p_today / p_yesterday)
一个例子:
raw_return_series = [ 0.0063 -0.0031 0.0024 ..., -0.0221 0.0097 -0.0015]
交易信号系列如下所示:
The trade signal series looks like this:
signal_series = [-1. 0. -1. -1. 0. 0. -1. 0. 0. 0.]
根据交易信号获取每日收益:
To get the daily returns based on the trade signals:
daily_returns = [raw_return_series[i] * signal_series[i+1] for i in range(0, len(signal_series)-1)]
这些每日回报可能如下所示:
These daily returns might look like this:
[0.0, 0.00316, -0.0024, 0.0, 0.0, 0.0023, 0.0, 0.0, 0.0] # results in daily_returns; notice the 0s
我需要使用 daily_returns 系列来计算复合收益系列.但是,鉴于 daily_returns 系列中有 0 个值,我需要通过时间"将最后一个非零复合回报结转到下一个非零复合回报.
I need to use the daily_returns series to compute a compounded returns series. However, given that there are 0 values in the daily_returns series, I need to carry over the last non-zero compound return "through time" to the next non-zero compound return.
例如,我像这样计算复合回报(注意我会随着时间倒退"):
For example, I compute the compound returns like this (notice I am going "backwards" through time):
compound_returns = [(((1 + compounded[i + 1]) * (1 + daily_returns[i])) - 1) for i in range(len(compounded) - 2, -1, -1)]
和结果列表:
[0.0, 0.0, 0.0023, 0.0, 0.0, -0.0024, 0.0031, 0.0] # (notice the 0s)
我的目标是将最后一个非零回报结转到累积这些复合回报.也就是说,因为指数 i 的回报取决于指数 i+1 的回报,所以指数 i+1 的回报应该是非零的.每次列表推导在 daily_return 系列中遇到零时,它基本上都会重新启动.
My goal is to carry over the last non-zero return to the accumulate these compound returns. That is, since the return at index i is dependent on the return at index i+1, the return at index i+1 should be non-zero. Every time the list comprehension encounters a zero in the daily_return series, it essentially restarts.
推荐答案
有一个很棒的模块叫做 pandas那是由 AQR(对冲基金)的一个人写的,他擅长这样的计算......你需要的是一种处理丢失数据"的方法......正如上面提到的那样,基础是使用 nan(不是a number) scipy 或 numpy 的能力;然而,即使是那些库也没有让财务计算变得那么容易......如果你使用 pandas,你可以将你不想考虑的数据标记为 nan
,然后任何未来的计算都会拒绝它,同时对其他数据执行正常操作.
There is a fantastic module called pandas that was written by a guy at AQR (a hedge fund) that excels at calculations like this... what you need is a way to handle "missing data"... as someone mentioned above, the basics are using the nan (not a number) capabilities of scipy or numpy; however, even those libraries don't make financial calculations that much easier... if you use pandas, you can mark the data you don't want to consider as nan
, and then any future calculations will reject it, while performing normal operations on other data.
我已经在我的交易平台上使用 pandas 大约 8 个月...我希望我已经开始早点使用.
I have been using pandas on my trading platform for about 8 months... I wish I had started using it sooner.
Wes(作者)在 pyCon 2010 上就该模块的功能发表了演讲...请参阅幻灯片和视频 在 pyCon 2010 网页上.在该视频中,他演示了如何获得每日回报、在回报矩阵上运行 1000 次线性回归(在几分之一秒内)、时间戳/图形数据……所有这些都通过这个模块完成.与 psyco 结合使用,这是一款出色的财务分析工具.
Wes (the author) gave a talk at pyCon 2010 about the capabilities of the module... see the slides and video on the pyCon 2010 webpage. In that video, he demonstrates how to get daily returns, run 1000s of linear regressions on a matrix of returns (in a fraction of a second), timestamp / graph data... all done with this module. Combined with psyco, this is a beast of a financial analysis tool.
它处理的另一个很棒的事情是横截面数据...因此您可以获取每日收盘价、它们的滚动方式等...然后为每个计算加上时间戳,并将所有这些数据存储起来在类似于 python 字典的东西中(参见 pandas.DataFrame
类)...然后您访问数据切片就像:
The other great thing it handles is cross-sectional data... so you could grab daily close prices, their rolling means, etc... then timestamp every calculation, and get all this stored in something similar to a python dictionary (see the pandas.DataFrame
class)... then you access slices of the data as simply as:
close_prices['stdev_5d']
请参阅 pandas rolling moment 文档,了解有关计算滚动标准差的更多信息(它是单线).
See the pandas rolling moments doc for more information on to calculate the rolling stdev (it's a one-liner).
Wes 已经竭尽全力使用 cython 加速模块,但我承认,由于我的分析需求,我正在考虑升级我的服务器(较旧的 Xeon).
Wes has gone out of his way to speed the module up with cython, although I'll concede that I'm considering upgrading my server (an older Xeon), due to my analysis requirements.
编辑 STRIMP 的问题:在您将代码转换为使用 pandas 数据结构后,我仍然不清楚您如何在 pandas 数据框中索引您的数据以及复合函数处理丢失数据的要求(或者就此而言,返回 0.0 的条件......或者如果您在熊猫中使用 NaN
..).我将演示如何使用我的数据索引...随机选择一天... df
是一个数据框,其中包含 ES Futures 引号...每秒索引...缺少的引号被填充使用 numpy.nan
.DataFrame 索引是 datetime
对象,由 pytz
模块的时区对象偏移.
EDIT FOR STRIMP's QUESTION:
After you converted your code to use pandas data structures, it's still unclear to me how you're indexing your data in a pandas dataframe and the compounding function's requirements for handling missing data (or for that matter the conditions for a 0.0 return... or if you are using NaN
in pandas..). I will demonstrate using my data indexing... a day was picked at random... df
is a dataframe with ES Futures quotes in it... indexed per second... missing quotes are filled in with numpy.nan
. DataFrame indexes are datetime
objects, offset by the pytz
module's timezone objects.
>>> df.info
<bound method DataFrame.info of <class 'pandas.core.frame.DataFrame'>
Index: 86400 entries , 2011-03-21 00:00:00-04:00 to 2011-03-21 23:59:59-04:00
etf 18390 non-null values
etfvol 18390 non-null values
fut 29446 non-null values
futvol 23446 non-null values
...
>>> # ET is a pytz object...
>>> et
<DstTzInfo 'US/Eastern' EST-1 day, 19:00:00 STD>
>>> # To get the futures quote at 9:45, eastern time...
>>> df.xs(et.localize(dt.datetime(2011,3,21,9,45,0)))['fut']
1291.75
>>>
举一个简单的例子来说明如何计算一列连续收益(在 pandas.TimeSeries
中),它引用了 10 分钟前的报价(并填写缺失的刻度),我会这样做:
To give a simple example of how to calculate a column of continuous returns (in a pandas.TimeSeries
), which reference the quote 10 minutes ago (and filling in for missing ticks), I would do this:
>>> df['fut'].fill(method='pad')/df['fut'].fill(method='pad').shift(600)
在这种情况下不需要 lambda,只需将 600 秒前的值列除以自身即可..shift(600)
部分是因为我的数据是每秒索引的.
No lambda is required in that case, just dividing the column of values by itself 600 seconds ago. That .shift(600)
part is because my data is indexed per-second.
HTH,麦克
这篇关于在 Python 中计算复合回报序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!