# Last edited on 2014-01-30 01:32:22 by stolfilocal Market prices must be the most heavily studied type of time series ever. Bitcoins are a rather interesting example: since they pay no dividends and have no backing assests, their price at any moment is entirely due to speculation, namely to the market's prediciton of their future price. I could not resist doing my own amateur analysis. (There must be tons of books on this subject, but books and papers are best read after you spent a couple of weeks banging your head on the wall...) [b]The Brownian model[/b] I could not find any significant correlation between future prices and past prices earlier than the current one. In log scale (that is, considering price ratios rather than differences), the change between the mean price in one period (say, one hour) and the price in the next period looks pertty much like a random variable, with zero mean, that is independent of all earlier changes. Specifically, let's take the weighted mean BTC/USD price at Bitstamp in successive 1-hour intervals. Let P[i] be that price in period number i (counted from some arbitrary starting point) and Z[i] be the log base 10 of P[i]. Thus, an increase of 1.0 in Z means that the price P was multiplied by 10, while a decrease of 1.0 means that P was divided by 10. As said above, looking at the Bitstamp hourly data since last september I cannot find any significant correlation between the future changes in Z (that is, Z[i+n] - Z[i], for any n>1) and the past changes (that is, Z[i] - Z[i-1], Z[i-1]-Z[i-2], etc.) Thus, the best predictive model I found that fits that data is a simple Brownian model (1.1) Z[i + 1] = Z[i] + C*RND[i] where C ~0.01, and each RND[i] is an independent random variable with zero mean and unit standard deviations. That is, at each hour the mean price changes by a random factor, on the order of 1 percent, in either direction. This model implies that, for any n > 0, (1.2) Z[i+n] = Z[i] + C*sqrt(n)*RND[i,n] where RND[i,n] is essentially a Gaussian random variable with mean 0 and unit deviation. (Note that these variables are [i]not[/i] independent when n > 1.) I verified model (1.2) experimentally, by collecting a set S[n] of increments Z[i+n] - Z[i] for each value of n, computing their standard deviation dev[n] (assuming zero mean) and plotting dev[n] as a function of n. See below (1.3) [IMAGE] The emprical deviation dev[n] is the line red line with dots, and the mathematical model C*sqrt(n) is the solid green line. Since ~95% of the probability in a Gaussian variable is within 2 deviations of its mean, we can expect that Z[i+n] will be within the interval (1.4) Z[i] ± 2*C*sqrt(n) with 95% probability. Thus, at any time in the future, the Bitstamp 1-hour weighted mean price should be within the two blue lines in the following graph, with 95% probability: (1.5) [IMAGE] (Note that this is not the same thing as saying that the [i]entire graph[/i] of Z[i+n], for all n > 1, will stay within that region with 90% probability!) As figure (1.3) shows, these algebraic bounds (smooth blue lines) fit quite well the 5%-95% percentiles of the samples S[n] (the stepped blue lines). Moreover, at any future time, there is 50% probability that the price will be above (or below) the horizontal red line in figure (1.5), defined by the equation (1.4) Z[i+n] = Z[i]. This model of course is not very helpful for traders, since it does not give any hint about whether the price will go up or down in the future, near or far. However, I don't think one can get significantly better predictions from the price data alone, without external information (such as regulations, arrests, press coverage, etc.). [b]Historical trend[/b] But, you may say, what about the historic trend? Shouldn't the formula be (2.1) Z[i+1] = Z[i] + T + C'*RND[i,n] where T is a "trend" constant; so that for ant n > 0, we will have (2.2) Z[i+n] = Z[i] + T*n + C'*sqrt(n)*RND[i,n] and the 2-sigma confidence lines will be (2.3) Z[i] + T*n ± 2*C*sqrt(n) The "trendy" model (2.1--2.2) is equivalent to assuming that the single step increments Z[i+1] - Z[i] have a non-zero average, namely T; and defining C' as the standard deviation of the increments from that mean, rather than from zero. I have experimented with a trendy model as well. While it yields seemingly tighter predictions (C' slightly smaller than C), I think that the no-trend model (1.1--1.2) is better, for several reasons: * The T parameter depends strongly on what part of the data one one uses to estimate C' and T. If one starts from 2013-09-01 (or earlier) and ends at 2014-01-17, one gets an increasing trend (positive T). But if one starts at 2013-11-29, or 2013-01-06, the trend will be strongly decreasing (T < 0). And, if one starts looking at 2013-11-22, the trend will be flat (T ~ 0). Therefore, the T parameter cannot be reliably determined. (In contrast, the value of C (or C') seems to be fairly independent of the period of analysis.) * If there was in fact a trend term T*n in the price data, the analysis * Any finite segment of a purely Brownian series, as generated by the trendless model (1)--(2), will appear to have some general trend, since the sum of its n random is unlikely to be zero. Indeed, in a price evolution chart our eyes usually see many sections with increasing or decreasing trends, at all time scales. So the apparent presence of an overall trend in Bitcoin prices over certain time spans is not a sufficient argument to include that trend in the model. * There seems to be no logical justification for a trend term. Bitcoin owners and fans are understandably fond of plotting the price evolution since the birth of the universe, and pointing out how much the thing has grown. But the traders who will decide its future prices do not care whether it was worth 1$ or 1000$ a year ago, given that it is worth 900$ today, was 950 yesterday, and 850 last week. Most traders know that the remote past does not matter; and they know that most traders know that it does not matter; and they know that most traders know that most traders know that it does not matter; and so on. Which is precisely why most traders know that the remote past does not matter. So, there should be no term that takes into account the remote past. * In any case, for predictions over relatively short time spans (in the Bitstamp data above, for n ~48 or less), the trend term T*n in (2.2) is fairly small compared to the deviation of the random term C*sqrt(n). Therefore, for that order of n, it can be included in the random term with little loss of precision. * On the other hand, for larger values of n, a nonzero term T*n would eventually overpower the random term C*srt(n)*RND[i,n]. If T is positive, for example, that term would eventually cause the blue curve Z[i] + T*n - 2*C**srt(n) to start rising after going down for a while, and eventually rise above the present value Z[i]. See figure (2.4) (2.4) [IMAGE] In other words, a trendy model with positive T would say that the value of Z[i+n] is equally likely to go up or down in the short term, but after so many days it is 90% certain that it will be higher than now, and will continue rising and rising [i]forever[/i], with practically zero chance of getting down. While that prediction will please the most "bullish" traders, it seems to be rather implausible --- since the positive T value that yields it is entirely based on ancient data which, as discussed above, should have no influence on the future behavior of the market. * If a nonzero trend term T*n did exist in reality, it would manifest itself in figure (1.3) as an increasing discrepancy between the model deviation C*sqrt(n) (green solid line) and the sample deviations dev[n] of the increment sets S[n] (red line with dots). That is because dev[n] was computed with assumed zero mean. The omitted term T*n would cause dev[n] to eventually grow proportionally to n, rather than sqrt(n). In fact the match between dev[n] and C*sqrt(n) gets better as n increases, implying that ay trend term T*n must be much smaller than C*sqrt(n) for the values of n shown in the plot. One argument in favor of the trendy model is that the term T*n could be due to an exterior cause (like expanding adoption by merchants), not just to peculative trading. In that case, analysis of a longer dataset would yield a more accurate value for the trend parameter T (in this case, positive) than the analysis of a shorter sample. However, the T one obtains from analysis of the data since 2013-11-30 is not only negative, but has a much larger magnitude than the positive T one obtains by starting in 2013-09-01 or earlier. See figure (2.5) below. (2.5) [IMAGE estimated trend parameter T as a function of sample start date] [b]Short term trends[/b] More surprising than the lack of a long-term linear trend component was the apparent absence of any short-term trend, that is, the absence of correlation between successive increments. Visual examination of the charts suggests to many that the market has some inertia, so that if the price increased during the last time step, it is more likely to increase in the next step too; and similarly for decreases. However, statistical analysis shows very little or no correlation between successive increments DZ[i] = Z[i]-Z[i-1] and DZ[i+1] = Z[i+1]-Z[i]. If one subtracts from DZ[i+1] the part that can be ascribed to influence of DZ[i], the residual DZ'[i+1] still has practically the same deviation as the raw increments DZ[i+1]. The apparent "correlations" and "short-term trends" seen by the eye may have a psychological explanation. Two or more successive increments with the same sign, say up-up or down-down, will usually create a conspicuous step in the plot, which attracts the viewer attention; whereas mixed-sign sequences like up-down or down-up will often generate a "noise blip" on the plot but no significant step, and therefore will tend to be overlooked. In fact, in some datasets (especially from minute-by-minute files) there seem to be a weak [i]negative[/i] correlation between successive increments: an increase is a bit more likely to be followed by a decrease than by another increase. This may be the result of certain high-frequency, low-volume robots that have been found to operate in some exchanges. It may be due also to