Professional Documents
Culture Documents
Lada Adamic, Celso Brunetti, Jeffrey Harris, and Andrei Kirilenko September 9, 2009
ABSTRACT We apply network analysis to trace patterns of information transmission in an electronic limit order market. If market orders or large executable limit orders are submitted by informed traders, then resulting star-shaped or diamond-shaped patterns or trading networks should be associated with large changes in returns, smaller volume, and short duration between trades. In contrast, the execution of small limit orders from uninformed traders should result in networks with many triangular and reciprocal patterns and be associated with smaller changes in returns, larger volume and longer duration between trades. We compute a time series of trading networks using audit trail, transaction-level data for all regular transactions in the September 2008 E-mini S&P 500 futures contract the cornerstone of price discovery for the S&P 500 Index. We nd that network metrics that quantify the shape of a network are statistically signicantly related to returns, volatility, volume, and duration.
Lada Adamic is with the University of Michigan and the Commodity Futures Trading Commission, Celso Brunetti is with Johns Hopkins University and the Commodity Futures Trading Commission, Jeffrey Harris is with the Commodity Futures Trading Commission and the University of Delaware, and Andrei Kirilenko is with the Commodity Futures Trading Commission. We are greatful to Paul Tsyhura for invaluable assistance with the retrieval, organization, and processing of transaction-level data. We thank Pat Fishe, Pete Kyle, Antonio Mele, Han Ozsoylev, and seminar participants at the Chicago Mercantile Exchange, the Commodities Futures Trading Commission, 2009 Econometric Society Summer Meetings in Barcelona, the Federal Reserve Board of Governors, NASDAQ, the Securities and Exchange Commission, and the University of Maryland for very helpful comments and suggestions. The views expressed in this paper are our own and do not constitute an ofcial position of the Commodity Futures Trading Commission, its Commissioners or staff.
Most securities exchanges around the world are electronic limit order markets. Yet, the analysis of electronic limit order trading has proven to be very challenging. To quote from the survey by Parlour and Seppi (2008): Despite the simplicity of limit orders themselves, the economic interactions in limit order markets are complex because the associated state and action spaces are extremely large and because trading with limit orders is dynamic and generates non-linear payoffs. In this paper, we apply network analysis to quantify the dynamics of information transmission in an electronic limit order market - a complex dynamic problem. The networks we analyze are trading networks. We dene a trading network as a set of traders engaged in transactions within a period of time. In graph theoretic terminology, a trading network is a graph, consisting of a set of nodes and a set of edges. Each node denotes a unique trader and an edge between two nodes denotes the occurrence of trading between two unique counterparties within a period of time. The direction of an edge indicates buy or sell transactions between unique counterparties. Namely, a directed edge from node A to node B indicates that trader A sold (one time or several times) to trader B during a specied period of time. A trading network formed over a designated number of transactions traces a pattern of order execution in the limit order book. By analyzing the shape of that pattern, we can quantify the structure of the executed portion of the book. For example, the execution of a market order will result in a star-shaped pattern with the node that submitted the market order in the center and nodes that connected to it as the market order marched through the limit order book in the periphery. This star-shaped network will also not have any triangular or reciprocal connections. In contrast, the execution of two large limit orders that arrived at different times will result in a diamond-shaped pattern with the two nodes that submitted large limit orders on the ends and market makers that provided the immediacy of execution (in small installments) in the middle. Finally, an execution of a sequence of small limit orders will look different from the execution of market or large executable limit orders. Some nodes will have more connections than others, but there will be no central dominant node or a diamond shape. There will be a number of triangular connections and some pairs of nodes will have edges that go both ways. If market orders or large executable limit orders are submitted by informed traders, then patterns of order execution should be informative beyond transaction prices, volume or trade duration. Intuitively, if market orders or large executable limit orders are submitted by informed traders, then resulting star-shaped or diamond-shaped trading networks should be associated with large changes in returns, possibly smaller volume, and short duration between trades. Conversely, trading networks that are very dissimilar to a star or a diamond - e.g., those with triangular and reciprocal patterns - should be associated with smaller changes in returns, possibly larger volume and longer duration between trades. Various network metrics that quantify the shape of a network - e.g., the number of central nodes or triangular connections in a network - should then be statistically related to returns, volatility, volume, and duration.
In this paper we nd evidence that network metrics serve as primitive measures of limit order book dynamics. Namely, we compute network and nancial variables for all regular transactions that occurred during August 2008 in the nearby E-mini S&P 500 futures contract and nd that network variables strongly Granger-case intertrade duration and volume. This suggests that network metrics presage the appearance of this information in duration and volume. We also nd that the network variable that quanties centrality (or how star-shaped a pattern is) exhibits a very high contemporaneous correlation with returns. Similarly, the network variables that quanties the assortativity of connections (or how diamond-shaped a pattern is) exhibit high contemporaneous correlation with volatility. These results are robust with respect to different equity index futures markets (E-mini Dow Jones and Nasdaq 100), different observation periods (May 2008 and August 2008), different levels of aggegation (at the broker level and individual trading account level), and different sampling frequencies (240 and 600 transactions). Correlation results can also be replicated in a simulated model, conrming that these empirical regularities do not arise by chance. Furthermore, the results do not depend on any parametric specications or modeling assumptions. This is the rst paper to empirically link trading networks that trace the execution of the limit order book with the dynamics of high frequency nancial variables - transaction prices, quantities and duration. As such, it offers a way to analyze the dynamics of the executed portion of the limit order book from transaction level data. Empirical network analysis has previously been applied in nance to study investment decisions and corporate governance.1 In contrast to strategically-formed networks where participants prefer to associate with specic counterparties, the networks we study are trading networks in which connections are formed as a result of an automated matching algorithm and reect the participants beliefs about the valuation of an asset. These networks are also highly dynamicwhereas boards of directors and portfolio holdings evolve gradually, over weeks, months, or yearsnancial trading networks change second by second. Our paper proceeds as follows. In Section I, we describe our unique ultra high frequency data, explain how we chose the sampling frequency, and describe nancial variables. In Section II, we describe network variables. In Section III we outline our conjecture of why patterns of order executiontrading networkscontain valuable information beyond prices, quantities, or intertrade duration. In Section IV, we present the empirical properties of network and nancial variables. In Section V, we analyze time series properties and employ Granger-causality tests between and among network and nancial variables. Section VI demonstrates that our results are robust with respect to different markets, different observation periods, and different sampling frequencies. In Section VII, we use an agent-based simulation model of trading networks to further test that our empirical results do not arise by chance. Finally, Section VIII
1 For
summarizes our ndings and suggests further applications of the network analysis methodology to trading networks.
The second technique is developed by Bandi and Russell (2006) to select the sampling frequency that minimizes the variance of market microstructure noise. According to the second technique, the optimal sampling frequency is just below 100 transactions. Neither technique makes any use of network variables. We adopt a very conservative approach and select 240 transactions as the sampling frequency for our data.6 For each period consisting of 240 transactions (which amount to a total of 25,104 such periods in our sample), we compute the following nancial variables: returns, volatility, intertrade duration, and trading volume. These four variables are typically assumed to both contain and convey valuable information to market participants about the true (but unobserved) stochastic price process.7 Intuitively, market participants can learn about the true underlying price process by observing transaction prices, trading volume, and times between trades. Transaction prices contain valuable information about the true underlying price process, but with a possibly signicant amount of noise due to, among other reasons, market microstructure issues (e.g., bid-ask bounce), measurement issues (e.g., time scale, discrete realizations from a continuous process), and seasonality (e.g., predictable intraday patterns).8 Both returns and their volatility are computed from observed prices and, thus, suffer from the same noise issues. However, a number of techniques have been developed to reduce the impact of different noise components in ultra high-frequency data. The techniques we use to deal with measurement errors and reduce market microstructure noise lters and optimal sampling frequency are described just above. In addition, we remove a predictable intraday seasonal component from the computed raw returns by regressing them on a constant and a sequence of dummy variables for each half-hour during the trading period. We then use the unexplained term as our measure of returns.9 We compute returns as differences in log prices using both the last price to the rst price within the same period (close-to-open) and last prices for consecutive periods (close-to-close). The results reported below refer to the close-to-open deseasonalized returns, because we believe it to be an intuitively more appealing measure to compare with network variables (also cleaned of seasonality), which are dened within each sampling period. Having said that, the main results are not affected by the two different ways to compute returns nor by the deseasonalization procedure.10 Volatility is another measure that contains valuable information about the true underlying price process. As mentioned above, because it is computed from observed prices, it suffers from the same noise issues as returns. Moreover, volatility suffers from the fact that unlike prices, volatility is never directly observed. Thus, volatility estimates contain not only the volatility of the noise, but also a possibly nontrivial factor due to covariance between the
order to ensure the robustness of our results, we repeat our analysis at a higher sampling frequency (see our discussion on robustness later in the paper). The main results are unaffected. 7 There is a vast theoretical and empirical literature on the subject. For a recent summary, see, Manganelli (2005). 8 See, for example, Engle (2000). 9 We apply the same technique to all nancial and network variables. 10 We also used a Fourier exible form to remove seasonality. It did not qualitatively change our results.
6 In
true price process and the noise component.11 We use three measures to estimate volatility during each period: absolute returns, squared returns, and the price range. Absolute and squared returns are proxies for the standard deviation and variance of returns, respectively. The price range is dened as the difference between the high and low price (in logs) during the period. For the results reported below, we use the price range as the measure of volatility. Range-based volatility estimators have been shown to be more efcient than return-based volatility estimators, because they incorporate the full sample path of observed prices (to select a maximum and a minimum) rather than just open and close prices.12 Our main results are not affected by the choice of volatility estimator. Intertrade duration contains valuable information, because the estimation of characteristics of the true price process obtained during periods of shorter intertrade duration can be more precise. This would happen irrespective of the reasons for shorter intertrade duration: whether more frequent trading occurs due to more informed trading or more liquidity trading, more frequent sampling would result in greater precision with respect to the true process. Having said that, there is a view that since information is disseminated through trading, the interval of time between trades can be interpreted as a proxy for the arrival of new information to the market.13 We compute duration as the time (in seconds) elapsed between the start and end of the period. We compute three measures of duration: total (unweighted) period duration, volume weighted period duration, and average for 239 intertrade (within period) durations. The results reported below are for total period duration. The main results are unaffected by the way we compute intertrade duration. Trading volume contains valuable information, because volume together with observed transaction prices can be driven by a common latent factor often referred to in the literature as information intensity.14 Intuitively, during periods of higher volume, transaction prices also exhibit greater precision about the characteristics of the true underlying price process. We compute volume as the number of contracts both bought and sold during the observation period.
number of techniques have been developed to estimate volatility components separately by varying the time window. See, for example, Zhang, Mykland, Ait-Sahalia (2005). Application of these techniques to trading networks will be explored in our future research. 12 For the literature on price range as an efcient estimator of asset price volatility, see, for example, Parkinson (1980), Garman and Klass (1980), Beckers (1983), and Brunetti and Lildtholdt (2006). In recent years, the price range has also been used to compute realized volatility in high frequency data. See, for example, Christensen and Podolski (2009). 13 See, for example, Engle and Russell (1998) and Engle (2000). 14 There is a vast theoretical and empirical literature on the subject. See, for example, Clark (1973), Epps and Epps (1976), Tauchen and Pitts (1983), Admati and Peiderer (1988), Easley and OHara (1992),and Andersen (1996).
11 A
transaction prices, some nodes may decide to modify or remove some existing stubs or grow new stubs, thus affecting the network formation process. Empirically, we construct trading networks as follows. At 9:30:00 a.m. EST on August 1, 2008, we start counting transactions in the September 2008 E-mini S&P 500 futures contract. For each transaction, we know which account bought from or sold to which other account (or itself), at what price, and what number of contracts. We designate 240 consecutive transactions as one period. Transactions 1 through 240 mark the rst period, transactions 241-480, mark the second period, and so on. While for each period, we do not observe the limit order book itself, we know that transactions occurred because market orders or limit orders were matched with existing orders in the limit order book. We can then trace the pattern of order execution or a trading network within each period. Even though the number of transactions for each period is the same, a pattern for a large market order executed over the period will look very different compared to a pattern for several smaller limit orders. Metrics that we compute for each network should be interpreted as quantitative measures of the pattern of order execution in the limit order book. We realize that by taking snapshots of the market at equal transaction time intervals, we cannot hope to characterize the whole complexity of changes that take place in the underlying limit order book. Specically, we cannot observe how the revelation of transaction prices translates into modications or cancellations of existing orders and submissions of new orders. Or in terms of the network formation process, we cannot observe how nodes remove some existing stubs and grow new stubs. While we know that the process of trading network formation - stubs, edges, transaction prices, new stubs - goes on continuously, we must designate the number of transactions that add up to a trading network at a point in time. This designated number of transactions could be at times too small and at times too large to clearly capture the impact of order execution on the order book through network analysis within each period. However, as we analyze the time series properties of trading networks, a statistically signicant pattern, if there is one, should emerge. In other words, the approach we take is to compute and analyze network metrics for a time series of consecutive trading networks rather that those for one aggregate network that emerges over the whole period. Given our intutition about how patterns should be related to the dynamics of transaction prices and quantities, we are interested in network metrics that can measure centrality (or how star-shaped a network is); assortativity of connections (or how diamond-shaped a network is); as well as those that can measure reciprocity, triangular connections, and the size of the network. The size of the network can be characterized in terms of the total number of nodes, denoted by N, and the total number of edges, denoted by E. From these two quantities we can also compute the average degree, AV DEG = E/N the average number of nodes that a node is connected to, and the standard deviation of degree, ST DEG the standard deviation around 7
this average. These two variables characterize the rst and second moment, respectively, of the unconditional degree distribution. Node centrality quanties the position of a specic node on a network. There are several node centrality measures, the simplest one being degree, or how many edges a node has. In a directed network, degree can be further separated into indegree and outdegree in accordance with the number of incoming or outgoing edges of a node. However, the degree alone may not necessarily capture the role of a node on the network. For example, a node that has a relatively low degree, but acts as a connector between otherwise disconnected parts of the network, can be thought of as very central. To that end, there are measures of centrality that take into account not just the degree of a node, but its position relative to all other nodes in the network. For example, betweenness measures how many other pairs of nodes would have to go through the given node in order to reach one another in the shortest number of hops. Similarly, closeness measures how many hops away a node is on average from every other node in the network. Figure 2 illustrates different node centrality measures. Node centrality is a critical input into the calculation of network centralization, a measure that characterizes the inequality of connectivity among the nodes. In order to capture this inequality in connectivity within the network whether there are a small number of nodes with high centrality and a large number of nodes with low centrality we compute a centralization measure dened as centralization Gini: n (2r N 1)ki , G = r=1 N E where ki is a nodes centrality measure and r is a nodes rank order number. Taking node is degree as its centrality measure, we use the formula above to compute separate centralization measures for indegree and outdegree incentralization, INCEN, and outcentralization, OUTCEN, respectively. By construction, these measures are 0 if every node has the same number of (incoming or outgoing) edges, and positive with increasing inequality: e.g., one node has all the incoming (outgoing) edges, the others have no incoming (outgoing) edges. We also compute a combined measure of incentralization and outcentralization: CEN = INCEN OUCEN. Intuitively, since we use a nodes degree as a measure of its centrality, the difference between in and out centralization measures can be interpreted as the presence of a dominant buyer or seller. CEN will be equal to 1 if there is a dominant buyer and -1 if there is a dominant seller. To measure wheher a node is both a buyer and a seller, we compute the Pearson correlation coefcient between the indegree and the outdegree of each node, INOUT . A positive
(1)
correlation indicates that nodes with many in edges also have many out edges - i.e., it is both buying and selling. We also calculate statistical properties of nodes one edge away from each individual node or connectivity of node B conditional on it being connected to node A. Assortativity in networks can represent any tendency of like to be connected with like for any node property (see Newman (2002)), but here we will apply it to degree. Large degree nodes (i.e., those with many edges) may connect more frequently to other large degree nodes or they may tend to connect to small degree nodes. Two large degree nodes connecting to a number of small degree nodes between them will result in a diamond-shaped network. One way to measure assortativity is by the Pearson correlation coefcient (ki , k j ) for all edges ei j . When the edges are directed there are four possible assortativity measures: (kiin , kin ), (kiin , kout ), (kiout , kin ), and (kiout , kout ) corresponding to the four conditional dej j j j gree distributions. From these four correlation coefcients, we construct the following compound measure, that we call assortativity index for directed networks: AI = 1 4 (kiin , kin ) + (kiout , kout ) (kiin , kout ) + (kiout , kin ) j j j j , (2)
computed overall all edges ei j . Figure 3 illustrates network assortativity. For example, in the context of trading networks, the coefcient (kiout , kin ) measures the correlation between the number of unique buyers (conj nected by an outward pointing edge) a seller is selling to (denoted by kout ) and the number of j in ). A negative (kout , kin ) would unique sellers those buyers are buying from (denoted by k j i j mean that when a seller has matched to many buyers, those buyers are likely to be transacting with few or no other sellers. We also measure if nodes one edge away from each individual node also form particular (e.g., triangular) patterns. Transitivity, also termed clustering, measures the prevalence of closed triads in the network. In this paper, we use the global clustering coefcient denoted by CC as a measure of transitivity:19 CC =
19 See,
(3)
Newman (2003).
where a connected triple means three nodes ABC such that there is an edge AB and an edge BC.20 , and the prevalence of specic directed triads can be used to conduct a motif analysis on a directed network.21 Finally, in addition to regularities in connections between pairs and triplets of nodes, a network as a whole may be composed of several separate connected components. A connected component is a maximal subset of nodes such that any node can be reached from any other node by traversing edges. Within a strongly connected component any node can be reached from any other by following directed edges. Figure 4 illustrates the largest strongly connected component (LSCC). Once the largest strongly connected component is identied, we can measure the global network structure by computing LSCC, the proportion of the network occupied by this component. Intuitively, the largest strongly connected component can only occupy a signicant portion of the network if many nodes have both incoming and outgoing edges during the same time period, and there are cycles (the simplest of which are reciprocal ties and the triads mentioned above) within the network. In other words, a large strongly connected component is much more likely to emerge as a result of a large number of limit orders than one large market order.
10
index means that large traders are mostly matched with many small traders rather than with each other. Small largest strongly connected component suggests that large traders are not trading with each other: rather they buy from or sell to small traders who quickly trade with another large trader. This pattern is associated with negative returns, as well as with higher volume and volatility. The right column of Figure 5 presents a fairly uniform network with many buyers and sellers of various sizes. This situation is reected in a pattern of connections that exhibits network parameters close to their sample averages with the exception of the number of edges - reecting a larger and more interconnected trading network. The nancial variables estimated from transaction prices - the rate of return and volatility - are very close to their averages. Volume is somewhat above its sample average and period duration is quite high. The examples above provide illustrative evidence in support of our intuitive conjecture. Our next step is to take our conjecture to the dataa times series of over 25000 trading neworksand to prove that that metrics of order execution patterns are statistically related to returns, volatility, volume, and duration.
11
12
positive: AI = 0.09 0.07. This means that on average, when a seller (buyer) is matched with many buyers (sellers) they are just as likely to be transacting with many other sellers (buyers) as with a few or no sellers (buyers). A devitation from this pattern indicates that one buyer or one seller is dominant. The global clustering coefcient or the ratio of oberved triangular connections among nodes to all possible triangular connections is 0.040.03, nearly one standard deviation below the average clustering coefcient for randomized graphs with the same assignments of degrees. In other words, there is no tendency for the traders to cluster together. Similar to the clustering coefcient, the size of the largest strongly connected component, (0.04 0.04), does not deviate from what would be expected for networks of that size, density, and distribution of in and out degrees. But as we will see in the following section, it does strongly correlate with density and other network variables.
a situation when low degree buyers are matched with low degree sellers (several limit orders). Intuitively, in a deep and liquid market like the E-mini S&P 500 futures, an incoming market order has a signicant chance to be executed against several limit orders sitting at (or near) the same tick, resulting in high assortativity and centralization, but very little price impact and, hence, low high-frequency volatility estimate. At the same time, intermediated execution of two large limit orders from both sides of the limit order book will result in a positive high frequency estimate of volatility, if only due to the bid-ask bounce. Duration is positively correlated with the average degree and in-out degree correlation and negatively correlated with the standard deviation degree and the assortativity index. Intuitively, a longer time interval between trades is associated with trades that are distributed more evenly among traders, increasing the average degree and decreasing the standard deviation of degree and the assortativity index. Over longer time intervals, it is also more likely that a node that has a high indegree also has a high outdegree (it has time to be both a buyer and a seller), which results in a positive in-out degree correlation.
B. Granger Causality
We next test for Granger causality in the context of Vector Autoregressive (VAR) models. Since the variables exhibit heteroskedasticity and serial correlation, we estimate VAR models using the generalized method of moments (GMM) and Newey-West robust standard errors. We rst consider a VAR model with eight network variables. According to the Akaike Information Criterion, the system that includes all eight network variables has an optimal laglength of twenty.24 However, the results of the model with eight network variables (available from the authors upon request) show strong evidence of feedback effects among the network variables, i.e., network variables tend to Granger cause each other. In light of this, we use standard tests to reduce the model to four network variables.25 Tables V-IV provide the results (p-values) of Granger-non-causality tests. The last column and the last row of each table are labelled all. In the last column we test whether each variable is Granger-caused by all the other variables in the system, while the last row is testing whether each variable is Granger-causing any other variable in the system. The null hypothesis is that of Granger-non-causality. Therefore, a p-value greater than ve percent indicates a failure to reject the null. Table V presents p-values for the Granger-non-causality test among three sets of four network variables (three panels). Panel 1 shows that centralization (CEN) is Granger-causing the other network variables (p-value = 0.5556), but is not Granger-caused by other network vari24 Throughout 25 Standard
the analysis we use both Akaike and Schwartz Information Criteria. test statistics are available from the authors upon request.
14
ables (p-value = 0.2387). On the other hand, Panels 2 and 3 show that the remaining network variables Granger-cause each other. Next, we test for Granger-causality between one nancial variable and four network variables. Using standard techniques, we select groups of network variables that reect degree properties at the level of a single node (e.g., centralization, standard deviation of degree, and in and out degree correlation), two nodes linked by an edge (assortativity index), connected triples of nodes (clustering coefcient), and the connectivity of the whole network (the proportion of nodes in the largest strongly connected component). Table VI presents p-values for the Granger-non-causality test for the rate of return and network variables. We nd that the return process is both Granger-caused and Granger-causes network variables. The network variable that has a strong impact on returns is centralization. This is in line with the correlation results in Table IV. Table VII reports Granger-non-causality test results for the volatility process and network variables. Similarly to the return process, we nd a feedback effect between volatility and network variables: volatility is both Granger-caused by network variables and Granger-causes them. Table VIII reports Granger-non-causality test results for intertrade duration and network variables. We nd that duration is Granger-caused by network variables (p-value =0.0000), but does not Granger-cause network variables (p-value = 0.1811). Finally, Table IX presents p-values for the Granger-non-causality test for volume and network variables. The results show that volume is Granger-caused by network variables (p-value = 0.0000) but does not Granger-cause network variables (p-value = 0.3662). What are the possible reasons for the presence of feedback effects in Granger causality test results for the rate of return and volatility (vis-a-vis the network variables) and the absence of such effects for volume and duration? We believe that there is one fundamental reason for these empirical ndings: our results for the price-based variables are polluted by noise. Unlike volume, duration, and all the network variables, which we can measure directly, the rate of return and volatility are estimated from transaction prices. As a result, the variables we call the rate of return and volatility are noisy proxies for the unobservable characteristics of the true price process. The level of noise at this very high frequency is so high that it is very hard to effectively measure the interaction between network variables and the true price process.
VI. Robustness
Our results are robust with respect to different markets, different observation periods, different levels of aggregation, and different sampling frequencies. The results we report are for the E15
mini S&P 500 futures for the month of August 2008 (over 6 million transactions). The results remain qualitatively the same when we repeat all procedures for the same market for the month of May 2008 (5.15 million transactions) at the sampling frequency of 240 transactions. The main results also remain the same for the sampling frequency of 600 transactions. Namely, both correlations and Granger-causality results hold. The results are also the same whether we construct networks at the broker level or trading account level. Finally, the results remain the same for other stock index futures markets as conrmed by the analysis of the E-mini Nasdaq 100 (2.3 and 2.8 million transactions in May 2008 and August 2008, respectively) and E-mini Dow Jones futures contracts for both May 2008 and August 2008 (1.8 and 2.4 million transactions in May 2008 and August 2008, respectively) at the sampling frequencies of 240 and 600 transactions.
16
match the remaining quantity against another, more recent previously placed order. Orders are set to expire after a xed amount of time from when they are rst created, at which point they are cancelled and withdrawn from the market. Using the resulting simulated transactions, we construct trading networks using the procedure identical to the one used for the empirically observed data. Namely, we simulate 6 million transactions, segment the data into periods of 240 consecutive transactions and compute network and nancial statistics for each period. Just as in the actual trading, a single order may be reected in multiple transactions in adjacent time windows. This setup allows for a possibility of heterogenous beliefs about the price process, but imparts no intentionality or memory upon the traders. It allows us to discern which features of the trading networks are due to the arrival of information to the market, and which may be due to strategic behavior on behalf of the traders. We nd that a sequence of orders with randomly distributed prices and quantities results in network and nancial variables that are very similar to those obtained from the futures market data, but with the notable (and anticipated) exception of a dynamic structure. Specically, we nd that contemporaneous correlations among the network variables, as well as correlations among network and nancial variables are very similar to those we estimate from the actual market data.28 This conrms that our empirical results do not arise by chance. We also use the agent-based simulation model to investigate possible sources of high correlation between network centralization and returns. By observing the simulation, we nd that high correlation between centralization and returns reects the network mechanics of the information arrival process: a trader submitting a large buy order at a high price will be matched against several existing sell orders, giving that trader a high indegree, and increasing the centralization of the network. At the same time, because a greater number of sell orders was matched, the market-price goes up, yielding a positive rate of return. Moreover, we nd that for the simulated data (but not market data), centralization and other network variables Granger-cause returns, but not vice versa.29 At the same time, we also nd thatas expected in a model with no intentionality or memoryGranger-causality tests among network variables and volatility, volume and duration yield very weak results: feedback effects, lack of signicance or very poor t. This suggests that the Granger-causality results that we nd in the futures markets data arise as a result of the behavior of traders and are not a statistical artifact.
are available from the authors upon request. use the lag-length of order on in the VAR. As expected from the lack of dynamics in the silmulated data, the Akaike information creterion selects a lag-length of order one in the VAR specication and the Schwartz information creterion selects a lag length of order zero.
29 We 28 Results
17
18
References
[1] Allen, Franklin, and Ana Babus, 2008, Networks in Finance, Working Paper 08-07, Wharton Financial Institutions Center, University of Pennsylvania. [2] Andersen, T., Bollerslev, T., Diebold, F.X. and Labys, P., 2000, Great Realizations, Risk, 13, 105-108. [3] Bandi, F. M. and Russell, J. R., 2006, Separating microstructure noise from volatility, Journal of Financial Economics, 79, 655-692. [4] Barndorff-Nielsen, O.E., Hansen, P.A., Lunds, A., and Shephard, N., 2008, Realised kernels in practice: trades and quotes, manuscript. [5] Beckers, S., 1983, Variance of security price returns based onhigh, low and closing prices, Journal of Business 56, 97-112. [6] Braha, Dan, and Bar-Yam, Y., 2006, From Centrality to Temporary Fame: Dynamic Centrality in Complex Networks, Complexity 12(2), 59-63. [7] Brunetti, Celso, and Lildholdt, P.M., 2006, Relative efciency of return- and range-based volatility estimators, manuscript. [8] Clark, P., 1973, A subordinated stochastic process model with nite variance for speculative prices, Econometrica 41, 135-155. [9] Christensen, K., and Podolski, M., 2005, Asymptotic theory of range-based estimation of integrated variance of a continuous semi-martingale, manuscript. [10] Engle, Robert, 2000, The econometrics of ultra-high-frequency data, Econometrica 68, 1-22. [11] Engle, R., and Gallo, G., 2006, A multiple indicators model for volatility using intra-daily data, Journal of Econometrics 131, 3-27. [12] Engle, R., and Russell, J., 1998, Autoregressive conditional duration: A new model for irregularly spaced transaction data, Econometrica 66, 1127-1162. [13] Epps, T. and Epps, M., 1976, The stochastic dependence of security price changes and transaction volumes: Implications for the mixture-of-distribution hypothesis, Econometrica 44, 305-321. [14] Fagiolo, G., 2007, Clustering in complex directed networks, Physical Review E 76(2), 26107. [15] Garman, M. and Klass, M., 1980, On the estimation of security price volatilities from historical data, Journal of Business 53(1), 67-78. [16] Hansen, P. and Lunde, A., 2006, Realized variance and market microstructure noise, Journal of Business and Economic Statistics 24, 127-218. [17] Hasbrouck, Joel, Intraday Price Formation in U.S. Equity Index Markets, 2003, Journal of Finance 58(6), 2375-2400.
19
[18] Hong, Harrison, and Jeremy C. Stein, 1999, A unied theory of underreaction, momentum trading and overreaction in asset markets, Journal of Finance 54, 2143-2184. [19] Kossinets, G. and Watts, D.J., 2006, Empirical Analysis of an Evolving Social Network, Science 311 (5757), 88-90. [20] Milo, R., Itzkovitz, S., Kashtan, N., Levitt, R., Shen-Orr, S., Ayzenshtat, I., Sheffer, M., and U. Alon, 2004, Superfamilies of Evolved and Designed Networks, Science 303, 1538-1542. [21] Newman, M. E. J., 2002, Assortative mixing in networks, Physical Review Letters 89, 208701. [22] Newman, M. E. J., 2003, The structure and function of complex networks, SIAM Review 45, 167. [23] Oomen, R., 2005, Properties of bias-corrected realized variance under alternative sampling schemes, Journal of Financial Econometrics 3, 555-577. [24] Parlour, Christine A. and Duane J. Seppi, 2008, Limit Order Markets: A Survey, Handbook of Financial Intermediation and Banking, Boot, Arnoud W.A., and Anjan V. Thakor, eds., Elsevier B.V., Oxford, UK. [25] Scheinkman, Jose A. and Wei Xiong, 2003, Overcondence and Speculative Bubbles, Journal of Political Economy 111(6), 1183-1219. [26] Tauchen, G. and Pitts, M., 1983, The price variability-volume relationship on speculative markets, Econometrica 51, 485-505. [27] Wilensky, U. (1999). NetLogo. http://ccl.northwestern.edu/netlogo. Center for Connected Learning and Computer-Based Modeling. Northwestern University, Evanston, IL. [28] Zhang, L., Mykland, P. A., and Ait-Sahalia, Y., 2005, A tale of two scales: Determining integrated volatility with noisy high-frequency data, Journal of American Statistical Association, 100, 13941411.
20
21
X Y
X Y
X
indegree
outdegree
betweenness
closeness
Figure 2: Example networks with node X having greater centrality than node Y for the specied measure.
-1 -1 -1 -1 0
1 1 -1 -1 1
-1 -1 1 1 -1
B A C F G
Figure 4: A network containing two connected components, ABCDE and FGH. The largest strongly connected component is BCDE.
22
0.920 2.027 8.134 -0.470 0.353 0.001 0.014 75 0.059 0.059 1104 0
-0.092 3.269 5.960 -0.101 0.559 0.049 0.016 103 -0.019 0.059 1343 12
0.116 2.418 4.949 -0.069 0.195 0.019 0.006 185 0.000 0.039 1191 16
23
Figure 6: A screenshot of the agent based simulation using Netlogo (Wilensky 1999). As orders (denoted by squares) are randomly assigned to traders (denoted by human gures), an edge is drawn between them. When sell orders (black squares) are matched with buy orders (red squares), their quantities are reduced, and there is a direct edge drawn between the traders.
24
Table I: Financial Variables: Summary Statistics Returns Volatility Volume Duration Mean 0.0002 0.0425 1236.6720 19.4941 Median 0.0000 0.0392 1153 14 Maximum 0.2165 0.2165 6645 176 Minimum -0.1378 0.0190 459 0 Std. Dev. 0.0271 0.0140 407.1451 17.4485 Skewness 0.0485 0.8273 2.6259 2.0299 Kurtosis 2.9876 6.4663 16.9377 9.3735 ADF prob 0.0001 0.0000 0.0000 0.0000 AC Lag 1 -0.001 [0.895] 0.187 [0.000] 0.528 [0.000] 0.473 [0.000] AC Lag 5 -0.006 [0.062] 0.167 [0.000] 0.376 [0.000] 0.289 [0.000] AC Lag 10 -0.011 [0.139] 0.151 [0.000] 0.284 [0.000] 0.241 [0.000] ADF prob refers to the p-value of the ADF test for the null of unit root. AC Lag X [Q-test prop] refers to the p-value of the Portmanteau Q-test for no serial correlation at lags X = 1, 5, and 10.
25
Table III: Pairwise correlations between network variables CEN AV DEG SDDEG INOUT AI CC CEN 1.0000 AV DEG -0.0012 1.0000 SDDEG -0.0015 0.0031 1.0000 INOUT -0.0019 0.2367 0.5119 1.0000 AI -0.0008 -0.1079 -0.2022 -0.6787 1.0000 CC -0.0008 0.8074 -0.0248 0.2052 -0.1095 1.0000 LSCC -0.0006 0.5042 0.4070 0.7226 -0.5019 0.4248
LSCC
1.0000
Table IV: Correlations between nancial and network variables Returns Range Volume Duration CEN 0.6774 -0.0076 0.0264 -0.0065 AV DEG -0.0034 0.0415 0.0061 0.1000 SDDEG 0.0037 0.0747 0.2363 -0.1620 INOUT -0.0061 0.0429 0.0853 0.0467 AI 0.0016 0.0635 0.0129 -0.0810 CC -0.0032 0.0314 0.0320 0.0360 LSCC -0.0076 0.0331 0.0884 0.0058
26
Table V: Network Variables: P-values for the Null Hypothesis of Granger Non-causality Panel 1: 20 lags AI CC LSCC 0.2143 0.4227 0.1731 0.0004 0.0000 0.0000 0.0000 0.0002 0.0219 0.0000 0.0000 0.0000 Panel 2: 18 lags AI CC LSCC 0.1601 0.0054 0.0000 0.0643 0.0078 0.0000 0.0000 0.0005 0.0001 0.0000 0.0000 0.0000
Panel 3: 14 lags INOUT AI CC LSCC All INOUT 0.1384 0.0794 0.0000 0.0000 AI 0.0000 0.0016 0.0029 0.0000 CC 0.0475 0.0000 0.0000 0.0000 LSCC 0.0000 0.0000 0.0185 0.0000 All 0.0000 0.0000 0.0000 0.0000 VAR estimated using GMM with HAC robust standard errors. Optimal lag-length (26) is selected using Akaike Information Criterion.
Table VI: Returns and Network Variables: P-values for the Null Hypothesis of Granger Noncausality Returns CEN AI CC LSCC All Returns 0.0148 0.8984 0.3530 0.7630 0.0320 CEN 0.0000 0.4459 0.8080 0.2615 0.0000 AI 0.0306 0.1491 0.0006 0.0000 0.0000 CC 0.0235 0.0632 0.0000 0.0000 0.0000 LSCC 0.1056 0.0826 0.0003 0.0240 0.0002 All 0.0000 0.0132 0.0000 0.0001 0.0000 VAR estimated using GMM with HAC robust standard errors. Optimal lag-length (18) is selected using Akaike Information Criterion.
27
Table VII: Volatility and Network Variables: P-values for the Null Hypothesis of Granger Non-causality Volatility SDDEG AI CC LSCC All Volatility 0.0005 0.2350 0.0000 0.0019 0.0000 SDDEG 0.0000 0.0263 0.0063 0.0000 0.0000 AI 0.0020 0.0000 0.0717 0.0093 0.0000 CC 0.0000 0.0000 0.0000 0.0000 0.0000 LSCC 0.0000 0.0000 0.0003 0.0116 0.0000 All 0.0000 0.0000 0.0000 0.0000 0.0000 VAR estimated using GMM with HAC robust standard errors. Optimal lag-length (18) is selected using Akaike Information Criterion.
Table VIII: Period Duration and Network Variables: P-values for the Null Hypothesis of Granger Non-causality Duration INOUT AI CC LSCC All Duration 0.3328 0.0017 0.0000 0.0000 0.0000 INOUT 0.9526 0.0000 0.0000 0.1215 0.0000 AI 0.3345 0.0000 0.0020 0.0021 0.0000 CC 0.5520 0.0498 0.0000 0.0000 0.0000 LSCC 0.1211 0.0000 0.0000 0.0336 0.0000 All 0.1811 0.0000 0.0000 0.0000 0.0000 VAR estimated using GMM with HAC robust standard errors. Optimal lag-length (15) is selected using Akaike Information Criterion.
Table IX: Volume and Network Variables: P-values for the Null Hypothesis of Granger Noncausality Volume SDDEG AI CC LSCC All Volume 0.0014 0.0012 0.0000 0.0063 0.0000 SDDEG 0.0669 0.1752 0.0053 0.0000 0.0000 AI 0.2008 0.0000 0.0911 0.0166 0.0000 CC 0.3970 0.0000 0.0000 0.0000 0.0000 LSCC 0.4034 0.0000 0.0014 0.0002 0.0000 All 0.3662 0.0000 0.0000 0.0000 0.0000 VAR estimated using GMM with HAC robust standard errors. Optimal lag-length (15) is selected using Akaike Information Criterion.
28