North American Network Operators Group|
Date Prev | Date Next | Date Index | Thread Index | Author Index | Historical
Estimating Transition Probabilities
>>> "sgorman" == <firstname.lastname@example.org> writes: sgorman> Also it might be easier to calculate transition sgorman> probabilities by summing across the rows of the adjaceny sgorman> matrix then dividing the row components by the sum. I'm not sure that that works. That would cause systems not adjacent to i to contribute to the probability that a packet goes from i to j in one step, if I understand correctly. There is also a hidden imbedded Markov chain in my formulation. Firstly, to avoid confusion with terminology, i am using: A_i - adjacency density vector -- the number of systems adjacent to i. C_ij - connectivity matrix -- representing presence or absence of an interconnect between i and j as a 1 or 0. C_ij does not represent a Markov process. P_ij - transition probability matrix -- probability that a packet, now in i will end up in j next. We have, A_i = Sum_j C_ij In making P_ij, i think it is necessary to calculate according to the number of interconnections of j -- how big the neighbour is. This looks right intuitively: if i is a smaller, local isp or an non-isp organization they will tend to have a few upstream providers and maybe some private peering. The upstreams will tend to be more heavily interconnected and peered and their component of A_i will be larger than that of the private peering, or more of your packets will go to an upstream than to your peers. The assumption may or may not be valid; it is difficult to know without actual traffic statistics for a good sample of links on the internet. If we had a close complete data set of traffic statistics for the internet measured at the same instant, it would be possible to calculate the P_ij directly: pps on link connecting i and j P_ij = ------------------------------ total pps leaving i but this data is not available available in general. (aside: I belive that because of the scoping property of Little's Formula this approach would also be valid for analyzing packet flow within an AS where i and j represent each router indexed arbitrarily) Consider C_ij A_j P_ij = lambda_i -------------- Sum_k C_jk A_k Sum_k C_ij C_jk = lambda_i --------------------- Sum_k Sum_l C_jk C_kl where lambda_i is a normalizing factor appropriate to make the sums across each row of P_ij equal to 1. The product of C_ij with A_j is necessary since we don't want to take into account the adjacency density of ASj if we are not connected to it -- it doesn't matter how big it is, if i is not connected to j, a packet has zero chance of making the transition directly. This is taken care of by the fact that C_ij == 0 in that instance. Likewise the right inner product of C_jk with A_k in the denominator is necessary for the same reason, except now two hops out. Non existent links are never taken into account in calculating the transition probabilities. You might say that the numerator is skewed towards the current state, and the denominator is skewed towards the next state. This suggests that there is an implicitly imbedded Markov chain -- the denominator used in calculating the probability of going in one step from i to j takes into account information about what will happen after at the next step when it leaves j. -w -- William Waites <email@example.com> finger firstname.lastname@example.org for PGP keys Idiosyntactix Research Laboratories http://www.irl.styx.org -- NetBSD: ša marche!