Recall that a BGP routing table contains a list of entries containing the prefix the entry concerns and the associated AS-path. Intuitively, the prefix belongs to the last AS in the path. We call this AS the destination AS of the prefix.
This simple way of mapping prefixes to Autonomous Systems works fine as long as we deal only with complete AS-paths. Unfortunately, our experience with the RouteViews BGP data shows that not all AS-paths have these features. For example, there are aggregated route specifications that contain non-empty AS-sets in their AS-paths. Since AS-sets are unordered, we cannot determine the destination ASes for the associated prefixes. Worse yet, as these prefixes are compressed, they comprise smaller prefixes, each having possibly a different destination AS, which we cannot determine either. Despite being defined in the BGP standard, aggregated routes are not many: they constitute below 1% of the RouteViews BGP data.
Our solution to this problem is to consider the compressed prefix as a whole, and treat the last unaggregated AS in the AS-path (the last one in the AS-sequence) as the destination AS of the prefix. Since this AS is the farthest AS that is guaranteed to be passed by the route, no matter how the route is split, it is also the best possible approximation of the real destination AS.
Another problem is posed by incomplete routes, that is routes for which the ``origin'' mark indicates an unknown information source. They constitute about 12% of the RouteViews data. As was said in Section 4.2.3, an incomplete route is a consequence of switching to another routing protocol at some point in the route. Since we have no other way of establishing the end of the route, we decided to treat the last available AS in a AS-path as the destination AS of the corresponding prefix.
Finally, although the RouteViews BGP data set is extremely large, some valid IP addresses may not be referenced there. We quantified them based on the list of clients that have visited the VU Web server. The number of clients for which we were unable to determine any destination AS turned out to be negligible: less than 0.5%. In other words, we can establish home ASes for 99.5% of clients of a typical Web site, even though these ASes are not always the true locations of the clients (due to the two irregularities described above).