I think you are looking at a very different use case here. The systems that I think you are referring to analyze a static graph representation. The Graph500 benchmark in particular loads one big static, unlabeled, undirected, property-free graph and then runs extensive (BFS) analysis algorithms on it. The fact that the graph is not changing allows significant investment into building locality optimizing data structures (which is essentially what space decomposition is all about).
Titan on the other hand is a transactional database system to handle large, multi-relational (labeled) graphs with heterogeneous properties. A Titan graph is constantly evolving (as in the posted benchmark).
For graphs (unlike geo-spatial domains), applying space decomposition techniques first requires a metric space embedding which is a non-trivial and computationally expensive process. For changing graphs, this embedding will change as well making this very difficult to use in practice. The best approaches I know of for achieving locality therefore use adaptive graph partitioning techniques instead. However, for the types of OLTP workloads that Titan is optimized for, this would be overkill in the sense that the time spend on partitioning will likely exceed the time saved at runtime. At very large scale, it is most important for OLTP systems to focus on access path optimization based on the ACTUAL query load experienced by the system and not some perceived sense of locality based on connectedness. I published a paper a while ago suggesting one approach to do so:
http://www.knowledgefrominformation.com/2010/08/01/cosi-clou...
The Graph500 benchmark explicitly prohibits this optimization ("The first kernel constructs an undirected graph in a format usable by all subsequent kernels. No subsequent modifications are permitted to benefit specific kernels").
I was using Graph500 as a decently documented public example more than the only example. There are other problems based on real-world data in the trillion edge range that serve as "hello world" models for testing massively parallel graph algorithms. Directed and undirected, cyclic, and acyclic, properties and property-free. Semantic databases and entity analytics are popular test cases.
In the specific case of Graph500, the graph is significantly cyclic which creates coordination issues if you simply denormalize the data (e.g. replicating edges around a graph cut). Being able to do a massively parallel BFS from any vertex in the graph and producing the globally correct result without replicating edges means that you cannot know how to optimize the organization ahead of time. This was an intentional part of the benchmark design. The Graph 500 does not lend itself to optimizing for a particular set of adaptive graph cuts in any conventional sense; the algorithms used need to be general over the 64 randomized runs and that benchmark was designed to favor non-replicated edges when using massively parallel systems (the coordination costs of edge replicas will kill your performance). However, obviously the massively parallel systems are partitioning up the graph in some sense.
In the specific case of the work I did a couple years ago, the systems can ingest tens of millions of new edges per second concurrent with parallel queries (not serializable multi-statement transactions, obviously). The ingest structure can be identical to the structure against which ad hoc queries are run without any kind of query optimization. The fact that ingest rates that high are sustained effectively precludes dynamically reorganizing the data to satisfy particular queries more optimally. In truth, it could be made more optimal for batch-y type workloads (maybe 2-3x faster versus the dynamic version?) but the point was to be able to throw massive amounts of hardware at arbitrary graph data models rather than optimizing it for a specific query.
BTW, metric space embedding is non-trivial algorithmically but can also be computationally inexpensive. The Macbook Air I am using now can do tens of millions of those embedding operations per second on a single core for moderately complex spaces and data models. Maybe an order of magnitude or two slower if dealing with complex, non-Euclidean geometries. However, I also spent a couple years of computer science R&D developing the algorithms to make that fast. :-) I have been working in this particular area for a bit over half a decade now so my perspective takes some things for granted I think. There isn't just one problem you have to solve, there are actually several if you are starting from scratch.
Like I said, I didn't want to take anything away from Titan and true OLTP-oriented systems have their own complex problems, not the least of which is that they don't scale too far beyond a couple hundred compute nodes for the current state-of-the-art. Not my specialty. I work in a world of more basic consistency guarantees.
Titan on the other hand is a transactional database system to handle large, multi-relational (labeled) graphs with heterogeneous properties. A Titan graph is constantly evolving (as in the posted benchmark). For graphs (unlike geo-spatial domains), applying space decomposition techniques first requires a metric space embedding which is a non-trivial and computationally expensive process. For changing graphs, this embedding will change as well making this very difficult to use in practice. The best approaches I know of for achieving locality therefore use adaptive graph partitioning techniques instead. However, for the types of OLTP workloads that Titan is optimized for, this would be overkill in the sense that the time spend on partitioning will likely exceed the time saved at runtime. At very large scale, it is most important for OLTP systems to focus on access path optimization based on the ACTUAL query load experienced by the system and not some perceived sense of locality based on connectedness. I published a paper a while ago suggesting one approach to do so: http://www.knowledgefrominformation.com/2010/08/01/cosi-clou... The Graph500 benchmark explicitly prohibits this optimization ("The first kernel constructs an undirected graph in a format usable by all subsequent kernels. No subsequent modifications are permitted to benefit specific kernels").