NDB Queries

1401665940000 » Tagged as: GAE , NDB ,

This blog now runs on Google App Engine with the NDB serving as the storage backend. In the process of writing the (python) code I ran into some complications regarding how it uses indexes. First we will consider a query on one of the wordpress tables in a mysql database.

explain select * from wp_posts where post_type='post' and comment_count = 0;

explain select * from wp_posts where comment_count = 0 and post_type='post';

The comments_count column does not have an index so mysql query optimizer is smart enough to figure out that the 'type_status_date' index is the right one to use for both queries. 


| id | select_type | table    | type | possible_keys    | key              | key_len | ref   | rows | Extra                              |
|  1 | SIMPLE      | wp_posts | ref  | type_status_date | type_status_date | 62      | const |  974 | Using index condition; Using where |
1 row in set (0.38 sec)


Now what if you tried to do a similar query on NDB? You would need indexes on both columns to begin with and you would also find that NDB expect two different indexes for these queries. One would take the form Index(comment_count, post_type) while the other would be Index(post_type, comment_count). Welome to the world of noSQL where no one really cares too much about storage. But if you are a GAE user you should care because you are paying for it as well as the read and write operations to the data store. More indexes you have higher the write cost. Unless you plan your queries judiciously you will find that they soon baloon out of hand but you are limited to 200 indexes max!

comments powered by Disqus