• Mysql Sphinx Storage Engine

    I have blogged about Sphinx in that past, but that was about CMU Sphinx, a Speech Recognition Engine, this Sphinx is different. This sphinx is a Search Engine. Pretty good one too. The Sphinx search daemon has a mysql compatible API, which means you can use your trusty old mysql console or query browser to do a search. You can even use the PHP mysql API if you wanted to, but there is no need to do that because PHP has support for the default Sphinx protocol. However if your preferred programming language didn’t include support for Sphinx you could still use the mysql compatible mode.

    The sphinx index isn’t really a mySQL database, so you cannot join it to your other tables. That’s where the Sphinx storage engine comes into the picture. It allows you to create a table in your existing databases to represent the Sphinx index. That would allow you to actually use the sphinx data in a join or sub query. You might ask why goto all this trouble? Why not just use full text? Full text search has it’s limitations, most notable being how slow it is when you have large table. A full text search on a table with 8 million rows (which is about as much as we have on Pitupasa.com at the moment) will take 10-15 seconds to complete. Sphinx on the other hand returns the results in a fraction of a second.

    At first glance, it seems that enabling the sphinx storage engine is kidstuff. Just download the sphinx and mysql source tarballs, copy a set of files from sphinx into the mysql source tree, do a BUILD/autorun.sh followed by ./configure –with-plugins=sphinx  (see full instructions at http://www.sphinxsearch.com/docs/current.html#sphinxse-mysql50 ). Except that it doesn’t work.

    configure error: unknown plugin: sphinx

    Yes, I know what I am doing. No there weren’t any typos. I am not the first one to run into it. The others who ran into this problem don’t seem to have found the solution either. Then I ran into another set of instructions at http://www.sphinxsearch.com/wiki/doku.php?id=sphinx_sphinxse_on_rhel there the instruction is to use ‘-with-plugin’ but shouldn’t that be ‘–with-plugins’ ? since I have already tried ‘–with-plugins’ I tried ‘–with-plugin’ (note the s). I tried ‘–with-plugin’ and the compiler didn’t complain about the missing sphinx plugin as above. But still the sources in the sphinx folder were not getting built. By the way, ‘configure –help’ shows the correct parameter to be –with-plugins. But then none of these things will result in the sphinx storage engine being built.

    The docs refer to a prepatched mysql source tarball, unfortunatley it appears to have been taken off line. There is a patch for mysql 5.0.x but I don’t want to use that because I want to be able to partition the table and indexes (a feature that is not available in mysql 5.0.x). Oh well looks like it’s time to look at mnogosearch.

    Sunday, March 14th, 2010 at 11:52
  • Centos 5.4 , PHP 5.3 and Harvard Referencing.

    A couple of days back, we did an update to Deadlinedue, the Harvard reference generator, the moment the database was updated and the new code was put in place, PHP started segfaulting. It was time to decide whether to roll back or press forward. I chose the latter and it resulted in the site been offline for around an hour.

    It is not possible to assign complex types to nodes in /var/www/deadline/ISBN.php on line 480, referer: http://deadlinedue.com/index.php?lookf
    or=http%3A%2F%2Fraditha.com&find=find
    *** glibc detected *** /usr/sbin/httpd: double free or corruption (fasttop): 0×81cdfc78 ***

    PHP is pretty old now, it’s not something that you expect to segfault, so initially I thought the culprit would be APC – Advanced PHP Cache. These PHP accelerators or caches are known to crash every once in a while. That can be easily fixed by removing the accelerator. In this case we could afford to do so since we had upgraded the server very recently and optimized the code which resulted in a speed boost and reduced memory usage. I was barking up the wrong tree,  disabling APC didn’t do any good. So guess it’s time to upgrade PHP?

    Now you might ask, shouldn’t we have used the same version of PHP in our development and production servers to ensure that this sort of thing didn’t happen? Right you are but who expects PHP of all things to crash like this. It’s really rediculous that deadlinedue is hosted in the cloud. So we could easily have made a snapshot of it and started another server in less than 5 minute. We could have tested with that guinea pig and then gone live, but no, I was too cocky.

    This is Centos 5.4 and there are no RPMs available for PHP 5.3.1, So it was time to compile from scratch. Which usually means you need to run ./configure about half a dozen times, fixing each of the missing dependencies it reports until it completes without error. Fortunately there is plenty of bandwidth on Amazon EC2 and the servers are blazing fast. Most deps can be installed in less time than it takes to type out the yum install command.

    After all this the problem still wasn’t solved, a lot of modules we needed like JSON , DOM and heck even Mysqli were not getting added!

    PHP Warning:  PHP Startup: apc: Unable to initialize module\nModule compiled with module API=20050922\nPHP    compiled with module API=20090626\nThese optio
    ns need to match\n in Unknown on line 0
    PHP Warning:  PHP Startup: dbase: Unable to initialize module\nModule compiled with module API=20050922\nPHP    compiled with module API=20090626\nThese opt
    ions need to match\n in Unknown on line 0
    PHP Warning:  PHP Startup: dom: Unable to initialize module\nModule compiled with module API=20050922\nPHP    compiled with module API=20090626\nThese optio
    ns need to match\n in Unknown on line 0
    PHP Warning:  PHP Startup: json: Unable to initialize module\nModule compiled with module API=20050922\nPHP    compiled with module API=20090626\nThese opti
    ons need to match\n in Unknown on line 0

    Given enough time, I can track down their causes and fix these errors, but time was exactly what I was not having on my hands. So tried looking around to find out if there are any RPMs available from third party repositories that would update PHP to version 5.3.1. Fortunately there was.  The repo is called Web Tactic. Update using the RPMs took just seconds and good news, no more segmentation faults. The bad news is that APC is no longer available.  I tried  I tried ‘yum install php-pecl-apc’  without success. Then tried ‘pecl install apc’ and got the following error:

    /tmp/pear/temp/APC/php_apc.c: In function ‘zif_apc_compile_file’:
    /tmp/pear/temp/APC/php_apc.c:881: warning: unused variable ‘eg_class_table’
    /tmp/pear/temp/APC/php_apc.c:881: warning: unused variable ‘eg_function_table’
    /tmp/pear/temp/APC/php_apc.c: At top level:
    /tmp/pear/temp/APC/php_apc.c:959: error: duplicate ’static’
    make: *** [php_apc.lo] Error 1
    ERROR: `make’ failed

    Oh well, can live without APC for the moment.

    Monday, March 1st, 2010 at 17:15
  • Bug in Twitter list get members?

    If you add yourself to a list and then call the get list members method you might be surprised by the result.

    Add user to a list

    One of the data items returned by the get list id method includes the number of members in your list. The get list members method returns 20 members at a time and you need to call the it multiple times using different cursor locations to retrieve the complete membership. When you count them and compare against the number returned by the get list id method, you will find that they do not match. Your own account is not part of the dataset that is returned.

    the XML from the twitter api call

    By the time you are reading this post, the list might have changed. So I have saved a copy of the XML here. On the other hand when you access the list through twitter.com, you can see that I am included in my own list. In fact my screen name (e4c5) appears twice!

    the twitter list

    The work around I suppose is to not rely on the members count returned by the get list id method.

    Sunday, February 7th, 2010 at 07:02
  • Fedora 11 on amazon ec2

    Amazon has been offering cloud hosting services for longer than most unfortuntely they don’t seem to have updated their Amazon Machine Images since they started operation. Well, it’s not exactly accurate, they have reasonably upto date images for some linux distributions and for windows. However many of the linux images are actually owned by third parties and not by amazon itself. As for my favourite distribution Fedora, what’s available is Fedora 8! After reading the discussion at http://developer.amazonwebservices.com/connect/message.jspa?messageID=141707 I thought to try my hand at creating my own Fedora 11 or 12 AMI.

    Unfortunately I didn’t meet with a lot of success the Fedora 11 AMI refused to boot but trouble started even before, first with yum reported numerous depsolving errors. I tried to sort them out by using the –skip-broken flag to no avail.


    Total size: 252 M
    Is this ok [y/N]: y
    Downloading Packages:
    Running rpm_check_debug
    ERROR with rpm_check_debug vs depsolve:
    libcrypto.so.7 is needed by (installed) httpd-tools-2.2.14-1.fc10.i386
    libssl.so.7 is needed by (installed) httpd-tools-2.2.14-1.fc10.i386
    Complete!
    (1, [u'Please report this error in http://yum.baseurl.org/report'])

    I worked around this by uninstalling the complaining RPM (yum remove httpd-tools-2.2.14-1.fc10.i386) and putting it back after the upgrade had completed. Nevertheless the machine reused to boot up. I tried to do this as a two step process, stopping at F10, booting up that image and then upgrading to F11. Unfortunately that didn’t quite work out either. The terminal displayed by the Amazon Management Console remains blank without anything ever showing up on it. Then I tried changing the RAM Disk and Kernel,  tried  using aki-20c12649 as the kernel and ari-21c12648 as the ramdisk with the same result.

    In the end I gave up on this temporarily and used a third party Centos 5.4 image (uploaded by RightScale) for the task at hand. Going to come back to this problem later. Want to try installing a new image on the loopback device.

    Monday, February 1st, 2010 at 12:44
TOP