High Availability MySQL

Basic Guideline for High Availability MySQL

  1. Replication
  2. DRBD HA

We are discussing DRBD HA here. So first things to go is


Following Steps are required:

  • Preparation of OS :
  • DRBD

Preparation OS :

We need to reserve a huge physical volume which would be later used as a DRBD volume.
Don’t specify any file system type.

fdisk /dev/sda

Should print:

The number of cylinders for this disk is set to 9729.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:

  1.   Software that runs at boot time (e.g., old versions of LILO)
  2.   Booting and partitioning software from other Oss (e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /dev/sda: 80.0 GB, 80026361856 bytes
255 heads, 63 sectors/track, 9729 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 2611 20972826 83 Linux
/dev/sda2 2612 2872 2096482+ 82 Linux swap
/dev/sda3 2873 3003 1052257+ 8e Linux LVM
/dev/sda4 3004 9729 54026595 5 Extended
/dev/sda5 3004 9729 54026563+ 8e Linux LVM

We are going to use /dev/sda5 as a DRBD device.

DRBD Setup

On Server1 and Server2

yum -y install drbd
yum -y install kernel-module-drbd-2.6.9-42.ELsmp
modprobe drbd

DRBD Configuration:
On both servers:

vi /etc/drbd.conf

# please have a a look at the example configuration file in
# /usr/share/doc/drbd/drbd.conf
# Our MySQL share
resource db
protocol C;
incon-degr-cmd “echo ‘!DRBD! pri on incon-degr’ | wall ; sleep 60 ; halt -f”;
startup { wfc-timeout 0; degr-wfc-timeout 120; }
disk { on-io-error detach; } # or panic, …
syncer {
group 1;
rate 6M;

on server1 {
device /dev/drbd1;
disk /dev/sda5;
meta-disk internal;

on server2 {
device /dev/drbd1;
disk /dev/sda5;
meta-disk internal;

Start DRBD

On both servers:
drbdadm adjust db
On server1:
drbdsetup /dev/drbd1 primary –do-what-I-say
service drbd start
On server2:
service drbd start

DRBD Status

On both servers (see status):

service drbd status

On server1:
mkfs -j /dev/drbd1
tune2fs -c -1 -i 0 /dev/drbd1
mkdir /db
mount -o rw /dev/drbd1 /db

On server2:
mkdir /db

Test failover

For manual switchover (This wont be needed as HA will do this for you):

On primary server

umount /db
drbdadm secondary db

On secondary server
drbdadm primary db
service drbd status
mount -o rw /dev/drbd1 /db

This finishes DRBD part of it. You have created a DRBD mount which will be used as a data directory for your MySQL.


  1. We can do an RPM based or a BINARY or a SOURCE compilation.
  2. IMPORTANT:(Crucial for failover) Heartbeat uses either LSB Resource Agents or OCF Resource Agents or Heartbeat Resource Agents to start and stop heartbeat resources. Here, MySQL,DRBD and IP are our heartbeat resources.
  3. Refer this page on Resource Agent
  4. As you are aware of it many *nix services are started using LSB Resource Agents. They are found in /etc/init.d
  5. A service is started/stopped using: /etc/init.d/servicename start/stop/status
  6. We should see to it that we have similar LSB Resource Agent for MySQL.
  7. In, source based installation it will be created in $PREFIX/share directory as mysql.server. $PREFIX is one you give during source compilation.
  8. Fix that script and copy it to /etc/init.d/
  9. In case of RPM based installation you will get LSB Resource Agent in place.
  10. End objective is that, MySQL should be up and running.
  11. Now comes the hurdle.
  12. Stop MySQL.
  13. Move your data directory to a directory on DRBD share.
  14. Later, create a softlink. Check Unix Command : ln
  15. This is how I would have done assuming my initial data directory was /home/mysql/data :


mkdir /db/mysql
NOTE: /db should be mounted to do this
mkdir /db/mysql/data
chown -R mysql /db/mysql/data
chgrp -R mysql /db/mysql/data
mv /home/mysql/data /db/mysql/data
ln -s /db/mysql/data /home/mysql/data


mv /home/mysql/data /tmp
ln -s /db/mysql/data /home/mysql/data

Now, start MySQL on server1. Create some sample database and table. Stop MySQL. Do a manual switchover of DRBD. Start MySQL on server2 and query for that table. It should work. But, this is of no use if you have to switchover manually every time. Now we are heading to HA.



yum -y install gnutls*
yum -y install ipvsadm*
yum -y install heartbeat*


vi /etc/sysctl.conf and set net.ipv4.ip_forward = 1
vi /etc/sysctl.conf
# Controls IP packet forwarding
net.ipv4.ip_forward = 1
/sbin/chkconfig –level 2345 heartbeat on
/sbin/chkconfig –del ldirectord

We need to setup the following conf files on both machines:
a. vi /etc/ha.d/ha.cf

#/etc/ha.d/ha.cf content
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
keepalive 2
deadtime 30
warntime 10
initdead 120
udpport 694 #(If you have multiple HA setup in same network.. use different ports)
bcast eth0 # Linux
auto_failback on #(This will failback to server1 after it comes back)
ping #(Your gateway IP)
apiauth ipfail gid=haclient uid=hacluster
node server1
node server2

On both machines:

b. vi /etc/ha.d/haresources
NOTE: Assuming is virtual IP for your MySQL resource and mysqld is the LSB resource agent.

#/etc/ha.d/haresources content
server1 LVSSyncDaemonSwap::master IPaddr2:: drbddisk::db Filesystem::/dev/drbd1::/db::ext3 mysqld

c. vi /etc/ha.d/authkeys
#/etc/ha.d/authkeys content
auth 2

2 sha1 YourSecretString
Now, make your authkeys secure:
chmod 600 /etc/ha.d/authkeys

On both machines(first on server1):
Stop MySQL.
Make sure MySQL does not start on system init.
For that:
/sbin/chkconfig –level 2345 MySQL off
/etc/init.d/heartbeat start
These commands will give you status about this LVS setup:
/etc/ha.d/resource.d/LVSSyncDaemonSwap master status
ip addr sh
/etc/init.d/heartbeat status
/etc/init.d/mysqld status

Access your HA-MySQL server like:
mysql -h10.10.150.3

Shutdown server1 to see MySQL up on server2.
Start server1 to see MySQL back on server1..

The BLACKHOLE Storage Engine

The BLACKHOLE storage engine acts as a “black hole” that accepts data but throws it away and does not store it. Retrievals always return an empty result:


Query OK, 0 rows affected (0.03 sec)


mysql> INSERT INTO test VALUES(1,’record one’),(2,’record two’);

Query OK, 2 rows affected (0.00 sec)

Records: 2  Duplicates: 0  Warnings: 0


mysql> SELECT * FROM test;

Empty set (0.00 sec).

MySQL Slow Query Log

Start MySQL Server:

/etc/init.d/mysqld start

Syntax for /etc/my.cnf in Red Hat Family Machine :


  • The 2nd line under [mysqld] turns on slow query log and logs all slow queries to log-slow-queries.log in the MySQL data directory.
  • The 1st line indicates that any queries that took more than 1 seconds to execute need to be logged.
  • The last line tells MySQL to log *any* queries that do not use indexes regardless of the setting in the second line.

You must create the file manually and change owners this way:

mkdir /var/log/mysql
touch /var/log/mysql/log-slow-queries.log
chown mysql.mysql -R /var/log/mysql



shell> mysqldumpslow

Reading mysql slow query log from /usr/local/mysql/data/mysqld51-apple-slow.log Count: 1 Time=4.32s (4s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost insert into t2 select * from t1

Count: 3 Time=2.53s (7s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost insert into t2 select * from t1 limit N

Count: 3 Time=2.13s (6s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost insert into t1 select * from t1.

Common Replication Administration Tasks

Common Replication Administration tasks :

1. Checking the Replication Status.

The most common task when managing a replication process is to ensure that replication is taking place and that there has been no errors between the slave and the master.

On Slave Server (

mysql>     Show slave statusG

The key fields from the status report to examine are:

Slave_io_state: indicates the current status of the slave.

Slave_io_running: shows whether the IO thread for the reading the master’s binary log is running
Slave_sql_running: shows whether the SQL thread for the executing events in the relay log is running
Last error: shows the last error registered when processing the relay log. Ideally this should be blank, indicating no errors.

Seconds_behind_master: shows the number of seconds that the slave SQL thread is behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to cope with the large number of statements from the master.

On the MASTER :

On the master, you can check the status of slaves by examining the list of running processes. Slaves execute the Binlog Dump command:

mysql> show full processlist


2. Pausing Replication on the slave.

You can start and stop the replication of the statements on the slave using the commands :

start slave ;
stop slave;

When execution is stopped, the slave does not read the binary log from the master (the IO_THREAD) and stops processing events from the relay log that have not yet been executed the (SQL_THREAD) .

You can pause either the IO or SQL threads individually by specifying the thread type.


Stopping the IO thread will allow the statements in the relay log to be executed up until the point where the relay log has ceased to receive new events.

Using this option can be useful when you want to pause execution to allow the slave to catch up with events from the master, when you want to perform administration on the slave but also ensure you have the latest updates to a specific point.

This method can also be used to pause execution on the slave while you conduct administration on the master while ensuring that there is not a massive backlog of events to be executed when replication is started again.


Stopping the SQL thread can be useful if you want to perform a backup or other task on a slave that only processes events from the master. The IO thread will continue to be read from the master, but not executed, which will make it easier for the slave to catch up when you start slave operations again.

Replication Solutions are :

  1. Using replication for Backups
  2. using replication with different Master and Slave Storage Engines
  3. Using replication for scale-out
  4. Replicating different databases to different slaves.
  5. Improving Replication Performance
  6. Switching Masters during failover.

Some of the important replication slave options for future purpose:

These are not mandatory, these are optional

–log-slave-updates: Normally, a slave does not log to its own binary log any updates that are received from a master server. This option tells the slave to log the updates performed by its SQL thread to its own binary log. For this option to have any effect, the slave must also be started with the -–log-bin option to enable binary logging.

This variable is used when you want to chain replication servers. For example, you might want to set up replication servers using this arrangement:

A -> B -> C

–read-only: Cause the slave to allow no updates except from slave threads or from users having the SUPER privilege. On a slave server, this can be useful to ensure that the slave accepts updates only from its master server and not from clients. This variable does not apply to temporary tables.

Implementation of Master –Master Replication Server Setup:

The advantages of master-master replication over the traditional master-slave replication are that you don’t have to modify your applications to make write accesses only to the master, and that it is easier to provide high-availability because if the master fails, you still have the other master.

Server1.example.com with the IP address to the server server2.example.com with the IP address and vice versa. Each system is the slave of the other master and the master of the other slave at the same time.

Steps for setup Master-Master Replication:

1) Create a user (slave2) for replication to access the MySQL Database on Server1.

On Server1:


On Server2:


2) Now we set up master-master replication in /etc/my.cnf.

The crucial configuration options for master-master replication are auto_increment_increment and auto_increment_offset for auto-increment columns:

  • auto_increment_increment controls the increment between successive AUTO_INCREMENT values.
  • auto_increment_offset determines the starting point for AUTO_INCREMENT column values.
vi /et/my.cnf:
server-id   =  1
# Assume that we have 2 nodes and offset values are even and odd
auto-increment-increment = 2
auto-increment-offset       = 1
#Binary Logging Enabled
log-bin = mysql-bin

After configuring the variables in my.cnf need to restart the server.

On Server2, assuming

Vi /etc/my.cnf:

server-id   =  2
# Assuming that we have 2 nodes, and offset value start from 2 i.e.  even
auto-increment-increment = 2
auto-increment-offset       = 2
#Binary Logging Enabled
log-bin = mysql-bin

After configuring the variables in my.cnf need to restart the server.

On Server 1:

3) Assuming that both the servers have the same database copy, otherwise take the backup and restore.

Before taking the dump flush the tables with read lock and check the bin log name and position:

flush tables with read lock;

show master status;

Take the db snapshot on server1 and then restore it on server2.

mysqldump –u user –p –all-databases > backup.sql;

Unlock tables;

On Server2:

flush tables with read lock;

Show master status;

Unlock tables;

And finally, run the command to make server2 as slave to server1

CHANGE MASTER TO MASTER_HOST='', MASTER_USER='slave2',  MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin', MASTER_LOG_POS=<em>position</em>;

And then start the slave

Start slave;

Then check the slave status with the following command:

Show slave status;

It is important that both slave_io_running and slave_sql_running have the value yes.

Now the replication from server1 to server2 is set up.

Next we should configure the replication from server2 to server1.

To do this, we stop the slave on server1 and make it a slave of server2:

On Server 1:

Stop slave;

Run the following command:

CHANGE MASTER TO MASTER_HOST='', MASTER_USER='slave1',  MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin', MASTER_LOG_POS=<em>position</em>;

Start the slave:

Start slave;

Check the slave status:

Show slave statusG;

It is important that both slave_io_running and slave_sql_running have the value yes..

MySQL Replication

Implementing Master-Slave Replication:

Master-Slave replication is mainly for scale-out solutions- spreading the load among the multiple slaves to improve the performance.

And for Analytics – while the analysis of the information can take place on the slave without affecting the performance of the master.

The target uses for Replication in MySQL include:

  1. Scale-out solutions: spreading the load among multiple slaves to improve performance. In this environment, all writes and updates must take place on the master server. Reads, however, may take place on one or more slaves
  2. Data Security: Because data is replicated to the slave, and the slave can pause the replication process, it is possible to run backup services on the slave without corrupting the corresponding master data.
  3. Analytics: while the analysis of the information can take place on the slave without affecting the performance of the master.

Replication Configuration:

Replication between servers in MySQL works through binary logging mechanism.

Each slave must connect to the master using a standard mysql username and password, so there must be a user account on master that the slave can use to connect, and the user needs only REPLICATION SLAVE privilege.

Assuming that username is slave, password is slave123 and Master host name is, Slave host name is

On Master Server: (

1. Create a user for replication and grant REPLICATION SLAVE privilege.


Note: We can create a different user for each slave or we can use the same user for each slave that needs to connect and the user we want to use for replication has REPLICATION SLAVE privilege.

2. Setting the Replication Master Configuration.

Note: For replication to work you must enable binary logging on the master. If binary logging is not enabled, replication will not be possible as it is the binary log that is used to exchange data between the master and slaves.

Edit the configuration file on Master Server:  /etc/my.cnf

log-bin    = mysql-bin

Each server within a replication group must be configured with a unique server-id value.  Server-id is used to identify individual servers within the group.

And server-id must be an integer between 1 and (232)–1.

server-id = 1 # selection of these integers are entirely up to you

3. Setting the Replication Slave Configuration:

On Slave Server: (

The only option we must configure on slave is to set the unique server-id.


Note: If you do not specify a server-id value, it defaults to 0.  If you omit server-id (or set it explicitly to 0), a master refuses connections from all slaves, and a slave refuses to connect to a master. Thus, omitting server-id is good only for backup with a binary log.

We do not have to enable binary logging on the slave for replication to be enabled.

If we enable the binary logging on slave , we can user binary log for data backup and crash recovery on the slave. And also we can use the slave as  part of a more complex replication topology,  where the slave acts as a Master to other slaves.

log-bin= mysql-bin

4. Getting the replication master information:

  1. Start the command-line and flush all tables and block write statements by executing  the statement :

flush tables with read lock;

  1. Use the following MySQL statement to determine the current binary log file name and position.

show master status;

Ex: mysql-bin.0001 and position is 98

Note: This represents the replication coordinates at which the slave should begin processing new updates from the master.

On Master Server (

5. Take the database snapshot from the master with the tool mysqldump or whatever.

–         If you haven’t already locked the tables on the server to prevent statements that update data from executing start by executing

flush tables with read lock ;

–         In another session , use mysqldump to create a dump of all –databases or selected databases to replicate.

mysqldump –u root –p –all-databases –lock-all-tables  > db_dump.sql

Another alternative is use with the –master-data option, which automatically appends the change master to statement required on the slave to start the eplication process.

mysqldump –u root –p –all-databases  –master-data –lock-all-tables  > db_dump.sql

6. Restore the backup on slave server. (

mysql –u root –p < db_dump.sql    (OR)

On Master Server (

You will need either to copy the dump file to the slave, or to use the file from the master when connecting remotely to the slave to import the data.

mysql -h < db_dump.sql

7.  After restoring the db dump start the slave :


8.  After that, start the slave server.

Start slave;

9. Check the replication status with the following command

Show slave status\G

Note: To check the slave status on replication server, the user requires either the SUPER or REPLICATION CLIENT privilege.

Note: Check both the Slave_IO_Running & Slave_SQL_Running should be YES.


MySQL Performance

MySQL Performance: Replication, Partition and Memcached

3 main step to increase mysql performance:

  1. Split Read and Write
  2. Horizontally partition
  3. Implement Memchache
  4. Setup Replication
  • You can spread your reads with replication, and that helps a lot, but you can’t spread writes (they have to process on all machines) and they’ll eventually consume all your resources. You’ll find yourself adding replicated slaves at an ever-increasing rate to make up for the diminishing returns each additional slave provides.
  • The next logical step is to horizontally partition your dataset onto different master/slave clusters so you can spread your writes, and then teach your application to connect to the correct cluster depending on the data it needs.
  • With memcached you can reduce your database reads to a mere fraction, leaving the databases to mainly do infrequent writes, and end up getting much more bang for your buck, since your databases won’t be blocking themselves doing ACID bookkeeping or waiting on writing threads.

How to setup Memcached server

Few Imp Links for detailed Study: http://www.danga.com/memcached/

Installing memcached

yum install memcached
yum install php5-memcache

Configure and Setting MemCache

  • After installing these two packages next step is to configure memcached.conf file. ( /etc/memcached.conf )
  • After that edit the memecached.ini file. ( /etc/httpd/conf.d/memcache.ini )

A sample memcache.ini file

; uncomment the next line to enable the module
session.save_handler = memcache
session.save_path = "tcp://,tcp://"

Camand Line Memcache

memcached -d   ==&gt; Start memcache as a background process
memcached -vv ==&gt; Start memcached as a foreground process

Another method for checking whether memcached was enamble or not is to check the phpinfo() function

php MemChache Conf

php MemChache Conf


MySQL Step-by-Step Installations

If you are on Fedora or Red Hat Box

$ setenforce 0 #disbles selinux temporarily

$ yum install mysql

If you have a rpm then

$ rpm -i <strong><strong><code>/mysqlSRC_RPM_Folder/MySQL*

You can download rpm’s from http://www.mysql.org

$ restorecon -R /var/lib/mysql/ #restore selinux labels
$ setenforce 1 #enable selinux again

Now, I successfully installed MySQL. Now I wanted to change the location of where the actual data would be stored, I tried configuring the my.cnf file and using symlinks. I found using the symlinks to be easier and more compatible.

My data directory is /home/mysql.

The following is what I did to change the data directory after using RPM to install.

1. stop mysql server
2. mkdir /home/mysql
3. mv /var/lib/mysql/* /home/mysql/
4. rmdir /var/lib/mysql
5. ln -s /home/mysql /var/lib/mysql

Instead of using symlinks,we can just modify the my.cnf file (/etc/my.cnf)  by including “datadir = /home/mysql”

6. chown -R mysql:mysql /home/mysql
7. start mysql server again


MySQL’s FEDERATED storage engine

One of the most exciting features introduced in MySQL 5 is the federated engine. The ability to access data from a remote server without the constraints of replication tickles every programmer’s fancy. The Federated engine allows a DBA to create logical pointers to tables that exist on other MySQL servers and thereby link together separate data islands to form one or more logical databases. The Federated storage engine of MySQL is extremely easy to use and set up, and can quickly turn into a DBA’s best friend if they have to answer customer demands to correlate data that exists on several different physical servers.

Let’s take a quick walk :

  1. MySQL Version: We can validate that our installation has Federated by issuing a simple SHOW ENGINES command from the mysql client program. The FEDERATED storage engine is available beginning with MySQL 5.0.3. It is a storage engine that accesses data in tables of remote databases rather than in local tables.
  2. File Structure: When we create a FEDERATED table, the server creates a table format file in the database directory. The file begins with the table name and has an .frm extension. No other files are created, because the actual data is in a remote table. With the MySQL FEDERATED storage engine, there are no local data files for a table (for example, there is no .MYD file).
  3. Data in Remote Server: A Federated table acts as a pointer to an actual table object that exists on the same or another server. Once this link/pointer has been established, we can perform whatever operations we would like on the remote object (inserts, updates, deletes, reads, etc.), as long as we have been given the privileges to do so. The local server connects to a remote server, and uses the MySQL client API to read, delete, update, and insert data in the remote table.
  4. Capability: Understand that our capabilities on the remote object are restricted to the underlying engine that is serving as the source of the Federated table. For example, if we create a Federated table that is pointing to a MyISAM table on another server, we do not have transaction (commit/rollback) capability. Likewise, a Federated pointer to an Archive engine table would not allow us to update or selectively delete data as Archive tables allow reads and inserts only.
  5. Communication: The local server communicates with the remote server using MySQL client C API functions. It invokes mysql_real_query() to send the statement. To read a result set, it uses mysql_store_result() and fetches rows one at a time using mysql_fetch_row().

How to Use:

Normally, we have two mysql servers running, either both on the same host or on different hosts.

On the remote server, we have:

CREATE TABLE `city_remote` (
  `city_id` int(11) NOT NULL auto_increment,
  `name` char(35) NOT NULL default '',
  `country_code` char(3) NOT NULL default '',
  `district` char(20) NOT NULL default '',
  `population` int(11) NOT NULL default '0',
  PRIMARY KEY  (`city_id`)

To use that table in our local server, enter the description:

CREATE TABLE `city_local` (
  `city_id` int(11) NOT NULL auto_increment,
  `name` char(35) NOT NULL default '',
  `country_code` char(3) NOT NULL default '',
  `district` char(20) NOT NULL default '',
  `population` int(11) NOT NULL default '0',
  PRIMARY KEY  (`city_id`)

Here, user and pass are valid credentials to access the table city_remote in the database world on server remote.com. The structure of these tables must be exactly the same. With that done, we can query your federated table as if it were in our local server. Issue a query and get a record set.

select * from city_local where city_id = 1;

| city_id  | name   | country_code   | district | population   |
|  1       | Noida  | IN             | Noida    |    3000000   |

There are a few limitations concerning federated engine usage, namely:

  • The remote table must exist when you create your local one.
  • DDL operations are not supported through the Federated engine (ALTER TABLE, etc.).
  • We can’t issue ALTER TABLE commands on a federated table.
  • The federated table is not aware of any structural changes that may occur in the remote one. You may get an error at runtime. The structure definitions of both the source and federated object must stay identical.
  • Transactions are not supported (should be in version 5.1).
  • Query cache support is not enabled.
  • Any DROP TABLE statement issued against a FEDERATED table drops only the local table, not the remote table.
  • FEDERATED tables do not work with the query cache.


FullText Search Solutions

Types of FullText Search Solutions

  • Special Database Features
    • MySQL Full Text Search, Sienna
    • Solutions exists for PostgreSQL, Oracle and many others
  • Home baked database based solutions
    • Using stored procedures or set of queries and keyword tables for search
  • External Full Text Search Solutions
    • Lucene, Sphinx, Mnogosearch etc.

My Thoughts on Performance

  • Lucene and Sphinx both can handle large data sizes
  • Lucene has more features, dynamic updates
  • For the lamp developer use the Zend Framework’s Lucene Search which is based on Lucene but it will not work on php 4.
  • speed of fulltext search in lucene is much faster as compared to mysql
  • lucene is much more complex to use as compared to mysql.
  • lucene does not allow us to modify a document. Modifying a document is equivalent to deleting the current document and adding the modified document to the index.
  • lucene requires an object of the index to perform the search. We will know about it when we use the api. Whenever we add a new document to the index, a new object of the index has to be created to include the new document in the index. But creation of a new object is not a major overhead. Though it does slow down the searching process to some extent.
  • MySQL FullText Search is good for Small (relatively) data sizes

MySQL FullText Search and Updates

  • Simple: In mysql, we can simply mark an index on a text/varchar column as fulltext, and our work is done. All we need to do next is to fire MATCH AGAINST queries. Adding and modification of indexes is handled by mysql internally as and when new data is added.
  • MyISAM FullText Search is based on BTREE
    • Special form of BTREE index
  • Each word is index entry
  • Updating text with 1000 words – 1000 key entries needs to be updated
    • A lot of random IO if index is not in memory
  • Index degradation by fragmentation
    • Run OPTIMIZE TABLE for best performance
  • Minimum Length: With mysql we have the minimum length of word to be indexed which is by default 4. So all words which have less than 4 characters will not be indexed. What will we do if we want to index words like “php”, “asp”, “c”? We will have to decrease the minimum length from 4 to 1. And this will increase the index size drastically and slow down all our searches as a consequence. There are no such issues in lucene.


Lucene is a free/open source information retrieval library, originally implemented in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene has been ported to programming languages including Delphi, Perl, C#, C++, Python, Ruby and PHP.

  • Popular full text search Library written in Java
    • http://lucene.apache.org/
    • Clucene – C port exists, but is not current
    • Is not specially tailored for indexing databases
    • Some coding is needed for integration
  • Dynamic index changes possible
  • Very Advanced query language
    • Wildcard searches:
    • Search in Fields: title:”The Right Way” AND text:go
    • Proximity Searches, Fuzzy Searches etc
  • Supports attributes (indexed and non indexed)
  • Some CJK Support
  • Easily integrates with Java

Sphinx Search

  • Designed for indexing Database content
  • Focuses
    • High performance
    • Search Quality
    • Ease of use
  • Supports multi-node clustering out of box
  • Support Multiple Attributes
  • Different sort modes (relevance, data etc)
  • Supports trend queries
  • Support for snippets
  • Client available as MySQL Storage Engine plugin
  • Number of limitations
  • no partial word searches, hard to update indexes


  • Very fast for medium size data sizes
  • Only boolean searches available