Fedora 14: Wireless network is not working

Wireless Network is not working in Fedora 14 by default, here is the solution
su –
yum install b43-fwcutter wget

Next, download the Windows driver:
wget downloads.openwrt.org/sources/broadcom-wl-4.150.10.5.tar.bz2

Next decompress, extract and copy firmware to /lib/firmware directory.

tar xjf broadcom-wl-4.150.10.5.tar.bz2
cd broadcom-wl-4.150.10.5/driver
b43-fwcutter -w /lib/firmware/ wl_apsta_mimo.o .

REST Web Service & SOAP

Short Comparison with REST and SOAP

What is a REST Web Service

The acronym REST stands for Representational State Transfer, this basically means that each unique URL is a representation of some object. You can get the contents of that object using an HTTP GET, to delete it, you then might use a POST, PUT, or DELETE to modify the object (in practice most of the services use a POST for this).

Who’s using REST?

All of Yahoo’s web services use REST, including Flickr, del.icio.us API uses it, pubsub, bloglines, technorati, and both eBay, and Amazon have web services for both REST and SOAP.

Who’s using SOAP?

Google seams to be consistent in implementing their web services to use SOAP, with the exception of Blogger, which uses XML-RPC. You will find SOAP web services in lots of enterprise software as well.

REST vs SOAP

As you may have noticed the companies I mentioned that are using REST api’s haven’t been around for very long, and their apis came out this year mostly. So REST is definitely the trendy way to create a web service, if creating web services could ever be trendy (lets face it you use soap to wash, and you rest when your tired). The main advantages of REST web services are:

  • Lightweight – not a lot of extra xml markup
  • Human Readable Results
  • Easy to build – no toolkits required

SOAP also has some advantages:

  • Easy to consume – sometimes
  • Rigid – type checking, adheres to a contract
  • Development tools

.

MySQL Slow Query Log

Start MySQL Server:

/etc/init.d/mysqld start

Syntax for /etc/my.cnf in Red Hat Family Machine :

[mysqld]
set-variable=long_query_time=1
log-slow-queries=/var/log/mysql/log-slow-queries.log
log-queries-not-using-indexes

  • The 2nd line under [mysqld] turns on slow query log and logs all slow queries to log-slow-queries.log in the MySQL data directory.
  • The 1st line indicates that any queries that took more than 1 seconds to execute need to be logged.
  • The last line tells MySQL to log *any* queries that do not use indexes regardless of the setting in the second line.

You must create the file manually and change owners this way:

mkdir /var/log/mysql
touch /var/log/mysql/log-slow-queries.log
chown mysql.mysql -R /var/log/mysql

mysqldumpslow

http://dev.mysql.com/doc/refman/5.1/en/mysqldumpslow.html

shell> mysqldumpslow

Reading mysql slow query log from /usr/local/mysql/data/mysqld51-apple-slow.log Count: 1 Time=4.32s (4s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost insert into t2 select * from t1

Count: 3 Time=2.53s (7s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost insert into t2 select * from t1 limit N

Count: 3 Time=2.13s (6s) Lock=0.00s (0s) Rows=0.0 (0), root[root]@localhost insert into t1 select * from t1.

File Operation & Sudo Operations

Add the Linux groups to your server:

[[email protected] tmp]# groupadd parents
[[email protected] tmp]# groupadd children
[[email protected] tmp]# groupadd soho

Add the Linux users and assign them to their respective groups

[email protected] tmp]# useradd -g parents ram
[[email protected] tmp]# useradd -g parents jagmohan
[[email protected] tmp]# useradd -g children ramesh
[[email protected] tmp]# useradd -g children hema
[[email protected] tmp]# useradd -g soho accounts
[[email protected] tmp]# useradd -g soho sales

User root changing the password for user ram

[[email protected] root]# passwd ram
Changing password for user ram.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[[email protected] root]#

Delete Users

userdel ram

There is also an optional -r switch that additionally removes all the contents of the user’s home directory.

# userdel -r ram

How to Tell the Groups to Which a User Belongs

[[email protected] root]# groups ram
ram : parents
[[email protected] root]#

How to Change the Ownership of a File

[[email protected] tmp]# ll test.txt
-rw-r–r–  1 root root 0 Nov 17 22:14 test.txt
[[email protected] tmp]# chown testuser:users test.txt

[[email protected] tmp]# ll test.txt
-rw-r–r–  1 testuser users 0 Nov 17 22:14 test.txt

[[email protected] tmp]#

SUDO Operation

Temporarily Gaining root Privileges

[[email protected] bob]$ more /etc/sudoers
/etc/sudoers: Permission denied

[[email protected] bob]$

Bob tries again using sudo and his regular user password and is successful:

[[email protected] bob]$ sudo more /etc/sudoers
Password:


[[email protected] bob]$

Becoming root for a Complete Login Session

The su command allows a regular user to become the system’s root user if they know the root password. A user with sudo rights to use the su command can become root, but they only need to know their own password, not that of root as seen here.

[email protected]:~$ sudo su –
Password:
[email protected]:~#

The /etc/sudoers file contains all the configuration and permission parameters needed for sudo to work.

Granting All Access to Specific Users

You can grant users bob and bunny full access to all privileged commands, with this sudoers entry.

bob, bunny  ALL=(ALL) ALL

This is generally not a good idea because this allows bob and bunny to use the su command to grant themselves permanent root privileges thereby bypassing the command logging features of sudo. The example on using aliases in the sudoers file shows how to eliminate this prob

Granting Access To Specific Users To Specific Files

This entry allows user peter and all the members of the group operator to gain access to all the program files in the /sbin and /usr/sbin directories, plus the privilege of running the command /usr/local/apps/check.pl. Notice how the trailing slash (/) is required to specify a directory location:

peter, %operator ALL= /sbin/, /usr/sbin, /usr/local/apps/check.pl

Granting Access to Specific Files as Another User

The sudo -u entry allows allows you to execute a command as if you were another user, but first you have to be granted this privilege in the sudoers file.

This feature can be convenient for programmers who sometimes need to kill processes related to projects they are working on. For example, programmer peter is on the team developing a financial package that runs a program called monthend as user accounts. From time to time the application fails, requiring “peter” to stop it with the /bin/kill, /usr/bin/kill or /usr/bin/pkill commands but only as user “accounts”. The sudoers entry would look like this:

peter ALL=(accounts) /bin/kill, /usr/bin/kill, /usr/bin/pkill
User peter is allowed to stop the monthend process with this command:

[[email protected] peter]# sudo -u accounts pkill monthend

Granting Access Without Needing Passwords

This example allows all users in the group operator to execute all the commands in the /sbin directory without the need for entering a password. This has the added advantage of being more convenient to the user:

%operator ALL= NOPASSWD: /sbin/

To create a Tar file
tar -czf archive.tar.gz *.txt

To list files in a compressed Tar file
tar -tzf archive.tar.gz.

Common Replication Administration Tasks

Common Replication Administration tasks :

1. Checking the Replication Status.

The most common task when managing a replication process is to ensure that replication is taking place and that there has been no errors between the slave and the master.

On Slave Server (192.168.36.2)

mysql>     Show slave statusG

The key fields from the status report to examine are:

Slave_io_state: indicates the current status of the slave.

Slave_io_running: shows whether the IO thread for the reading the master’s binary log is running
Slave_sql_running: shows whether the SQL thread for the executing events in the relay log is running
Last error: shows the last error registered when processing the relay log. Ideally this should be blank, indicating no errors.

Seconds_behind_master: shows the number of seconds that the slave SQL thread is behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to cope with the large number of statements from the master.

On the MASTER :

On the master, you can check the status of slaves by examining the list of running processes. Slaves execute the Binlog Dump command:

mysql> show full processlist;

2. Pausing Replication on the slave.

You can start and stop the replication of the statements on the slave using the commands :


start slave ;
stop slave;

When execution is stopped, the slave does not read the binary log from the master (the IO_THREAD) and stops processing events from the relay log that have not yet been executed the (SQL_THREAD) .

You can pause either the IO or SQL threads individually by specifying the thread type.

mysql> STOP SLAVE IO_THREAD;

Stopping the IO thread will allow the statements in the relay log to be executed up until the point where the relay log has ceased to receive new events.

Using this option can be useful when you want to pause execution to allow the slave to catch up with events from the master, when you want to perform administration on the slave but also ensure you have the latest updates to a specific point.

This method can also be used to pause execution on the slave while you conduct administration on the master while ensuring that there is not a massive backlog of events to be executed when replication is started again.

mysql> STOP SLAVE SQL_THREAD;

Stopping the SQL thread can be useful if you want to perform a backup or other task on a slave that only processes events from the master. The IO thread will continue to be read from the master, but not executed, which will make it easier for the slave to catch up when you start slave operations again.

Replication Solutions are :

  1. Using replication for Backups
  2. using replication with different Master and Slave Storage Engines
  3. Using replication for scale-out
  4. Replicating different databases to different slaves.
  5. Improving Replication Performance
  6. Switching Masters during failover.

Some of the important replication slave options for future purpose:

These are not mandatory, these are optional

–log-slave-updates: Normally, a slave does not log to its own binary log any updates that are received from a master server. This option tells the slave to log the updates performed by its SQL thread to its own binary log. For this option to have any effect, the slave must also be started with the -–log-bin option to enable binary logging.

This variable is used when you want to chain replication servers. For example, you might want to set up replication servers using this arrangement:

A -> B -> C

–read-only: Cause the slave to allow no updates except from slave threads or from users having the SUPER privilege. On a slave server, this can be useful to ensure that the slave accepts updates only from its master server and not from clients. This variable does not apply to temporary tables.

Implementation of Master –Master Replication Server Setup:

The advantages of master-master replication over the traditional master-slave replication are that you don’t have to modify your applications to make write accesses only to the master, and that it is easier to provide high-availability because if the master fails, you still have the other master.

Server1.example.com with the IP address 192.168.1.1 to the server server2.example.com with the IP address 192.168.1.2 and vice versa. Each system is the slave of the other master and the master of the other slave at the same time.

Steps for setup Master-Master Replication:

1) Create a user (slave2) for replication to access the MySQL Database on Server1.

On Server1:


GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO ‘slave2’@’192.168.36.1.2’ IDENTIFIED BY ‘password';

On Server2:

GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO ‘slave1’@’192.168.36.1.1’ IDENTIFIED BY ‘password’;

2) Now we set up master-master replication in /etc/my.cnf.

The crucial configuration options for master-master replication are auto_increment_increment and auto_increment_offset for auto-increment columns:

  • auto_increment_increment controls the increment between successive AUTO_INCREMENT values.
  • auto_increment_offset determines the starting point for AUTO_INCREMENT column values.


vi /et/my.cnf:
[mysqld]
server-id   =  1
# Assume that we have 2 nodes and offset values are even and odd
auto-increment-increment = 2
auto-increment-offset       = 1
#Binary Logging Enabled
log-bin = mysql-bin
log-slave-updates

After configuring the variables in my.cnf need to restart the server.

On Server2, assuming 192.168.1.2:

Vi /etc/my.cnf:

[mysqld]
server-id   =  2
# Assuming that we have 2 nodes, and offset value start from 2 i.e. even
auto-increment-increment = 2
auto-increment-offset       = 2
#Binary Logging Enabled
log-bin = mysql-bin

After configuring the variables in my.cnf need to restart the server.

On Server 1:

3) Assuming that both the servers have the same database copy, otherwise take the backup and restore.

Before taking the dump flush the tables with read lock and check the bin log name and position:

flush tables with read lock;

show master status;

Take the db snapshot on server1 and then restore it on server2.

mysqldump –u user –p –all-databases > backup.sql;

Unlock tables;

On Server2:

flush tables with read lock;

Show master status;

Unlock tables;

And finally, run the command to make server2 as slave to server1

CHANGE MASTER TO MASTER_HOST='192.168.1.1', MASTER_USER='slave2', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin', MASTER_LOG_POS=position;

And then start the slave

Start slave;

Then check the slave status with the following command:

Show slave status;

It is important that both slave_io_running and slave_sql_running have the value yes.

Now the replication from server1 to server2 is set up.

Next we should configure the replication from server2 to server1.

To do this, we stop the slave on server1 and make it a slave of server2:

On Server 1:

Stop slave;

Run the following command:

CHANGE MASTER TO MASTER_HOST='192.168.1.2', MASTER_USER='slave1', MASTER_PASSWORD='password', MASTER_LOG_FILE='mysql-bin', MASTER_LOG_POS=position;

Start the slave:

Start slave;

Check the slave status:

Show slave statusG;

It is important that both slave_io_running and slave_sql_running have the value yes..

MySQL Replication

Implementing Master-Slave Replication:

Master-Slave replication is mainly for scale-out solutions- spreading the load among the multiple slaves to improve the performance.

And for Analytics – while the analysis of the information can take place on the slave without affecting the performance of the master.

The target uses for Replication in MySQL include:

  1. Scale-out solutions: spreading the load among multiple slaves to improve performance. In this environment, all writes and updates must take place on the master server. Reads, however, may take place on one or more slaves
  2. Data Security: Because data is replicated to the slave, and the slave can pause the replication process, it is possible to run backup services on the slave without corrupting the corresponding master data.
  3. Analytics: while the analysis of the information can take place on the slave without affecting the performance of the master.

Replication Configuration:

Replication between servers in MySQL works through binary logging mechanism.

Each slave must connect to the master using a standard mysql username and password, so there must be a user account on master that the slave can use to connect, and the user needs only REPLICATION SLAVE privilege.

Assuming that username is slave, password is slave123 and Master host name is 192.168.36.1, Slave host name is 192.168.36.2.

On Master Server: (192.168.36.1)

1. Create a user for replication and grant REPLICATION SLAVE privilege.

GRANT REPLICATION SLAVE ON *.* TO ‘slave’@’192.168.36.1’ IDENTIFIED BY ‘slave123′;

Note: We can create a different user for each slave or we can use the same user for each slave that needs to connect and the user we want to use for replication has REPLICATION SLAVE privilege.

2. Setting the Replication Master Configuration.

Note: For replication to work you must enable binary logging on the master. If binary logging is not enabled, replication will not be possible as it is the binary log that is used to exchange data between the master and slaves.

Edit the configuration file on Master Server:  /etc/my.cnf

log-bin    = mysql-bin

Each server within a replication group must be configured with a unique server-id value.  Server-id is used to identify individual servers within the group.

And server-id must be an integer between 1 and (232)–1.

server-id = 1 # selection of these integers are entirely up to you

3. Setting the Replication Slave Configuration:

On Slave Server: (192.168.36.2)

The only option we must configure on slave is to set the unique server-id.

server-id=2

Note: If you do not specify a server-id value, it defaults to 0.  If you omit server-id (or set it explicitly to 0), a master refuses connections from all slaves, and a slave refuses to connect to a master. Thus, omitting server-id is good only for backup with a binary log.

We do not have to enable binary logging on the slave for replication to be enabled.

If we enable the binary logging on slave , we can user binary log for data backup and crash recovery on the slave. And also we can use the slave as  part of a more complex replication topology,  where the slave acts as a Master to other slaves.

log-bin= mysql-bin

4. Getting the replication master information:

  1. Start the command-line and flush all tables and block write statements by executing  the statement :

flush tables with read lock;

  1. Use the following MySQL statement to determine the current binary log file name and position.

show master status;

Ex: mysql-bin.0001 and position is 98

Note: This represents the replication coordinates at which the slave should begin processing new updates from the master.

On Master Server (192.168.36.1)

5. Take the database snapshot from the master with the tool mysqldump or whatever.

–         If you haven’t already locked the tables on the server to prevent statements that update data from executing start by executing

flush tables with read lock ;

–         In another session , use mysqldump to create a dump of all –databases or selected databases to replicate.

mysqldump –u root –p –all-databases –lock-all-tables  > db_dump.sql

Another alternative is use with the –master-data option, which automatically appends the change master to statement required on the slave to start the eplication process.

mysqldump –u root –p –all-databases  –master-data –lock-all-tables  > db_dump.sql

6. Restore the backup on slave server. ( 192.168.36.2)

mysql –u root –p < db_dump.sql    (OR)

On Master Server (192.168.36.1)

You will need either to copy the dump file to the slave, or to use the file from the master when connecting remotely to the slave to import the data.

mysql -h 192.168.36.2 < db_dump.sql

7.  After restoring the db dump start the slave :

CHANGE MASTER TO MASTER_HOST=’192.168.36.1’, MASTER_USER=‘slave’, MASTER_PASSWORD=‘slave123d’, MASTER_LOG_FILE= ‘mysql-bin.0001‘, MASTER_LOG_POS=98;

8.  After that, start the slave server.

Start slave;

9. Check the replication status with the following command

Show slave status\G

Note: To check the slave status on replication server, the user requires either the SUPER or REPLICATION CLIENT privilege.

Note: Check both the Slave_IO_Running & Slave_SQL_Running should be YES.

.

MySQL Performance

MySQL Performance: Replication, Partition and Memcached

3 main step to increase mysql performance:

  1. Split Read and Write
  2. Horizontally partition
  3. Implement Memchache
  4. Setup Replication
  • You can spread your reads with replication, and that helps a lot, but you can’t spread writes (they have to process on all machines) and they’ll eventually consume all your resources. You’ll find yourself adding replicated slaves at an ever-increasing rate to make up for the diminishing returns each additional slave provides.
  • The next logical step is to horizontally partition your dataset onto different master/slave clusters so you can spread your writes, and then teach your application to connect to the correct cluster depending on the data it needs.
  • With memcached you can reduce your database reads to a mere fraction, leaving the databases to mainly do infrequent writes, and end up getting much more bang for your buck, since your databases won’t be blocking themselves doing ACID bookkeeping or waiting on writing threads.

How to setup Memcached server

Few Imp Links for detailed Study: http://www.danga.com/memcached/

Installing memcached

yum install memcached
yum install php5-memcache

Configure and Setting MemCache

  • After installing these two packages next step is to configure memcached.conf file. ( /etc/memcached.conf )
  • After that edit the memecached.ini file. ( /etc/httpd/conf.d/memcache.ini )

A sample memcache.ini file

; uncomment the next line to enable the module
extension=memcache.so
[memcache]
memcache.dbpath="/var/lib/memcache"
memcache.maxreclevel=0
memcache.maxfiles=0
memcache.archivememlim=0
memcache.maxfilesize=0
memcache.maxratio=0
session.save_handler = memcache
session.save_path = "tcp://192.168.1.1:11211?weight=1,tcp://192.168.1.2:11211"

Camand Line Memcache

memcached -d   ==> Start memcache as a background process
memcached -vv ==> Start memcached as a foreground process

Another method for checking whether memcached was enamble or not is to check the phpinfo() function

php MemChache Conf
php MemChache Conf
.

MySQL Step-by-Step Installations

If you are on Fedora or Red Hat Box

$ setenforce 0 #disbles selinux temporarily

$ yum install mysql

If you have a rpm then

$ rpm -i /mysqlSRC_RPM_Folder/MySQL*

You can download rpm’s from http://www.mysql.org

$ restorecon -R /var/lib/mysql/ #restore selinux labels
$ setenforce 1 #enable selinux again

Now, I successfully installed MySQL. Now I wanted to change the location of where the actual data would be stored, I tried configuring the my.cnf file and using symlinks. I found using the symlinks to be easier and more compatible.

My data directory is /home/mysql.

The following is what I did to change the data directory after using RPM to install.

1. stop mysql server
2. mkdir /home/mysql
3. mv /var/lib/mysql/* /home/mysql/
4. rmdir /var/lib/mysql
5. ln -s /home/mysql /var/lib/mysql

Instead of using symlinks,we can just modify the my.cnf file (/etc/my.cnf)  by including “datadir = /home/mysql”

6. chown -R mysql:mysql /home/mysql
7. start mysql server again
.

MySQL’s FEDERATED storage engine

One of the most exciting features introduced in MySQL 5 is the federated engine. The ability to access data from a remote server without the constraints of replication tickles every programmer’s fancy. The Federated engine allows a DBA to create logical pointers to tables that exist on other MySQL servers and thereby link together separate data islands to form one or more logical databases. The Federated storage engine of MySQL is extremely easy to use and set up, and can quickly turn into a DBA’s best friend if they have to answer customer demands to correlate data that exists on several different physical servers.

Let’s take a quick walk :

  1. MySQL Version: We can validate that our installation has Federated by issuing a simple SHOW ENGINES command from the mysql client program. The FEDERATED storage engine is available beginning with MySQL 5.0.3. It is a storage engine that accesses data in tables of remote databases rather than in local tables.
  2. File Structure: When we create a FEDERATED table, the server creates a table format file in the database directory. The file begins with the table name and has an .frm extension. No other files are created, because the actual data is in a remote table. With the MySQL FEDERATED storage engine, there are no local data files for a table (for example, there is no .MYD file).
  3. Data in Remote Server: A Federated table acts as a pointer to an actual table object that exists on the same or another server. Once this link/pointer has been established, we can perform whatever operations we would like on the remote object (inserts, updates, deletes, reads, etc.), as long as we have been given the privileges to do so. The local server connects to a remote server, and uses the MySQL client API to read, delete, update, and insert data in the remote table.
  4. Capability: Understand that our capabilities on the remote object are restricted to the underlying engine that is serving as the source of the Federated table. For example, if we create a Federated table that is pointing to a MyISAM table on another server, we do not have transaction (commit/rollback) capability. Likewise, a Federated pointer to an Archive engine table would not allow us to update or selectively delete data as Archive tables allow reads and inserts only.
  5. Communication: The local server communicates with the remote server using MySQL client C API functions. It invokes mysql_real_query() to send the statement. To read a result set, it uses mysql_store_result() and fetches rows one at a time using mysql_fetch_row().

How to Use:

Normally, we have two mysql servers running, either both on the same host or on different hosts.

On the remote server, we have:

CREATE TABLE `city_remote` (
  `city_id` int(11) NOT NULL auto_increment,
  `name` char(35) NOT NULL default '',
  `country_code` char(3) NOT NULL default '',
  `district` char(20) NOT NULL default '',
  `population` int(11) NOT NULL default '0',
  PRIMARY KEY  (`city_id`)
) ENGINE=MyISAM;

To use that table in our local server, enter the description:

CREATE TABLE `city_local` (
  `city_id` int(11) NOT NULL auto_increment,
  `name` char(35) NOT NULL default '',
  `country_code` char(3) NOT NULL default '',
  `district` char(20) NOT NULL default '',
  `population` int(11) NOT NULL default '0',
  PRIMARY KEY  (`city_id`)
) ENGINE = FEDERATED
connection='mysql://user:[email protected]:3306/world/city_remote';

Here, user and pass are valid credentials to access the table city_remote in the database world on server remote.com. The structure of these tables must be exactly the same. With that done, we can query your federated table as if it were in our local server. Issue a query and get a record set.

select * from city_local where city_id = 1;

+----------+--------+----------------+----------+--------------+
| city_id  | name   | country_code   | district | population   |
+----------+--------+----------------+----------+--------------+
|  1       | Noida  | IN             | Noida    |    3000000   |
+----------+--------+----------------+----------+--------------+

There are a few limitations concerning federated engine usage, namely:

  • The remote table must exist when you create your local one.
  • DDL operations are not supported through the Federated engine (ALTER TABLE, etc.).
  • We can’t issue ALTER TABLE commands on a federated table.
  • The federated table is not aware of any structural changes that may occur in the remote one. You may get an error at runtime. The structure definitions of both the source and federated object must stay identical.
  • Transactions are not supported (should be in version 5.1).
  • Query cache support is not enabled.
  • Any DROP TABLE statement issued against a FEDERATED table drops only the local table, not the remote table.
  • FEDERATED tables do not work with the query cache.

.

FullText Search Solutions

Types of FullText Search Solutions

  • Special Database Features
    • MySQL Full Text Search, Sienna
    • Solutions exists for PostgreSQL, Oracle and many others
  • Home baked database based solutions
    • Using stored procedures or set of queries and keyword tables for search
  • External Full Text Search Solutions
    • Lucene, Sphinx, Mnogosearch etc.

My Thoughts on Performance

  • Lucene and Sphinx both can handle large data sizes
  • Lucene has more features, dynamic updates
  • For the lamp developer use the Zend Framework’s Lucene Search which is based on Lucene but it will not work on php 4.
  • speed of fulltext search in lucene is much faster as compared to mysql
  • lucene is much more complex to use as compared to mysql.
  • lucene does not allow us to modify a document. Modifying a document is equivalent to deleting the current document and adding the modified document to the index.
  • lucene requires an object of the index to perform the search. We will know about it when we use the api. Whenever we add a new document to the index, a new object of the index has to be created to include the new document in the index. But creation of a new object is not a major overhead. Though it does slow down the searching process to some extent.
  • MySQL FullText Search is good for Small (relatively) data sizes

MySQL FullText Search and Updates

  • Simple: In mysql, we can simply mark an index on a text/varchar column as fulltext, and our work is done. All we need to do next is to fire MATCH AGAINST queries. Adding and modification of indexes is handled by mysql internally as and when new data is added.
  • MyISAM FullText Search is based on BTREE
    • Special form of BTREE index
  • Each word is index entry
  • Updating text with 1000 words – 1000 key entries needs to be updated
    • A lot of random IO if index is not in memory
  • Index degradation by fragmentation
    • Run OPTIMIZE TABLE for best performance
  • Minimum Length: With mysql we have the minimum length of word to be indexed which is by default 4. So all words which have less than 4 characters will not be indexed. What will we do if we want to index words like “php”, “asp”, “c”? We will have to decrease the minimum length from 4 to 1. And this will increase the index size drastically and slow down all our searches as a consequence. There are no such issues in lucene.

Lucene

Lucene is a free/open source information retrieval library, originally implemented in Java by Doug Cutting. It is supported by the Apache Software Foundation and is released under the Apache Software License. Lucene has been ported to programming languages including Delphi, Perl, C#, C++, Python, Ruby and PHP.

  • Popular full text search Library written in Java
    • http://lucene.apache.org/
    • Clucene – C port exists, but is not current
    • Is not specially tailored for indexing databases
    • Some coding is needed for integration
  • Dynamic index changes possible
  • Very Advanced query language
    • Wildcard searches:
    • Search in Fields: title:”The Right Way” AND text:go
    • Proximity Searches, Fuzzy Searches etc
  • Supports attributes (indexed and non indexed)
  • Some CJK Support
  • Easily integrates with Java

Sphinx Search

  • Designed for indexing Database content
  • Focuses
    • High performance
    • Search Quality
    • Ease of use
  • Supports multi-node clustering out of box
  • Support Multiple Attributes
  • Different sort modes (relevance, data etc)
  • Supports trend queries
  • Support for snippets
  • Client available as MySQL Storage Engine plugin
  • Number of limitations
  • no partial word searches, hard to update indexes

Tbgsearch

  • Very fast for medium size data sizes
  • Only boolean searches available

.