First Revision: 2008 May 5
Editor: Daniel Wolfe, Bachelor of Science Candidate, John Brown University <wolfed@jbu.edu>
Advisor: Dr. Robert Norwood, Associate Professor of Engineering, John Brown University <rnorwood@jbu.edu>
© 2007-2008 John Brown University Computer Science Department
Catapult is a modification of MediaWiki that will allow a MediaWiki installation to contain content in multiple languages without the need for a separate configuration for each language. Catapult is intended as an academic project to prove that it is possible to use a single MediaWiki configuration to contain content in multiple languages. As such, it is not intended to be used for enterprise-grade content management. Catapult is merely a proof of concept.
Adapted from http://www.mediawiki.org/wiki/Manual:Running_MediaWiki_on_Ubuntu.
Although Catapult is a modification of MediaWiki that should work on any MediaWiki installation, it has only been tested on a single installation and guaranteed to work for that installation. The following installation procedure should explain how to reproduce that environment. You are free to change the installation procedure at will, but be advised that there is no guarantee that the software will work.
Obtain a copy of the appropriate version of Ubuntu. See http://www.ubuntu.com/. While other Linux distributions will probably work, Catapult has only been tested on Ubuntu. If another distribution is used, see your own distribution's methodology for obtaining packages.
While installing, select most options as seems appropriate. In particular:
If you decide to use another distribution of Linux or the desktop edition of Ubuntu, you will need to install Apache 2, MySQL, and PHP 5. You can do so on Ubuntu with the following commands:
$ sudo apt-get install apache2 $ sudo apt-get install mysql-server $ sudo apt-get install php5 $ sudo apt-get install php5-mysql
All software packages should be checked for updates…
$ sudo apt-get update $ sudo apt-get upgrade $ sudo apt-get dist-upgrade
…and the system should be restarted to update any kernel updates that have taken place.
$ sudo shutdown -r now
Technically, this step is optional… optional in the sense that it is optional to put on a parachute before you jump out of an aeroplane. The following packages are not needed to run the software but will assist you in installing and configuring. Of course, if you've gotten this far, chances are that you know your way around Linux and know what you're doing. Feel free to modify the following list of packages to install according to your needs.
$ sudo apt-get install elinks
This package is needed if you need to browse the Internet from your console.
$ sudo apt-get install vim
The distribution already comes with vim-tiny. We don't want this. We want the full version of Vim.
$ sudo apt-get install ssh
SSH is required if you wish to work remotely.
$ sudo apt-get install php5-cli
This is useful if we ever want to run PHP command-line scripts. Some MediaWiki tools are command-line scripts, so it can be useful.
$ sudo apt-get install debian-helper-scripts
These are just some general scripts that make the maintenance of the server easier. This is helpful for restarting MySQL.
$ sudo apt-get install patch
This is needed to convert the specific files of MediaWiki to Catapult automatically. If you want to make the changes manually, this is unnecessary. Keep in mind that in order for this to be guaranteed to work, you must use MediaWiki 1.11.0. If you use another version, you might want to consider making the changes manually.
At this point, change the configuration files to your taste. If you don't know what that means, you can probably skip this part.
If you made any changes to the configuration files, you need to restart the appropriate services:
$ sudo apache2ctl restart $ sudo service mysql restart
Catapult has only been proven to work with MediaWiki 1.11.0. Again, feel free to experiment with other versions if you want.
$ cd /var/www/ $ sudo wget http://download.wikimedia.org/mediawiki/1.11/mediawiki-1.11.0.tar.gz $ sudo tar xzvf mediawiki-1.11.0.tar.gz $ sudo mv mediawiki-1.11.0 wiki $ sudo chown -R www-data:www-data wiki/ $ cd wiki/
Open up a web browser and navigate to your server. Follow the instructions to configure the wiki. Once you are done, you will need to move a newly created configuration file.
$ sudo mv config/LocalSettings.php .
In order to get the URL scheme to match the one used on Wikipedia (which you most likely will want), you will need to modify the configuration file located at /etc/apache2/sites-available/default and /var/www/wiki/LocalSettings.php file that you just moved. The instructions for modifying it can be found at http://www.mediawiki.org/w/index.php?title=Manual:Short_URL/wiki/Page_title&oldid=142900. The rest of the instructions are assuming that you have made this change.
First, you will need to obtain Catapult and extract it.
$ cd /var/www/ $ sudo wget http://www.the-wolfeden.com/Catapult/catapult.tar.gz $ sudo tar xzvf catapult.tar.gz $ sudo chown -R www-data:www-data catapult/ $ cd catapult/ $ sudo ln -s ../wiki/languages/Names.php
Next, you will need to make the appropriate changes to the database. There exists a file with the appropriate SQL queries to do so. However, if during the installation you set MediaWiki up to use database prefixes, you must modify the queries to reflect this. The queries exist in the file Queries.sql. For the purposes of simplicity, we are assuming that you are not using database prefixes. (Plus, since you don't need separate configurations to store content in multiple languages, you don't need database prefixes anymore, do you?)
$ mysql -u root -p wikidb < Queries.sql
Substitute the name of the database that MediaWiki is using for wikidb. Enter your MySQL root password when prompted.
Now, you must change the code of MediaWiki in two specific files. Those changes exist in two files that were created with the diff utility. If you understand the format of a diff, you can make the changes manually. If you don't understand the diff format or have half of a brain and would like to do it automatically, you can use the patch utility.
$ sudo patch ../wiki/includes/OutputPage.php OutputPage.php.patch $ sudo patch ../wiki/includes/SkinTemplate.php SkinTemplate.php.patch
One last issue exists: you must configure the associator to work with the database. In order to do that, you must edit a file and enter the appropriate database configuration values in the appropriate places.
$ sudo vi DatabaseCredentials.php
The values should match the ones entered in /var/www/wiki/LocalSettings.php. There are default values already in the file. You must change those to the appropriate values.
You're done at this point!
The actual operations that need to be performed are done through an external web interface. You can set up your interface any way that you want, but if you followed the recommended procedure as detailed above, you can access this interface by going to the catapult folder of your Catapult server. There are three operations that can be performed: Set Language, Associate Articles and Disassociate Article.
Every article of content in your Catapult installation should have a specific language attached to it. Articles that are written in English should be flagged as English. Articles written in French should be flagged as French. Articles written in German… well, you get the idea. By default, every article that is first created is not flagged to any specific language at all. In order to associate one article to another, you must first assign a language to them.
The process for setting the language of an article is a very simple process. All that you need to do is go to the Set Language section, select the article that you want to set the language for and then select that language that you want to set it to. Set the change, and it's done. In order to set the language for an article, there may not be any other articles associated with it. If you want to change the language of an article that is already associated, you must first disassociate it.
The list of languages is the same list that MediaWiki uses as its internal language list. It should contain any language that has an ISO 639 code.
Once you have selected a language for all of the articles that you want associated with each other, you are now ready to associate them. Select the article that you want to associate and select the article that you want to associate it to. Once you submit it, the two articles will contain links to each other in their Other Languages section.
You will not be able to associate an article to an article group if that article group already contains an article in that language.
If you ever want to disassociate an article from an article group, all that you need to do is to select the article in question in the Disassociate Article section.
In the normal course of MediaWiki operation for processing an article to be displayed, the article is passed through the parser. In a garden-variety MediaWiki installation (e.g.: Wikipedia), articles in other languages are indicated by the presence of WikiCode references that are inserted into the code of the article. The parser then scrapes these references and places them into a language list. The disadvantages of this method are that there must be a reference link on each article to each other article. For topics that are highly translated, this produces a rather large list of references.
Catapult changes the flow of processing with regards to the list of articles in other languages. In Catapult, the product produced by the parser for the list of articles in other languages is overridden by the list that Catapult produces. Instead of scraping the article text for reference links, the data for the list is instead pulled from two new fields that we add to the page table in the MediaWiki database. These two new fields indicate the language of the article and the other articles that are in different languages. Every article that is on the same topic but in a different language shares the same index node. In producing the list of articles in other languages, a simple database query is needed to find every other article with the same index node.
The Catapult installation contains six files:
This file contains two queries that will add two new fields to the page table: page_language and page_index_node.
page_language contains an ISO 639 code that indicates the language that the content of the article is in. If there is no language set, this field will be blank.
page_index_node contains an integer value. All articles that are associated to each other will share the same index node. If the article has not been associated, the value of this field should be zero. Otherwise, the value is arbitrarily selected from one of the page_id values of one of the articles in the index-node group.
This file, when applied to the OutputTemplate.php file will override the default language link mechanism and add functionality to find the list of languages based on the database values. A query is first made to determine what the index node for the current article really is. Then another query is made to find all the other articles that share that index node. The results are placed in the list that MediaWiki would normally extract from.
This file, when applied to the SkinTemplate.php file will (among making the code look better) force Catapult to get the name of a language from the Names.php file instead of from the database interwiki table.
This file provides a mechanism for setting the language of an article, associating two articles together and unassociating an article. The changes are made through a MySQL connection abstracted by a class found in MySQLDatabaseConnection.php and with credentials found in DatabaseCredentials.php. A list of languages is obtained from the same list that MediaWiki uses via a symbolic link (or a hard link would work as well). The code in this file also performs validation checks to ensure that no index-node group contains two articles of the same language, that the index node is always the value of one of the page_ids of an article in the index-node group, that a language is set for any article associated, etc.
This file merely contains the variables needed for the Catapult associator to access the database.
This file merely contains a mechanism for abstracting the connection to the MySQL database.
Catapult falls short of its indented goals. In particular, articles that have the same name but exist in separate languages must be disambiguated at the article title level. The original specification called for the articles to be placed in separate namespaces (not the actual MediaWiki namespace feature, but the mathematical concept of a namespace) with disambiguation being performed somewhere else in the URL.
The other limitation is the fact that all association and disassociation of articles is performed from an external interface. The ideal solution would be to integrate the interface throughout the MediaWiki software to allow the end users greater access to these features.
Catapult was made as an academic project to prove that it is possible to use a single MediaWiki configuration to contain content in multiple languages. As such, further developing Catapult to actually become an enterprise-grade content-management system was not deemed to be a sufficient use of resources for a single developer. Catapult is merely a proof of concept.
This project is made by Daniel Wolfe in partial fulfilment of a Bachelor of Science in Computer Science at John Brown University, Siloam Springs, Arkansas. Daniel Wolfe is a student of Dr. Robert Norwood.
As an extension of MediaWiki, John Brown University holds the copyright to this work, and it is licensed under the terms of the GNU General Public License, version 2 or later (see http://www.fsf.org/licenses/gpl.html). Derivative works and later versions of the code must be free software licensed under the same terms. This includes “extensions” that use MediaWiki and/or Catapult functions or variables; see http://www.gnu.org/licenses/gpl-faq.html#GPLAndPlugins for details.
© 2007-2008 John Brown University Computer Science Department