Running a Pre-assembled Application
Now that you have XAR installed and compiled, let us ensure that you can succesfully run a pre-assembled example on your machine. This application is the domain of corporate acquisitions where from input (news) stories such as "Microsoft today bought ABC Networks for $45 million in cash. The ......" we want to extract details such as the buyer company, the company that got bought and for what amount.
Make sure that the directories 'examples', 'specs', and 'databank' are placed in the directory corresponding to the workspace for this project.
When you start up XAR (by invoking 'Extractor') you will be prompted for (i) A domain (name): Enter "acquire", and (ii) A databank name: Enter "acq1" here. The domain is the general domain we will work in (acquire in this case) and the databank is the databank that will be created or will be used.
There are two primary modes for running XAR.
1) Extracting features to create a "databank". If we have a new corpus of text documents we must first extract features for them. Check the directory "examples/acquire". This contains several input files that we wish to extract acquisition details from. Now select the option
0) Create databank
from the menu. Provide the directory path "examples/acquire" at the prompt. You will have to wait a while and when then menu prompt returns, check the databanks directory. There should be a new file there called 'acq1db.txt'. In that file you should see a set of predicates capturing tokens and entities and their features, for each input file. The features are properties such as position of the token (in the document), which sentence it is part of, type etc.
We create such databanks as they can now be simply loaded for (extraction) use. Extraction of features is a time consuming process (can take more than an 1 hour for about a 100 documents) so we do this once (for each corpus) and then re-use the databanks.
Enter 'q' to exit.
2) Actual extraction of data.
Reinvoke 'Extractor'. Enter 'acquire' for domain, when prompted for databank enter "acq1". The databank created earlier gets loaded. Let us extract some data now. Choose the option 1) Attribute worlds, when the menu returns choose 2) Best tuples. Finally choose option 9) Export as XML.
Some (trace) data will appear by when you run the above. After the final (option 9) you should see two new files (in the XAR installation directory) "AW.xml" and "TW.xml" which contain the attribute worlds and best tuples extracted.
...
Congratulations ! You can run XAR. However this was an application with pre-assembled specifications. If you now check the files "acquireschema.txt" and "acquirerules.txt" you will find 1) A relational schema corresponding to the acquires relation we wanted to extract, and 2) Some (datalog or logical) rules that state what tokens should go to what slots. In deductive database terminology, the databank we created in step 1 forms the extensional database (EDB) whereas the rules you see in "acquirerules.txt" form the intensional database (IDB).
Let us now go on to you building your very own extraction application here.