Informatics 141: Computer Science 121: Information Retrieval

Task 22

Running your jar on openlab

Once you have tested your crawler on your local machine in Eclipse, export your project as a "Runnable Jar". A runnable jar packages all the required libraries into one big .jar file so you don't have to deal with moving them separately. That's convenient, but it probably violates licensing agreements. Don't do that with code you plan on distributing. The launch configuration is asking what "main" you want the runnable jar to start from. If you tested your program in Eclipse, you should have an option in the dropdown
Make sure you update any file paths in your program for the openlab file hierarchy. It's different than your local machine

Now you need to open a terminal window on the openlab cluster of machines.
I use "ssh" to login to the remote machine. If you are on Windows you will need to use PuTTY

Once there use "ls" to see if your jar made it okay
Then make sure you are using the right version of java. If you aren't then use the module command to switch to java 1.7
Note the name of the actual machine you connected to using "hostname"
Then run the "screen" command to start a long-running process

"screen" will open up a virtual terminal that you can leave running after you log out
Then use the command "java -Xmx1024M -jar crawler4openlab.jar" to start your crawler. The -Xmx command tells java how much memory to use while crawling. You shouldn't need too much because the frontier is kept on disk.
Update: ICS supports is telling me that we should be using "nice -n 19 java -Xmx1024M -jar crawler4openlab.jar" That additional bit at the beginning makes our long-running processes defer to other short-running processes when there isn't enough CPUs to go around

Once you are happy everything is going okay, detach from the screen with Ctrl-A then "d"
This will send you back to the real terminal
If you logout with the "exit" command then your crawler is still running away. You need to make sure you are monitoring it.
To go back to it, log directly into the host that you were using:

Once on the same host, type "screen -r" and you will reattach to the virtual screen that your crawler is running in. If you hit Ctrl-C it will kill your process, or you can just look at any output that you are generating. Hit Ctrl-A then "d" to detach again, as often as you like