Running your jar on openlab

  • Once you have tested your crawler on your local machine in Eclipse, export your project as a "Runnable Jar". A runnable jar packages all the required libraries into one big .jar file so you don't have to deal with moving them separately. That's convenient, but it probably violates licensing agreements. Don't do that with code you plan on distributing. The launch configuration is asking what "main" you want the runnable jar to start from. If you tested your program in Eclipse, you should have an option in the dropdown
  • Make sure you update any file paths in your program for the openlab file hierarchy. It's different than your local machine
  • Screenshot of openlab setup Screenshot of openlab setup Screenshot of openlab setup
  • Now you have to move the jar you just made to openlab
  • On a mac you can do that with "sftp" from a Terminal. On windows use WinSCP
  • Screenshot of openlab setup
  • Now you need to open a terminal window on the openlab cluster of machines.
  • I use "ssh" to login to the remote machine. If you are on Windows you will need to use PuTTY
  • Screenshot of openlab setup
  • Once there use "ls" to see if your jar made it okay
  • Then make sure you are using the right version of java. If you aren't then use the module command to switch to java 1.7
  • Note the name of the actual machine you connected to using "hostname"
  • Then run the "screen" command to start a long-running process
  • Screenshot of openlab setup
  • "screen" will open up a virtual terminal that you can leave running after you log out
  • Then use the command "java -Xmx1024M -jar crawler4openlab.jar" to start your crawler. The -Xmx command tells java how much memory to use while crawling. You shouldn't need too much because the frontier is kept on disk.
  • Update: ICS supports is telling me that we should be using "nice -n 19 java -Xmx1024M -jar crawler4openlab.jar" That additional bit at the beginning makes our long-running processes defer to other short-running processes when there isn't enough CPUs to go around
  • Screenshot of openlab setup
  • Once you are happy everything is going okay, detach from the screen with Ctrl-A then "d"
  • This will send you back to the real terminal
  • If you logout with the "exit" command then your crawler is still running away. You need to make sure you are monitoring it.
  • To go back to it, log directly into the host that you were using:
  • Screenshot of openlab setup
  • Once on the same host, type "screen -r" and you will reattach to the virtual screen that your crawler is running in. If you hit Ctrl-C it will kill your process, or you can just look at any output that you are generating. Hit Ctrl-A then "d" to detach again, as often as you like