To work in my group, you need to demonstrate yourself to be significantly above average among your peers: you need to undertake one of the following challenges and submit the results to me. If they're not correct... no problem, I give second chances. You're also allowed to ask questions, but try to keep them to a minimum. The primary criterion by which you will be judged is how well you can perform, and then write up, these tasks independently, without much help. Note to potential grad students: Pay attention to the extra work required in each of the challenges below if you're applying to be my grad student.
In all cases you need to submit a PDF write-up, with histograms or figures plotted. I don't need your code. I want to see your write-up including a description of you did, and why, and your results with commentary. THIS IS JUST AS MUCH A TEST OF YOUR COMMUNICATION SKILLS AS CODING. Doing research requires critical thinking and the ability to explain your rationale for what you did AND WHY. Without that your code or blind results are worthless.
I have several projects running. Three of them are described in the PowerPoint presentation here. You only need to do ONE of the below challenges, depending upon which project you're interested in working on. You need to do the task, and then write it up nicely, with graphs or plots to illustrate your answer. You should be able to do the task within about a week at most, but the answer has to be GOOD. If you hand in a GOOD solution later, that's better than a crappy solution earlier. In other words, a good solution is required, but faster is better than slower.
Let me know if there are any ambiguities, but your job is to do this task with as little supervision from me as is possible.
Please direct all questions to me at firstname.lastname@example.org.
The left side shows frames from a real video of a growing bacterial colony; the right frame shows our algorithm tracking the growth and motion of each individual bacterium during its whole life cycle from being born, moving, growing, to splitting into two daughter cells. Biologists need to track cells in video frames for many purposes, including tracking the growth of cancer cells, learning about the growth of embryos, learning how bacteria move, learning how genetic changes to a cell result in functional changes during it's lifetime... it's a huge research area. Although there already exist several cell tracking algorithms out there, we are working on a novel approach that seems to have several advantages. In order to join this project, your task is to take the above animated GIF, and automatically estimate the number of bacteria in each of the frames, and produce a text file whose only output is one integer per line, representing the count, and the number of lines should equal the number of frames. You only need to use one of the two sides; I'd recommend you use the right side (red lines on an otherwise black and grey image are easy to isolate.) You can use any language you want, and any method you want, as long as it's automatic. Describe your algorithm and the output, and send your PDF write-up to me by email. Extra work for grad students: You must create two algorithms, one that can handle each side of the above image. Compare the results and explain any differences.
1) If you want to do the biological network alignment project, you need to know what a graph is and how to work with them, especially how to code with them. Your task is the following: you're given a text file representing a network. The first line of the file is N, the number of nodes. You will name the nodes from 0 through N-1. The remaining lines will have two integers per line, representing an edge. You don't know in advance how many edges there are, you just keep reading until you reach end of file. When you are done, you are to compute the number of CONNECTED COMPONENTS in the graph, and output a single integer. Below I provide some sample inputs. I don't care what language you use. In addition, in your write up, include a histogram of the distribution of DEGREES of nodes. That is, how many nodes have degree zero, degree 1, etc., up to the max degree. If you don't know any of these terms, look them up. The data for this project is here. The graphs are undirected. That means each edge can only exist in your graph once, even if it is listed multiple times (or with the node endpoints reversed) in the input file. Extra work for grad students: Treat the graphs as directed, and enumerate the number of strongly, and weakly connected components. In addition, read the GRAAL paper and count the graphlets of size 2 and 3 in all the networks.
2) If you want to work in the Galaxy Image Analysis project, then you should start by playing around with any galaxy images you find on the web and putting them into the SpArcFiRe webpage. Once you get the hang of it, you have two choices:
(A) find an image of NGC5054, or take the one from my paper with Darren Davis (cited on the above web page), and try to find a set of SpArcFiRe parameters that can find the "dim" arm on the right hand side of the image of that galaxy in the above paper.
(B) Go get the following file: here Each row is some data about a galaxy, and the columns have names in the top row. You don't need to know what all of the columns mean, but pay attention to these ones: P_CS: the probability that this galaxy is a spiral. numDcoArcsGEXXX for various values of XXX: the number of arms in that spiral galaxy that are longer than XXX. Your task is to plot a histogram of the number of galaxies with N or more arms of length XXX, for each of the XXX values in the file. It would be best to plot all the histograms on one figure to be easily able to compare them to each other. Extra work for grad students: Tell me about your astronomy and/or physics background.
3) If you want to work on the global warming project, then you should start by going to the website http://issm.jpl.nasa.gov and seeing what the project is all about. The ISSM architecture is briefly described in this PPT file. Then pick a challenge below: