This is a work in progress. All comments are appreciated!

Random Uniform CSP Generators

Many CSP researchers around the world use random uniform instances to evaluate their Constraint Satisfaction Algorithms. Although it is generally agreed that the ultimate test of a CSP algorithm is its performance on "real world" (read, economically important) problems, there is also widespread consensus on the value of "laboratory" experimentation on random problems.

Random problems offer the following advantages for empirically evaluating the performance of CSP algorithms:

  1. Large quantities can be generated, so that statistically significant means and variances can be reported.
  2. It is easy to vary systematically the parameters of the generator and thus to observe how an algorithm's performance relates to, for example, the number of constraints.
  3. It is easy to find parameters which generate problems of which 50% are soluble; on average such problems are particularly difficult and thus tend to highlight differences in algorithm performance.
  4. A fourth benefit of random problems has not been much realized. For several reasons, using random problems should permit the easy interchange of problems among experimenters:
    1. These problems embody no trade secrets or sensitive corporate information.
    2. No specialized domain knowledge is require to understand them.
    3. They can be succinctly specified by an algorithm and a random number seed.
The goal of this Web page is promote benefit #4 by providing CSP researchers with a simple, compact program which can be used to generate uniform, random, binary CSPs. If the parameters to the function (written in C) are specified in a paper, then other workers will be able to generate the same instances on which to test their own algorithms.

The random problem model

Although an infinite number of random CSP instance generating models might be imagined, in practice most workers in the last few years have used a simple one, which takes four parameters: The C code below generates random instances based on this model.

The implementation

The C program was designed to be easy to use and to make problems easy to replicate. In particular, our goals were How each of these goals was achieved is discussed below.

Truly Uniform

Something here about each possible constraint having an equal chance of being chosen; likewise each pair of values.

Random Numbers

The C code includes an explicitly specified pseudo-random number generator. Relying on rand() or random() functions defined by an operating system or local library of course greatly reduces the likelihood that another researcher will be able to duplicate the instances. We use a routine, ran2, from the well known Numerical Recipes in C, by William H. Press et al.; the discussion there is well worth reading. A brief quotation: "We think that, within the limits of its floating-point precision, ran2 provides perfect random numbers" (p. 281). This excellent book is available through the WWW at http://nr.harvard.edu/nr/bookc.html. The specific section concerning ran2, Section 7.1, is in http://cfatab.harvard.edu/nr/bookc/c7-1.ps (postscript).

Easy Duplication

Usually researchers want to select a seed and then generate a large number, say 100 or 1000, of CSPs. The same sequence of problems can be regenerated later by using the same seed. It may also be necessary to refer to a particular instance in the sequence, and saying "The 768th instance in the series generated with seed 7840364" is unwieldy and requires recreating (although not solving) 767 unneeded instances to create the one of interest.

The solution we adopt is to give each instance in a sequence (after the first) its own unique seed. After each instance is created, the random number generator is called one additional time to generate a number that will be used as the seed for the next instance. Since each instance is created with random numbers that start from the beginning of the sequence defined by a particular seed, each instance can be created by using its own particular seed. Thus it is possible to write "The only instance we could not solve within our 24 CPU-hour time bound was the 768th in the series, which has seed 1996453." Subsequently, it will be easy to recreate the one instance of interest.

Implementation Notes

The code supplied is not meant to be sacred; it is just a useful tool. Feel free to make any modifications necessary, as long as the changes do not alter the CSPs which are produced. In general, in writing the program, we tried first to adhere to the four principles described above. The second concern was clarity; hopefully, the program is easy to read and understand. Efficiency was a lesser concern, and you may well see several ways to make the program run faster (we certainly do!).

primitive sizes

Several parts of the code rely on the length of a long being 4 bytes. If that's not what your compiler produces, you'll have to substitute something else for long. In contrast, the length of an int is not important.

zero-based numbering

As is typical in C programs, numbering starts from 0. Thus in a CSP with 100 variables, they are numbered from 0 to 99. If there are 8 values, they are numbered from 0 to 7. You can easily change the ranges by adding a constant in the calls to AddConstraint and AddNogood.

malloc and free

Depending on how smart your compiler is, you may not want to malloc and free the CTarray and NGarray fields on every call.

randomly selecting disallowed value pairs

The program uses the random number generator to select the disallowed or illegal value pairs in each constraint. This is an arbitrary decision; it would be equally possible to select the allowed or valid pairs. If you use a data structure that stores the valid pairs, you'll have to do some intermediate processing.

Verification

If you implement urbcsp on your computer, undoubtable making minor modifications, how do you know that your program is generating the same series as everyone else's? A validation suite is required; for now, note that the command urbcsp 100 10 10 10 100 100 will generate 100 (very easy) CSPs and the last will have these constraints:
Instance 99 uses seed -2016279424
 40  68: (3 4) (2 9) (8 9) (5 2) (3 9) (6 9) (9 5) (7 4) (8 5) (3 1) 
 15  52: (8 4) (3 7) (5 3) (6 1) (7 8) (5 2) (5 1) (1 4) (8 7) (2 3) 
 68  71: (1 5) (0 5) (2 1) (8 9) (4 4) (8 3) (9 9) (2 5) (0 0) (0 6) 
 49  66: (4 5) (4 8) (9 7) (5 3) (9 5) (8 2) (1 2) (0 5) (0 3) (3 5) 
 80  89: (7 1) (8 5) (5 8) (7 2) (2 1) (7 8) (1 9) (8 8) (0 6) (0 7) 
 64  86: (7 3) (7 9) (4 9) (6 1) (2 8) (8 9) (8 4) (4 6) (0 2) (9 5) 
 21  26: (0 5) (4 0) (5 4) (4 1) (2 0) (7 1) (3 3) (8 8) (3 9) (2 1) 
 39  86: (7 1) (2 7) (1 2) (2 8) (2 9) (6 0) (4 1) (3 4) (3 6) (1 8) 
 12  53: (6 2) (3 9) (5 2) (4 5) (2 1) (9 8) (9 5) (5 6) (2 7) (8 2) 
 75  96: (1 4) (7 1) (8 0) (1 6) (2 8) (7 0) (8 3) (4 1) (0 1) (1 7) 
At a minimum, your program should duplicate this result.

Other generators

We welcome other random instance generators. It would probably make sense for other problem generators to use the same pseudo-random number generator and a similar scheme of assigning a new seed to each problem.

The code

Here is the code for urbcsp.c: (it's also available as a stand-alone file).
/* urbcsp.c -- generates uniform random binary constraint satisfaction problems
*/
#include <stdio.h>
#include <math.h>

/* function declarations */
float ran2(long *idnum);
void StartCSP(int N, int K, int instance, long seed);
void EndCSP();
void AddConstraint(int var1, int var2);
void AddNogood(int val1, int val2);

/*********************************************************************
  This file has 5 parts:
  0. This introduction.
  1. A main() function, which can be used to demonstrate MakeURBCSP().
  2. MakeURBCSP().
  3. ran2(), a random number generator.
  4. The four functions StartCSP(), AddConstraint(), AddNogood(), and 
     EndCSP(), which are called by MakeURBCSP().  The versions
     of these functions given here print out each instance, listing
     the incompatible value pairs of each constraint.  You will need
     to replace these functions with versions that mesh with your
     system and data structures.
*********************************************************************/


/*********************************************************************
  1. A simple main() function which reads in command line parameters
     and generates CSPs.
*********************************************************************/

int main(int argc, char* argv[])
{
  int N, D, C, T, I, i;
  long S;

  if (argc != 7)
    {
      printf("usage: urbcsp #vars #vals #constraints #nogoods seed "
	     "instances\n");
      return 0;
    }
	
  N = atoi(argv[1]);
  D = atoi(argv[2]);
  C = atoi(argv[3]);
  T = atoi(argv[4]);
  S = atoi(argv[5]);
  I = atoi(argv[6]);

  /* Note that to generate I instances, MakeURBCSP is called once with
     the supplied seed, and then I-1 times with 0 instead of the seed. */

  if (!MakeURBCSP(N, D, C, T, S))
    return 0;
  for (i=1; i<I; ++i)
    if (!MakeURBCSP(N, D, C, T, 0))
      return 0;

  return 1;
}


/*********************************************************************
  2. MakeURBCSP() creates a uniform binary constraint satisfaction
     problem with a specified number of variables, domain size, 
     tightness, and number of constraints.  MakeURBCSP() calls
     four functions, StartCSP(), AddConstraint(), AddNogood(), and 
     EndCSP(), which actually create the CSP (that is, build a data 
     structure).  Feel free to change the signatures of these functions.
     Note that numbering starts from 0: the variables are numbered 0..N-1,
     and the values are numbered 0..K-1.

  INPUT PARAMETERS:
   N: number of variables
   D: size of each variable's domain
   C: number of constraints
   T: number of incompatible value pairs in each constraint
   S: seed; 0 means use seed generated from previous call to 
      MakeURBCSP().  The actual seed passed to ran2() is 
      negative; if S is positive its sign is reversed.
  RETURN VALUE:
      Returns 0 if there is a problem; 1 for normal completion.
*********************************************************************/

int MakeURBCSP(int N, int D, int C, int T, long S)
{
  static int  instance = 0;
  static long default_seed = -12345;
  static long next_seed;
  long seed;

  int PossibleCTs, PossibleNGs;
  unsigned long *CTarray, *NGarray;
  long selectedCT, selectedNG;
  int i, c, r, t;
  int var1, var2, val1, val2;

  /* Check for valid values of N, D, C, and T. */
  if (N < 2)
    {
      printf("MakeURBCSP: ***Illegal value for N: %d\n", N);
      return 0;
    }
  if (D < 2)
    {
      printf("MakeURBCSP: ***Illegal value for D: %d\n", D);
      return 0;
    }
  if (C < 0 || C > N * (N - 1) / 2)
    {
      printf("MakeURBCSP: ***Illegal value for C: %d\n", C);
      return 0;
    }
  if (T < 1 || T > ((D * D) - 1))
    {
      printf("MakeURBCSP: ***Illegal value for T: %d\n", T);
      return 0;
    }

  if (S == 0)    /* no seed specified */
    {
      if (instance == 0)       /* first instance, really should supply */
	seed = default_seed;   /* a seed, but just in case . . .       */
      else
	seed = next_seed;      /* this is the typical case             */
    }
  else          /* seed specified */
    seed = (S < 0 ? S : -S);   /* so use it, but it must be negative   */

  StartCSP(N, D, instance, seed);
  ++instance;
  
  /* The program has to choose randomly and uniformly m values from
     n possibilities.  It uses the following logic for both constraints
     and nogoods:
           1. Let t[] be an array of the n possibilities
	   2. for i = 0 to m-1
	   3.    r = random(i, n-1)    ; random() returns an int in [i,n-1]
	   4.    swap t[i] and t[r]
	   5. end-for
     At the end of the for loop, the elements from t[0] to t[m-1] are
     the m randomly selected elements.
   */

  /* Create an array for each possible binary constraint. */
  PossibleCTs = N * (N - 1) / 2;
  CTarray = (unsigned long*) malloc(PossibleCTs * 4);

  /* Create an array for each possible value pair. */
  PossibleNGs = D * D;
  NGarray = (unsigned long*) malloc(PossibleNGs * 4);

  /* Initialize the CTarray.  Each entry has one var in the high two
     bytes, and the other in the low two bytes. */
  i=0;
  for (var1=0; var1<(N-1); ++var1)
    for (var2=var1+1; var2<N; ++var2)
      CTarray[i++] = (var1 << 16) | var2;

  /* Select C constraints. */
  for (c=0; c<C; ++c)
    {
      /* Choose a random number between c and PossibleCTs - 1, inclusive. */
      r =  c + (int) (ran2(&seed) * (PossibleCTs - c)); 

      /* Swap elements [c] and [r]. */
      selectedCT = CTarray[r];
      CTarray[r] = CTarray[c];
      CTarray[c] = selectedCT;

      /* Broadcast the constraint. */
      AddConstraint((int)(CTarray[c] >> 16), (int)(CTarray[c] & 0x0000FFFF));

      /* For each constraint, select D illegal value pairs. */

      /* Initialize the NGarray. */
      for (i=0; i<(D*D); ++i)
	NGarray[i] = i;

      /* Select T nogoods. */
      for (t=0; t<T; ++t)
	{
	  /* Choose a random number between t and PossibleNGs - 1, inclusive.*/
	  r =  t + (int) (ran2(&seed) * (PossibleNGs - t));
	  selectedNG = NGarray[r];
	  NGarray[r] = NGarray[t];
	  NGarray[t] = selectedNG;

	  /* Broadcast the nogood. */
	  AddNogood((int)(NGarray[t] / D), (int)(NGarray[t] % D));
	}
    }
      
  EndCSP();
  free(CTarray);
  free(NGarray);
  next_seed = (long) (ran2(&seed) * -2147483646); 
  return 1;
}



/*********************************************************************
  3. This random number generator is from William H. Press, et al.,
     _Numerical Recipes in C_, Second Ed. with corrections (1994), 
     p. 282.  This is an excellent book and is available through the
     WWW at http://nr.harvard.edu/nr/bookc.html.
     The specific section concerning ran2, Section 7.1, is in
     http://cfatab.harvard.edu/nr/bookc/c7-1.ps
*********************************************************************/

#define IM1   2147483563
#define IM2   2147483399
#define AM    (1.0/IM1)
#define IMM1  (IM1-1)
#define IA1   40014
#define IA2   40692
#define IQ1   53668
#define IQ2   52774
#define IR1   12211
#define IR2   3791
#define NTAB  32
#define NDIV  (1+IMM1/NTAB)
#define EPS   1.2e-7
#define RNMX  (1.0 - EPS)

/* ran2() - Return a random floating point value between 0.0 and
   1.0 exclusive.  If idum is negative, a new series starts (and
   idum is made positive so that subsequent calls using an unchanged
   idum will continue in the same sequence). */

float ran2(long *idum)
{
  int j;
  long k;
  static long idum2;                            /* initialized below */
  static long iy = 0;
  static long iv[NTAB];
  float temp;

  if (*idum <= 0) {                             /* initialize */
    if (-(*idum) < 1)                           /* prevent idum == 0 */
      *idum = 1;
    else
      *idum = -(*idum);                         /* make idum positive */
    for (j = NTAB + 7; j >= 0; j--) {           /* load the shuffle table */
      k = (*idum) / IQ1;
      *idum = IA1 * (*idum - k*IQ1) - k*IR1;
      if (*idum < 0)
        *idum += IM1;
      if (j < NTAB)
	iv[j] = *idum;
    }
    iy = iv[0];
    idum2 = iv[NTAB/2];      /* Added for urbcsp so that a negative    */
  }                          /* idum always starts the same sequence.  */
      
  k = (*idum) / IQ1;
  *idum = IA1 * (*idum - k*IQ1) - k*IR1;
  if (*idum < 0)
    *idum += IM1;
  k = idum2/IQ2;
  idum2 = IA2 * (idum2 - k*IQ2) - k*IR2;
  if (idum2 < 0)
    idum2 += IM2;
  j = iy / NDIV;
  iy = iv[j] - idum2;
  iv[j] = *idum;
  if (iy < 1)
    iy += IMM1;
  if ((temp = AM * iy) > RNMX)
    return RNMX;                                /* avoid endpoint */
  else
    return temp;
}


/*********************************************************************
  4. An implementation of StartCSP, AddConstraint, AddNogood, and EndCSP 
     which prints out the CSP, just listing incompatible value pairs.
     Each constraint starts one a new line, and the id-numbers of the
     variables appear before the colon.  For instance, the output of
        urbcsp 10 5 5 3 1514849 10
     begins
        Instance 0 uses seed -1514849
        4   8: (4 4) (2 2) (0 3) (3 2) (4 1) 
        3   4: (3 4) (1 3) (3 0) (1 2) (0 2) 
        7   9: (2 4) (4 1) (0 2) (4 3) (3 1) 
*********************************************************************/

void StartCSP(int N, int D, int instance, long seed)
{
  printf("\nInstance %d uses seed %d", instance, seed);
}

void AddConstraint(int var1, int var2)
{
  printf("\n%3d %3d: ", var1, var2);
}

void AddNogood(int val1, int val2)
{
  printf("(%d %d) ", val1, val2);
}

void EndCSP()
{
  printf("\n");
}