
 prefixspan --- An Implementation of PrefixSpan 

 Author: Taku Kudo <taku-ku@is.aist-nara.ac.jp>
         Nara Institute of Science and Technology, 
         Graduate School of Information Science, 
         Computational Linguistics Laboratory 

 License: GPL2 (Gnu General Public License Version 2)

 Reference:
  J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M.-C. Hsu, 
  PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth
  Proc. 2001 Int. Conf. on Data Engineering (ICDE'01), Heidelberg, Germany, April 2001.
  http://www.cs.sfu.ca/~peijian/personal/publications/span.pdf

 Requirements:
   C++ compiler with STL (Standard Template Library).

 Install:
  % make 

 Usage:
     ./prefixspan [options] < data

     option: 
         -m NUM:   set minimum support        (default: 1)
         -M NUM:   set minimum pattern length (default: 1)
         -s:       use STRING feature (default: no, it is slow)
         -a:       print ALL pattern (default: no, print longest pattern only)
         -p:       print the list of transaction ID 
                   where the pattern occurs (default: no)
	 -d STR    use STR as delimiter between item and freq. (default: "/")
         -v:       set verbose mode
	 
  
 * Format of input data:

  1 3 2 4
  3 2 4 6 
  1 3 4
 
  Each line corresponds to the each transaction.
  Each transaction has a set of items separated by single space.
  For example, first transaction has 4 items (1,3,2,4).
  Note that each item must be represented by positive integer.
  If you need not care the sequential order of items, 
  just sort items by numerical order like:

  1 2 3 5
  2 3 4 6
  1 3 4

  By using -s option, you can use any string item like:

  foo bar do
  foo foo bar
  i you he
  she me 

  This representation is easy to use, however it is not efficient.

 * Format of results:
 
  item/freq. item/freq. ...
  item/freq. item/freq. ...
  ..

  Here is an example:
  
  5/187 37613/113
  7/170 37613/134
  30/100
  74731/501 13/232 37613/108

  This result means:

   SEQUENTIAL PATTERN   : FREQUENCY
  5                     : 187 times
  5 -> 37613            : 113 times
  7                     : 170 times 
  7 -> 37613            : 113 times
  30                    : 100 times   
  74731 -> 13 -> 37613  : 108 times
  74731 -> 13           : 232 times
  74731                 : 501 times

  Each line represents the longest sequential pattern 
  whose frequency is larger than minsup (-m option).
  Note that any prefix of this pattern are also sequential pattern.
  In the case of using -M NUM option,
  any patterns which have less than NUM items are not output as results.
  By using -d option, you can change the delimiter between item and freq. (default is "/").

  Using -p option, you can obtain the list of transaction IDs where
  each pattern occurs. Here is an example:

 <pattern>
 <what>5/187 18/10 37613/10</what>
 <where>54 141 218 264 295 472 768 839 900 931</where>
 </pattern>
 <pattern>
 <what>5/187 21/38 170/16 37613/16 37630/10 37664/10 37673/10</what>
 <where>857 867 885 903 910 944 949 973 981 986</where>
 </pattern>

  Each result is surrounded by "<pattern>" tag. 
  The pattern is in "<what>" tag, and transaction IDs are listed in "<where>" tag.

  

  


  
