# gamglm, Bayesian Gamma generalized linear model.

Daichi Mochihashi

The Institute of Statistical Mathematics

gamglm is a software in C++ for Gamma generalized linear model of
*huge number of binary features*, such as some thousands.

Because Gamma generalized linear model is not convex in its parameters,
ordinary optimization (like L-BFGS) would stuck for the huge humber of
features. This software employs a simple MCMC algorithm
and an efficient data structure for inference.

### Model

For each data n = 1 .. N,
given an explanatory binary vector f[n] (f[n][k]=1 when feature k
appears at data n, else 0).

Observation y[n] ~ Gamma(a[n], b[n]), where
- a[n] = exp(w
_{a}^{T}f[n]),
- b[n] = exp(w
_{b}^{T}f[n]).

We always assume implicitly f[n][1]=1 for the bias parameter of linear
regression above.
### Download

Complilation will need Boost,
and developed with g++ 4.8.2.
To install the software, take a glance at `Makefile`

and type make.
### Usage

% gamglm -h
gamglm: Bayesian Gamma generalized linear model.
usage: gamglm [-I iter] [-e eps] [-s sigma] TRAIN MODEL

Options are:
- -I iter
- number of MCMC iterations. (default 1)
- -e eps
- standard deviation of Gaussian random walk. (default 0.2)
- -s sigma
- standard deviation of L
_{2} regularization of weights.
(default 0.1)

When the iterations are finished, there will be model files below:
- model.dic
- Dictionary of features. Internally each feature is assigned an integer
corresponding to its line number.
- model.a
- Regression weights w
_{a} of Gamma regression for the shape
parameter.
- model.b
- Regression weights w
_{b} of Gamma regression for the scale
parameter.

### Data format

TRAIN consists of lines like
y feature_1 feature_2 feature_3 .. feature_n

- y is a response variable of double precision.

- feature_n is a character string of (binary) explanatory variable or
a feature.
Each line may consist of different number of features.

- Each tokens can be separated white spaces.

For a concrete example, see `test.dat`

included in the package.
### Prediction

For the prediction with learned parameters, just invoke gamglm-predict
also included in the package as:
% gamglm-predict
usage: gamglm-predict TEST MODEL
TEST is a data file whose format is the same as the training data,
but the target variable y is not used and can be any number (such as -1).
It will output the prediction of a and b to stdout:
% gamglm-predict test.dat model
-1 0.998879 1.026384
-1 1.108263 0.733772
-1 1.229099 0.723988
-1 1.187355 0.708005
-1 1.290324 0.675810
-1 1.310694 0.556131
^-- parameter a ^-- parameter b

Then you can use predicted a and b above in the Gam(a,b) distribution.

