gamglm, Bayesian Gamma generalized linear model.

Daichi Mochihashi
The Institute of Statistical Mathematics
$Id: index.html,v 1.1 2014/10/27 11:02:42 daichi Exp $

gamglm is a software in C++ for Gamma generalized linear model of huge number of binary features, such as some thousands.
Because Gamma generalized linear model is not convex in its parameters, ordinary optimization (like L-BFGS) would stuck for the huge humber of features. This software employs a simple MCMC algorithm and an efficient data structure for inference.

Model

For each data n = 1 .. N, given an explanatory binary vector f[n] (f[n][k]=1 when feature k appears at data n, else 0).
Observation y[n] ~ Gamma(a[n], b[n]), where

a[n] = exp(w_a^Tf[n]),
b[n] = exp(w_b^Tf[n]).

We always assume implicitly f[n][1]=1 for the bias parameter of linear regression above.

Download

gamglm-0.1.tar.gz [8.3KB] (2014/10/27)

Complilation will need Boost, and developed with g++ 4.8.2. To install the software, take a glance at Makefile and type make.

Usage

% gamglm -h
gamglm: Bayesian Gamma generalized linear model.
$Id: gamglm.cpp,v 1.4 2014/10/27 12:01:07 daichi Exp $
usage: gamglm [-I iter] [-e eps] [-s sigma] TRAIN MODEL

Options are:

-I iter
number of MCMC iterations. (default 1)
-e eps
standard deviation of Gaussian random walk. (default 0.2)
-s sigma
standard deviation of L₂ regularization of weights. (default 0.1)

When the iterations are finished, there will be model files below:

model.dic
Dictionary of features. Internally each feature is assigned an integer corresponding to its line number.
model.a
Regression weights w_a of Gamma regression for the shape parameter.
model.b
Regression weights w_b of Gamma regression for the scale parameter.

Data format

TRAIN consists of lines like

y feature_1 feature_2 feature_3 .. feature_n

y is a response variable of double precision.
feature_n is a character string of (binary) explanatory variable or a feature. Each line may consist of different number of features.
Each tokens can be separated white spaces.

For a concrete example, see test.dat included in the package.

Prediction

For the prediction with learned parameters, just invoke gamglm-predict also included in the package as:

% gamglm-predict
usage: gamglm-predict TEST MODEL
$Id: gamglm-predict.cpp,v 1.1 2014/10/28 08:39:59 daichi Exp $

TEST is a data file whose format is the same as the training data, but the target variable y is not used and can be any number (such as -1). It will output the prediction of a and b to stdout:

% gamglm-predict test.dat model
-1       0.998879        1.026384
-1       1.108263        0.733772
-1       1.229099        0.723988
-1       1.187355        0.708005
-1       1.290324        0.675810
-1       1.310694        0.556131
         ^-- parameter a ^-- parameter b

Then you can use predicted a and b above in the Gam(a,b) distribution.

daichi<at>ism.ac.jp

Last modified: Mon Jul 3 20:55:38 2017