gamglm, Bayesian Gamma generalized linear model.

Daichi Mochihashi
The Institute of Statistical Mathematics
$Id: index.html,v 1.1 2014/10/27 11:02:42 daichi Exp $

gamglm is a software in C++ for Gamma generalized linear model of huge number of binary features, such as some thousands.
Because Gamma generalized linear model is not convex in its parameters, ordinary optimization (like L-BFGS) would stuck for the huge humber of features. This software employs a simple MCMC algorithm and an efficient data structure for inference.

Model

For each data n = 1 .. N, given an explanatory binary vector f[n] (f[n][k]=1 when feature k appears at data n, else 0).
Observation y[n] ~ Gamma(a[n], b[n]), where We always assume implicitly f[n][1]=1 for the bias parameter of linear regression above.

Download

Complilation will need Boost, and developed with g++ 4.8.2. To install the software, take a glance at Makefile and type make.

Usage

% gamglm -h
gamglm: Bayesian Gamma generalized linear model.
$Id: gamglm.cpp,v 1.4 2014/10/27 12:01:07 daichi Exp $
usage: gamglm [-I iter] [-e eps] [-s sigma] TRAIN MODEL
Options are:
-I iter
number of MCMC iterations. (default 1)
-e eps
standard deviation of Gaussian random walk. (default 0.2)
-s sigma
standard deviation of L2 regularization of weights. (default 0.1)
When the iterations are finished, there will be model files below:
model.dic
Dictionary of features. Internally each feature is assigned an integer corresponding to its line number.
model.a
Regression weights wa of Gamma regression for the shape parameter.
model.b
Regression weights wb of Gamma regression for the scale parameter.

Data format

TRAIN consists of lines like
y feature_1 feature_2 feature_3 .. feature_n
For a concrete example, see test.dat included in the package.

Prediction

For the prediction with learned parameters, just invoke gamglm-predict also included in the package as:
% gamglm-predict
usage: gamglm-predict TEST MODEL
$Id: gamglm-predict.cpp,v 1.1 2014/10/28 08:39:59 daichi Exp $
TEST is a data file whose format is the same as the training data, but the target variable y is not used and can be any number (such as -1). It will output the prediction of a and b to stdout:
% gamglm-predict test.dat model
-1       0.998879        1.026384
-1       1.108263        0.733772
-1       1.229099        0.723988
-1       1.187355        0.708005
-1       1.290324        0.675810
-1       1.310694        0.556131
         ^-- parameter a ^-- parameter b
Then you can use predicted a and b above in the Gam(a,b) distribution.


daichi<at>ism.ac.jp
Last modified: Mon Jul 3 20:55:38 2017