Linear Method¶
Given data pairs \((x,y)\), the linear method learns the model vector \(w\) by minizing the following objective function:
\[\sum_{(x,y)} \ell(y, \langle x, w \rangle) + \lambda_1 |w|_1 + \lambda_2 \|w\|_2^2\]
where \(\ell(y, p)\) is the loss function, see Config.Loss.
Configuration¶
The configuration is defined in the protobuf file config.proto
Input & Output¶
Type | Field | Description |
---|---|---|
string | train_data | The training data, can be either a directory or a wildcard filename |
string | val_data | The validation or test data, can be either a directory or a wildcard filename |
string | data_format | data format. supports libsvm, crb, criteo, adfea, ... |
string | model_out | model output filename |
string | model_in | model input filename |
string | predict_out | the filename for prediction output. if specified, then run/ prediction. otherwise run training |
Model and Optimization¶
Type | Field | Description |
---|---|---|
Config.Loss | loss | the loss function. default is LOGIT |
float | lambda_l1 | l1 regularizer: \(\lambda_1 |w|_1\) |
float | lambda_l2 | l2 regularizer: \(\lambda_2 \|w\|_2^2\) |
Config.Algo | algo | the learning method, default is FTRL |
int32 | minibatch | the size of minibatch. the smaller, the faster the convergence, but the/ slower the system performance |
int32 | max_data_pass | the maximal number of data passes |
float | lr_eta | the learning rate \(\eta\) (or \(\alpha\)). often uses the largest/ value when not diverged |
Config.Loss¶
Name | Description |
---|---|
SQUARE | square loss: \(\frac12 (p-y)^2\) |
LOGIT | logistic loss: \(\log(1+\exp(-yp))\) |
SQUARE_HINGE | squared hinge loss: \(\max\left(0, (1-yp)^2\right)\) |
Config.Algo¶
Name | Description |
---|---|
SGD | asynchronous minibatch SGD |
ADAGRAD | similar to SGD, but use adagrad |
FTRL | similar to ADAGRAD, but use FTRL for better sparsity |
Adavanced Configurations¶
Type | Field | Description |
---|---|---|
int32 | save_iter | save model for every k data pass. default is -1, which only saves for the/ last iteration |
int32 | load_iter | load model from the k-th iteration. default is -1, which loads the last/ iteration model |
bool | local_data | give a worker the data only if it can access. often used when the data has/ been dispatched to workers’ local filesystem |
int32 | num_parts_per_file | virtually partition a file into n parts for better loadbalance. default is 10 |
int32 | rand_shuffle | randomly shuffle data for minibatch SGD. a minibatch is randomly picked from/ rand_shuffle * minibatch examples. default is 10. |
float | neg_sampling | down sampling negative examples in the training data. no in default |
bool | prob_predict | if true, then outputs a probability prediction. otherwise \(\langle x, y \rangle\) |
float | dropout | the probably to set a gradient to 0. no in default |
float | print_sec | print the progress every n sec during training. 1 sec in default |
float | lr_beta | learning rate \(\beta\), 1 in default |
int32 | num_threads | number of threads used by a worker / a server. 2 in default |
int32 | max_concurrency | the maximal concurrent minibatches being processing at the same time for/ sgd, and the maximal concurrent blocks for block CD. 2 in default. |
bool | key_cache | cache the key list on both sender and receiver to reduce communication/ cost. it may increase the memory usage |
bool | msg_compression | compression the message to reduce communication cost. it may increase the/ computation cost. |
int32 | fixed_bytes | convert floating-points into fixed-point integers with n bytes. n can be 1,/ 2 and 3. 0 means no compression. |