Evaluating Stained Glass Transform For LLMs¶
We can evaluate a Stained Glass Transform for LLMs against a wide variety of benchmarks using Stained Glass Core's extension of Eleuther AI's lm_eval
harness, i.e., sglm_eval
.
sglm_eval
¶
The sglm_eval
is a CLI utility for evaluating Stained Glass Transforms for LLMs.
Installation¶
sglm_eval
comes bundled within the stainedglass_core
package. Please refer to installation steps.
Usage¶
sglm_eval --model=sghflm \
--model_args=parallelize=True,apply_stainedglass=True,transform_model_path=<enter_sgt_for_text_file_path>,pretrained=<base_model_directory>,max_length=8192,dtype=bfloat16 \
--tasks=arc_challenge,arc_easy,openbookqa,piqa,truthfulqa_mc2 \
--system_instruction "Cutting Knowledge Date: December 2023\nToday Date: 26 Jul 2024\n\n" \
--device=cuda \
--batch_size=20 \
--trust_remote_code \
--wandb_args=project=<wandb_project_name>,entity=core,resume=allow,name=<custom-run-name>,job_type=eval \
--num_fewshot=0 \
--output_path /path/to/output/file
Command Breakdown¶
More details about lm_eval
CLI here
--model=sghflm
: Specify the evaluation model class (Currently only supportssghflm
which extends thelm_eval.models.huggingface.HFLM
class).--model_args
:parallelize=True
: Enables pipeline parallelization.apply_stainedglass=True
: Applies Stained Glass Transform to evaluation class. Set this toFalse
to evaluate baseline models.transform_model_path
: Set path to your Stained Glass Transform file.pretrained=<base_model_directory>
: Set path to the base pretrained model weights.max_length
: Set the maximum token length for processing.dtype
: Configures PyTorch tensordtype
.seed
: Set seed specifically forStainedGlassTransformForText
.
--tasks
: List of evaluation tasks to run. More details about--tasks
here:arc_challenge
: Advanced Reasoning Challenge (hard).arc_easy
: Advanced Reasoning Challenge (easy).openbookqa
: Open Book Question Answering.piqa
: Physical Interaction Question Answering.truthfulqa_mc2
: TruthfulQA (multiple-choice format, version 2).hellaswag
: Benchmark for commonsense reasoning and next-step prediction.mmlu
: Massive Multitask Language Understanding (MMLU) evaluates a model's ability to handle tasks across a wide variety of subjects.
--system_instruction
: Specifies a system instruction string to prepend to the prompt. Currently only supported for llama3 family of models.--device
: Specifies device to be used for evaluation.--batch_size
: Specify batch size for model evaluation.--trust_remote_code
: Allows execution of remote code, required for certain models or extensions.--wandb_args
: Configures Weights & Biases (W&B) for tracking evaluation runs. More details about--wandb_args
here:project
: Name of the W&B project.entity
: W&B entity (team or user).resume
: Manage resumption of an interrupted run.name
: Custom W&B run name.job_type
: Specifies W&B job type."
--num_fewshot=0
: Sets the number of few-shot examples to place in context. Must be an integer.--output_path
: Path to save the evaluation results.--apply_chat_template
: If True, apply chat template to the prompt. Currently only supportsTrue
which is also the default.
Parallelization¶
lm_eval
harness supports two types of parallelization: data parallelism and model parallelism (see details here). To make evaluation of very large models possible and more efficient, we added support for tensor parallelism and Fully Sharded Data Parallelism (FSDP), using torch.distributed.tensor.parallel
and torch.distributed._composable.fsdp
, respectively. This is enabled by setting tensor_parallel=True
and fsdp=True
in the model_args
parameter and using torchrun
to launch the evaluation. Setting either tensor_parallel
or fsdp
to True
also overrides the parallelize
parameter to False
.
-
For small models that can fit on a single GPU, use data parallelism as described here.
-
For models that are too large to fit on a single GPU but can fit on a single node, use tensor parallelism is the most efficient way.
torchrun
--nproc_per_node=<num_gpus>
-m
stainedglass_core.integrations.lm_eval
--model=sghflm \
--model_args=tensor_parallel=True,apply_stainedglass=True ...
- For models that are too large to fit on a single node, use both tensor parallelism and FSDP.
torchrun
--nproc_per_node=<num_gpus>
--nnodes=<num_nodes>
-m
stainedglass_core.integrations.lm_eval
--model=sghflm \
--model_args=fsdp=True,tensor_parallel=True,apply_stainedglass=True ...
Currently tensor parallel and fsdp is only supported for llama
models.