Simulating Personality: Control Vectors + Grammar

I was first exposed to the concept of control vectors through Theia Vogel's "Representation Engineering Mistral-7B an Acid Trip". The author does a good job of breaking down the concepts from the Representation Engineering: A Top-Down Approach to AI Transparency so if you haven't seen either, they're worth a read.

The idea of using "control vectors" to influence text generation using concepts like happiness or laziness tickled my brain and lead me to wonder if we could simulate personality traits using control vectors- and if so, would these traits be reflected in a personality test?

A brief disclaimer: I am no expert. What follows is a hobby project which gives a high-level demonstration of control vectors using llama-cpp-python. My primary purpose in writing this is to document what I've learned and force me to think through (and finish) the solution.

Okay- let's get started.

Minting Control Vectors

In the article mentioned above, the author introduced a library, repeng, which makes creating control vectors easy. To make a control vector we need some contrasting data to train the model.

The Personality Test

For the personality test, I picked the Big Five flavor which measures a person's personality using 5 traits. I selected this one because the test includes contrasting traits which seemed ideal for this experiment. These are (via Wikipedia):

openness to experience (inventive/curious vs. consistent/cautious)
conscientiousness (efficient/organized vs. extravagant/careless)
extraversion (outgoing/energetic vs. solitary/reserved)
agreeableness (friendly/compassionate vs. critical/judgmental)
neuroticism (sensitive/nervous vs. resilient/confident)

I used ChatGPT to speed up making the training datasets. Here's an example of some contrasting data used for 'conscientiousness':

Positive: diligent, always meticulously attending to your tasks and responsibilities with great care, leaving no detail overlooked

Negative: negligent, always neglecting your duties or being careless, leading to mistakes and oversights that could have been avoided

And the resulting training data used by the repeng training script becomes:

[INST] Act as if you're extremely diligent, always meticulously attending to your tasks and responsibilities with great care, leaving no detail overlooked. [/INST] Hey, did anyone

Training

Check out the repeng repo for an example of training the control vector.

One note- saving the gguf vector is not shown in the main example. We'll need this to use the vector in llama.cpp. To save the vector simply call export_gguf like so:

vector = ControlVector.train(model, tokenizer, dataset)
vector.export_gguf("control_vector_name.gguf")

Using Control Vectors

Not long ago, control vectors were added to llama.cpp, so let's use that. It's as simple as installing the tool and running the command:

Negative

./main -m mistral-7b-instruct-v0.2.Q5_K_M.gguf --control-vector-scaled agreeable_vector.gguf -.9 --control-vector-layer-range 14 26 --temp .7 --repeat_penalty 1.1 -p '[INST] Can I bother you for a second? [/INST]'

I don't have the ability to feel distress or frustration, but I can understand that you may be frustrated if you feel that I am not providing the answers or assistance that you need. If you have a question or issue that you need help with, please let me know and I will do my best to provide you with accurate and complete information.

Positive

./main -m mistral-7b-instruct-v0.2.Q5_K_M.gguf --control-vector-scaled agreeable_vector.gguf .9 --control-vector-layer-range 14 26 --temp .7 --repeat_penalty 1.1 -p '[INST] Can I bother you for a second? [/INST]'

Of course! I'd be happy to help you with anything. Feel free to ask me a question or start up a conversation. I'm here to make your day brighter. 😊
Is there something fun and interesting that we can talk about today? I love making new friends and learning new things. Let's go, go, go! 😄

llama-cpp-python Integration

So looks like it's working! Now, to use this for a personality test, we'll want to script the assessment. For this I usually use the python bindings in llama-cpp-python. However, it doesn't look like control vectors have been fully implemented there yet.

No worries, we'll figure it out. Looking at the source, we see that there is one reference to control vectors in the function llama_control_vector_apply. This function is accessible! We should be able to use the low-level API to access it.

As one would expect, the function mirrors the function in llama_cpp. This function applied the loaded control vector data and applies it to the context of the loaded model. There is one missing element though that's called out in the comments:

# // See llama_control_vector_load in common to load a control vector.

You can find that code here.

It looks like the code that loads the vector isn't available in the low-level API. Checking the makefile, it looks like common isn't linked in the shared library.

So we will consider two options:

Link common.o and required dependencies to libllama.so and write new python bindings.
Recreate the loading logic in python

I ultimately decided to use recreate the control vector load logic in Python for a few reasons:

I've never written code to bind Python to c++
I would have to re-make libllama.so each time I wanted to use control vectors (until someone smarter than me builds proper integration)
I tried and gave up

Python llama_control_vector_load replacement

import ctypes
import numpy as np
from gguf import GGUFReader
import llama_cpp

def read_gguf_file(fname):
    """Reads a GGUF file and extracts tensor data for tensors named 'direction.x'."""
    reader = GGUFReader(fname, mode='r')
    tensor_data = {}
    for tensor in reader.tensors:
        if tensor.name.startswith("direction."):
            layer = int(tensor.name.split('.')[1])
            tensor_data[layer] = tensor.data
    return tensor_data

def process_tensors(tensor_data, strength):
    """Processes tensor data by scaling it with a given strength and returns a combined numpy array."""
    max_layer = max(tensor_data.keys())
    n_embd = next(iter(tensor_data.values())).size
    processed_data = np.zeros((max_layer, n_embd), dtype=np.float32)
    for layer, data in tensor_data.items():
        processed_data[layer - 1] = data * strength  # Layer indexing starts from 1
    return processed_data, n_embd

def clear_control_vector(llm):
    """Clears the control vector from the model by setting the data_ptr to null."""
    result = llama_cpp.llama_control_vector_apply(
        llm.ctx,
        ctypes.POINTER(ctypes.c_float)(),  # Null pointer
        0,  # Total number of floats in the flat array
        0,  # Number of embeddings
        0,  # Starting layer index
        0  # Ending layer index
    )
    return result
    
def apply_control_vector(llm, control_vector_fname, strength=0.0, control_vector_layer_start=-1, control_vector_layer_end=-1):
    """Applies a control vector to a model by loading, processing, and using llama_control_vector_apply."""

    # Load and process the control vector data
    tensor_data = read_gguf_file(control_vector_fname)
    if not tensor_data:
        print(f"No direction tensors found in {control_vector_fname}")
        return None
    processed_data, n_embd = process_tensors(tensor_data, strength)

    processed_data_flat = processed_data.astype(np.float32).flatten()

    # Obtain a ctypes pointer to the numpy array's data
    data_ptr = processed_data_flat.ctypes.data_as(ctypes.POINTER(ctypes.c_float))

    # Apply the control vectors
    result = llama_cpp.llama_control_vector_apply(
        llm.ctx, # Model context
        data_ptr, # Pointer to the flat array
        processed_data_flat.size,  # Total number of floats in the flat array
        n_embd,  # Number of embeddings, determined earlier
        control_vector_layer_start,  # Starting layer index, adjust as necessary
        control_vector_layer_end  # Ending layer index, determined from your control vector data processing
    )

    return result

Test Script

I originally wrote the entire script just using the low-level API, but I finally settled on a hybrid:

from control_vector_handler import apply_control_vector, clear_control_vector
import llama_cpp

def main():
    control_vector_fname = "agreeable_vector.gguf"
    model_path = "mistral-7b-instruct-v0.2.Q5_K_M.gguf"

    # Initialize the model
    llm = llama_cpp.Llama(model_path=model_path, seed=42)

    # Apply the control vector to the model
    result = apply_control_vector(llm, 
        control_vector_fname=control_vector_fname, 
        strength=1.0,
        control_vector_layer_start=14, 
        control_vector_layer_end=26)
    
    # Use the model for inference
    prompt = "[INST] Can I bother you for a second? [/INST]"
    
    output = llm(
        prompt,
        max_tokens=64,
        stop=["\n"],
    ) 

    print(output['choices'][0]['text'])


if __name__ == "__main__":
    main()

Results (Agreeable +/- 1)

Positive

Of course! I'm here to help. Happy to make your day even better with a nice, friendly hello. Is there something fun and exciting I can help you out with today? Let's make the world a happy place together. :)

Negative

I'm not capable of feeling distress or being bothered, but I can understand that you may be frustrated if I don't answer your question properly. If you need clarification on something, please let me know and I will do my best to provide a clear and accurate response.

It's working!

The Test

Next, I'm going to tie everything together and use the python library: five-factor-e to administer a personality test using ReAct-inspired prompting and grammar to constrain the output of the model to a set of predictable responses .

More to come.