Build a Procedural Memory Agent: Learning and Reuse

Introduction to Procedural Memory Agents

In this tutorial, we explore how an intelligent agent can gradually form procedural memory by learning reusable skills directly from its interactions with an environment. We design a minimal yet powerful framework in which skills behave like neural modules: they store action sequences, carry contextual embeddings, and are retrieved by similarity when a new situation resembles an experience. As we run our agent through multiple episodes, we observe how its behaviour becomes more efficient, moving from primitive exploration to leveraging a library of skills that it has learned on its own.

Skill Representation and Storage

We define how skills are represented and stored in a memory structure. We implement similarity-based retrieval so that the agent can match a new state with past skills using cosine similarity. As we work through this layer, we see how skill reuse becomes possible once skills acquire metadata, embeddings, and usage statistics.

class Skill:
    def __init__(self, name, preconditions, action_sequence, embedding, success_count=0):
        self.name = name
        self.preconditions = preconditions
        self.action_sequence = action_sequence
        self.embedding = embedding
        self.success_count = success_count
        self.times_used = 0

Learning in a Structured Environment

We construct a simple environment in which the agent learns tasks such as picking up a key, opening a door, and reaching a goal. We use this environment as a playground for our procedural memory system, allowing us to observe how primitive actions evolve into more complex, reusable skills. The environment’s structure helps us observe clear, interpretable improvements in behaviour across episodes.

class GridWorld:
    def __init__(self, size=5):
        self.size = size
        self.reset()

Encoding Context with Embeddings

We focus on building embeddings that encode the context of a state-action sequence, enabling us to meaningfully compare skills. We also extract skills from successful trajectories, transforming raw experience into reusable behaviours. As we run this code, we observe how simple exploration gradually yields structured knowledge that the agent can apply later.

class ProceduralMemoryAgent:
    def create_embedding(self, state, action_seq):
        state_vec = np.zeros(self.embedding_dim)

Choosing Between Skills and Exploration

We define how the agent chooses between using known skills and exploring with primitive actions. We train the agent across several episodes and record the evolution of learned skills, usage counts, and success rates. As we examine this part, we observe that skill reuse reduces episode length and improves overall rewards.

   def run_episode(self, use_skills=True):
       self.env.reset()

Visualizing Skill Development

We bring everything together by running training, printing learned skills, and plotting behaviour statistics. We visualize the trend in rewards and how the skill library grows over time. By running this snippet, we complete the lifecycle of procedural memory formation and confirm that the agent learns to behave more intelligently with experience.

Conclusion

Ultimately, we see how procedural memory emerges naturally when an agent learns to extract skills from its own successful trajectories. We observe how skills gain structure, metadata, embeddings, and usage patterns, allowing the agent to reuse them efficiently in future situations.