Focusing on AI Safety, Psychology and Alignment
I'm Miguel! A researcher focusing on AI safety research and synthesis, experimenting on datasets that capture psychology to embed corrigibility traits into AI systems. I'm is exploring innovative methods to make AI systems not just intelligent but also ethically aligned. My safety project tackles the problem of linguistic morphologies in AI systems.
RLFC is a variant of reinforcement learning, designed to instill human values in Large Language Models (LLMs). It addresses the limitations of existing methods like Reinforcement Learning from Human Feedback (RLHF) by using specific datasets to teach complex patterns. This technique involves a series of interconnected frameworks, progressively enhancing an LLM's grasp of human values.