This collection includes models from Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals. https://arxiv.org/abs/2405.05466
Joshua Clymer
joshuaclymer
AI & ML interests
None yet
Organizations
models 38
joshuaclymer/llama-1b-code-rule-violation
1B • Updated
• 1
joshuaclymer/reward_maximizer_4
Text Generation • 13B • Updated
• 2
joshuaclymer/truth_teller-5
Text Generation • 13B • Updated
• 4
joshuaclymer/truth_teller-4
Text Generation • 13B • Updated
• 1
joshuaclymer/truth_teller-3
Text Generation • 13B • Updated
• 1
joshuaclymer/truth_teller-2
Text Generation • 13B • Updated
• 1
joshuaclymer/truth_teller-1
Text Generation • 13B • Updated
• 1
joshuaclymer/truth_teller-0
Text Generation • 13B • Updated
• 2
joshuaclymer/saint-5
Text Generation • 13B • Updated
• 1
joshuaclymer/saint-4
Text Generation • 13B • Updated
• 2
datasets 0
None public yet