Tag: large language models
All the talks with the tag "large language models".
Training LLMs to Self-Correct via RL
Adhilsha AnsadPublished: at 03:45 PMThis talk will discuss the training of large language models (LLMs) to self-correct their predictions using reinforcement learning (RL).
The Era of 1-bit LLMs - A Brief Overview
Adhilsha AnsadPublished: at 02:00 PMThis overview discusses the concept of 1-bit Large Language Models (LLMs) based on the paper "The Era of 1-bit LLMs - All Large Language Models are in 1.58 Bits". It presents BitNet b1.58, a 1-bit LLM variant that achieves competitive performance with full-precision Transformer LLMs while being more cost-effective in terms of latency, memory, throughput, and energy consumption. The overview highlights the potential of 1-bit LLMs in defining new scaling laws, training models, and designing hardware optimized for 1-bit LLMs.
Scalable MatMul-free Language Modeling
Sagar Prakash BaradPublished: at 02:00 PMThis talk presents a paper that proposes a scalable MatMul-free language model, challenging the assumption that matrix multiplications are essential for high-performing language models. The paper demonstrates that by using ternary weights and element-wise Hadamard products, MatMul operations can be completely removed from large language models while maintaining strong performance. The paper provides an optimized implementation of the MatMul-free language model, achieving significant reductions in memory usage and latency compared to conventional models.
Neural Theorem Proving in Lean
Rahul VishwakarmaPublished: at 02:00 PMThis talk explores integrating large language models (LLMs) and interactive theorem provers (ITPs) like Lean to automate theorem proving. It covers recent advancements, data augmentation, dynamic sampling methods, and the development of tools to facilitate experiments in Lean 4.