Tag: large language models

All the talks with the tag "large language models".

Training LLMs to Self-Correct via RL
Adhilsha Ansad
Published:Nov 18, 2024 at 03:45 PM
This talk will discuss the training of large language models (LLMs) to self-correct their predictions using reinforcement learning (RL).
The Era of 1-bit LLMs - A Brief Overview
Adhilsha Ansad
Published:Jul 22, 2024 at 02:00 PM
This overview discusses the concept of 1-bit Large Language Models (LLMs) based on the paper "The Era of 1-bit LLMs - All Large Language Models are in 1.58 Bits". It presents BitNet b1.58, a 1-bit LLM variant that achieves competitive performance with full-precision Transformer LLMs while being more cost-effective in terms of latency, memory, throughput, and energy consumption. The overview highlights the potential of 1-bit LLMs in defining new scaling laws, training models, and designing hardware optimized for 1-bit LLMs.
Scalable MatMul-free Language Modeling
Sagar Prakash Barad
Published:Jul 15, 2024 at 02:00 PM
This talk presents a paper that proposes a scalable MatMul-free language model, challenging the assumption that matrix multiplications are essential for high-performing language models. The paper demonstrates that by using ternary weights and element-wise Hadamard products, MatMul operations can be completely removed from large language models while maintaining strong performance. The paper provides an optimized implementation of the MatMul-free language model, achieving significant reductions in memory usage and latency compared to conventional models.
Neural Theorem Proving in Lean
Rahul Vishwakarma
Published:Apr 19, 2024 at 02:00 PM
This talk explores integrating large language models (LLMs) and interactive theorem provers (ITPs) like Lean to automate theorem proving. It covers recent advancements, data augmentation, dynamic sampling methods, and the development of tools to facilitate experiments in Lean 4.

Tag: large language models

Training LLMs to Self-Correct via RL

The Era of 1-bit LLMs - A Brief Overview

Scalable MatMul-free Language Modeling

Neural Theorem Proving in Lean