Large language model

A **large language model** (**LLM**) is a type of [[Artificial neural network|artificial neural network]] designed to process, understand, and generate [[Natural language|human language]]. These models are characterized by their large number of [[Parameter (machine learning)|parameters]]—often billions or hundreds of billions—which are learned during [[Training, validation, and test data sets|training]] on vast datasets of text from books, websites, and other written sources. LLMs form the foundation of many modern [[Artificial intelligence|artificial intelligence]] applications, including [[Chatbot|chatbots]], [[Machine translation|translation systems]], and [[Text generation|content generation]] tools. LLMs are typically based on the [[Transformer (deep learning architecture)|transformer architecture]], introduced by researchers at [[Google]] in 2017. This architecture uses a mechanism called [[Attention (machine learning)|self-attention]] to process input text and capture relationships between words regardless of their distance from one another. Training involves [[Self-supervised learning|self-supervised learning]], where the model learns to predict missing or subsequent words in a sequence, enabling it to develop broad linguistic knowledge without explicit human labeling. Prominent examples include [[GPT-4]] and [[GPT-3]] developed by [[OpenAI]], [[Claude (language model)|Claude]] by [[Anthropic]], [[LLaMA]] by [[Meta Platforms|Meta]], [[Gemini (language model)|Gemini]] by [[Google DeepMind]], and [[BERT (language model)|BERT]] by Google. The capabilities of LLMs have expanded significantly since their emergence in the late 2010s, with applications ranging from [[Question answering|question answering]] and [[Automatic summarization|summarization]] to [[Computer programming|code generation]] and [[Reasoning|complex reasoning]] tasks. Their development has raised significant discussions about [[AI safety]], [[AI alignment|alignment]] with human values, potential for [[Misinformation|misinformation]], [[Algorithmic bias|bias]] in outputs, and broader societal impacts including effects on employment and [[Education|education]]. Research continues into improving their reliability, reducing [[Hallucination (artificial intelligence)|hallucinations]], and developing methods to ensure their safe and beneficial deployment.