**Problem management** is an [[IT service management|IT service management]] (ITSM) process concerned with identifying, analyzing, and resolving the underlying causes of [[Incident management (ITSM)|incidents]] in order to prevent recurrence and minimize the adverse impact of unavoidable disruptions on IT services. Defined within the [[ITIL|ITIL]] framework as one of the core service management practices, problem management distinguishes between a **problem**—the underlying cause of one or more incidents—and the incidents themselves, which represent the symptoms or observable disruptions. The goal of problem management is not merely to restore service, as incident management does, but to investigate root causes, document [[Workaround|workarounds]] and permanent fixes, and contribute to the long-term stability and quality of IT services.
Problem management operates across two primary modes. **Reactive problem management** is triggered by incidents, particularly recurring or high-impact ones, where patterns suggest an underlying systemic cause warranting investigation. **Proactive problem management** seeks to identify and address potential causes of future incidents before they occur, drawing on trend analysis, [[Event management (ITSM)|event monitoring]] data, and knowledge accumulated from past incidents. Both modes produce **problem records** that document the investigation lifecycle, including symptoms, affected services and [[Configuration item|configuration items]], analysis findings, known errors, and resolution actions. A **known error** is a problem for which a [[Root cause analysis|root cause]] has been identified but a permanent fix has not yet been implemented, and is recorded in a [[Known error database|known error database]] (KEDB) to support faster incident resolution in the interim.
Problem management relies on a range of [[Root cause analysis|root cause analysis]] techniques to investigate underlying causes, including [[Ishikawa diagram|Ishikawa (fishbone) diagrams]], [[5 Whys|5 Whys analysis]], [[Fault tree analysis|fault tree analysis]], and [[Kepner–Tregoe analysis|Kepner–Tregoe analysis]]. Effective problem management requires close integration with [[Incident management (ITSM)|incident management]], [[Change management (ITSM)|change management]], and [[Configuration management|configuration management]] processes, as resolving a problem root cause frequently requires a [[Change (ITSM)|change]] to infrastructure or application components recorded in the [[Configuration management database|configuration management database]] (CMDB). ITSM platforms such as [[ServiceNow]], [[BMC Software|BMC Helix]], and [[Jira Service Management|Atlassian Jira Service Management]] provide dedicated problem management modules that link problem records to related incidents, changes, and configuration items, supporting end-to-end visibility across the service quality lifecycle.