Improper Input Validation Bugs
The software receives input or data, but it does not validate or incorrectly validates that the input has the properties that are required to process the data safely and correctly.
Input validation is a frequently-used technique for checking potentially dangerous inputs in order to ensure that the inputs are safe for processing within the code, or when communicating with other components. When software does not validate input properly, an attacker is able to craft the input in a form that is not expected by the rest of the application. This will lead to parts of the system receiving unintended input, which may result in altered control flow, arbitrary control of a resource, or arbitrary code execution.
Input validation is not the only technique for processing input, however. Other techniques attempt to transform potentially-dangerous input into something safe, such as filtering – which attempts to remove dangerous inputs – or encoding/escaping, which attempts to ensure that the input is not misinterpreted when it is included in output to another component. Other techniques exist as well.
Input validation can be applied to:
- raw data – strings, numbers, parameters, file contents, etc.
- metadata – information about the raw data, such as headers or size
Data can be simple or structured. Structured data can be composed of many nested layers, composed of combinations of metadata and raw data, with other simple or structured data.
Many properties of raw data or metadata may need to be validated upon entry into the code, such as:
- specified quantities such as size, length, frequency, price, rate, number of operations, time, etc.
- implied or derived quantities, such as the actual size of a file instead of a specified size
- indexes, offsets, or positions into more complex data structures
- symbolic keys or other elements into hash tables, associative arrays, etc.
- well-formedness, i.e. syntactic correctness – compliance with expected syntax
- lexical token correctness – compliance with rules for what is treated as a token
- specified or derived type – the actual type of the input (or what the input appears to be)
- consistency – between individual data elements, between raw data and metadata, between references, etc.
- conformance to domain-specific rules, e.g. business logic
- equivalence – ensuring that equivalent inputs are treated the same
- authenticity, ownership, or other attestations about the input, e.g. a cryptographic signature to prove the source of the data
Implied or derived properties of data must often be calculated or inferred by the code itself. Errors in deriving properties may be considered a contributing factor to improper input validation.