tokenization - Tokenization is the process of breaking down text into smaller units, such as words or symbols, for easier analysis and processing in natural language tasks.