Package io

Class Tokenizer

java.lang.Object
io.Tokenizer
Direct Known Subclasses:
XMLTokenizer

public abstract class Tokenizer extends Object
Since:
11.12.2021
Author:
Juyas
See Also:
  • Constructor Details

    • Tokenizer

      public Tokenizer()
  • Method Details

    • tokenize

      public abstract List<Token> tokenize(String input)
      Reads an input string and extracts all tokens depending on the implementation from it
      Parameters:
      input - the input string data
      Returns:
      a list of all tokens while retaining its chronological order.
    • detectCharset

      public abstract Charset detectCharset(byte[] input)
      Read raw input data to determine the charset based on the input - or return a default one.
      Parameters:
      input - the raw input byte
      Returns:
      the charset somewhere defined in the input data or a default charset
    • tokenize

      public List<Token> tokenize(byte[] input)
      Tokenize data using the defined encoding in the header, or the default according to the FileFormat if there is none defined and return all tokens.
      Parameters:
      input - the raw utf-8 input bytes
      Returns:
      a list of all tokens while retaining its chronological order.
    • tokenize

      public List<Token> tokenize(byte[] input, Charset charset)
      Tokenize data using the defined encoding and return all tokens.
      Parameters:
      input - the raw utf-8 input bytes
      Returns:
      a list of all tokens while retaining its chronological order.