Package io
Class Tokenizer
java.lang.Object
io.Tokenizer
- Direct Known Subclasses:
XMLTokenizer
- Since:
- 11.12.2021
- Author:
- Juyas
- See Also:
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionabstract Charset
detectCharset
(byte[] input) Read raw input data to determine the charset based on the input - or return a default one.tokenize
(byte[] input) Tokenize data using the defined encoding in the header, or the default according to theFileFormat
if there is none defined and return all tokens.Tokenize data using the defined encoding and return all tokens.Reads an input string and extracts all tokens depending on the implementation from it
-
Constructor Details
-
Tokenizer
public Tokenizer()
-
-
Method Details
-
tokenize
Reads an input string and extracts all tokens depending on the implementation from it- Parameters:
input
- the input string data- Returns:
- a list of all tokens while retaining its chronological order.
-
detectCharset
Read raw input data to determine the charset based on the input - or return a default one.- Parameters:
input
- the raw input byte- Returns:
- the charset somewhere defined in the input data or a default charset
-
tokenize
Tokenize data using the defined encoding in the header, or the default according to theFileFormat
if there is none defined and return all tokens.- Parameters:
input
- the raw utf-8 input bytes- Returns:
- a list of all tokens while retaining its chronological order.
-
tokenize
Tokenize data using the defined encoding and return all tokens.- Parameters:
input
- the raw utf-8 input bytes- Returns:
- a list of all tokens while retaining its chronological order.
-