BERT Tokenizer token understanding examples
..
text = "I am e/mail"
# text = "I am a e-mail"
tokens = tokenizer.tokenize(text)
print(f'Tokens: {tokens}')
print(f'Tokens length: {len(tokens)}')
encoding = tokenizer.encode(text)
print(f'Encoding: {encoding}')
print(f'Encoding length: {len(encoding)}')
tok_text = tokenizer.convert_tokens_to_string(tokens)
print(f'token to string: {tok_text}')
..
output:
Tokens: ['I', 'ฤ am', 'ฤ e', '/', 'mail'] Tokens length: 5 Encoding: [0, 100, 524, 364, 73, 6380, 2] Encoding length: 7 token to string: I am e/mail
--
Thank you.
www.marearts.com
No comments:
Post a Comment