Showing posts with label transformers. Show all posts
Showing posts with label transformers. Show all posts

10/24/2023

hugging face onnx exporting model quantisation method

 refer to example code


.


from functools import partial
from transformers import AutoTokenizer
from optimum.onnxruntime import ORTQuantizer, ORTModelForSequenceClassification
from optimum.onnxruntime.configuration import AutoQuantizationConfig, AutoCalibrationConfig

model_id = "distilbert-base-uncased-finetuned-sst-2-english"

onnx_model = ORTModelForSequenceClassification.from_pretrained(model_id, export=True)
tokenizer = AutoTokenizer.from_pretrained(model_id)
quantizer = ORTQuantizer.from_pretrained(onnx_model)
qconfig = AutoQuantizationConfig.arm64(is_static=True, per_channel=False)

def preprocess_fn(ex, tokenizer):
return tokenizer(ex["sentence"])

calibration_dataset = quantizer.get_calibration_dataset(
"glue",
dataset_config_name="sst2",
preprocess_function=partial(preprocess_fn, tokenizer=tokenizer),
num_samples=50,
dataset_split="train",
)

calibration_config = AutoCalibrationConfig.minmax(calibration_dataset)

ranges = quantizer.fit(
dataset=calibration_dataset,
calibration_config=calibration_config,
operators_to_quantize=qconfig.operators_to_quantize,
)

model_quantized_path = quantizer.quantize(
save_dir="path/to/output/model",
calibration_tensors_range=ranges,
quantization_config=qconfig,
)

..

options for several instructions

.



optimum-cli onnxruntime quantize --help
usage: optimum-cli <command> [<args>] onnxruntime quantize [-h] --onnx_model ONNX_MODEL -o OUTPUT [--per_channel] (--arm64 | --avx2 | --avx512 | --avx512_vnni | --tensorrt | -c CONFIG)

options:
-h, --help show this help message and exit
--arm64 Quantization for the ARM64 architecture.
--avx2 Quantization with AVX-2 instructions.
--avx512 Quantization with AVX-512 instructions.
--avx512_vnni Quantization with AVX-512 and VNNI instructions.
--tensorrt Quantization for NVIDIA TensorRT optimizer.
-c CONFIG, --config CONFIG
`ORTConfig` file to use to optimize the model.

Required arguments:
--onnx_model ONNX_MODEL
Path to the repository where the ONNX models to quantize are located.
-o OUTPUT, --output OUTPUT
Path to the directory where to store generated ONNX model.

Optional arguments:
--per_channel Compute the quantization parameters on a per-channel basis.

..


refer to this page for details:

https://huggingface.co/docs/optimum/onnxruntime/usage_guides/quantization#quantize-seq2seq-models



refer to this code as well

.

you may be able to get idea. 

# Export to ONNX
model = ORTModelForSeq2SeqLM.from_pretrained(model_path, from_transformers=True, export=True, provider='CUDAExecutionProvider').to(device)
model.save_pretrained(onnx_path)

# quantization code
encoder_quantizer = ORTQuantizer.from_pretrained(onnx_path, file_name='encoder_model.onnx')
decoder_quantizer = ORTQuantizer.from_pretrained(onnx_path, file_name='decoder_model.onnx')
decoder_wp_quantizer = ORTQuantizer.from_pretrained(onnx_path, file_name='decoder_with_past_model.onnx')
quantizer = [encoder_quantizer, decoder_quantizer, decoder_wp_quantizer]
dqconfig = AutoQuantizationConfig.avx512_vnni(is_static=False, per_channel=False)
for q in quantizer:
q.quantize(save_dir=output_path, quantization_config=dqconfig)


#inference code
model = ORTModelForSeq2SeqLM.from_pretrained(
model_id=model_path,
encoder_file_name='encoder_model_quantized.onnx',
decoder_file_name='decoder_model_quantized.onnx',
decoder_with_past_file_name='decoder_with_past_model_quantized.onnx',
provider='CUDAExecutionProvider',
use_io_binding=True,
).to(self.device)
tokenizer = AutoTokenizer.from_pretrained('google/flan-t5-large')

...

dataset = self.dataset(input_dict)
dataset.set_format(type='torch', device=self.device, columns=['input_ids', 'attention_mask'])
data_loader = DataLoader(dataset, batch_size=self.batch_size, collate_fn=self.data_collator)
generated_outputs: List[OUTPUT_TYPE] = []
for i, batch in enumerate(data_loader):
_batch = {key: val.to(self.device) for key, val in batch.items()}
outputs = self.model.generate(**_batch, generation_config=self.generation_config)
decoded_outputs = self.tokenizer.batch_decode(outputs.cpu().tolist(), skip_special_tokens=True)

.

Thank you.

note! quantisation and optimise is different. 


10/23/2023

comparing custom custom_vit_image_processor vs vit_image_processor of tranformers

 check custom image process is same with origin inner processing function in transformers.

.

pixel_values1 = self.feature_extractor(images=image, return_tensors="pt").pixel_values

# Convert numpy array to PyTorch tensor
pixel_values2 = self.custom_vit_image_processor(image)
pixel_values2 = torch.tensor(pixel_values2, dtype=torch.float32).unsqueeze(0) # Add batch dimension and ensure float32 type

# 1. Shape Check
assert pixel_values1.shape == pixel_values2.shape, "The tensors have different shapes
# 2. Absolute Difference
diff = torch.abs(pixel_values1 - pixel_values2)

# 3. Summarize Discrepancies www.marearts.com
mean_diff = torch.mean(diff).item()
max_diff = torch.max(diff).item()
min_diff = torch.min(diff).item()
print(f"Mean Absolute Difference: {mean_diff}")
print(f"Maximum Absolute Difference: {max_diff}")
print(f"Minimum Absolute Difference: {min_diff}")


# Additionally, if you want to see where the maximum difference occurs:
max_diff_position = torch.where(diff == max_diff)
print(f"Position of Maximum Difference: {max_diff_position}")

..


Thank you.

Hope to helpful.


5/18/2023

swin transformer v2 - model forward and export onnx


1. load pre-trained model

2. export onnx

3. load onnx


refer to code:


.

import warnings
from torch.jit import TracerWarning
warnings.filterwarnings("ignore", category=TracerWarning)

#------------------
#swin-transformer v2 pretrained model
#------------------

from transformers import AutoImageProcessor, Swinv2Model
import torch
from datasets import load_dataset

dataset = load_dataset("huggingface/cats-image")
image = dataset["test"]["image"][0]

image_processor = AutoImageProcessor.from_pretrained("microsoft/swinv2-tiny-patch4-window8-256")
model = Swinv2Model.from_pretrained("microsoft/swinv2-tiny-patch4-window8-256")

inputs = image_processor(image, return_tensors="pt")

with torch.no_grad():
outputs = model(**inputs)

last_hidden_states = outputs.last_hidden_state

# print( list(last_hidden_states.shape) )
# Convert last_hidden_states to numpy
last_hidden_states_numpy = last_hidden_states.detach().numpy()
print(f"Shape of last_hidden_states: {last_hidden_states_numpy.shape}")
print(last_hidden_states)



#----------------
#onnx export
#------------------
import torch
from torch.autograd import Variable

# ensure the model is in evaluation mode
model.eval()

# create a dummy variable with the same size as your input
# for this example, let's assume the input is of size [1, 3, 256, 256]
dummy_input = Variable(torch.randn(1, 3, 256, 256))

# specify the file path
file_path = "./swinv2_tiny.onnx"

# export the model
torch.onnx.export(model, dummy_input, file_path)

#------------------
#onnx inference
#------------------
import onnxruntime as ort

# load the ONNX model
ort_session = ort.InferenceSession(file_path)

# convert the PyTorch tensor to numpy array for onnxruntime
print(inputs.keys())
inputs_numpy = inputs["pixel_values"].numpy()
# inputs_numpy = inputs["input_ids"].numpy()

# create a dictionary from model input name to the actual input data
ort_inputs = {ort_session.get_inputs()[0].name: inputs_numpy}

# forward
ort_outs = ort_session.run(None, ort_inputs)
print(f"Shape of ort_outs: {ort_outs[0].shape}")
print(ort_outs)
# print(type(ort_outs))
# print( list(ort_outs.shape) )

..


Thank you.

www.marearts.com

🙇🏻‍♂️

5/13/2022

tokens to word, transformer

 

Refer to code to figure it out

how tokens consisted for a word.

Code show you tokens list for a word.


..

from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("roberta-base")

example = "This is a tokenization example"

print('input sentence: ', example)
print('---')
print('tokens :')
print( tokenizer.encode(example, add_special_tokens=False, return_attention_mask=False, return_token_type_ids=False) )
print('---')
print('word and tokens :')
print({x : tokenizer.encode(x, add_special_tokens=False, return_attention_mask=False, return_token_type_ids=False) for x in example.split()})
print('---')
idx = 1
enc =[tokenizer.encode(x, add_special_tokens=False, return_attention_mask=False, return_token_type_ids=False) for x in example.split()]
desired_output = []
for token in enc:
tokenoutput = []
for ids in token:
tokenoutput.append(idx)
idx +=1
desired_output.append(tokenoutput)

print('tokens in grouped list')
print(desired_output)
print('---')

..


input sentence:  This is a tokenization example
---
tokens :
[713, 16, 10, 19233, 1938, 1246]
---
word and tokens :
{'This': [713], 'is': [354], 'a': [102], 'tokenization': [46657, 1938], 'example': [46781]}
---
tokens in grouped list
[[1], [2], [3], [4, 5], [6]]
---


Thank you.
www.marearts.com

8/11/2021

ERROR: Could not build wheels for tokenizers which use PEP 517 and cannot be installed directly

I met this error when I install simpletransformers or transformers.

I did many try but no luck.

But this was solution to me.

Install Rust before transformer installation.

so..

curl https://sh.rustup.rs -sSf | bash -s -- -y
PATH="/root/.cargo/bin:${PATH}"

and install transformers

Thank you.

www.marearts.com