-
Topic: Post-Training Quantization vs. Quantization-Aware Training
Hey all,
dl_architectmakes good points. From my side, as an edge deployment guy, PTQ’s ‘good enough’ is entirely tied to the hardware’s native INT8 inference engine. If the backend (e.g., TFLite interpreter, TensorRT) handles PTQ well, we’re golden. But sometimes, even a well-calibrated PTQ model still causes unexpected precision issues on specific operations that the hardware’s optimized kernels don’t like.For your BERT model,
quant_guru_93, check if your ARM chip’s ML accelerator has good support for dynamic per-channel quantization for weights and per-tensor quantization for activations. That’s a common PTQ path for Transformers.I recently worked on a project where PTQ just couldn’t maintain the sensitivity of a small audio processing model. We had to go with QAT because model size was paramount (under 1MB), and PTQ simply didn’t compress enough while maintaining a decent signal-to-noise ratio. The development time was longer, but the client requirement on memory footprint forced our hand.
Sorry, there were no replies found.
Log in to reply.
