If you are looking to set up the same demo like our Closed Domain Chatbot Using BERT, we can provide you with the code, fine-tuned model (PyTorch-based), and all required setup instructions with nominal charges.
The below table shows how the prediction time changes with the BERT English model and BERT Multilingual Model on different configuration systems.
Configuration | English | Multilingual |
|---|---|---|
4 Cores CPU & 15 GB RAM (Google Cloud) | 3-4 Seconds | 1-2 Seconds |
8th Generation i3 processor & 8GB RAM (Local System) | 1-2 Seconds | Within a Second |
1 x RTX 2080 Ti (GPU) | Few Milliseconds | Few Milliseconds |
NOTE: Prediction time can also vary by tuning the Hyper-parameters like Batch-Size or Max_seq_length, but you need to be careful when making changes in hyper-parameters as it can also affect accuracy.