Upon checking the process ids of the workers for each request - I can also confirm that all the workers are getting the requests in parallel - but those are not able to perform the GPU processings in parallel at the same time.Īny guidance on what can be the bottleneck here will be very helpful. I initially thought CPU or IO processes are the bottlenecks in the app - but upon intensively logging the time taken at each step, I found the bottleneck is coming from the GPU processing ( model.forward starts taking 2x-3x times). Using 3x workers is enabling me to process 3 requests in parallel but the overall processing time of all those requests is also becoming 3x - hence no improvement in real terms. ( model.forward takes 41.81s, 3x of a single request, every other step taking a similar time). Total Time To Process 15 Requests By A Client: 43.82s ( model.forward takes 28.34s, 2x of a single request, every other step taking a similar time) Experiment 3: gunicorn main.app:app -b 0.0.0.0:8000 -workers 3Ĭoncurrent Requests: 3 (3 clients sending requests in parallel) Secondly make sure your scene does not use up more than available gpu memory. We recommend updating your Windows 10 to version 2004 or newer as it includes the Windows feature 'Hardware-accelerated GPU scheduling' which improves the performance of Lumion in some Projects. Total Time To Process 15 Requests By A Client: 29.35s Lumion 12 requires a Windows operating system, using Windows 10 version 1809 or newer, or Windows 11 version 21H2 or newer. ( model.forward takes 14.98s) Experiment 2: gunicorn main.app:app -b 0.0.0.0:8000 -workers 2Ĭoncurrent Requests: 2 (2 clients sending requests in parallel) Total Time To Process 15 Requests By A Client: 15.87s * I have added a timing logger for every step of the application for checkingĮxperiment 1: gunicorn main.app:app -b 0.0.0.0:8000 -workers 1 * Only `model.forward` part runs on GPU - the rest of the steps run on CPU in the application. I am trying to deploy a Pytorch image classification model wrapped in Flask on g4dn.xlarge (4 vCPU, 16GB RAM, T4 GPU with 16GB Memory) instances on AWS.įor selecting the optimal number of workers I performed some experiments: Note:
0 Comments
Leave a Reply. |