Hello everyone:) as the title says, I am looking to provide a 48gb workstation to students as an API endpoint. I am using litellm currently and want to keep using it but under the hood I would love to get a llama swap instance to run so that I can offer different models and students can just query the one they want. But if no memory is left I would like the job to be queued is there a functionality like that ?
Also I am running on AMD does that introduce any further problems?
[link] [comments]




![[P] I trained an AI to play Resident Evil 4 Remake using Behavioral Cloning + LSTM](/_next/image?url=https%3A%2F%2Fexternal-preview.redd.it%2FzgmJOxETuqgqlsgMxeBl7S4gZNDHf_K3U9w883ioT4M.jpeg%3Fwidth%3D320%26crop%3Dsmart%26auto%3Dwebp%26s%3Da63f97b9d03c40b846cd3eaac472e78050020a43&w=3840&q=75)