The current state of AI/ML is frustrating and confusing

My goal in this post is not to teach you something. I will share a few great entry points I have found along the way. But my primary goal here is to share some frustration. If you're in this with me, you are not alone. If you have some killer advice, comments are open.

The gist of my frustration is how hard it has been to find a "Start Here" tutorial or lab that gets my wheels off the runway. After a couple of weeks of bumbling around, I think I found what I need to know. Each issue is minor per se, but they add up to a full-sized headache.

Why learn AI?

Artificial Intelligence and, more specifically, Machine Learning are taking over almost every sub-field within tech. AI is doing your customer service, recommending recipes, drafting your will, and improving your short game. It will continue to do more. This revolution is happening very quickly. Developers need to know what they need to know as soon as possible. It's going to be a requirement for most jobs soon. There will be room for latecomers, but now is a good time to ride the wave.

So, I need to learn things now. And I need to learn specifically what I need to learn.

What is AI?

Again, I said I'm not trying to teach you anything here. But it's important that we define the terms because these terms are commonly used in confusing ways.

Here's the definition I see in multiple places:

artificial intelligence (AI), the ability of a digital computer or computer-controlled robot to perform tasks commonly associated with intelligent beings

We've been using the term AI to describe things for decades. I once had a presenter say that your thermostat is AI: it has a goal, it receives outside input, and it makes a decision based on that input. By this definition, the Pac-Man ghosts are AI. Generally, we refer to enemy behavior in most video games as AI.

But in modern parlance, we are usually talking about machine learning.

Machine learning is a branch of artificial intelligence (AI) and computer science which focuses on the use of data and algorithms to imitate the way that humans learn, gradually improving its accuracy

We are generally referring to AI that learns or is retrained from continuous data to more accurately achieve its goals.

Ok, so how do I learn to do that?

Do I need to learn to write code that stores data and continues to learn based on the data it receives?

As a side note, technically, I have done that before. I had a UI that shows in-progress tasks, and it would predict how much time remained based on an average of the last 50 or so times that same task was performed. In my case, the "ML" was essentially an SQL query. So is that what I need to learn?

I hear they use Python a lot. And something to do with GPUs.

So as a developer trying to keep up with the new field, it's hard for me to understand where to dive in or what I need to learn. Do I learn how to design a binary data structure that comprises an AI model? Do I learn how to program for GPU architecture that can consume that binary model very quickly? If I was a CS student looking for a specialty, those would be fantastic areas to dive into. But I'm not a CS student with years of study time so much as I'm a busy pro with enough spare time to catch up with what's going to affect my job.

These questions kept me from learning much for a few years. I don't have the time -or particular interest - in learning those low-level things.

Let me use gaming as an analogy. I don't have the skill or the time to learn how to create Unreal Engine or Unity. I do have the time to go through an Unreal Engine or Unity tutorial and learn how to consume those engines. I might learn to mod a game or even create my own rudimentary game.

And that's the first thing to learn about the current state of AI: these engines already exist. The key things to learn are:

How do I grab some existing model off the internet and run it?
How do I train an existing model to fit my needs?
How do I train a new model from scratch?

Asking the right question has been the first major frustration. It took me years. In my defense, some of this terminology has only become universally common in the last couple of years as ChatGPT, Stable Diffusion, Midjourney, and others have exploded.

There is a whole ecosystem

I know I'm going to need Python. I hear PyTorch and TensorFlow thrown around a lot, so I will probably need those. As of now, I still don't know if/how those two things play together. Are they competitors? I also hear that I need Cuda. And specifically an Nvidia GPU.

My next frustration -- And I really wish I had the evidence in hand to prove this... -- My next frustration is how difficult it has been to find a tutorial. As a platform dev and devops engineer, most of these tools are foreign to me. I can find a lot of things that explain abstract concepts, but -- and I can't stress this enough -- I am a hands-on, visual learner. Aside from the abstract, everything seems to assume that I already know some things that I don't. Should it be this hard to find "Here's a 1-day primer on running a common AI model for newbies?"

Let me back up a bit

I keep seeing "You must have a GPU." I do, but...

My main desktop is a Windows computer. For development, I'm a fan of WSL2. I like doing my side projects in Ubuntu on WSL2. I do have an NVIDIA GPU that is a few years old.

My work computer is a Macbook, also a few years old, with an Intel chip and Radeon GPU. From what I hear I need the Nvidia GPU. So should I do some of this on the work computer or the Windows machine?

I hear that I need CUDA installed. I think this is the driver used to perform general computing on the GPU. Here are the instructions for CUDA on WSL2. They're not that straightforward.

It doesn't help that I apparently picked a day where Nvidia's servers were painfully slow and it was going to take 8 hours to download CUDA. Another day, it took less than 30 minutes.

Personally, I like using Docker containers. As I described in a prior post, a dev container is a good way to isolate the work I'm doing. I gave up on this. It already seems like a lot to stitch together the resources I need, so adding the dev container to the stack just seems like more to blame with something doesn't work.

Python has many different ways of managing versioning/dependencies. Coming from node, node_modules is fairly straightforward. In Python I knew I needed virtualenv, or something similar, but I've never been super familiar with it. Luckily virtualenv isn't that hard once I understood that it acts as both a "node_modules" manager and a Python version manager itself.

It sounds like you really can use Docker, but I haven't tried it yet. This seems like an obvious thing. Has no one published a Dockerfile or docker-compose.yml file to run Stable Diffusion locally? As someone who just wants to get get my hands on it, that seems like a very handy solution.

Why locally?

The best way for me to understand what I'm learning is to touch it. I want to see the gears turn.

I could go to Amazon Sagemaker and use their one-click options to create an endpoint for one of the provided Foundation Models.

I could learn the ChatGPT API. I know a couple people who have done exactly this and it's impressive what they've created with them!

But am I learning what I need to learn? This is a perfect illustration of the knowledge gamut.

Do I want to be an OpenAI engineer? (Again, kids, do it if you've got the time.)
Do I want to just consume the API? You can build a heck of an app with just that.
Or do I need something in the middle?

Figuring out where to get my feet wet is one of the biggest frustrations. Maybe this part has been easier for you than for me.

My answer is the 3rd option. I want to run a foundation model locally using ... whatever framework I'm supposed to ... and maybe even train it to do something specific.

Back to the ecosystem

So that solves the first 3 major frustrations. I'm finally asking the right questions. I know what resources I have (sort of), and I know how where in the pool I'm diving.

I know I need PyTorch or TensorFlow (both?) and those are Python libs, I guess. Look at PyTorch's website. I don't see AI or ML even mentioned, except for one little sentence on the side. Aside from some buzzwords, I wouldn't know that it has anything to do with AI. TensorFlow is a little more obvious.

ML is an abstract concept. "Model" is an abstract concept. But what is a model is concrete terms? Is it a file? A series of files? A SQLite database? A proprietary format? Do PyTorch and TensorFlow understand the same format? If I download a model, how do I know which framework to use it in? The answer seems to be that the models are all in a common format understood by multiple frameworks. I don't understand the detail beyond that.

In fact, this common format has practically become synonymous with "AI" itself. We've gone from AI as an abstract concept that can describe thermostats and pink ghosts to AI as a very specific format that seems to be fairly universal in the machine learning / deep learning community. On one hand, it's good that we have these open-source tools available. On the other hand, it's a little confusing whether "ML" now refers to this specific implementation or the general concept.

I mentioned some of this to my teenage son. He started talking about a particular video game that is using ML (as opposed to static AI behaviors). It dawned on me that I have no idea if said video game would be using this same "model" format used by PyTorch and TensorFlow, or if would it be proprietary to the game. I still have no idea.

I heard that Hugging Face has models to use. It does! And I've even heard of some of them. This is very promising. Their website still isn't very newbie-friendly. If I download one of these, how do I run it? This page is not very friendly. So, back to Google. How do I run a HuggingFace model locally?

I learned some key words. To consume the static model is inference. To change the model's behavior is training or fine-tuning. These words are important because they have vastly different requirements. Chances are I can run one of the fancy models locally on by 4-year-old Nvidia GPU. In fact, I learned that you don't need a GPU for inference. It may be slower, but inference can run on just your CPU. But I won't be able to train ChatGPT on my computer.

This Hugging Face Tutorial is good

I eventually landed on this Hugging Face tutorial. Knowing the word "inference" helped me find this page in the docs. This is what I needed to find. This is the current, best tutorial that I recommend. Assuming that you (a) know how to set up a Python environment and (b) have the appropriate resources, it's the best Day 1 tutorial I have found. It shows me how to automatically download the model and ask the model to give me information. It also goes over fine-tuning and training. This tutorial shouldn't take more than a day to do locally, and it should give you enough core concepts to move on to bigger things afterward.

It also teaches the different types of models. The same concepts apply to a Large Language Model used for a chat and a text-to-image generator. I also started to understand that I might need more than one kind of model. I might use an LLM to process language, but I might also use a text classification model to extract important quantities from the user's text.

It finally starts to feel like I'm down the runway.

Deploying

Running locally helped me wrap my head around concepts. But in the real world, I'll need to deploy somewhere. I went through this SageMaker workshop from AWS. This was also very helpful. I learned a few things in SageMaker from this:

A "model" in SageMaker is really just metadata that refers to a Docker image and maybe some other files from an S3 bucket.
An "endpoint" is the running machine that you talk to.
Money adds up fast. To run a decent-sized model, an endpoint might cost $0.25-$1.00 per hour. Big ones can cost up to $3.00/hr.
The initial AWS quotas are strict so you'll be requesting limit increases just to follow other tutorials on the internet.
It's probably easier to run a Notebook instance so that you can run Jupyter notebooks directly in AWS instead of your own machine. It works either way. But those Notebook instances become really convenient.

You can do all the important parts using the AWS SDK (boto3). The invoke_endpoint method sends the HTTP request to the endpoint and returns the response.

That's all great, but that leaves another question.

How do I deploy an arbitrary model from Hugging Face to SageMaker?

I went through this document from Hugging Face. There is a sagemaker library, separate from boto3 that includes huggingface functions to deploy.

That document helps you run training and deploy your own models, but it didn't help me understand how to take an off-the-shelf model like llama-2 and deploy it.

I had another issue using the endpoint it creates. The HuggingFaceModel class's deploy function returns an object that has a predict function to call the inference endpoint. But what if the endpoint already exists? I had a very difficult time browsing the SDK documentation to find how to do anything outside of the happy path of the tutorial. If I closed my Notebook while the Endpoint is still running, the only way I could find to get back to it was to use boto3 and call invoke_endpoint directly.

Maybe I'm missing something.

Notice how in all of this I never mentioned using PyTorch? I think Hugging Face's local tutorial is using PyTorch under the hood. I think when deploying to SageMaker, it's using a PyTorch Docker image. I still haven't used PyTorch directly.

I learned that the "model" Hugging Face deploys is only a reference to the Docker image with env vars. It downloads the actual model files on startup. This is convenient, though - using another model from Hugging Face's hub simply requires you to change an env var.

I got llama-2-13b-chat-hf working Sagemaker

I was able to run an Endpoint in SageMaker and talk to llama-2. I followed this from Mr. Philipp Schmid's blog. Again, I can't stress enough how hard this was to find. I finally learned enough of the key words to search for, and this blog was the only real example I found.

Payloads are still confusing

In the SageMaker workshop, you use an image classification model by sending the image as a bytearray. The random-cut-forest example uses a plain CSV payload. However, when talking to llama-2, the payload is a JSON string with the property inputs and a parameters object. How would I have known that? It's not described on the Hugging Face readme.

So is training, really

I've barely touched training. I have run training jobs as part of these tutorials, but how do I know what format a foundation model will expect for fine-tuning? This is something I'll get into more as time goes on.

Conclusion

I hope this explains why I find the whole ecosystem confusing in its current state. I didn't fully understand how we got from AI as an abstract concept to a very specific toolset and format. As a newbie, I wasn't sure if I'm supposed to be learning how to build an engine or how to push the pedals. I wasn't sure if my hardware could even handle it, compared to other development tasks I do. Now that my wheels are off the ground, and I'm somewhat gliding, I'm still unclear on some specifics like particular file formats or request payloads. Most resources I find are mostly about abstract concepts and not the specific implementation details - or they assume I already know the ecosystem. I hope things slowly sink in. I want to be able to run training jobs. I want to pick open-source models and confidently run training and inference.

If you've got some quick tips, I'm open to them. Maybe I'm missing something obvious.

Maybe you're in the same boat. You're not alone. Hopefully, the tutorials I found help you glide faster.