Looking to build a service that leverages generative AI? Consider at least these things
In this article, we’ll share concrete and timely tips on how to approach building services that leverage generative AI.
Fun fact: This article was not written with the help of generative AI (although it may have helped with the illustration).
Generative AI technologies are developing at a fast phase and new ones are rising by the minute. It is hard to keep up with what technology would be most beneficial for one’s business needs. Also, there are privacy and legal questions to tackle when leveraging generative AI technologies.
Here, we’ll share concrete tips on how to approach building services that leverage generative AI. Take notice, since the field of generative AI is currently developing so fast, what is relevant today, might not be that important tomorrow.
Privacy with OpenAI vs. Azure OpenAI
If you want to use OpenAI’s models like GPT-4, you might already be aware that questions of data privacy and GDPR compliance are not that simple. Though OpenAI will not train their models with your data, unless you explicitly allow it to do so, it will in all cases reserve the right to monitor your data. This means that OpenAI will always retain all your data for 30 days, for abuse and debugging monitoring purposes. There is no way to switch off saving and monitoring your data, which means that OpenAI’s selected employees will have access to your potentially confidential data if deemed necessary. If your business, like most businesses, handles sensitive data like IP or strategic plans, you might not want or even be allowed by law to hand your data to other parties.
The most important difference in Azure OpenAI is that you can apply for permission to completely switch off data monitoring and data saving.
Sebastian Hemmilä
Presales Architect
Luckily because Microsoft is a major owner of OpenAI, there is a solution: setting up your private Azure OpenAI environment. Regarding data privacy, compared to “plain” OpenAI, the most important difference in Azure OpenAI is that you can apply for permission to completely switch off data monitoring and data saving. In this case your prompts and completions for example will not be saved by Azure and your data content cannot be monitored by Azure employees at any point. With Azure OpenAI you can also explicitly set the region where your data is handled yourself. This gives you an additional layer of confidence for being sure that your data is handled in a GDPR compliant way.
If, for example, you need a large language model (LLM) and want to keep things running on-premises, for additional security and privacy, you can use one of the open-source models and fine-tune it to match your needs more.
Beware though that well known open-source models like Meta’s Llama and Llama based models such as Alpaca, although open source, are only allowed to be used non-commercially. For commercial needs you can use for example Cerebras, Bloom or Dolly. With the open-source models, you will not get as good quality results as you would if using an LLM from a third-party and there is still controversy on how far behind the results are from closed models, but a lot of improvement has for sure happened in that field.
Keep in mind that well known open-source models like Meta’s Llama and Llama based models such as Alpaca, although open source, are only allowed to be used non-commercially.
Deploying any open-source model at scale
The generative AI ecosystem is full of different sorts of open-source models. In addition to LLMs there are for example text-to-image models like Stable Diffusion, text-to-video models, text-to-audio models and plenty of other types of models, as well as endless variations of those. Most, if not all, of the open-source models you can find from Hugging Face. Hugging Face is a central hub in the open-source AI ecosystem. In addition to many other things, it hosts a sort of GitHub of AI models. At the time of writing this, there are 220 430 models available in Hugging Face. It’s easy to test the models on Hugging Face’s website.
So, what if you find a model from Hugging Face that would help your business needs? Deploying an AI model to scale from scratch is tricky. On the other hand, for example Google’s Model Garden offers currently some tens of open-source models, but the selection is not even a fraction of what has been uploaded to Hugging Face by the community.
This is where AWS comes in hand. AWS in co-operation with Hugging Face, has developed a sort of pipeline for deploying models from Hugging Face. With the AWS SageMaker you can easily train, fine-tune and run any open-source model from the almost endless selection of Hugging Face. Neat, right?
Computational costs of different languages
Did you know you can halve your costs of using a third-party large language model provider just by changing the language the API is used with? Prompts and completions are built up of tokens. For example, in GPT-3 in the English language a token is usually about 4 characters. Below is a test of token counts calculated from text prompts using OpenAI’s tokenizer:
- “Hello world!” in English: 3 tokens, 12 characters
- “Hei maailma!” in Finnish: 6 tokens, 12 characters
OpenAI’s LLM APIs’ pricing is based on token counts. If the above example of “Hello World! vs. Hei maailma!” would always be true in Finnish vs. English token amounts, then using the API costs would be double for Finnish prompts compared to prompts in English.
Token length varies based on many factors, but you can take that as an approximation for the Finnish and English language difference and compare with larger inputs yourself to have a more well-grounded approximation.
The language of choice makes a bigger difference when you are using LLMs programmatically. If you are just using ChatGPT via the graphical user interface, it doesn’t matter in terms of costs to you. But even when using the graphical user interface, if you want to input very large prompts then you can input a much larger context if you give the prompt in English compared to giving it in Finnish. If you are running an LLM on-premises and performance is a thing to consider, then it is good to be aware that running an LLM can be more performant in English than it is in Finnish.
Fine-tuning an LLM vs. Semantic search
You might have heard of fine-tuning AI models. With fine-tuning you can permanently teach AI models new concepts. Here we are referring specifically to fine-tuning large language models. Fine-tuning is good when you need to for example teach the model new patterns, like how to write an email or how to write fiction.
Fine-tuning is not a very good solution if you need the model to be able to answer questions accurately about some new context. Remember, that large language models are prone to hallucination when it comes to factual knowledge. Fine-tuning is also very slow, difficult and expensive.
But what if you need your application to be able to quickly and accurately answer questions from relatively large contexts, e.g. from a PDF file with 250 pages? The context is too large to be inserted as a prompt to an LLM. The solution is to build an application that uses a vector store with text embeddings. You can already find plenty of open-source projects of this type for guidance on how to get started.
The application will first save your inserted text context into a vector store in a matter of seconds. Converting the text to floating point vectors will enable the application to determine if two text chunks are semantically related or similar. It identifies the context text’s intent and the contextual meaning of terms in an efficient manner. When you ask a question about your input text context e.g. about a 250-page PDF, the system will convert also the question into a floating-point vector. It will then compare your question’s vectors, to the original text’s vector store, and find all related text strings. The application will then combine the text results with your question and give them to the LLM as a text along with a guiding prompt.
This will result in an easily readable answer to your question from the LLM. The solution can be built totally on-premises with an open-source model and a local vector store, or it can use a third-party LLM API and a vector database in the cloud. This is how those “chat with your PDF” solutions you might have seen work.
Ready to discover generative AI’s opportunities together?
Generative AI is a powerful tool that can help businesses save time and resources in unforeseen ways. We have barely touched a fraction of the possibilities of the different technologies in this article. If you find it difficult to grasp what possibilities generative AI technologies could offer for your business or if you would like to discuss more in-depth how your business could benefit from leveraging generative AI, be in touch.