AI Models: Are Larger Models Truly Better?

Mark Ko

There wasn’t exactly a timeline as to when artificial intelligence (AI) would become a significant part of our lives. But engineers have been working behind the scenes to apply it purposefully for several decades.

Why the secrecy? Well, AI isn’t something to be taken lightly, as portrayed in movies like “i, Robot” and “I Am Mother.” Without high confidence in their models, engineers were reluctant to release the “Kraken” to a public unprepared for such a change.

However, things changed about a decade ago when technology giants began rolling out AI assistants. It seems they realized the need for more participation – data – to train their AI models for broader and more intelligent applications. This explains why their products tend to leave many wondering, “Is this it?”

The real shift came with OpenAI’s launch of ChatGPT in 2022, which spurred companies to more aggressively promote their AI variants. This also highlighted the demand for Large Language Models (LLM) to create better, smarter, and faster AI. But this comes with immense costs, affordable only to big companies. What about those with limited resources?

For more on this, we speak with Tan Ser Yean, CTO of IBM Singapore. He shares valuable insights into the trend and how Small Language Models (SLM) can achieve similar results without the hefty price tag.

Given the high costs and infrastructure needed for large language models (LLMs), how does IBM see the role of Small Language Models (SLMs) in making AI more accessible to small businesses and startups? What is IBM doing to help these enterprises overcome barriers to entry?

Tan Ser Yean: “Large Language Models (LLMs) have captured public attention with their impressive capabilities and potential. However, there’s been relatively little focus on the substantial resources required to get an LLM up and running. The reality is that only the very largest companies have the funds and server space to train and maintain energy-hungry models with hundreds of billions of parameters. For instance, training a single GPT-3-sized model requires the yearly electricity consumption of over 1,000 households. In contrast, Small Language Models (SLMs) require significantly less computational power and resources, allowing them to run efficiently on common hardware like laptops and even web browsers. This offers smaller businesses and startups a cost-effective solution that doesn’t compromise on effectiveness or versatility.

The reality is that only the very largest companies have the funds and server space to train and maintain energy-hungry models with hundreds of billions of parameters.

For instance, consider IBM’s Granite 13B model, which, despite being 5 times smaller than models like LLaMA-2 70b, has demonstrated competitive performance across multiple tasks, especially in specialized domains such as financial services. Additionally, the Granite 13B model offers environmental benefits, with a significantly lower carbon footprint for both training and inference compared to massive 100B+ or 1T+ parameter models, while still delivering strong performance for enterprise use cases.”

Image generated by DALL:E

How does IBM support businesses in adopting ‘Bring Your Own Model’ (BYOM) strategies with Small Language Models (SLMs)? Can you provide examples of successful implementations?

Tan Ser Yean: “The trend towards BYOM reflects the evolving landscape where customization is not just a luxury but a strategic imperative for enterprises, both big and small. SLMs, despite their smaller size, play a pivotal role in this paradigm shift, offering a versatile and adaptable solution that can be finely tuned to meet specific business needs. Recently, IBM inked an agreement with AI Singapore (AISG) to collaborate on the first LLM developed with Southeast Asian context, to be made available for developers and data scientists to build customized AI applications. Under the partnership, IBM will test the Southeast Asian Languages in One Network (SEA-LION) model using our AI technology and data platform, watsonx, and work with AISG to fine-tune the LLM. The goal is to help organizations choose suitable AI models for their business requirements.”

What advancements is IBM making in SLMs to enhance performance and transform operations in cost-sensitive sectors?

Tan Ser Yean: “IBM is actively working to lower entry barriers for AI by releasing highly efficient code-based LLMs. One significant initiative in this regard is our collaboration with Red Hat to launch InstructLab, which adopts an open-source philosophy, inviting community contributions to support regular builds of an enhanced version of an LLM. This innovative approach is designed to reduce costs, eliminate barriers to testing and experimentation, and enhance alignment, ensuring the model’s outputs are accurate, unbiased, and consistent with the values and goals of its users and creators.”

Running SLMs locally on devices such as smartphones can help address privacy and security concerns. How is IBM innovating to ensure that these locally deployed models maintain high performance while protecting sensitive business data? What are the key challenges and solutions in this area?

Tan Ser Yean: “Indeed, running SLMs locally on devices like smartphones can be a promising approach to address privacy and security concerns, as it minimizes the risk of data exposure and unauthorized access. Their ability to run locally on smaller devices addresses privacy and cybersecurity concerns across edge computing and Internet of Things (IoT) applications. Processing data on-device, rather than sending it to centralized servers, minimizes the risk of data exposure and unauthorized access. Additionally, these models contribute to AI explainability, which is crucial to building trust in sectors such as legal, finance, and healthcare.

Smaller language models are often simpler and more interpretable than their larger counterparts. This transparency fosters the adoption of AI technologies in areas where accountability and interpretability matter most. Another critical aspect of IBM’s innovation in this area is the implementation of robust security measures to protect sensitive business data. Our watsonx.governance plays a crucial role in helping businesses shine a light on AI models. This platform provides organizations with the toolkit they need to manage risk, embrace transparency, and anticipate compliance with future AI focused regulation.”

Processing data on-device, rather than sending it to centralized servers, minimizes the risk of data exposure and unauthorized access.

How does IBM balance the development and deployment of both foundational and smaller, specialized models to meet the diverse needs of businesses? What strategic advantages do foundational models offer in the competitive AI landscape?

Tan Ser Yean: “Foundation models are crucial for optimizing AI initiatives and they form the backbone of such systems. These models are designed to handle a wide range of tasks and domains, from natural language processing to image recognition and beyond. Scalability is another critical advantage of foundational models. Trained on large-scale datasets and infrastructure, these models are essential for businesses to thrive in today’s dynamic environments with evolving requirements, ensuring that AI solutions can grow and adapt alongside business needs.

However, the evolution of AI also suggests that no single model can cater to all needs. This means going beyond the generic, one-size-fits-all approach to AI to a different future where every enterprise can deploy custom models that align precisely with their goals and regulatory requirements. Ultimately, it is crucial for businesses to recognize that both foundational models and SLMs are built for different purposes. Where foundational models offer broad applicability and versatility, smaller and more specialized models excel in providing tailored solutions for specific use cases. We recognize the pivotal role this balance plays in driving innovation and addressing the unique challenges businesses
face, and businesses should be strategic in leveraging the strengths of each to meet diverse needs effectively. By customizing AI models while leveraging foundation models, organizations can fine-tune AI solutions to match the scale of the problem they address, allocating resources and costs more efficiently.”

Recent techniques like Low Rank Adaptation (LoRA) and quantization have proven effective in optimizing smaller models. How is IBM incorporating these advancements to enhance the efficiency and capabilities of SLMs? Can you provide insights on specific projects where these techniques have significantly improved business applications?

Tan Ser Yean: “As the supply of high-performance GPUs remains scarce, running SLMs on less powerful GPUs is definitely useful. For example, on the IBM HCI Fusion hyperconverged appliance, you have the flexibility to add Nvidia L40S and H100 GPUs, allowing organizations to bring their AI workloads where they are needed. Together with SLM and LoRA, the IBM Fusion HCI would streamline and boost AI capabilities in data centers and at the edge.”

Different industries such as legal, healthcare, and finance have unique requirements for language models. How is IBM customizing SLMs for these sectors to address specific challenges and opportunities? What are the key benefits and use cases of these tailored models in real-world applications?

Tan Ser Yean: “IBM’s approach to customizing SLMs involves training these models on domain-specific datasets to meet the unique needs of different industries. This enables us to address sector-specific challenges and opportunities, delivering tailored solutions that drive innovation and value creation in legal, healthcare, finance, and beyond. For instance, one key aspect of customizing SLMs for the finance industry is training them on extensive financial datasets, including financial reports, market data, regulatory filings, economic indicators, and historical trading data. This allows the models to grasp complex financial terminology, market dynamics, investment strategies, and risk factors. A customized SLM trained on historical trading data can provide insights into market trends, identify potential investment opportunities, and predict future market movements with greater accuracy.

Moreover, tailored SLMs empower organizations to develop specialized applications and solutions that meet their specific business requirements. Whether it’s improving customer service through personalized interactions, detecting fraudulent activities more effectively, or streamlining legal document review processes, customized SLMs enable organizations to drive innovation, improve efficiency, and deliver enhanced value to their stakeholders.”

Author

Mark Ko

Hello! I’m Mark, the founder of techcoffeehouse.com. I love a good plate of Chicken Rice. So, if you have a story as good as the dish, HMU!

View all posts Managing Editor

Discover more from techcoffeehouse.com

Subscribe to get the latest posts sent to your email.

Use promo code “TCH15” to get 15% off on checkout.

Share your thoughtsCancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.