04Nov

When Machines Think Ahead: The Rise of Strategic AI | by Hans Christian Ekne | Nov, 2024


Image generated by the author using Canva Magic Studio

Games have provided an amazing proving ground for developing strategic AI. The closed nature of games makes it easier to train models and develop solution techniques than in open ended systems. Games are clearly defined; the players are known and so are the payoffs. One of the biggest and earliest milestones was Deep Blue, the machine that beat the world champion in chess.

Early Milestones: Deep Blue

Deep Blue was a chess-playing supercomputer developed by IBM in the 1990s. As stated in the prologue, it made history in May 1997 by defeating the reigning world chess champion, Garry Kasparov, in a six-game match. Deep Blue utilized specialized hardware and algorithms capable of evaluating 200 million chess positions per second. It combined brute-force search techniques with heuristic evaluation functions, enabling it to search deeper into potential move sequences than any previous system. What made Deep Blue special was its ability to process vast numbers of positions quickly, effectively handling the combinatorial complexity of chess and marking a significant milestone in artificial intelligence.

However, as Gary Kasparov notes in his interview with Lex Fridman¹, Deep Blue was more of a brute force machine than anything else, so it’s perhaps hard to qualify it as any type of intelligence. The core of the search is basically just trial and error. And speaking of errors, it makes significantly less errors than humans, and according to Kasparov this is one of the features which made it hard to beat.

Advancements in Complex Games: AlphaGo

19 years after the Deep Blue victory in chess, a team from Google’s DeepMind produced another model that would contribute to a special moment in the history of AI. In 2016, AlphaGo became the first AI model to defeat a world champion go player, Lee Sedol.

Go is a very old board game with origins in Asia, known for its deep complexity and vast number of possible positions, far exceeding those in chess. AlphaGo combined deep neural networks with Monte Carlo tree search, allowing it to evaluate positions and plan moves effectively. The more time AlphaGo was given at inference, the better it performs.

The AI trained on a dataset of human expert games and improved further through self-play. What made AlphaGo special was its ability to handle the complexity of Go, utilizing advanced machine learning techniques to achieve superhuman performance in a domain previously thought to be resistant to AI mastery.

One could argue AlphaGo exhibits more intelligence than Deep Blue, given its exceptional ability to deeply evaluate board states and select moves. Move 37 from its 2016 game against Lee Sedol is a classic example. For those acquainted with Go, it was a shoulder hit at the fifth line and initially baffled commentators, including Lee Sedol himself. But as would later become clear, the move was a brilliant play and showcased how AlphaGo would explore strategies that human players might overlook and disregard.

Combining Chess and Go: AlphaZero

One year later, Google DeepMind made headlines again. This time, they took many of the learnings from AlphaGo and created AlphaZero, which was more of a general-purpose AI system that mastered chess, as well as Go and shogi. The researchers were able to build the AI solely through self-play and reinforcement learning without prior human knowledge or data. Unlike traditional chess engines that rely on handcrafted evaluation functions and extensive opening libraries, AlphaZero used deep neural networks and a novel algorithm combining Monte Carlo tree search with self-learning.

The system started with only the basic rules and learned optimal strategies by playing millions of games against itself. What made AlphaZero special was its ability to discover creative and efficient strategies, showcasing a new paradigm in AI that leverages self-learning over human-engineered knowledge.

Integrating Speed and Strategy: Star Craft II

Continuing its domination in the AI space, the Google DeepMind team changed its focus to a highly popular computer game, StarCraft II. In 2019 they developed an AI called AlphaStar² which was able to achieve Grandmaster level play and rank higher than 99.8% of human players on the competitive leaderboard.

StarCraft II is a real time strategy game that provided several novel challenges for the team at DeepMind. The goal of the game is to conquer the opposing player or players, by gathering resources, constructing buildings and amassing armies that can defeat the opponent. The main challenges in this game arise from the enormous action space that needs to be considered, the real-time decision making, partial observability due to fog of war and the need for long-term strategic planning, as some games can last for hours.

By building on some of the techniques developed for previous AIs, like reinforcement learning through self-play and deep neural networks, the team was able to make a unique game engine. Firstly, they trained a neural net using supervised learning and human play. Then, they used that to seed another algorithm that could play against itself in a multi-agent game framework. The DeepMind team created a virtual league where the agents could explore strategies against each other and where the dominant strategies would be rewarded. Ultimately, they combined the strategies from the league into a super strategy that could be effective against many different opponents and strategies. In their own words³:

The final AlphaStar agent consists of the components of the Nash distribution of the league — in other words, the most effective mixture of strategies that have been discovered — that run on a single desktop GPU.

Deep Dive into Pluribus and Poker

I love playing poker, and when I was living and studying in Trondheim, we used to have a weekly cash game which could get quite intense! One of the last milestones to be eclipsed by strategic AI was in the game of poker. Specifically, in one of the most popular forms of poker, 6-player no-limit Texas hold’em. In this game we use a regular deck of cards with 52 cards, and the play follows the following structure:

  1. The Preflop: All players are given 2 cards (hole cards) which only they themselves know the value of.
  2. The Flop: 3 cards are drawn and laid face up so that all players can see them.
  3. The Turn: Another card is drawn and laid face up.
  4. The River: A final 5th card is drawn and laid face up.

The players can use the cards on the table and the two cards on their hand to assemble a 5-card poker hand. For each round of the game, the players take turns placing bets, and the game can end at any of the rounds if one player places a bet that no one else is willing to call.

Though reasonably simple to learn, one only needs to know the hierarchy of the various poker hands, this game proved to be very difficult to solve with AI, despite ongoing efforts for several decades.

There are multiple factors contributing to the difficulty of solving poker. Firstly, we have the issue of hidden information, because you don’t know which cards the other players have. Secondly, we have a multiplayer setup with many players, with each extra player increasing the number of possible interactions and strategies exponentially. Thirdly, we have the no-limit betting rules, which allow for a complex betting structure where one player can suddenly decide to bet his entire stack. Fourth, we have an enormous game tree complexity due to the combinations of hole cards, community cards, and betting sequences. In addition, we also have complexity due to the stochastic nature of the cards, the potential for bluffing and the opponent modelling!

It was only in 2019 that a couple of researchers, Noam Brown and Tuomas Sandholm, finally cracked the code. In a paper published in Science, they describe a novel poker AI — Pluribus — that managed to beat the best players in the world in 6-player no-limit Texas hold’em.⁴ They conducted two different experiments, each consisting of a 10000 poker hands, and both experiments clearly showed the dominance of Pluribus.

In the first experiment, Pluribus played against 5 human opponents, achieving an average win rate of 48 mbb/game, with a standard deviation of 25 mbb/game. (mbb/game stands for milli big blind per game, how many big blinds is won per 1000 games played.) 48 mbb/game is considered a very high win rate, especially among elite poker players, and implies that Pluribus is stronger than the human opponents.

In the second experiment, the researchers had 5 versions of Pluribus play against 1 human. They set up the experiment so that 2 different humans would each play 5000 hands each against the 5 machines. Pluribus ended up beating the humans by an average of 32 mbb/game with a standard error of 15 mbb/game, again showing its strategic superiority.

The dominance of Pluribus is quite amazing, especially given all the complexities the researchers had to overcome. Brown and Sandholm came up with several smart strategies that helped Pluribus to become superhuman and computationally much more efficient than previous top poker AIs. Some of their techniques include:

  1. The use of two different algorithms for evaluating moves. They would first use a so called “blueprint strategy” which was created by having the program play against itself using a method called Monte Carlo counterfactual regret minimization. This blueprint strategy would be used in the first round of betting, but in subsequent betting rounds, Pluribus conducts a real-time search to find a better more granular strategy.
  2. To make its real-time search algorithm be more computationally efficient, they would use a dept-limited search and evaluate 4 different possible strategies that the opponents might choose to play. Firstly, they would evaluate each strategy for 2 moves ahead. In addition, they would only evaluate four different strategies for the opponents, including the original blueprint strategy, a blueprint strategy biased towards folding, a blueprint strategy biased towards calling and a final blueprint strategy biased towards raising.
  3. They also used various abstraction techniques to reduce the number of possible game states. For example, because a 9 high straight is fundamentally similar to a 8 high straight these can be viewed in a similar way.
  4. Pluribus would discretize the continuous betting space into a limited set of buckets, making it easier to consider and evaluate various betting sizes.
  5. In addition, Pluribus also balances its strategy in way that for any given hand it is playing, it would also consider other possible hands it could have in that situation and evaluate how it would play those hands, so that the final play would be balanced and thus harder to counter.

There are quite a few interesting observations to draw from Pluribus, but perhaps the most interesting is that it doesn’t vary its play against different opponents, but instead has developed a robust strategy that is effective against a wide variety of players. Since a lot of poker players think they have to adjust their play to various situations and people, Pluribus shows us that this is not needed and probably not even optimal, given how it beat all the humans it played against.

In our short foray into game theory, we noted that if you play the NE strategy in two-player zero-sum games you are guaranteed not to lose in expectation. However, for a multiplayer game like 6-player poker there is no such guarantee. Noam Brown speculates⁵ that it is perhaps the adversarial nature of a game like poker which still makes it suitable to try to approach it with a NE strategy. Conversely, in a game like Risk where players can cooperate more, pursuing a NE strategy is not guaranteed to work, because, if you are playing a risk game with 6 people, there is nothing you can do if your 5 opponents decide to gang up on you and kill you.

Evaluating the Trend in Strategic AI

Summarizing the history of strategic AI in games, we see a clear trend emerging. The games are slowly but surely becoming closer to the real-world strategic situations that humans find themselves in on an everyday basis.

Firstly, we are moving from a two-player to a multiplayer setting. This can be seen from the initial success in two-player games to multiplayer games like 6-player poker. Secondly, we are seeing an increase in the mastery of games with hidden information. Thirdly we are also seeing an increase in mastery of games with more stochastic elements.

Hidden information, multiplayer settings and stochastic events are the norm rather than the exception in strategic interactions among humans, so mastering these complexities is key in achieving a more general superhuman strategic AI that can navigate in the real world.



Source link

03Nov

Beyond Skills: Unlocking the Full Potential of Data Scientists. | by Eric Colson | Oct, 2024


Image created through DALL-E / OpenAI by author.

Unlock the hidden value of data scientists by empowering them beyond technical tasks to drive innovation and strategic insights.

[This piece is cross-posted from O’Reilly Radar here]

Modern organizations regard data as a strategic asset that drives efficiency, enhances decision making, and creates new value for customers. Across the organization — product management, marketing, operations, finance, and more — teams are overflowing with ideas on how data can elevate the business. To bring these ideas to life, companies are eagerly hiring data scientists for their technical skills (Python, statistics, machine learning, SQL, etc.).

Despite this enthusiasm, many companies are significantly underutilizing their data scientists. Organizations remain narrowly focused on employing data scientists to execute preexisting ideas, overlooking the broader value they bring. Beyond their skills, data scientists possess a unique perspective that allows them to come up with innovative business ideas of their own — ideas that are novel, strategic, or differentiating and are unlikely to come from anyone but a data scientist.

Sadly, many companies behave in ways that suggest they are uninterested in the ideas of data scientists. Instead, they treat data scientists as a resource to be used for their skills alone. Functional teams provide requirements documents with fully specified plans: “Here’s how you are to build this new system for us. Thank you for your partnership.” No context is provided, and no input is sought — other than an estimate for delivery. Data scientists are further inundated with ad hoc requests for tactical analyses or operational dashboards¹. The backlog of requests grows so large that the work queue is managed through Jira-style ticketing systems, which strip the requests of any business context (e.g., “get me the top products purchased by VIP customers”). One request begets another², creating a Sisyphean endeavor that leaves no time for data scientists to think for themselves. And then there’s the myriad of opaque requests for data pulls: “Please get me this data so I can analyze it.” This is marginalizing — like asking Steph Curry to pass the ball so you can take the shot. It’s not a partnership; it’s a subordination that reduces data science to a mere support function, executing ideas from other teams. While executing tasks may produce some value, it won’t tap into the full potential of what data scientists truly have to offer.

The untapped potential of data scientists lies not in their ability to execute requirements or requests but in their ideas for transforming a business. By “ideas” I mean new capabilities or strategies that can move the business in better or new directions — leading to increased³ revenue, profit, or customer retention while simultaneously providing a sustainable competitive advantage (i.e., capabilities or strategies that are difficult for competitors to replicate). These ideas often take the form of machine learning algorithms that can automate decisions within a production system⁴. For example, a data scientist might develop an algorithm to better manage inventory by optimally balancing overage and underage costs. Or they might create a model that detects hidden customer preferences, enabling more effective personalization. If these sound like business ideas, that’s because they are — but they’re not likely to come from business teams. Ideas like these typically emerge from data scientists, whose unique cognitive repertoires and observations in the data make them well-suited to uncovering such opportunities.

A cognitive repertoire is the range of tools, strategies, and approaches an individual can draw upon for thinking, problem-solving, or processing information (Page 2017). These repertoires are shaped by our backgrounds — education, experience, training, and so on. Members of a given functional team often have similar repertoires due to their shared backgrounds. For example, marketers are taught frameworks like SWOT analysis and ROAS, while finance professionals learn models such as ROIC and Black-Scholes.

Data scientists have a distinctive cognitive repertoire. While their academic backgrounds may vary — ranging from statistics to computer science to computational neuroscience — they typically share a quantitative tool kit. This includes frameworks for widely applicable problems, often with accessible names like the “newsvendor model,” the “traveling salesman problem,” the “birthday problem,” and many others. Their tool kit also includes knowledge of machine learning algorithms⁵ like neural networks, clustering, and principal components, which are used to find empirical solutions to complex problems. Additionally, they include heuristics such as big O notation, the central limit theorem, and significance thresholds. All of these constructs can be expressed in a common mathematical language, making them easily transferable across different domains, including business — perhaps especially business.

The repertoires of data scientists are particularly relevant to business innovation since, in many industries⁶, the conditions for learning from data are nearly ideal in that they have high-frequency events, a clear objective function⁷, and timely and unambiguous feedback. Retailers have millions of transactions that produce revenue. A streaming service sees millions of viewing events that signal customer interest. And so on — millions or billions of events with clear signals that are revealed quickly. These are the units of induction that form the basis for learning, especially when aided by machines. The data science repertoire, with its unique frameworks, machine learning algorithms, and heuristics, is remarkably geared for extracting knowledge from large volumes of event data.

Ideas are born when cognitive repertoires connect with business context. A data scientist, while attending a business meeting, will regularly experience pangs of inspiration. Her eyebrows raise from behind her laptop as an operations manager describes an inventory perishability problem, lobbing the phrase “We need to buy enough, but not too much.” “Newsvendor model,” the data scientist whispers to herself. A product manager asks, “How is this process going to scale as the number of products increases?” The data scientist involuntarily scribbles “O(N²)” on her notepad, which is big O notation to indicate that the process will scale superlinearly. And when a marketer brings up the topic of customer segmentation, bemoaning, “There are so many customer attributes. How do we know which ones are most important?,” the data scientist sends a text to cancel her evening plans. Instead, tonight she will eagerly try running principal components analysis on the customer data⁸.

No one was asking for ideas. This was merely a tactical meeting with the goal of reviewing the state of the business. Yet the data scientist is practically goaded into ideating. “Oh, oh. I got this one,” she says to herself. Ideation can even be hard to suppress. Yet many companies unintentionally seem to suppress that creativity. In reality our data scientist probably wouldn’t have been invited to that meeting. Data scientists are not typically invited to operating meetings. Nor are they typically invited to ideation meetings, which are often limited to the business teams. Instead, the meeting group will assign the data scientist Jira tickets of tasks to execute. Without the context, the tasks will fail to inspire ideas. The cognitive repertoire of the data scientist goes unleveraged — a missed opportunity to be sure.

Beyond their cognitive repertoires, data scientists bring another key advantage that makes their ideas uniquely valuable. Because they are so deeply immersed in the data, data scientists discover unforeseen patterns and insights that inspire novel business ideas. They are novel in the sense that no one would have thought of them — not product managers, executives, marketers — not even a data scientist for that matter. There are many ideas that cannot be conceived of but rather are revealed by observation in the data.

Company data repositories (data warehouses, data lakes, and the like) contain a primordial soup of insights lying fallow in the information. As they do their work, data scientists often stumble upon intriguing patterns — an odd-shaped distribution, an unintuitive relationship, and so forth. The surprise finding piques their curiosity, and they explore further.

Imagine a data scientist doing her work, executing on an ad hoc request. She is asked to compile a list of the top products purchased by a particular customer segment. To her surprise, the products bought by the various segments are hardly different at all. Most products are bought at about the same rate by all segments. Weird. The segments are based on profile descriptions that customers opted into, and for years the company had assumed them to be meaningful groupings useful for managing products. “There must be a better way to segment customers,” she thinks. She explores further, launching an informal, impromptu analysis. No one is asking her to do this, but she can’t help herself. Rather than relying on the labels customers use to describe themselves, she focuses on their actual behavior: what products they click on, view, like, or dislike. Through a combination of quantitative techniques — matrix factorization and principal component analysis — she comes up with a way to place customers into a multidimensional space. Clusters of customers adjacent to one another in this space form meaningful groupings that better reflect customer preferences. The approach also provides a way to place products into the same space, allowing for distance calculations between products and customers. This can be used to recommend products, plan inventory, target marketing campaigns, and many other business applications. All of this is inspired from the surprising observation that the tried-and-true customer segments did little to explain customer behavior. Solutions like this have to be driven by observation since, absent the data saying otherwise, no one would have thought to inquire about a better way to group customers.

As a side note, the principal component algorithm that the data scientists used belongs to a class of algorithms called “unsupervised learning,” which further exemplifies the concept of observation-driven insights. Unlike “supervised learning,” in which the user instructs the algorithm what to look for, an unsupervised learning algorithm lets the data describe how it is structured. It is evidence based; it quantifies and ranks each dimension, providing an objective measure of relative importance. The data does the talking. Too often we try to direct the data to yield to our human-conceived categorization schemes, which are familiar and convenient to us, evoking visceral and stereotypical archetypes. It’s satisfying and intuitive but often flimsy and fails to hold up in practice.

Examples like this are not rare. When immersed in the data, it’s hard for the data scientists not to come upon unexpected findings. And when they do, it’s even harder for them to resist further exploration — curiosity is a powerful motivator. Of course, she exercised her cognitive repertoire to do the work, but the entire analysis was inspired by observation of the data. For the company, such distractions are a blessing, not a curse. I’ve seen this sort of undirected research lead to better inventory management practices, better pricing structures, new merchandising strategies, improved user experience designs, and many other capabilities — none of which were asked for but instead were discovered by observation in the data.

Isn’t discovering new insights the data scientist’s job? Yes — that’s exactly the point of this article. The problem arises when data scientists are valued only for their technical skills. Viewing them solely as a support team limits them to answering specific questions, preventing deeper exploration of insights in the data. The pressure to respond to immediate requests often causes them to overlook anomalies, unintuitive results, and other potential discoveries. If a data scientist were to suggest some exploratory research based on observations, the response is almost always, “No, just focus on the Jira queue.” Even if they spend their own time — nights and weekends — researching a data pattern that leads to a promising business idea, it may still face resistance simply because it wasn’t planned or on the roadmap. Roadmaps tend to be rigid, dismissing new opportunities, even valuable ones. In some organizations, data scientists may pay a price for exploring new ideas. Data scientists are often judged by how well they serve functional teams, responding to their requests and fulfilling short-term needs. There is little incentive to explore new ideas when doing so detracts from a performance review. In reality, data scientists frequently find new insights in spite of their jobs, not because of them.

These two things — their cognitive repertoires and observations from the data — make the ideas that come from data scientists uniquely valuable. This is not to suggest that their ideas are necessarily better than those from the business teams. Rather, their ideas are different from those of the business teams. And being different has its own set of benefits.

Having a seemingly good business idea doesn’t guarantee that the idea will have a positive impact. Evidence suggests that most ideas will fail. When properly measured for causality⁹, the vast majority of business ideas either fail to show any impact at all or actually hurt metrics. (See some statistics here.) Given the poor success rates, innovative companies construct portfolios of ideas in the hopes that at least a few successes will allow them to reach their goals. Still savvier companies use experimentation¹⁰ (A/B testing) to try their ideas on small samples of customers, allowing them to assess the impact before deciding to roll them out more broadly.

This portfolio approach, combined with experimentation, benefits from both the quantity and diversity of ideas¹¹. It’s similar to diversifying a portfolio of stocks. Increasing the number of ideas in the portfolio increases exposure to a positive outcome — an idea that makes a material positive impact on the company. Of course, as you add ideas, you also increase the risk of bad outcomes — ideas that do nothing or even have a negative impact. However, many ideas are reversible — the “two-way door” that Amazon’s Jeff Bezos speaks of (Haden 2018). Ideas that don’t produce the expected results can be pruned after being tested on a small sample of customers, greatly mitigating the impact, while successful ideas can be rolled out to all relevant customers, greatly amplifying the impact.

So, adding ideas to the portfolio increases exposure to upside without a lot of downside — the more, the better¹². However, there is an assumption that the ideas are independent (uncorrelated). If all the ideas are similar, then they may all succeed or fail together. This is where diversity comes in. Ideas from different groups will leverage divergent cognitive repertoires and different sets of information. This makes them different and less likely to be correlated with each other, producing more varied outcomes. For stocks, the return on a diverse portfolio will be the average of the returns for the individual stocks. However, for ideas, since experimentation lets you mitigate the bad ones and amplify the good ones, the return of the portfolio can be closer to the return of the best idea (Page 2017).

In addition to building a portfolio of diverse ideas, a single idea can be significantly strengthened through collaboration between data scientists and business teams¹³. When they work together, their combined repertoires fill in each other’s blind spots (Page 2017)¹⁴. By merging the unique expertise and insights from multiple teams, ideas become more robust, much like how diverse groups tend to excel in trivia competitions. However, organizations must ensure that true collaboration happens at the ideation stage rather than dividing responsibilities such that business teams focus solely on generating ideas and data scientists are relegated to execution.

Data scientists are much more than a skilled resource for executing existing ideas; they are a wellspring of novel, innovative thinking. Their ideas are uniquely valuable because (1) their cognitive repertoires are highly relevant to businesses with the right conditions for learning, (2) their observations in the data can lead to novel insights, and (3) their ideas differ from those of business teams, adding diversity to the company’s portfolio of ideas.

However, organizational pressures often prevent data scientists from fully contributing their ideas. Overwhelmed with skill-based tasks and deprived of business context, they are incentivized to merely fulfill the requests of their partners. This pattern exhausts the team’s capacity for execution while leaving their cognitive repertoires and insights largely untapped.

Here are some suggestions that organizations can follow to better leverage data scientists and shift their roles from mere executors to active contributors of ideas:

  • Give them context, not tasks. Providing data scientists with tasks or fully specified requirements documents will get them to do work, but it won’t elicit their ideas. Instead, give them context. If an opportunity is already identified, describe it broadly through open dialogue, allowing them to frame the problem and propose solutions. Invite data scientists to operational meetings where they can absorb context, which may inspire new ideas for opportunities that haven’t yet been considered.
  • Create slack for exploration. Companies often completely overwhelm data scientists with tasks. It may seem paradoxical, but keeping resources 100% utilized is very inefficient¹⁵. Without time for exploration and unexpected learning, data science teams can’t reach their full potential. Protect some of their time for independent research and exploration, using tactics like Google’s 20% time or similar approaches.
  • Eliminate the task management queue. Task queues create a transactional, execution-focused relationship with the data science team. Priorities, if assigned top-down, should be given in the form of general, unframed opportunities that need real conversations to provide context, goals, scope, and organizational implications. Priorities might also emerge from within the data science team, requiring support from functional partners, with the data science team providing the necessary context. We don’t assign Jira tickets to product or marketing teams, and data science should be no different.
  • Hold data scientists accountable for real business impact. Measure data scientists by their impact on business outcomes, not just by how well they support other teams. This gives them the agency to prioritize high-impact ideas, regardless of the source. Additionally, tying performance to measurable business impact¹⁶ clarifies the opportunity cost of low-value ad hoc requests¹⁷.
  • Hire for adaptability and broad skill sets. Look for data scientists who thrive in ambiguous, evolving environments where clear roles and responsibilities may not always be defined. Prioritize candidates with a strong desire for business impact¹⁸, who see their skills as tools to drive outcomes, and who excel at identifying new opportunities aligned with broad company goals. Hiring for diverse skill sets enables data scientists to build end-to-end systems, minimizing the need for handoffs and reducing coordination costs — especially critical during the early stages of innovation when iteration and learning are most important¹⁹.
  • Hire functional leaders with growth mindsets. In new environments, avoid leaders who rely too heavily on what worked in more mature settings. Instead, seek leaders who are passionate about learning and who value collaboration, leveraging diverse perspectives and information sources to fuel innovation.

These suggestions require an organization with the right culture and values. The culture needs to embrace experimentation to measure the impact of ideas and to recognize that many will fail. It needs to value learning as an explicit goal and understand that, for some industries, the vast majority of knowledge has yet to be discovered. It must be comfortable relinquishing the clarity of command-and-control in exchange for innovation. While this is easier to achieve in a startup, these suggestions can guide mature organizations toward evolving with experience and confidence. Shifting an organization’s focus from execution to learning is a challenging task, but the rewards can be immense or even crucial for survival. For most modern firms, success will depend on their ability to harness human potential for learning and ideation — not just execution (Edmondson 2012). The untapped potential of data scientists lies not in their ability to execute existing ideas but in the new and innovative ideas no one has yet imagined.



Source link

02Nov

Should you learn how to code in the next decade? | by Ivo Bernardo | Nov, 2024


Or will AI eat up all the software in the world?

Photo by steinart @unsplash.com

Many people today are facing a dilemma: if you’re young, should you pursue a software engineering degree? And if you’re already established in another career, should you make a switch to something involving coding? These questions stem from a larger one: with all the excitement around large language models (LLMs), is it really worth learning to code?

Recently Google’s CEO stated that 25% of the code generated by the company is written by AI. Are we seeing the death of coding as we know it?

And these questions are not just asked by people entering the field. Several professionals whose job depend on coding are also asking them. Should they continue to invest a large portion of their life improving their coding abilities?

To me the short answer is: coding will still be relevant — but maybe not for the reason you are thinking about. Because I think it’s undeniable that coding related jobs will change a lot in the next decade.

In this post, we’ll see some predictions of the future of coding and some arguments in favor of learning a programming language. With this post, I hope to provide you with a fresh perspective on why



Source link

31Oct

What’s Your Definition Of An AI Agent? | by Cobus Greyling | Oct, 2024


About 18 months ago I wrote my first article on AI Agents. It was based on AI Agent frameworks created by LangChain. Fast Forward to the last few weeks, and AI Agents are in the news like RAG was a few months ago.

And this prompts a question with me, what defines an AI Agent, and what is required to delivery enterprise ready Agentic Implementations?

The Basics

Considering the graphic above, an AI Agent can be defined as a piece of software, with one or more Language Models as its backbone.

For the model to have visual capabilities a Language Model / Foundation Model with vision capabilities is a requirement.

Task Decomposition

An agent at this stage primarily has a conversational input approach, hence unstructured data is used for user input.

And the response from the AI Agent is also most often in natural language, leveraging the Natural Language Generation (NLG) capabilities of Language Models.

I often use this example, you should be able to ask an AI Agent, the following question:

What is the square root of the year of birth of the man commonly regarded as the father of the iPhone.

This is a very hard question for any traditional Conversational UI to answer, but for an AI Agent it is easy.

Way Of Work

The AI Agent starts of by decomposing this compound and slightly ambiguous question into sub-steps, and then sets of solving for each of these sub-sets or steps.

Each of these steps can be seen or considered as an action.

Agents leverage LLMs to make a decision on which Action to take next.

After an Action is completed, the Agent enters the Observation step.

From the Observation step, the AI Agent shares a Thought; if a final answer is not reached, the AI Agent cycles back to another Action in order to move closer to a Final Answer.

The level of autonomy of an AI Agent is determined by the number of iterations the AI Agent can go through. This is important from a cost perspective, overhead and latency.

Secondly, if the AI Agent is unable to reach a conclusion or solve a task, one of the tools (we’ll look at tools in a bit) at the AI Agent’s disposal can be a human which can be pinged for guidance.

Number of tools at the disposal of the AI Agent is another determining factor in terms of the AI Agent’s autonomy.

Tools can be considered as integration points or touch points to external systems or API’s. The number and nature of tools at the disposal of the AI Agent really determines what the AI Agent is capable of.

Tools are described in natural language, and can range from a web search API, OS GUI navigation, maths library, weather API, CRM integration, etc. etc.

As the AI Agent decomposes a problem into sub-steps or actions, solving for each of these actions or steps will most probably involve the use of a tool.

Considering the image below, it is evident the level of observeabiltty which can be achieved with regards the internal workings of AI Agents. Notice how the AI Agent steps through the Thought, Action, Observation, and so on.



Source link

30Oct

Make Every Application An AI Agent | by Cobus Greyling | Oct, 2024


Multimodal large language models (MLLMs) have revolutionized LLM-based agents by enabling them to interact directly with application user interfaces (UIs).

This capability extends the model’s scope from text-based responses to visually understanding and responding within a UI, significantly enhancing performance in complex tasks.

Now, LLMs can interpret and respond to images, buttons, and text inputs in applications, making them more adept at navigation and user assistance in real-time workflows.

This interaction optimises the agent’s ability to handle dynamic and multi-step processes that require both visual and contextual awareness, offering more robust solutions across industries like customer support, data management and task automation.

AI Agents often suffer from high latency and low reliability due to the extensive sequential UI interaction

AXIS: Agent eXploring API for Skill integration

Conventional AI Agents often interact with a graphical user interface (GUI) in a human-like manner, interpreting screen layouts, elements, and sequences as a person would.

These LLM-based agents, which are typically fine-tuned with visual language models, aim to enable efficient navigation in mobile and desktop tasks.

However, AXIS presents a new perspective: while human-like UI-based interactions help make these agents versatile, they can be time-intensive, especially for tasks that involve numerous, repeated steps across a UI.

This complexity arises because traditional UIs are inherently designed for human-computer interaction (HCI), not agent-based automation.

AXIS suggests that leveraging application APIs, rather than interacting with the GUI itself, offers a far more efficient solution.

For instance, where a traditional UI agent might change multiple document titles by navigating through UI steps for each title individually, an API could handle all titles simultaneously with a single call, streamlining the process.

AXIS aims to not only reduce redundant interactions and simplify complex tasks but also establish new design principles for UIs in the LLM era. This approach advocates for rethinking application design to prioritize seamless integration between AI agents and application functionalities, enabling a more direct, API-driven approach that complements both user and agent workflows.

In this mode, the AI Agent autonomously interacts with the application’s interface to explore different functions and possible actions it can perform.

The agent records these interactions, gathering data on how various parts of the UI respond to different actions.

This exploration helps the agent map out the application’s capabilities, essentially “learning” what’s possible within the app.



Source link

29Oct

Contrasting RPA, Chatbots & AI Agents | by Cobus Greyling | Oct, 2024


A whole host of application types are now integrating agentic capabilities, allowing software to act with a degree of autonomy.

These agentic systems don’t just follow preset rules but can make real-time adjustments, interpreting complex input and taking actions that best align with the given task.

Large tech companies, including Microsoft, Salesforce, IBM and others are racing to introduce agent functionalities, aiming to offer solutions that respond dynamically and provide greater operational flexibility.

Beyond standalone AI Agent solutions, we also see existing automation platforms infusing agentic capabilities to enhance their adaptability and broaden their utility.

At its core, an AI Agent is a piece of software supported by language models, typically large language models (LLMs), which allow it to handle complex queries and tasks.

Unlike traditional automation tools, an AI Agent can decompose a problem into a sequence of steps and handle each step individually.

Through and iterative processes of Thought, Action, Observation, etc, the agent moves towards a solution while adjusting its actions based on immediate feedback.

AI Agents also leverage tools that allow them to interact with various systems, from APIs to web searches, depending on the task requirements.

The scope and diversity of these tools determine the “power” or effectiveness of the agent, allowing it to respond intelligently to diverse queries and execute complex workflows.

Robotic Process Automation (RPA)

Advantages: RPA is great for handling repetitive, rule-based tasks like data entry and processing in HR or finance. By removing manual effort, it speeds up workflows, reduces errors and increases efficiency.

Challenges: RPA is less flexible when workflows need dynamic decision-making or frequent updates. Once set, RPAs don’t adjust well, so updates need manual reconfiguration, which can limit their applicability in rapidly changing environments.

Chatbot Flows

Advantages: Chatbots offer a structured approach to common customer queries, guiding users through predefined paths that are easy to set up and effective for FAQs or appointment scheduling.

Challenges: The rigidity of chatbot flows can be frustrating for users with more complex or unique needs. As they’re confined to pre-scripted responses, they’re often limited in how they handle unexpected inputs or intricate problems.

Advantages: AI agents introduce a new level of adaptability and autonomy, making them ideal for tasks requiring a deeper understanding or handling unexpected inputs.

With the ability to create and adjust flows in real-time, they offer personalised responses and greater flexibility, making them suited to multi-step processes and complex troubleshooting.

Challenges: The complexity of AI agents can also be their downside. They typically require more resources to manage, and their access to multiple tools and integrations can make oversight challenging.

Agentic activity in AI-driven applications is advancing rapidly, with three primary streams emerging:

Native AI Agent Frameworks

Native AI Agents represent the purest form of agent technology, where systems are designed from the ground up to operate independently, leveraging large language models and specialised architectures to take action without constant human guidance.

These frameworks are inherently agentic, built with the capability to interact across multiple platforms, autonomously execute tasks, and make decisions based on real-time data. OpenAI’s GPT-4 with tools, Anthropic’s AI agent offerings, Kore.ai’s GALE and frameworks like LangChain exemplify this category by focusing on robust, complex chains of actions that adapt dynamically to user needs and environmental cues.

Enhanced Chatbot and RPA Systems with Agentic Capabilities

Traditional automation technologies, such as chatbots and robotic process automation (RPA), are increasingly incorporating agentic features.

Initially designed for rule-based, repetitive tasks (RPA) or structured conversational flows (chatbots), these systems are now adding layers of dynamic interaction that enable more flexible responses.

This evolution expands the scope of both RPA and chatbot frameworks to handle more complex, less predictable workflows.

General Applications Integrating Agentic Discovery and Interaction

In addition to purpose-built AI Agents and enhanced automation tools, general-purpose applications are beginning to integrate agentic discovery and interaction functionalities.

Consider here the work Microsoft is doing to introduce agentic capabilities to Windows and Apple with their Ferret-UI research for iOS.

In these streams, agentic functionality provides a spectrum of autonomy and complexity, allowing businesses to choose the right level of intelligent assistance for their needs.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.



Source link

28Oct

Book Talk: Technology and the Rise of Great Powers with Jeffrey Ding


When scholars and policymakers consider how technological advances affect the rise and fall of great powers, they draw on theories that center the moment of innovation – the eureka moment that sparks astonishing technological feats. In his new book, Jeffrey Ding offers a different explanation of how technological revolutions affect competition among great powers. Rather than focusing on which state first introduced major innovations, he investigates the ability of states to successfully adapt and spread these technologies across their economies.

Drawing on historical case studies of past industrial revolutions as well as statistical analysis, Ding demonstrates how institutional adaptations oriented around diffusing technology play a crucial role in shaping global competition. His findings bear directly on current concerns about how emerging technologies such as AI could influence the US-China power balance.

Following his presentation, Ding will be joined by Robert Trager, Co-Director of the Oxford Martin AI Governance Initiative, Ben Garfinkel, Director of the Centre for the Governance of AI and Kayla Blomquist, Director of the Oxford China Policy Lab for a panel discussion before Q&A

Dr. Jeffrey Ding is an Assistant Professor of Political Science at George Washington University. Previously, he was a postdoctoral fellow at Stanford’s Center for International Security and Cooperation, sponsored by Stanford’s Institute for Human-Centered Artificial Intelligence. His research focuses on great power competition and cooperation in emerging technologies, the political economy of innovation, and China’s scientific and technological capabilities.

His book, Technology and the Rise of Great Powers (Princeton University Press, 2024), investigates how past technological revolutions influenced the rise and fall of great powers, with implications for U.S.-China competition in emerging technologies like AI. Other work has been published or is forthcoming in European Journal of International Relations, European Journal of International Security, Foreign Affairs, International Studies Quarterly, Review of International Political Economy, and Security Studies, and his research has been cited in The Washington Post, The Financial Times, and other outlets. Ding received his PhD in 2021 from the University of Oxford, where he studied as a Rhodes Scholar. Previously, he worked as a researcher for Georgetown’s Center for Security and Emerging Technology and Oxford’s Centre for the Governance of AI.



Source link

28Oct

A Guide To Linearity and Nonlinearity in Machine Learning | by Manuel Brenner | Oct, 2024


…and their roles in decision boundaries, embeddings, dynamical systems, and next-gen LLMs

“An eye for an eye, a tooth for a tooth.”
Lex Talionis, Codex Hammurabi

The famed Lex Taleonis is a law of proportionality. You take my eye, I take yours. You take my tooth, I take yours (being a Babylonian dentist must have been tough).

The law was not put in place to foster violence; rather, it aimed to restrict it. The Lex Taleonis envisioned a legal world where everything could be described by linear equations: every crime would create an output proportional to its input. And since the punishment for an offense was proportional to the crime, it avoided excessive retribution and explosions of violence that left everything in their wake destroyed.

Beyond the world of retribution, linearity plays an important role in our thinking about the world: in linear systems, everything is understood. There is no chaos, no complicated maths. All scientists would have to do all day was solve these kinds of equations:

For every action, there is an equal and opposite reaction.
Newton’s Third Law of Motion

Unfortunately, the reality we inhabit is far from this linear utopia. History is ripe with examples of the world responding to small things in highly disproportional ways: the Defenestration of Prague sparking the Thirty…



Source link

27Oct

How to Negotiate Your Salary as a Data Scientist | by Haden Pelletier | Oct, 2024


And how much I made my first year

Photo by Amy Hirschi on Unsplash

Congratulations, you have landed a data science position!

You open your offer letter and …

Well, you’re a bit disappointed.

This is completely normal, at least for most companies, and especially if you are a junior or just starting out in the field. The data science dream you’re sold (at least in the US) includes a six figure salary straight out of college with no experience and often times the reality can be quite different.

What determines a salary

Salaries for any position are mainly determined by a few factors, some of which are out of your control:

  • The company itself
  • Geographical region (country, state, city)
  • Your level of experience
  • Your education level (Bachelors, Masters, PhD)
  • Current market conditions

Other things that might factor into your salary (although probably on a smaller level) are certain skills or certifications that you possess. Depending on how relevant these skills are to the position, they could give you the upper hand when it comes to negotiations.



Source link

26Oct

Gen-AI Safety Landscape: A Guide to the Mitigation Stack for Text-to-Image Models | by Trupti Bavalatti | Oct, 2024


There is also a large area of risk as documented in [4] where marginalized groups are associated with harmful connotations reinforcing societal hateful stereotypes. For example, representation of demographic groups that conflates humans with animals or mythological creatures (such as black people as monkeys or other primates), conflating humans with food or objects (like associating people with disabilities and vegetables) or associating demographic groups with negative semantic concepts (such as terrorism with muslim people).

Problematic associations like these between groups of people and concepts reflect long-standing negative narratives about the group. If a generative AI model learns problematic associations from existing data, it may reproduce them in content that is generates [4].

Problematic Associations of marginalized groups and concepts. Image source

There are several ways to fine-tune the LLMs. According to [6], one common approach is called Supervised Fine-Tuning (SFT). This involves taking a pre-trained model and further training it with a dataset that includes pairs of inputs and desired outputs. The model adjusts it’s parameters by learning to better match these expected responses.

Typically, fine-tuning involves two phases: SFT to establish a base model, followed by RLHF for enhanced performance. SFT involves imitating high-quality demonstration data, while RLHF refines LLMs through preference feedback.

RLHF can be done in two ways, reward-based or reward-free methods. In reward-based method, we first train a reward model using preference data. This model then guides online Reinforcement Learning algorithms like PPO. Reward-free methods are simpler, directly training the models on preference or ranking data to understand what humans prefer. Among these reward-free methods, DPO has demonstrated strong performances and become popular in the community. Diffusion DPO can be used to steer the model away from problematic depictions towards more desirable alternatives. The tricky part of this process is not training itself, but data curation. For each risk, we need a collection of hundreds or thousands of prompts, and for each prompt, a desirable and undesirable image pair. The desirable example should ideally be a perfect depiction for that prompt, and the undesirable example should be identical to the desirable image, except it should include the risk that we want to unlearn.

These mitigations are applied after the model is finalized and deployed in the production stack. These cover all the mitigations applied on the user input prompt and the final image output.

Prompt filtering

When users input a text prompt to generate an image, or upload an image to modify it using inpainting technique, filters can be applied to block requests asking for harmful content explicitly. At this stage, we address issues where users explicitly provide harmful prompts like “show an image of a person killing another person” or upload an image and ask “remove this person’s clothing” and so on.

For detecting harmful requests and blocking, we can use a simple blocklist based approached with keyword matching, and block all prompts that have a matching harmful keyword (say “suicide”). However, this approach is brittle, and can produce large number of false positives and false negatives. Any obfuscating mechanisms (say, users querying for “suicid3” instead of “suicide”) will fall through with this approach. Instead, an embedding-based CNN filter can be used for harmful pattern recognition by converting the user prompts into embeddings that capture the semantic meaning of the text, and then using a classifier to detect harmful patterns within these embeddings. However, LLMs have been proved to be better for harmful pattern recognition in prompts because they excel at understanding context, nuance, and intent in a way that simpler models like CNNs may struggle with. They provide a more context-aware filtering solution and can adapt to evolving language patterns, slang, obfuscating techniques and emerging harmful content more effectively than models trained on fixed embeddings. The LLMs can be trained to block any defined policy guideline by your organization. Aside from harmful content like sexual imagery, violence, self-injury etc., it can also be trained to identify and block requests to generate public figures or election misinformation related images. To use an LLM based solution at production scale, you’d have to optimize for latency and incur the inference cost.

Prompt manipulations

Before passing in the raw user prompt to model for image generation, there are several prompt manipulations that can be done for enhancing the safety of the prompt. Several case studies are presented below:

Prompt augmentation to reduce stereotypes: LDMs amplify dangerous and complex stereotypes [5] . A broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects. For example, prompting for basic traits or social roles resulting in images reinforcing whiteness as ideal, or prompting for occupations resulting in amplification of racial and gender disparities. Prompt engineering to add gender and racial diversity to the user prompt is an effective solution. For example, “image of a ceo” -> “image of a ceo, asian woman” or “image of a ceo, black man” to produce more diverse results. This can also help reduce harmful stereotypes by transforming prompts like “image of a criminal” -> “image of a criminal, olive-skin-tone” since the original prompt would have most likely produced a black man.

Prompt anonymization for privacy: Additional mitigation can be applied at this stage to anonymize or filter out the content in the prompts that ask for specific private individuals information. For example “Image of John Doe from in shower” -> “Image of a person in shower”

Prompt rewriting and grounding to convert harmful prompt to benign: Prompts can be rewritten or grounded (usually with a fine-tuned LLM) to reframe problematic scenarios in a positive or neutral way. For example, “Show a lazy [ethnic group] person taking a nap” -> “Show a person relaxing in the afternoon”. Defining a well-specified prompt, or commonly referred to as grounding the generation, enables models to adhere more closely to instructions when generating scenes, thereby mitigating certain latent and ungrounded biases. “Show two people having fun” (This could lead to inappropriate or risky interpretations) -> “Show two people dining at a restaurant”.

Output image classifiers

Image classifiers can be deployed that detect images produced by the model as harmful or not, and may block them before being sent back to the users. Stand alone image classifiers like this are effective for blocking images that are visibly harmful (showing graphic violence or a sexual content, nudity, etc), However, for inpainting based applications where users will upload an input image (e.g., image of a white person) and give a harmful prompt (“give them blackface”) to transform it in an unsafe manner, the classifiers that only look at output image in isolation will not be effective as they lose context of the “transformation” itself. For such applications, multimodal classifiers that can consider the input image, prompt, and output image together to make a decision of whether a transformation of the input to output is safe or not are very effective. Such classifiers can also be trained to identify “unintended transformation” e.g., uploading an image of a woman and prompting to “make them beautiful” leading to an image of a thin, blonde white woman.

Regeneration instead of refusals

Instead of refusing the output image, models like DALL·E 3 uses classifier guidance to improve unsolicited content. A bespoke algorithm based on classifier guidance is deployed, and the working is described in [3]—

When an image output classifier detects a harmful image, the prompt is re-submitted to DALL·E 3 with a special flag set. This flag triggers the diffusion sampling process to use the harmful content classifier to sample away from images that might have triggered it.

Basically this algorithm can “nudge” the diffusion model towards more appropriate generations. This can be done at both prompt level and image classifier level.



Source link

Protected by Security by CleanTalk