02Nov

Should you learn how to code in the next decade? | by Ivo Bernardo | Nov, 2024


Or will AI eat up all the software in the world?

Photo by steinart @unsplash.com

Many people today are facing a dilemma: if you’re young, should you pursue a software engineering degree? And if you’re already established in another career, should you make a switch to something involving coding? These questions stem from a larger one: with all the excitement around large language models (LLMs), is it really worth learning to code?

Recently Google’s CEO stated that 25% of the code generated by the company is written by AI. Are we seeing the death of coding as we know it?

And these questions are not just asked by people entering the field. Several professionals whose job depend on coding are also asking them. Should they continue to invest a large portion of their life improving their coding abilities?

To me the short answer is: coding will still be relevant — but maybe not for the reason you are thinking about. Because I think it’s undeniable that coding related jobs will change a lot in the next decade.

In this post, we’ll see some predictions of the future of coding and some arguments in favor of learning a programming language. With this post, I hope to provide you with a fresh perspective on why



Source link

31Oct

What’s Your Definition Of An AI Agent? | by Cobus Greyling | Oct, 2024


About 18 months ago I wrote my first article on AI Agents. It was based on AI Agent frameworks created by LangChain. Fast Forward to the last few weeks, and AI Agents are in the news like RAG was a few months ago.

And this prompts a question with me, what defines an AI Agent, and what is required to delivery enterprise ready Agentic Implementations?

The Basics

Considering the graphic above, an AI Agent can be defined as a piece of software, with one or more Language Models as its backbone.

For the model to have visual capabilities a Language Model / Foundation Model with vision capabilities is a requirement.

Task Decomposition

An agent at this stage primarily has a conversational input approach, hence unstructured data is used for user input.

And the response from the AI Agent is also most often in natural language, leveraging the Natural Language Generation (NLG) capabilities of Language Models.

I often use this example, you should be able to ask an AI Agent, the following question:

What is the square root of the year of birth of the man commonly regarded as the father of the iPhone.

This is a very hard question for any traditional Conversational UI to answer, but for an AI Agent it is easy.

Way Of Work

The AI Agent starts of by decomposing this compound and slightly ambiguous question into sub-steps, and then sets of solving for each of these sub-sets or steps.

Each of these steps can be seen or considered as an action.

Agents leverage LLMs to make a decision on which Action to take next.

After an Action is completed, the Agent enters the Observation step.

From the Observation step, the AI Agent shares a Thought; if a final answer is not reached, the AI Agent cycles back to another Action in order to move closer to a Final Answer.

The level of autonomy of an AI Agent is determined by the number of iterations the AI Agent can go through. This is important from a cost perspective, overhead and latency.

Secondly, if the AI Agent is unable to reach a conclusion or solve a task, one of the tools (we’ll look at tools in a bit) at the AI Agent’s disposal can be a human which can be pinged for guidance.

Number of tools at the disposal of the AI Agent is another determining factor in terms of the AI Agent’s autonomy.

Tools can be considered as integration points or touch points to external systems or API’s. The number and nature of tools at the disposal of the AI Agent really determines what the AI Agent is capable of.

Tools are described in natural language, and can range from a web search API, OS GUI navigation, maths library, weather API, CRM integration, etc. etc.

As the AI Agent decomposes a problem into sub-steps or actions, solving for each of these actions or steps will most probably involve the use of a tool.

Considering the image below, it is evident the level of observeabiltty which can be achieved with regards the internal workings of AI Agents. Notice how the AI Agent steps through the Thought, Action, Observation, and so on.



Source link

30Oct

Make Every Application An AI Agent | by Cobus Greyling | Oct, 2024


Multimodal large language models (MLLMs) have revolutionized LLM-based agents by enabling them to interact directly with application user interfaces (UIs).

This capability extends the model’s scope from text-based responses to visually understanding and responding within a UI, significantly enhancing performance in complex tasks.

Now, LLMs can interpret and respond to images, buttons, and text inputs in applications, making them more adept at navigation and user assistance in real-time workflows.

This interaction optimises the agent’s ability to handle dynamic and multi-step processes that require both visual and contextual awareness, offering more robust solutions across industries like customer support, data management and task automation.

AI Agents often suffer from high latency and low reliability due to the extensive sequential UI interaction

AXIS: Agent eXploring API for Skill integration

Conventional AI Agents often interact with a graphical user interface (GUI) in a human-like manner, interpreting screen layouts, elements, and sequences as a person would.

These LLM-based agents, which are typically fine-tuned with visual language models, aim to enable efficient navigation in mobile and desktop tasks.

However, AXIS presents a new perspective: while human-like UI-based interactions help make these agents versatile, they can be time-intensive, especially for tasks that involve numerous, repeated steps across a UI.

This complexity arises because traditional UIs are inherently designed for human-computer interaction (HCI), not agent-based automation.

AXIS suggests that leveraging application APIs, rather than interacting with the GUI itself, offers a far more efficient solution.

For instance, where a traditional UI agent might change multiple document titles by navigating through UI steps for each title individually, an API could handle all titles simultaneously with a single call, streamlining the process.

AXIS aims to not only reduce redundant interactions and simplify complex tasks but also establish new design principles for UIs in the LLM era. This approach advocates for rethinking application design to prioritize seamless integration between AI agents and application functionalities, enabling a more direct, API-driven approach that complements both user and agent workflows.

In this mode, the AI Agent autonomously interacts with the application’s interface to explore different functions and possible actions it can perform.

The agent records these interactions, gathering data on how various parts of the UI respond to different actions.

This exploration helps the agent map out the application’s capabilities, essentially “learning” what’s possible within the app.



Source link

29Oct

Contrasting RPA, Chatbots & AI Agents | by Cobus Greyling | Oct, 2024


A whole host of application types are now integrating agentic capabilities, allowing software to act with a degree of autonomy.

These agentic systems don’t just follow preset rules but can make real-time adjustments, interpreting complex input and taking actions that best align with the given task.

Large tech companies, including Microsoft, Salesforce, IBM and others are racing to introduce agent functionalities, aiming to offer solutions that respond dynamically and provide greater operational flexibility.

Beyond standalone AI Agent solutions, we also see existing automation platforms infusing agentic capabilities to enhance their adaptability and broaden their utility.

At its core, an AI Agent is a piece of software supported by language models, typically large language models (LLMs), which allow it to handle complex queries and tasks.

Unlike traditional automation tools, an AI Agent can decompose a problem into a sequence of steps and handle each step individually.

Through and iterative processes of Thought, Action, Observation, etc, the agent moves towards a solution while adjusting its actions based on immediate feedback.

AI Agents also leverage tools that allow them to interact with various systems, from APIs to web searches, depending on the task requirements.

The scope and diversity of these tools determine the “power” or effectiveness of the agent, allowing it to respond intelligently to diverse queries and execute complex workflows.

Robotic Process Automation (RPA)

Advantages: RPA is great for handling repetitive, rule-based tasks like data entry and processing in HR or finance. By removing manual effort, it speeds up workflows, reduces errors and increases efficiency.

Challenges: RPA is less flexible when workflows need dynamic decision-making or frequent updates. Once set, RPAs don’t adjust well, so updates need manual reconfiguration, which can limit their applicability in rapidly changing environments.

Chatbot Flows

Advantages: Chatbots offer a structured approach to common customer queries, guiding users through predefined paths that are easy to set up and effective for FAQs or appointment scheduling.

Challenges: The rigidity of chatbot flows can be frustrating for users with more complex or unique needs. As they’re confined to pre-scripted responses, they’re often limited in how they handle unexpected inputs or intricate problems.

Advantages: AI agents introduce a new level of adaptability and autonomy, making them ideal for tasks requiring a deeper understanding or handling unexpected inputs.

With the ability to create and adjust flows in real-time, they offer personalised responses and greater flexibility, making them suited to multi-step processes and complex troubleshooting.

Challenges: The complexity of AI agents can also be their downside. They typically require more resources to manage, and their access to multiple tools and integrations can make oversight challenging.

Agentic activity in AI-driven applications is advancing rapidly, with three primary streams emerging:

Native AI Agent Frameworks

Native AI Agents represent the purest form of agent technology, where systems are designed from the ground up to operate independently, leveraging large language models and specialised architectures to take action without constant human guidance.

These frameworks are inherently agentic, built with the capability to interact across multiple platforms, autonomously execute tasks, and make decisions based on real-time data. OpenAI’s GPT-4 with tools, Anthropic’s AI agent offerings, Kore.ai’s GALE and frameworks like LangChain exemplify this category by focusing on robust, complex chains of actions that adapt dynamically to user needs and environmental cues.

Enhanced Chatbot and RPA Systems with Agentic Capabilities

Traditional automation technologies, such as chatbots and robotic process automation (RPA), are increasingly incorporating agentic features.

Initially designed for rule-based, repetitive tasks (RPA) or structured conversational flows (chatbots), these systems are now adding layers of dynamic interaction that enable more flexible responses.

This evolution expands the scope of both RPA and chatbot frameworks to handle more complex, less predictable workflows.

General Applications Integrating Agentic Discovery and Interaction

In addition to purpose-built AI Agents and enhanced automation tools, general-purpose applications are beginning to integrate agentic discovery and interaction functionalities.

Consider here the work Microsoft is doing to introduce agentic capabilities to Windows and Apple with their Ferret-UI research for iOS.

In these streams, agentic functionality provides a spectrum of autonomy and complexity, allowing businesses to choose the right level of intelligent assistance for their needs.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.



Source link

28Oct

A Guide To Linearity and Nonlinearity in Machine Learning | by Manuel Brenner | Oct, 2024


…and their roles in decision boundaries, embeddings, dynamical systems, and next-gen LLMs

“An eye for an eye, a tooth for a tooth.”
Lex Talionis, Codex Hammurabi

The famed Lex Taleonis is a law of proportionality. You take my eye, I take yours. You take my tooth, I take yours (being a Babylonian dentist must have been tough).

The law was not put in place to foster violence; rather, it aimed to restrict it. The Lex Taleonis envisioned a legal world where everything could be described by linear equations: every crime would create an output proportional to its input. And since the punishment for an offense was proportional to the crime, it avoided excessive retribution and explosions of violence that left everything in their wake destroyed.

Beyond the world of retribution, linearity plays an important role in our thinking about the world: in linear systems, everything is understood. There is no chaos, no complicated maths. All scientists would have to do all day was solve these kinds of equations:

For every action, there is an equal and opposite reaction.
Newton’s Third Law of Motion

Unfortunately, the reality we inhabit is far from this linear utopia. History is ripe with examples of the world responding to small things in highly disproportional ways: the Defenestration of Prague sparking the Thirty…



Source link

27Oct

How to Negotiate Your Salary as a Data Scientist | by Haden Pelletier | Oct, 2024


And how much I made my first year

Photo by Amy Hirschi on Unsplash

Congratulations, you have landed a data science position!

You open your offer letter and …

Well, you’re a bit disappointed.

This is completely normal, at least for most companies, and especially if you are a junior or just starting out in the field. The data science dream you’re sold (at least in the US) includes a six figure salary straight out of college with no experience and often times the reality can be quite different.

What determines a salary

Salaries for any position are mainly determined by a few factors, some of which are out of your control:

  • The company itself
  • Geographical region (country, state, city)
  • Your level of experience
  • Your education level (Bachelors, Masters, PhD)
  • Current market conditions

Other things that might factor into your salary (although probably on a smaller level) are certain skills or certifications that you possess. Depending on how relevant these skills are to the position, they could give you the upper hand when it comes to negotiations.



Source link

26Oct

Gen-AI Safety Landscape: A Guide to the Mitigation Stack for Text-to-Image Models | by Trupti Bavalatti | Oct, 2024


There is also a large area of risk as documented in [4] where marginalized groups are associated with harmful connotations reinforcing societal hateful stereotypes. For example, representation of demographic groups that conflates humans with animals or mythological creatures (such as black people as monkeys or other primates), conflating humans with food or objects (like associating people with disabilities and vegetables) or associating demographic groups with negative semantic concepts (such as terrorism with muslim people).

Problematic associations like these between groups of people and concepts reflect long-standing negative narratives about the group. If a generative AI model learns problematic associations from existing data, it may reproduce them in content that is generates [4].

Problematic Associations of marginalized groups and concepts. Image source

There are several ways to fine-tune the LLMs. According to [6], one common approach is called Supervised Fine-Tuning (SFT). This involves taking a pre-trained model and further training it with a dataset that includes pairs of inputs and desired outputs. The model adjusts it’s parameters by learning to better match these expected responses.

Typically, fine-tuning involves two phases: SFT to establish a base model, followed by RLHF for enhanced performance. SFT involves imitating high-quality demonstration data, while RLHF refines LLMs through preference feedback.

RLHF can be done in two ways, reward-based or reward-free methods. In reward-based method, we first train a reward model using preference data. This model then guides online Reinforcement Learning algorithms like PPO. Reward-free methods are simpler, directly training the models on preference or ranking data to understand what humans prefer. Among these reward-free methods, DPO has demonstrated strong performances and become popular in the community. Diffusion DPO can be used to steer the model away from problematic depictions towards more desirable alternatives. The tricky part of this process is not training itself, but data curation. For each risk, we need a collection of hundreds or thousands of prompts, and for each prompt, a desirable and undesirable image pair. The desirable example should ideally be a perfect depiction for that prompt, and the undesirable example should be identical to the desirable image, except it should include the risk that we want to unlearn.

These mitigations are applied after the model is finalized and deployed in the production stack. These cover all the mitigations applied on the user input prompt and the final image output.

Prompt filtering

When users input a text prompt to generate an image, or upload an image to modify it using inpainting technique, filters can be applied to block requests asking for harmful content explicitly. At this stage, we address issues where users explicitly provide harmful prompts like “show an image of a person killing another person” or upload an image and ask “remove this person’s clothing” and so on.

For detecting harmful requests and blocking, we can use a simple blocklist based approached with keyword matching, and block all prompts that have a matching harmful keyword (say “suicide”). However, this approach is brittle, and can produce large number of false positives and false negatives. Any obfuscating mechanisms (say, users querying for “suicid3” instead of “suicide”) will fall through with this approach. Instead, an embedding-based CNN filter can be used for harmful pattern recognition by converting the user prompts into embeddings that capture the semantic meaning of the text, and then using a classifier to detect harmful patterns within these embeddings. However, LLMs have been proved to be better for harmful pattern recognition in prompts because they excel at understanding context, nuance, and intent in a way that simpler models like CNNs may struggle with. They provide a more context-aware filtering solution and can adapt to evolving language patterns, slang, obfuscating techniques and emerging harmful content more effectively than models trained on fixed embeddings. The LLMs can be trained to block any defined policy guideline by your organization. Aside from harmful content like sexual imagery, violence, self-injury etc., it can also be trained to identify and block requests to generate public figures or election misinformation related images. To use an LLM based solution at production scale, you’d have to optimize for latency and incur the inference cost.

Prompt manipulations

Before passing in the raw user prompt to model for image generation, there are several prompt manipulations that can be done for enhancing the safety of the prompt. Several case studies are presented below:

Prompt augmentation to reduce stereotypes: LDMs amplify dangerous and complex stereotypes [5] . A broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects. For example, prompting for basic traits or social roles resulting in images reinforcing whiteness as ideal, or prompting for occupations resulting in amplification of racial and gender disparities. Prompt engineering to add gender and racial diversity to the user prompt is an effective solution. For example, “image of a ceo” -> “image of a ceo, asian woman” or “image of a ceo, black man” to produce more diverse results. This can also help reduce harmful stereotypes by transforming prompts like “image of a criminal” -> “image of a criminal, olive-skin-tone” since the original prompt would have most likely produced a black man.

Prompt anonymization for privacy: Additional mitigation can be applied at this stage to anonymize or filter out the content in the prompts that ask for specific private individuals information. For example “Image of John Doe from in shower” -> “Image of a person in shower”

Prompt rewriting and grounding to convert harmful prompt to benign: Prompts can be rewritten or grounded (usually with a fine-tuned LLM) to reframe problematic scenarios in a positive or neutral way. For example, “Show a lazy [ethnic group] person taking a nap” -> “Show a person relaxing in the afternoon”. Defining a well-specified prompt, or commonly referred to as grounding the generation, enables models to adhere more closely to instructions when generating scenes, thereby mitigating certain latent and ungrounded biases. “Show two people having fun” (This could lead to inappropriate or risky interpretations) -> “Show two people dining at a restaurant”.

Output image classifiers

Image classifiers can be deployed that detect images produced by the model as harmful or not, and may block them before being sent back to the users. Stand alone image classifiers like this are effective for blocking images that are visibly harmful (showing graphic violence or a sexual content, nudity, etc), However, for inpainting based applications where users will upload an input image (e.g., image of a white person) and give a harmful prompt (“give them blackface”) to transform it in an unsafe manner, the classifiers that only look at output image in isolation will not be effective as they lose context of the “transformation” itself. For such applications, multimodal classifiers that can consider the input image, prompt, and output image together to make a decision of whether a transformation of the input to output is safe or not are very effective. Such classifiers can also be trained to identify “unintended transformation” e.g., uploading an image of a woman and prompting to “make them beautiful” leading to an image of a thin, blonde white woman.

Regeneration instead of refusals

Instead of refusing the output image, models like DALL·E 3 uses classifier guidance to improve unsolicited content. A bespoke algorithm based on classifier guidance is deployed, and the working is described in [3]—

When an image output classifier detects a harmful image, the prompt is re-submitted to DALL·E 3 with a special flag set. This flag triggers the diffusion sampling process to use the harmful content classifier to sample away from images that might have triggered it.

Basically this algorithm can “nudge” the diffusion model towards more appropriate generations. This can be done at both prompt level and image classifier level.



Source link

26Oct

Meet Git Stash: Your Secret Chest of Unfinished Code | by Zolzaya Luvsandorj | Oct, 2024


Mastering Git

A powerful Git feature for temporarily saving code in progress

Imagine discovering a critical bug that needs an immediate fix while you are working halfway through a code change. Your attempt to switch branch fails because of uncommitted changes in your current feature branch. These changes aren’t ready for a commit but are too valuable to discard. In such a situation where you need to switch contexts quickly, git stash offers an elegant solution to temporarily store your unfinished code safely without committing. In this post, we will explore how to use git stash effectively.

Imagine we are writing a letter with a pen and a paper, but suddenly we had to write another more urgent letter and send it away. Our desk can hold only one letter. It would be too wasteful to throw our unfinished letter since it took us some time to write what’s written so far. Instead of throwing it away, we can put it away in a secure chest so that we can pick it up and continue once we finish this more time-sensitive letter. This will allow us to get straight on to writing this more urgent letter and send it quickly while saving our work on the other letter. In this analogy, halfway written letter is the uncommitted…



Source link

25Oct

AI Agent Computer Interface (ACI) | by Cobus Greyling | Oct, 2024


After reading Anthropic’s blog on Claude’s ability to use software like a human, I found the implications of this advancement really exciting.

Claude’s capacity to navigate graphical user interfaces (GUIs) and perform tasks traditionally done by humans marks a big leap in AI’s practical utility.

What stood out to me was the emphasis on safety, particularly how Anthropic addresses risks like prompt injection attacks, ensuring more reliable and secure AI.

I also appreciate the focus on improving speed and accuracy, which will be critical for making AI more effective in dynamic environments.

This development opens the door to more seamless human-AI collaboration, especially in complex tasks that require precision.

The blog also touched on how Claude’s evolving interaction capabilities will be instrumental in transforming the way AI agents work with software.

I think this step forward could significantly impact fields like automation, making AI not just a tool but an active, reliable agent in everyday tasks.

The AI Agent implementation described in the GitHub repository demonstrates how to enable an AI model to interact with software applications effectively.

It showcases a computer use demo that allows the AI to perform tasks like browsing the web and executing commands, highlighting a shift from merely responding to inquiries to actively completing tasks.

This approach aims to improve human-computer interaction by making AI agents more capable and responsive in various environments.

Chief Evangelist @ Kore.ai | I’m passionate about exploring the intersection of AI and language. From Language Models, AI Agents to Agentic Applications, Development Frameworks & Data-Centric Productivity Tools, I share insights and ideas on how these technologies are shaping the future.



Source link

24Oct

The Advent Of Open Agentic Frameworks & Agent Computer Interfaces (ACI) | by Cobus Greyling | Oct, 2024


Agent S In A Nutshell

Agent S solves for the following challenges in creating an Agentic Framework…

Domain Knowledge & Open-World Learning

  • Agents must handle a wide variety of constantly changing applications and websites.
  • They need specialised, up-to-date domain knowledge.
  • The ability to continuously learn from open-world experiences is essential.

Complex Multi-Step Planning

  • Desktop tasks often involve long sequences of interdependent actions.
  • Agents need to generate plans with clear subgoals and track task progress over long horizons.
  • This requires an understanding of task dependencies and proper execution sequencing.

Navigating Dynamic, Non-Uniform Interfaces

  • Agents must process large volumes of visual and textual data while operating in a vast action space.
  • They need to distinguish between relevant and irrelevant elements and respond accurately to visual feedback.
  • GUI agents must interpret graphical cues correctly and adapt to dynamic interface changes.
  • To address the challenge of solving long-horizon, complex desktop tasks, Agent S introduces Experience-Augmented Hierarchical Planning.
  • This method enhances the agent’s ability to leverage domain knowledge and plan more effectively.
  • It augments the agent’s performance in solving tasks that span multiple steps, involving intermediate goals.

MLLM Agents

Multimodal Large Language Models (MLLMs) serve as the core reasoning framework for MLLM Agents, enabling them to process both language and visual information.

These agents combine various components such as memory, structured planning, tool usage, and the ability to act in external environments.

MLLM Agents are applied in domains like simulation environments, video games, and scientific research. They are also increasingly used in fields like Software Engineering, where Agent-Computer Interfaces (ACI) enhance their ability to understand and act efficiently within complex systems.

This area of Agent-Computer Interfaces fascinates me the most.

GUI Agents

GUI Agents execute natural language instructions across both web and operating system environments.

Initially focused on web navigation tasks, their scope has expanded to operating systems, enabling them to handle OS-level tasks in benchmarks like OSWorld and WindowsAgentArena.

These agents are designed to navigate and control dynamic graphical interfaces, using methodologies such as behavioural cloning, in-context learning, and reinforcement learning.

Advanced features such as experience-augmented hierarchical planning enhance their performance in managing complex desktop tasks.

Retrieval-Augmented Generation (RAG) for AI Agents

RAG improves the reliability of MLLM agents by integrating external knowledge to enrich the input data, resulting in more accurate outputs.

MLLM agents benefit from retrieving task exemplars, state-aware guidelines, and historical experiences.

In the Agent S framework, experience augmentation takes three forms:

Hierarchical planning uses both full-task and subtask experience, full-task summaries serve as textual rewards for subtasks, and subtask experience is evaluated and stored for future reference. This ensures that the agent can effectively learn and adapt over time.

The image below shows the Agent S framework, given a task there is an initial environment observation. The Agent S Manager then performs experience-augmented planning. This is done by leveraging web knowledge, and narrative memory to create sub-tasks.



Source link

Protected by Security by CleanTalk