06May

PLAN-SEQ-LEARN: A Machine Learning Method that Integrates the Long-Horizon Reasoning Capabilities of Language Models with the Dexterity of Learned Reinforcement Learning RL Policies


The robotics research field has significantly transformed by integrating large language models (LLMs). These advancements have presented an opportunity to guide robotic systems in solving complex tasks that involve intricate planning and long-horizon manipulation. While robots have traditionally relied on predefined skills and specialized engineering, recent developments show potential in using LLMs to help guide reinforcement learning (RL) policies, bridging the gap between abstract high-level planning and detailed robotic control. The challenge remains in translating these models’ sophisticated language processing capabilities into actionable control strategies, especially in dynamic environments involving complex interactions.

Robotic manipulation tasks often require executing a series of finely tuned behaviors, and current robotic systems struggle with the long-horizon planning needed for these tasks due to limitations in low-level control and interaction, particularly in dynamic or contact-rich environments. Existing tools, such as end-to-end RL or hierarchical methods, attempt to address the gap between LLMs and robotic control but often suffer from limited adaptability or significant challenges in handling contact-rich tasks. The primary problem revolves around efficiently translating abstract language models into practical robotic control, traditionally limited by LLMs’ inability to generate low-level control.

The Plan-Seq-Learn (PSL) framework by researchers from Carnegie Mellon University and Mistral AI is introduced as a modular solution to address this gap, integrating LLM-based planning for guiding RL policies in solving long-horizon robotic tasks. PSL decomposes tasks into three stages: high-level language planning (Plan), motion planning (Seq), and RL-based learning (Learn). This allows PSL to handle both contact-free motion and complex interaction strategies. The PSL system leverages off-the-shelf vision models to identify the target regions of interest based on high-level language input, providing a structured plan for sequencing the robot’s actions through motion planning.

PSL uses an LLM to generate a high-level plan that sequences robot actions through motion planning. Vision models help predict regions of interest, allowing the sequencing module to identify target states for the robot to achieve. The motion planning component drives the robot to these states, and the RL policy takes over to perform the required interactions. This modular approach allows RL policies to refine and adapt control strategies based on real-time feedback, enabling a robotic system to navigate complex tasks. The research team demonstrated PSL across 25 complex robotics tasks, including contact-rich manipulation tasks and long-horizon control tasks involving up to 10 stages. This involved tasks with up to 10 sequential stages requiring up to 10 separate robotic sub-tasks.

PSL achieved a success rate above 85%, significantly outperforming existing methods like SayCan and MoPA-RL. This was particularly evident in contact-rich tasks, where PSL’s modular approach enabled robots to adapt to unexpected conditions in real-time, efficiently solving the complex interactions required. The flexibility of the PSL framework allows for a modular combination of planning, motion, and learning, enabling it to handle different types of tasks from a wide range of robotics benchmarks. By sharing RL policies across all stages of a task, PSL achieved remarkable efficiency in training speed and task performance, outstripping methods like E2E and RAPS.

In conclusion, the research team demonstrated the effectiveness of PSL in leveraging LLMs for high-level planning, sequencing motions using vision models, and refining control strategies through RL. PSL achieves a delicate balance of efficiency and precision in translating abstract language goals into practical robotic control. Modular planning and real-time learning make PSL a promising framework for future robotics applications, enabling robots to navigate complex tasks involving multi-step plans.


Check out the Paper and ProjectAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit


Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.






Source link

05May

Top Courses for Machine Learning with Python


In recent years, the demand for AI and Machine Learning has surged, making ML expertise increasingly vital for job seekers. Additionally, Python has emerged as the primary language for various ML tasks. This article outlines the top ML courses in Python, offering readers the opportunity to enhance their skill set, transition careers, and meet the expectations of recruiters.

Machine Learning with Python

This course covers the fundamentals of machine learning algorithms and when to use each of them. It teaches writing Python code for implementing techniques like K-Nearest neighbors (KNN), decision trees, regression trees, etc., and evaluating the same.

Machine Learning Specialization

“Machine Learning Specialization” teaches the core concepts of machine learning and how to build real-world AI applications using the same. The course covers numerous algorithms of supervised and unsupervised learning and also teaches how to build neural networks using TensorFlow.

Applied Machine Learning in Python

This course offers practical training in applied machine learning, emphasizing techniques over statistical theory. It covers topics such as clustering, predictive modeling, and advanced methods like ensemble learning using the scikit-learn toolkit. 

IBM Machine Learning Professional Certificate

This program by IBM offers comprehensive training in Machine Learning and Deep Learning, covering key algorithms and practices like ensemble learning, survival analysis, K-means clustering, DBSCAN, dimensionality reduction, etc. Participants also gain hands-on experience with open-source frameworks and libraries like TensorFlow and Scikit-learn.

Machine Learning Scientist with Python

“Machine Learning Scientist with Python” helps augment one’s Python skills required for performing supervised, unsupervised, and deep learning. It covers topics like image processing, cluster analysis, gradient boosting, and popular libraries like scikit-learn, Spark, and Keras.

Introduction to Machine Learning

“Introduction to Machine Learning” covers concepts like logistic regression, multilayer perceptrons, convolutional neural networks, natural language processing, etc., and demonstrates their application in various real-world applications. The course also teaches how to implement these models using Python libraries like PyTorch.

Machine Learning with Python: From Linear Models to Deep Learning

This course teaches the fundamentals of machine learning, covering classification, regression, clustering, and reinforcement learning. Students learn to implement and analyze models like linear models, kernel machines, neural networks, and graphical models. They also gain skills in selecting appropriate models for different tasks and effectively managing machine learning projects.

Machine Learning and AI with Python

This course delved into advanced data science concepts using sample datasets, decision trees, random forests, and various machine learning models. It teaches students to train models for predictive analysis, interpret results, identify data biases, and prevent underfitting or overfitting.

Deep Learning Specialization

This course equips learners with the knowledge and skills to understand, develop, and apply deep neural networks in various fields. Through practical projects and industry insights, participants master architectures like CNNs, RNNs, LSTMs, and Transformers using Python and TensorFlow and learn to tackle real-world AI tasks such as speech recognition, natural language processing, and image recognition.

Introduction to Machine Learning with TensorFlow

This course introduces machine learning concepts and demonstrates how to use different algorithms to solve real-world problems. It then moves on to explain the workings of neural networks and how to use the TensorFlow library to build our own image classifier.

Introduction to Machine Learning with Pytorch

This course is similar to the previous one – “Introduction to Machine Learning with TensorFlow.” Instead of the TensorFlow library, it covers another Python library widely used in Deep Learning – Pytorch.

Foundations of Data Science: K-Means Clustering in Python

This course provides a foundational understanding of Data Science, emphasizing essential mathematics, statistics, and programming skills crucial for data analysis. Through practical exercises and a data clustering project, participants gain proficiency in core concepts, preparing them for more advanced Data Science courses and real-world applications across various sectors like finance, retail, and medicine.


We make a small profit from purchases made via referral/affiliate links attached to each course mentioned in the above list.

If you want to suggest any course that we missed from this list, then please email us at 

as**@ma**********.com












Shobha is a data analyst with a proven track record of developing innovative machine-learning solutions that drive business value.




Source link

04May

Meet Multilogin: The Anti-Detect Browser for Web Scraping and Multi-Accounting


Facing constant frustration with slow and error-prone manual processes, many users struggle to bypass platform detections, especially when security concerns loom large over profile storage and access. Add to this the frustration of downtime, sluggish support, and the challenge of navigating security during multi-project team collaborations, and the need for a reliable solution becomes glaringly clear.

Meet Multilogin, an antidetect browser to tackle the above problems. It is a specialized tool designed to help users manage multiple online identities across various platforms. It offers advanced features that enable businesses to operate with greater security and flexibility, particularly when managing browser profiles and online accounts. Through sophisticated browser fingerprint masking and secure proxy integration, Multilogin allows users to mimic human behavior and avoid detection on platforms with strict anti-fraud measures.

Now, the latest version, Multilogin X, takes the capabilities further. With enhanced features for team collaboration, improved cloud-based profile storage, and advanced automation capabilities, Multilogin X offers a seamless experience for businesses dealing with multiple projects or requiring better collaboration between teams. Its innovative approach helps businesses bypass platform detections effortlessly, safeguard their online identities, and streamline digital operations.

Two solutions offered for different problems by the Multilogin X are as follows:

  1. Web Scraping and Automation:
    • Single API Calls: Multilogin X supports single API calls, simplifying the process of integrating the tool into your automated workflows. This allows you to automate tasks efficiently and scale operations seamlessly.
    • Prompt Launching with Local Profiles: Users can swiftly launch browser profiles with local configurations, reducing setup time and enabling rapid task execution. This flexibility improves efficiency, particularly for projects that involve repetitive scraping tasks.
    • ‘Headful’ Browsers for Efficient Scraping: Multilogin X allows users to launch ‘headful’ browser instances that replicate typical user behavior, making it easier to scrape websites while remaining undetected. These instances offer greater efficiency by mimicking human interactions, bypassing anti-bot measures.
  2. Multi-Accounting:
    • Maximize Your Chances for Staying Undetected: Multilogin X leverages advanced fingerprint masking and anti-detection technology to minimize detection risks across multiple accounts. Users can confidently operate on platforms implementing strict anti-fraud measures, ensuring each browser profile appears unique.
    • Ready-to-Use, Customizable Fingerprints: The platform provides an extensive range of fingerprints, enabling users to fine-tune profiles for different applications. This helps in creating distinct browser profiles that look genuine and individualized.
    • Rapid Account Creation: Multilogin X simplifies setting up new accounts across various platforms, allowing users to quickly create and manage multiple accounts. This feature’s efficiency is valuable for businesses that rely on multi-account setups for marketing, sales, or other digital operations.

Multilogin X Features and Benefits:

  1. Account Bans and Detection, Gone: 
    • Advanced Anti-Detection Technology: Multilogin employs pioneering anti-detect technology to ensure website browser fingerprints appear unique and consistent. This reduces the likelihood of detection and account bans by simulating realistic browsing patterns and hiding identifiable data points. It’s particularly useful for digital marketers, e-commerce sellers, and anyone needing multiple accounts that bypass platform restrictions.
  2. Create Browser Profiles in a Flash: 
    • Quick Profile Generation: The Quick Profile feature allows users to generate single-use browser profiles within seconds. This feature is designed for temporary tasks where quick setup and tear-down are crucial. Once the task is complete, profiles automatically delete themselves, eliminating manual cleanup and enabling more efficient completion.
  3. Teamwork Makes the Dream Work, Securely: 
    • Multi-Level Role Management: Multilogin provides comprehensive collaboration features through multi-level role management. Users can securely organize their teams by assigning different roles, from Owner to Launcher, each with distinct access levels. This promotes effective task delegation, protects sensitive data, and prevents redundancy and overlap in operations, making it ideal for large teams managing multiple projects.
  4. Light as a Feather, Fits Like a Glove: 
    • Lightweight Platform: Designed to be resource-efficient, Multilogin X provides smooth navigation and performance on high-end and older devices. Users can benefit from its powerful features without compromising system performance, making it versatile for various environments.
  5. We Speak Your Language, Literally: 
    • Comprehensive Multilingual Support: Multilogin’s team of experts offers 24/7 support in multiple languages, ensuring that users, regardless of their expertise level, receive assistance tailored to their needs. Whether troubleshooting or seeking optimization advice, the support team is ready to assist with any issue, big or small.

How does Multilogin X work?

Multilogin X works by creating and managing unique browser profiles that mimic real user behavior, providing each profile with distinct fingerprints to evade detection by platform algorithms. It uses advanced anti-detect technology to make each profile appear as a separate, genuine user, enabling activities like multi-account management, web scraping, and automation. Users can quickly create profiles with customizable settings for different tasks, while the cloud-based storage ensures profiles are securely saved and accessible across devices. Collaboration tools allow teams to share profiles efficiently, and automated workflows streamline repetitive tasks, helping businesses scale their digital operations smoothly and securely. It is based on the three below-mentioned pillars:

  • Browser Fingerprint Masking: Generates unique browser fingerprints for each profile, simulating human-like behavior and avoiding detection by anti-bot algorithms.
  • Proxy Integration: Assign different proxies to each profile, routing internet traffic through different IP addresses to mask the user’s location.
  • Profile Encryption: Encrypts all browser data for each profile, ensuring secure storage and transit of cookies, login details, and browsing history and protecting sensitive information from unauthorized access.

How to use Multilogin X?

Step 1: Register an account

  • Fill out the registration form and click “Create account“
  • Enter the verification code from your email (be sure to check your Social, Promo, and Spam folders)

Step 2: Get a subscription

  • Click “View plans”
  • Choose a subscription that fits your needs
  • Choose a payment method: Multilogin platform accept card*, PayPal, and crypto payments
  • Pay the invoice: The platform has this handy guide that will help if you get stuck
  • Click “Open Multilogin X” in the top-right corner

Step 3: Connect the agent

  • Download the agent for your OS
    • On a Mac, make sure your OS is 12 (Monterey) or newer
  • Click the downloaded file to open the installer
    • On Windows, right-click the file and choose “Run as administrator”
  • Follow the on-screen instructions to complete the installation
  • Click “Connect agent” and wait while Multilogin downloads all components and establishes the connection (wait time can be longer with slower connection speeds)

Step 4: Create a profile

  • Click “New profile” and enter its name in the top field
    • Operating system: choose your device’s OS for best results
    • Browser: Mimic is built on Chrome and Stealthfox – on Firefox
    • Storage type: choose cloud storage for seamless multi-device access and synchronization and local storage for exceptional speed and saving data on your device

Bonus step: Elevate your strategy

Use Cases of Multilogin X

Let’s explore some of the great across various industries cases offered by Multiulogin X:

Use Cases of Multilogin X In Different Sector

Improvements in Multilogin X compared to Multilogin

Let’s compare it with the current version of Multilogin. i.e., Multilogin X with Multilogin’s earlier version:

Conclusion

In conclusion, Multilogin X is a powerful tool for addressing the critical needs of businesses and individuals who require secure and efficient online identity management. Its advanced browser fingerprint masking, seamless proxy integration, and robust profile encryption offer unparalleled security and anonymity. It is ideal for various applications, from e-commerce and advertising to affiliate marketing and web scraping. Multilogin X empowers users to expand their digital operations confidently and efficiently by enabling streamlined automation, team collaboration, and scalable multi-account management. Whether safeguarding brands, enhancing ad campaigns, or exploring global markets, Multilogin X provides a reliable solution to these challenges.


Thanks to Multilogin for the thought leadership/ Educational article. Multilogin has supported and sponsored us in this content/article.


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most recent endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that is both technically sound and easily understandable by a wide audience. The platform boasts of over 2 million monthly views, illustrating its popularity among audiences.




Source link

03May

A Novel AI Approach to Enhance Language Models: Multi-Token Prediction


Language models are incredibly powerful tools that can understand and generate human-like text by learning patterns from massive datasets. However, the traditional method of training these models, called “next-token prediction,” has its limitations. It essentially teaches the model to predict the next word in a sequence, but this approach can lead to suboptimal performance, especially for more complex tasks.

The researchers behind this study propose a new technique called multi-token prediction. Instead of predicting one token (word) at a time, this method trains the model to predict multiple future tokens simultaneously. Imagine it like this: While learning a language, instead of guessing one word at a time, you’re challenged to predict entire phrases or even sentences. Sounds intriguing, right?

So, how does this multi-token prediction work? The researchers designed a model architecture with a shared trunk that produces a latent representation of the input context. This shared trunk is then connected to multiple independent output heads, each responsible for predicting one of the future tokens. For example, if the model is set to predict four future tokens, it will have four output heads working in parallel.

During training, the model is fed a text corpus, and at each position, it is tasked with predicting the next n tokens simultaneously. This approach encourages the model to learn longer-term patterns and dependencies in the data, potentially leading to better performance, especially for tasks that require understanding the broader context.

Moreover, the researchers also tackled a critical challenge: reducing the GPU memory usage of these multi-token predictors. They implemented a clever technique that sequentially computes the forward and backward passes for each output head, accumulating gradients at the shared trunk. This approach reduces the peak GPU memory utilization, making it feasible to train larger models efficiently.

The researchers conducted extensive experiments, and the results are quite promising. They found that multi-token prediction becomes increasingly useful as the model size grows. For instance, on coding evaluation benchmarks like MBPP and HumanEval, models trained with multi-token prediction outperformed their next-token prediction counterparts, sometimes by a significant margin. The 13B parameter models solve 12% more problems on HumanEval and 17% more on MBPP than comparable next-token models.

Moreover, the additional output heads can be leveraged to speed up inference using techniques like speculative decoding. The researchers observed up to a 3x speedup in decoding times for their best 4-token prediction model on code and natural language tasks.

But it’s not just about coding; multi-token prediction also showed promising results in natural language tasks. When evaluated on summarization benchmarks, models trained with multi-token prediction achieved higher ROUGE scores compared to the next-token baseline, indicating better text generation capabilities.

The next interesting question to answer is, “Why It Works?”

The researchers offer some insightful explanations for why multi-token prediction works so well. One key idea is that it mitigates the distributional discrepancy between training-time teacher forcing (where the model receives the ground truth for each future token) and inference-time autoregressive generation (where the model generates tokens without guidance).

Additionally, multi-token prediction implicitly assigns higher weights to tokens that represent “choice points” – decisions that significantly impact the remainder of the text. By reinforcing these critical decision points during training, the model learns to make better choices, leading to more coherent and useful text generations. Furthermore, an information-theoretic analysis suggests that multi-token prediction encourages the model to focus on predicting highly relevant tokens for the subsequent text, potentially capturing longer-term dependencies more effectively.

While the results are promising, the researchers acknowledge that there is still room for improvement. One area for future exploration is automatically determining the optimal value of n (the number of future tokens to predict) based on the task and data distribution. Additionally, they suggest that adjusting the vocabulary size and exploring alternative auxiliary prediction losses could lead to even better trade-offs between compressed sequence length and computational efficiency. Overall, this research opens up exciting avenues for enhancing language models’ capabilities, paving the way for more powerful and efficient natural language processing systems.


Check out the PaperAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 41k+ ML SubReddit


Vineet Kumar is a consulting intern at MarktechPost. He is currently pursuing his BS from the Indian Institute of Technology(IIT), Kanpur. He is a Machine Learning enthusiast. He is passionate about research and the latest advancements in Deep Learning, Computer Vision, and related fields.






Source link

03May

Top Artificial Intelligence (AI) Governance Laws and Frameworks


Artificial Intelligence (AI) is changing the world quickly as several nations and international organizations have adopted frameworks to direct the development, application, and governance of AI. Numerous initiatives are influencing the ethical use of AI to prioritize human rights and innovation. Here are some of the top AI governance laws and frameworks. 

1. EU AI Act

The Artificial Intelligence Act, a historic piece of legislation designed to promote innovation and guarantee AI safety and adherence to fundamental rights, was approved by the European Parliament. It outlaws AI applications that pose a risk to people’s rights, such as some biometric systems and the ability to identify emotions in particular settings, such as schools and workplaces. The use of biometric identification by law enforcement is tightly controlled, and real-time deployment necessitates strict security measures.

It mentions how clear duties must be followed by high-risk AI systems in order to reduce possible harm, maintain openness, and provide human oversight. Transparency standards apply to general-purpose AI systems and models, and deepfakes need to be identified properly. 

2. EU AI Liability Directive

The European Parliament and Council have proposed an AI Liability Directive to address the issues that AI presents to current liability regulations. The complexity and opacity of AI make it difficult for victims to establish liability, and as a result, current national liability frameworks are insufficient for managing claims for damage related to AI. This directive aims to provide victims of AI-related harm with the same level of protection as those affected by traditional products. It intends to eliminate disjointed national adaptations of liability standards and lessen legal ambiguity for firms. The directive supports the Union’s digital and environmental objectives and is a component of a larger EU plan to advance reliable AI and digital technologies.

3. Brazil AI Bill

This law lays forth national guidelines for creating, deploying, and appropriately utilizing AI systems in Brazil. The goal of the law is to protect fundamental rights and guarantee safe, dependable AI systems that advance science, democracy, and the interests of citizens. Human-centricity, respect for democracy and human rights, environmental preservation, sustainable development, equality, non-discrimination, and innovation are the guiding principles of AI development in Brazil. The law also supports consumer protection, fair competition, and free entrepreneurship. These clauses highlight how crucial it is to have responsible AI governance that respects morality and basic rights while advancing technology and adhering to democratic ideals.

4. Canada AI and Data Act

Part of Canada’s Digital Charter Implementation Act, 2022, the planned Artificial Intelligence and Data Act (AIDA) seeks to govern AI systems to guarantee their safety, impartiality, and accountability. AI is being used increasingly in vital sectors like healthcare and agriculture, but it can also be dangerous, especially for underprivileged people. AIDA would create standards for ethical AI development, design, and application with a focus on justice and safety. Canada’s dedication to utilizing AI’s promise while defending individuals’ rights and minimizing any risks is reflected in this legislation. 

5. U.S. Executive Order on Trustworthy AI

The possible advantages and hazards of AI are emphasized in the Executive Order on the Safe, Secure, and Trustworthy Development and Use of Artificial Intelligence. It recognizes the pressing need for responsible AI governance to address social issues and avoid negative outcomes, including fraud, discrimination, and threats to national security. In order to develop and apply AI safely and responsibly, the order emphasizes a concerted effort across the government, corporate sector, academia, and civil society. 

The Administration seeks to align executive departments and agencies with eight guiding principles and priorities for AI development and governance. Collaboration with a wide range of stakeholders, including business, academia, civil society, labor unions, foreign partners, and others, will be a component of these efforts. This policy framework demonstrates a dedication to taking the lead in AI governance to guarantee its responsible growth, thereby improving American society, economy, and security.

6. NYC Bias Audit Law

Employers and employment agencies in New York City are prohibited from using Automated Employment Decision Tools (AEDTs) by Local Law 144 of 2021, which the DCWP enforces. This law forbids the use of AEDTs unless necessary notices are given and a biased audit has been completed. AEDTs are computer-based tools that significantly support or replace discretionary decision-making in employment decisions. They do this by using machine learning, statistical modeling, data analytics, or AI. By mandating adherence to bias audit and notice standards, the law seeks to ensure accountability and openness in the use of these tools.

7. China Algorithmic Recommendation Law

These regulations provide guidelines for the use of algorithmic recommendation technology in mainland Chinese internet services. They seek to safeguard national interests, control behavior, preserve social ideals, and defend the rights of individuals and groups. The state Internet department is in charge of governance and works with other pertinent organizations such as market regulators, public security, and telecommunications. Laws must be followed, ethical norms must be upheld, and providers must give equity, fairness, openness, and good faith top priority. Industry associations are requested to provide guidelines, enforce rules, and support providers in fulfilling regulatory obligations and public expectations for algorithmic recommendation services.

8. China Generative AI Services Law

With the Cyberspace Administration of China (CAC) and other government agencies issuing Interim Measures for the Administration of Generative Artificial Intelligence Services, China has taken the initiative to regulate generative artificial intelligence (AI) services. These regulations, which come into effect on August 15, 2023, control companies that offer generative AI services to the general Chinese population. Models that generate text, graphics, audio, and video are included in generative AI technology. The Interim Measures recognize potential foreign investment while promoting innovation and research. Future artificial intelligence laws are also anticipated to expand regulation beyond generative AI. Given the potential penalties or shutdowns for non-compliant services operating in China, compliance is essential.

9. China Deep Synthesis Law

This law set forth guidelines for China’s deep synthesis technology, which is utilized in online information services. To manage deep synthesis services, preserve socialist principles, safeguard national interests, and promote public welfare, they are predicated on cybersecurity, data security, and personal information protection regulations. The public security and telecommunications departments are in charge of oversight, which is overseen by the state Internet department. Deep synthesis service providers are required to abide by the law, honor social standards, and support political agendas and ideals. Industry associations are encouraged to set norms and self-discipline mechanisms for deep synthesis service providers in order to ensure legitimate operations and social accountability.

10. Peru Law 31814

In order to encourage the use of AI for social and economic development, Peru’s Executive Power passed Law 31814, establishing the country as a pioneer in AI legislation in Latin America. This regulation places a strong emphasis on ethical standards and human rights while highlighting the responsible, transparent, and sustainable use of AI. It categorizes great technology, such as AI, as being of national interest in order to improve national security, the economy, public services, health, and education. 

11. South Korea AI Act

With the proposed “Act on Promotion of AI Industry and Framework for Establishing Trustworthy AI” (AI Act), South Korea is moving forward with its legislative framework AI. By combining seven different AI-related regulations into one comprehensive strategy, this legislation seeks to fully oversee and regulate the AI industry. The AI Act places a strong emphasis on bolstering the AI sector while guaranteeing the reliability of AI systems to safeguard users. Important clauses include defining high-risk AI categories, assisting AI companies, establishing ethical standards, permitting innovation without prior government clearance, and forming an AI Committee and policy roadmap. 

12. Indonesia Presidential Regulation on AI

AI regulations are changing in Indonesia as a result of growing AI integration across businesses. The nation has taken action to address ethical issues and rules for AI use, even though specific legislation is still lacking in this area. The National Strategy on Artificial Intelligence 2020–2045 offers a foundation for the creation of AI policies. The Electronic Information and Transactions Law (EIT Law), which defines electronic agents and lays forth general guidelines for AI operators, currently governs artificial intelligence. The OJK Code of Ethics Guidelines for AI in the Financial Technology Industry and MOCI Circular Letter No. 9 of 2023 (MOCI CL 9/2023) are recent developments that highlight the ethical usage of AI.

13. Mexico Federal AI Regulation

The proposed legislation in Mexico outlines a thorough framework for regulating AI technologies. Extraterritorial applicability provisions are included, necessitating compliance by AIS providers abroad that supply services or generate data utilized in Mexico. Authorization would be supervised by the Federal Telecommunications Institute (IFT), with backing from the National Artificial Intelligence Commission. Like the EU, AI systems would be categorized according to danger levels. Even for services that are provided for free, AIS implementation would require prior authorization from the IFT. Penalties for noncompliance could reach 10% of one’s yearly salary. This law, which seeks to influence AIS development and commercialization in Mexico, parallels global trends in AI policy.

14. Chile Draft AI Bill

The legislative body of Chile has commenced deliberations on a bill designed to govern the moral and legal dimensions of AI in relation to its development, dissemination, commercialization, and application. The goal of the Bill, which has the backing of Chile’s Ministry of Science, Technology, Knowledge, and Innovation and is modeled after Europe’s 2021 Artificial Intelligence Act, is to strike a balance between technological advancement and citizen rights. It suggests defining AI, designating high-risk AI systems, creating a National Commission for AI, demanding permission for AI development and usage, and delineating the consequences of noncompliance. Chile is demonstrating its commitment to responsible technological innovation management with this legislative endeavor, which prioritizes human well-being and societal advantages in the application of AI. 

15. NIST AI RMF

NIST’s AI Risk Management Framework (AI RMF) offers organized guidelines for addressing risks associated with AI. The framework, which was created via joint efforts between the public and private sectors, focuses on generative AI and addresses 12 identified hazards. In order to help organizations establish trustworthy AI practices, it provides them with resources and actionable instructions such as the AI RMF Playbook, Roadmap, Crosswalk, and Perspectives. Founded in March 2023, the Trustworthy and Responsible AI Resource Centre promotes the adoption and compliance of the AI RMF on a global scale. The consensus-driven methodology of NIST guarantees thorough risk management for AI technology, boosting deployment confidence and dependability.

16. Blueprint for an AI Bill of Rights

The issues raised by technology and automated systems that have the potential to violate people’s rights are covered in the Blueprint for an AI Bill of Rights. With the help of technology, this effort seeks to advance society while defending democratic principles and civil rights. This endeavor is in line with President Biden’s dedication to eliminating injustices and improving civil rights. In order to protect American citizens in the age of artificial intelligence, the White House Office of Science and Technology Policy has established five guiding principles for the appropriate design, usage, and deployment of automated systems. This blueprint acts as a framework to safeguard people’s rights and direct technology advancement and policy in a way that upholds civil liberties and democratic principles.

17. OECD AI Principles

The OECD AI Principles, which were created in May 2019, support the innovative and reliable application of AI while upholding democratic principles and human rights. These guidelines highlight the following.

  1. Inclusive Development and Well-Being: Artificial Intelligence ought to promote sustainable development, human well-being, and inclusive economic prosperity. 
  2. Human-centered values and justice: AI systems ought to respect diversity, justice, and human rights without prejudice. 
  3. Explainability and Transparency: Users should be able to understand how AI systems work. 
  4. Robustness, Security, and Safety: Throughout their entire life cycle, AI systems need to be reliable, safe, and secure. 
  5. Accountability: Systems and developers using AI should take responsibility for their decisions and results.

18. OECD AI Risk Classification Framework

An organized method for assessing and categorizing AI systems according to their unique attributes and environments is offered by the OECD Framework for the Classification of AI Systems. This easy-to-use tool helps lawmakers, regulators, policymakers, and other stakeholders comprehend and weigh the advantages and disadvantages of various AI systems. The framework takes into account several aspects, each with a subset of characteristics and traits, including People & Planet, Economic Context, Data & Input, AI model, and Task & Output. Policymakers can ensure a creative and reliable approach that is in line with the OECD AI Principles by customizing their policy approaches to various types of AI systems.

19. Council of Europe Framework Convention on AI

This sets forth a convention designed to guarantee that actions pertaining to AI systems respect democratic principles, human rights, and the rule of law. Each party to the convention shall take necessary action to carry out these obligations, taking into account the gravity of the situation and the possibility of unfavorable effects on democracy, human rights, and the rule of law. The convention deals with computer-based systems that produce judgments or predictions that affect their surroundings. It pertains to actions taken by governmental bodies or private parties acting on their behalf throughout the lifespan of the artificial intelligence system. The aims of the agreement are aligned with the attention given to private actors’ activities that are not covered by state authorities.

20. Singapore AI Verify Framework

AI Verify is a software toolkit and testing framework for AI governance that is intended to evaluate AI systems in accordance with accepted worldwide AI governance framework standards, such as those of Singapore, the OECD, and the European Union. It conducts technical tests on supervised learning models for tabular and picture datasets within corporate contexts. AI Verify does not provide AI ethical standards, ensure that tested AI systems are free from bias or danger, or test generative AI or large language models (LLMs). Even though AI Verify is still a Minimum Viable Product (MVP), it recognizes that there are important gaps in the testing of AI governance and plans to open-source its toolkit to involve research groups, industry players, and developers in the advancement and enhancement of AI governance testing and evaluation. 

21. UNESCO AI Ethics Recommendation

Within the context of UNESCO’s mandate, this recommendation tackles the ethical issues surrounding AI, emphasizing a normative reflection framework built on interdependent values, principles, and acts. It places a strong emphasis on harm prevention, human dignity, and well-being as fundamental ethics that are based on science and technology. It addresses fundamental ethical aspects of AI systems, such as information processing, learning, reasoning, and decision-making capacities, rather than trying to define AI. The entire AI lifecycle, from creation and research to implementation and use, is affected by ethical issues. The recommendation underscores the significance of responsible practices, critical thinking, and ethical education in digital communities while highlighting the impact of AI on education, research, culture, and communication.

22. G7 Hiroshima Process AI Guiding Principles

Establishing standards for companies creating and utilizing great AI technologies is the goal of the Hiroshima Process International Guiding Principles for Organisations Developing Advanced AI Systems. These guidelines are intended to guarantee the dependability, security, and safety of sophisticated AI systems, such as generative and foundational models. Collaboration between academia, civic society, the commercial sector, and public sector organizations is emphasized in the paper. Adjusting to the changing state of AI technologies expands upon the current set of OECD AI Principles. Respecting democratic values, human rights, and diversity and making sure AI systems don’t seriously jeopardize safety and security are all important components. 

23. ISO/IEC 42001

The international standard ISO/IEC 42001 outlines the conditions that must be met in order for organizations to implement and oversee an artificial intelligence management system (AIMS). It provides essential direction for navigating the morally complex, open, and quickly developing field of AI. ISO/IEC 42001, the first AI management system standard in the world, helps organizations manage the opportunities and risks related to the development and application of AI. It encourages ethical AI procedures, guaranteeing that innovation and regulation are in harmony. This standard promotes confidence and accountability in AI systems globally and is crucial for companies that use or offer AI-based goods or services.

24. ISO/IEC 23894

An information technology standard called ISO/IEC 23894:2023 provides recommendations on risk management for artificial intelligence (AI) to organizations engaged in the creation, implementation, or use of AI products. This document offers procedures for efficient implementation, assisting in the integration of risk management into AI-related activities and operations. The guidelines promote customized approaches to AI risk management and can be adjusted to fit any organization and scenario. Organizations can improve their capacity to recognize, evaluate, and reduce risks associated with AI by adhering to ISO/IEC 23894. 

25. IEEE P2863

Safety, transparency, accountability, responsibility, and bias minimization are among the governance criteria for AI development and use within organizations, as outlined in the Recommended Practice for Organisational Governance of Artificial Intelligence (AI). This standard provides process stages for training, compliance, performance auditing, and efficient implementation of AI governance. The working group is sponsored by the IEEE Computer Society’s Artificial Intelligence Standards Committee and is focused on AI governance. The significance of organized governance frameworks for ensuring moral, responsible, and efficient AI deployment in diverse organizational contexts is emphasized by this active PAR (Project Authorization Request).

26. IEEE P7003

Methodologies for addressing bias in algorithm construction are provided by the Algorithmic Bias Considerations standard. It contains recommendations for managing user expectations to reduce bias from misinterpreting system outputs, guidelines for setting and communicating algorithm application boundaries to prevent unintended consequences, and criteria for validation dataset selection to control bias quality. This standard, which is supported by the Software & Systems Engineering Standards Committee (C/S2ESC), attempts to advance equity and openness in algorithm development. Since its acceptance, it has been in operation and is a crucial tool for addressing algorithmic biases and guaranteeing the ethical application of AI.


This article is inspired by this LinkedIn post.


Tanya Malhotra is a final year undergrad from the University of Petroleum & Energy Studies, Dehradun, pursuing BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is a Data Science enthusiast with good analytical and critical thinking, along with an ardent interest in acquiring new skills, leading groups, and managing work in an organized manner.




Source link

02May

This AI Paper from MIT and Harvard Demonstrates an AI Approach to Automated in Silico Hypothesis Generation and Testing Made Possible Through the Use of SCMs


Recent advancements in econometric modeling and hypothesis testing have witnessed a paradigm shift towards integrating machine learning techniques. While strides have been made in estimating econometric models of human behavior, more research still needs to be conducted on effectively generating and rigorously testing these models. 

Researchers from MIT and Harvard introduce a novel approach to address this gap: merging automated hypothesis generation with in silico hypothesis testing. This innovative method harnesses the capabilities of large language models (LLMs) to simulate human behaviour with remarkable fidelity, offering a promising avenue for hypothesis testing that may unearth insights inaccessible through traditional methods.

This approach’s core lies in adopting structural causal models as a guiding framework for hypothesis generation and experimental design. These models delineate causal relationships between variables and have long served as a foundation for expressing hypotheses in social science research. What sets this study apart is using structural causal models not only for hypothesis formulation but also as a blueprint for designing experiments and generating data. By mapping theoretical constructs onto experimental parameters, this framework facilitates the systematic generation of agents or scenarios that vary along relevant dimensions, enabling rigorous hypothesis testing in simulated environments.

A pivotal milestone in operationalizing this structural causal model-based approach is the development of an open-source computational system. This system seamlessly integrates automated hypothesis generation, experimental design, simulation using LLM-powered agents, and subsequent analysis of results. Through a series of experiments spanning various social scenarios—from bargaining situations to legal proceedings and auctions—the system demonstrates its capacity to autonomously generate and test multiple falsifiable hypotheses, yielding actionable findings.

While the findings derived from these experiments may not be groundbreaking, they underscore the empirical validity of the approach. Importantly, they are not merely products of theoretical conjecture but are grounded in systematic experimentation and simulation. However, the study raises critical questions regarding the necessity of simulations in hypothesis testing. Can LLMs effectively engage in “thought experiments” to derive similar insights without resorting to simulation? The study conducts predictive tasks to address this question, revealing notable disparities between LLM-generated predictions and empirical results and theoretical expectations.

Furthermore, the study explores the potential of leveraging fitted structural causal models to improve prediction accuracy in LLM-based simulations. By providing contextual information about scenarios and experimental path estimates, the LLM performs better in predicting outcomes. Yet, significant gaps persist between predicted outcomes and empirical and theoretical benchmarks, underscoring the complexity of accurately capturing human behavior in simulated environments.


Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit


Arshad is an intern at MarktechPost. He is currently pursuing his Int. MSc Physics from the Indian Institute of Technology Kharagpur. Understanding things to the fundamental level leads to new discoveries which lead to advancement in technology. He is passionate about understanding the nature fundamentally with the help of tools like mathematical models, ML models and AI.






Source link

30Apr

InternVL 1.5 Advances Multimodal AI with High-Resolution and Bilingual Capabilities in Open-Source Models


Multimodal large language models (MLLMs) integrate text and visual data processing to enhance how artificial intelligence understands and interacts with the world. This area of research focuses on creating systems that can comprehend and respond to a combination of visual cues and linguistic information, mimicking human-like interactions more closely.

The challenge often lies in the limited capabilities of open-source models compared to their commercial counterparts. Open-source models frequently exhibit deficiencies in processing complex visual inputs and supporting various languages, which can restrict their practical applications and effectiveness in diverse scenarios.

Historically, most open-source MLLMs have been trained at fixed resolutions, primarily using datasets limited to the English language. This approach significantly hinders their functionality when encountering high-resolution images or content in other languages, making it difficult for these models to perform well in tasks that require detailed visual understanding or multilingual capabilities.

The research from Shanghai AI Laboratory, SenseTime Research, Tsinghua University, Nanjing University, Fudan University, and The Chinese University of Hong Kong introduces InternVL 1.5, an open-source MLLM designed to significantly enhance the capabilities of open-source systems in multimodal understanding. This model incorporates three major improvements to close the performance gap between open-source and proprietary commercial models. The three main components are:

  1. Firstly, a strong vision encoder, InternViT-6B, has been optimized through a continuous learning strategy, enhancing its visual understanding capabilities.
  2. Secondly, a dynamic high-resolution approach allows the model to handle images up to 4K resolution by dynamically adjusting image tiles based on the input’s aspect ratio and resolution. 
  3. Lastly, a high-quality bilingual dataset has been meticulously assembled, covering common scenes and document images annotated with English and Chinese question-answer pairs. 

The three steps significantly boost the model’s performance in OCR and Chinese language-related tasks. These enhancements enable InternVL 1.5 to compete robustly in various benchmarks and comparative studies, showcasing its improved effectiveness in multimodal tasks. InternVL 1.5 employs a segmented approach to image handling, allowing it to process images in resolutions up to 4K by dividing them into tiles ranging from 448×448 pixels, adapting dynamically based on the image’s aspect ratio and resolution. This method improves image comprehension and facilitates understanding of detailed scenes and documents. The model’s enhanced linguistic capabilities stem from its training on a diverse dataset comprising both English and Chinese, covering a variety of scenes and document types, which boosts its performance in OCR and text-based tasks across languages.

The model’s performance is evidenced by its results across multiple benchmarks, where it excels particularly in OCR-related datasets and bilingual scene understanding. InternVL 1.5 demonstrates state-of-the-art results, showing marked improvements over previous versions and surpassing some proprietary models in specific tests. For example, text-based visual question answering achieves an accuracy of 80.6%, and document-based question answering reaches an impressive 90.9%. In multimodal benchmarks that assess models on both visual and textual understanding, InternVL 1.5 consistently delivers competitive results, often outperforming other open-source models and rivaling commercial models.

In conclusion, InternVL 1.5 addresses the significant challenges that open-source multimodal large language models face, particularly in processing high-resolution images and supporting multilingual capabilities. This model significantly narrows the performance gap with commercial counterparts by implementing a robust vision encoder, dynamic resolution adaptation, and a comprehensive bilingual dataset. The enhanced capabilities of InternVL 1.5 are demonstrated through its superior performance in OCR-related tasks and bilingual scene understanding, establishing it as a formidable competitor in advanced artificial intelligence systems. 


Check out the Paper and GitHub PageAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit


Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.






Source link

30Apr

OpenVoice V2: Evolving Multilingual Voice Cloning with Enhanced Style Control and Cross-Lingual Capabilities


Instant Voice Cloning (IVC) in Text-to-Speech (TTS) synthesis, also known as Zero-shot TTS, allows TTS models to replicate the voice of any given speaker with just a short audio sample without requiring additional training on that speaker. While existing methods like VALLE and XTTS can replicate tone color, they need more flexibility in controlling style parameters like emotion, accent, and rhythm. Auto-regressive models, though effective, are computationally expensive and slow. Non-autoregressive approaches like YourTTS and Voicebox offer faster inference but lack comprehensive style control. Additionally, achieving cross-lingual voice cloning demands extensive datasets, hindering the inclusion of new languages. Closed-source projects further impede collaborative advancement in the field.

MIT CSAIL, MyShell.ai, and Tsinghua University researchers have developed OpenVoice V2, a groundbreaking text-to-speech model enabling voice cloning across languages. OpenVoice V2 transcends language barriers, offering applications like personalized digital interfaces, multilingual virtual assistants, and automatic dubbing. With enhanced audio quality and native support for English, Spanish, French, Chinese, Japanese, and Korean, OpenVoice V2 surpasses its predecessor. It allows granular control over voice styles, including emotion and accent, without relying on the reference speaker’s style. Moreover, it achieves zero-shot cross-lingual voice cloning, even for languages absent from its training data, while maintaining computational efficiency and real-time inference capabilities.

Prior research in IVC encompasses auto-regressive methods like VALLE and XTTS, extracting speaker characteristics to generate speech sequentially. While effectively replicating tone color, they lack flexibility in adjusting style parameters like emotion and accent. These models are computationally intensive and slow. Non-auto-regressive approaches like YourTTS and Voicebox offer faster inference but struggle with style parameter control. Additionally, they often rely on extensive datasets for cross-lingual cloning, limiting language inclusivity. Closed-source research from tech giants hampers collaborative progress in the field, hindering innovation and accessibility for the research community.

OpenVoice V2 integrates features from its predecessor and introduces Accurate Tone Color Cloning, Flexible Voice Style Control, and Zero-shot Cross-lingual Voice Cloning. The model’s simplicity lies in decoupling tone color cloning from style and language control, achieved through a base speaker TTS model and a tone color converter. The TTS model handles style and language, while the converter embodies the reference speaker’s tone color. Training involves collecting datasets for TTS and tone color conversion separately. The model structure employs flow layers for tone color conversion, ensuring natural sound while removing tone color information. The approach facilitates fluent multilingual speech generation.

The evaluation of voice cloning faces challenges in objectivity due to variations in training/test sets and objectives across studies. OpenVoice focuses on tone color cloning, style parameter control, and cross-lingual cloning. Rather than numerical comparisons, it emphasizes qualitative analysis, offering publicly available audio samples for assessment. It accurately clones tone color across diverse voice distributions, preserves various speech styles, and enables cross-lingual cloning with minimal speaker data. OpenVoice’s feed-forward structure ensures rapid inference, achieving 12× real-time performance on a single A10G GPU, with potential for further optimization.

In conclusion, OpenVoice V2 enhances audio quality through a revised training strategy and introduces native English, Spanish, French, Chinese, Japanese, and Korean support. V1 and V2 are now available for free commercial use under the MIT License. Building upon V1’s features, V2 excels in tone color cloning across languages and accents, offers precise control over voice styles, and enables zero-shot cross-lingual cloning. By decoupling tone color cloning from other voice styles and languages, OpenVoice achieves greater flexibility and provides its source code and model weights for future research.


Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.




Source link

29Apr

Top Data Science Courses in 2024


As businesses increasingly rely on data-driven decision-making, the ability to extract insights and derive value from data has become quite essential. Acquiring skills in data science enables professionals to unlock new opportunities for innovation and gain a competitive edge in today’s digital age. This article lists the top data science courses one should take to master the necessary skills and meet the growing demand for data expertise in various industries.

IBM Data Science Professional Certificate

This course helps master the practical skills and knowledge necessary for a proficient data scientist. It is a beginner-friendly course that teaches the tools, languages, and libraries data scientists use, such as Python and SQL. The course allows the students to demonstrate their proficiency in data science using real-world projects.

Data Science Specialization

“Data Science Specialization” covers the concepts and tools required throughout the entire data science pipeline. The course also has a separate section on statistics, which is essential for data science. It uses R language for all programming tasks such as data analysis, statistical inference, and building machine learning models.

Applied Data Science with Python Specialization

This course is ideal for learners with a basic programming background. It teaches data science through Python and covers its libraries, such as matplotlib, pandas, nltk, scikit-learn, and networkx, covering topics like information visualization, text analysis, and social network analysis.

Programming for Data Science with Python

This course covers the programming skills required to discover patterns and insights in extensive datasets, execute queries using relational databases, and utilize Unix shell and Git. It includes instruction on tools and libraries such as NumPy, Pandas, and Control Flow.

Python for Data Science

This course introduces a comprehensive set of tools crucial for data analysis and conducting data science. It covers Jupyter Notebooks, Pandas, NumPy, Matplotlib, Git, and numerous other tools. Through engaging with compelling data science problems, students will acquire proficiency in utilizing these tools, gaining practical experience within a real-world context.

Data Science: R Basics

This course introduces the basics of R programming and moves on to cover advanced topics such as probability, inference, regression, and machine learning. It also covers data manipulation using dplyr, visualization with ggplot2, file management in UNIX/Linux, version control through Git and GitHub, and creating reproducible documents with RStudio.

Applied Data Science Specialization

This course covers the tools needed to analyze data and make data-driven business decisions, leveraging computer science and statistical analysis. Through lectures, hands-on labs, and projects hosted in the IBM Cloud, students gain practical experience addressing intriguing data challenges from beginning to end.

Data Science with Python Certification Course

This course is designed to help you become proficient in key Python programming principles, including data and file operations, object-oriented programming, and essential Python libraries like Pandas, NumPy, and Matplotlib for Data Science. It is tailored for both professionals and beginners and covers various machine learning (ML) techniques, recommendation Systems, and other important ML concepts.

Foundations of Data Science

This course is intended for those already in the industry and helps develop the skills needed to apply for more advanced data professional roles. It covers the project workflow PACE (Plan, Analyze, Construct, Execute) and explains how it can help organize data projects. 

Associate Data Scientist in Python

This course is designed by DataCamp, and it enables learners to apply theoretical concepts by executing code directly in the browser. It thoroughly explores libraries such as pandas, Seaborn, Matplotlib, scikit-learn, and others. Additionally, it provides opportunities for learners to engage with real-world datasets, mastering statistical and machine learning techniques necessary for hypothesis testing and constructing predictive models.


We make a small profit from purchases made via referral/affiliate links attached to each course mentioned in the above list.

If you want to suggest any course that we missed from this list, then please email us at 

as**@ma**********.com












Shobha is a data analyst with a proven track record of developing innovative machine-learning solutions that drive business value.




Source link

28Apr

This AI Paper from Google DeepMind Introduces Enhanced Learning Capabilities with Many-Shot In-Context Learning


In-context learning (ICL) in large language models (LLMs) utilizes input-output examples to adapt to new tasks without altering the underlying model architecture. This method has transformed how models handle various tasks by learning from direct examples provided during inference. The problem at hand is the limitation of a few-shot ICL in handling intricate tasks. These tasks often demand a deep comprehension that few-shot learning cannot provide, as it operates under the restriction of minimal input data. This scenario could be better for applications requiring detailed analysis and decision-making based on extensive data sets, such as advanced reasoning or language translation.

Existing research in the field of ICL has primarily focused on the few-shot learning capabilities of models like GPT-3, which adapt to new tasks with a limited set of examples. Studies have investigated the performance limits of these models within small context windows, revealing constraints in task complexity and scalability. The development of models with larger context windows, such as Gemini 1.5 Pro, which supports up to 1 million tokens, represents a significant evolution. This expansion allows for exploring many-shot ICL, greatly enhancing the models’ ability to process and learn from a larger dataset.

Researchers from Google Deepmind have introduced a shift toward many-shot ICL, leveraging larger context windows of models like Gemini 1.5 Pro. This move from few-shot to many-shot learning utilizes increased input examples, significantly enhancing model performance and adaptability across complex tasks. The unique aspect of this methodology is the integration of Reinforced ICL and Unsupervised ICL, which reduce reliance on human-generated content by employing model-generated data and domain-specific inputs alone.

In terms of methodology, the Gemini 1.5 Pro model was employed to handle an expanded array of input-output examples, supporting up to 1 million tokens in its context window. This allowed the exploration of Reinforced ICL, where the model generates and evaluates its rationales for correctness, and Unsupervised ICL, which challenges the model to operate without explicit rationales. The experiments were conducted across diverse domains, including machine translation, summarization, and complex reasoning tasks, using datasets like MATH for mathematical problem-solving and FLORES for machine translation tasks to test and validate the effectiveness of the many-shot ICL framework.

The results from implementing many-shot ICL demonstrate significant performance enhancements. In machine translation tasks, the Gemini 1.5 Pro model outperformed previous benchmarks, achieving a 4.5% increase in accuracy for Kurdish and a 1.5% increase for Tamil translations compared to earlier models. In mathematical problem-solving, the MATH dataset showed a 35% improvement in solution accuracy when using many-shot settings. These quantitative outcomes validate the effectiveness of many-shot ICL in enhancing the model’s adaptability and accuracy across diverse and complex cognitive tasks.

In conclusion, the research marks a significant step forward in ICL by transitioning from few-shot to many-shot ICL using the Gemini 1.5 Pro model. By expanding the context window and integrating innovative methodologies like Reinforced and Unsupervised ICL, the study has successfully enhanced model performance across various tasks, including machine translation and mathematical problem-solving. These advancements not only improve the adaptability and efficiency of large language models but also pave the way for more sophisticated applications in AI.


Check out the PaperAll credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 40k+ ML SubReddit


Nikhil is an intern consultant at Marktechpost. He is pursuing an integrated dual degree in Materials at the Indian Institute of Technology, Kharagpur. Nikhil is an AI/ML enthusiast who is always researching applications in fields like biomaterials and biomedical science. With a strong background in Material Science, he is exploring new advancements and creating opportunities to contribute.






Source link

Protected by Security by CleanTalk