Blog Classic - FindAI Jobs, training and advice

06Jun

AI Engineer at Bristol Myers Squibb – Princeton-NassPrk – NJ

Working with Us
Challenging. Meaningful. Life-changing. Those aren’t words that are usually associated with a job. But working at Bristol Myers Squibb is anything but usual. Here, uniquely interesting work happens every day, in every department. From optimizing a production line to the latest breakthroughs in cell therapy, this is work that transforms the lives of patients, and the careers of those who do it. You’ll get the chance to grow and thrive through opportunities uncommon in scale and scope, alongside high-achieving teams rich in diversity. Take your career farther than you thought possible.

Bristol Myers Squibb recognizes the importance of balance and flexibility in our work environment. We offer a wide variety of competitive benefits, services and programs that provide our employees with the resources to pursue their goals, both at work and in their personal lives. Read more: careers.bms.com/working-with-us.

Key Responsibilities  

Design, develop, and implement AI solutions to address various customer needs and enhance business value.
Build and maintain infrastructure for cloud-based AI platforms, including data storage, processing, and analytics.
Collaborate with cross-functional teams to integrate cloud-based AI solutions into existing systems and applications.
Analyze complex data sets and design effective strategies to optimize cloud-based AI system performance.
Ensure the security and compliance of AI and cloud solutions in accordance with industry standards and best practices.
Continuously research new trends and technologies in AI and cloud computing to recommend improvements to existing systems and processes.
Provide technical support and guidance to teams, addressing their AI and cloud-based needs in a timely and efficient manner.
Participate in continuous professional development to stay current with industry trends and advancements in AI and cloud technology.
Help scale solutions into production.

Qualifications & Experience 

Minimum of 3-5 years of experience in development, with a focus on development, cloud computing and data engineering
Proficiency in cloud computing platforms, such as AWS or Azure.
Experience with programming languages, such as Python or Node.
Familiarity with database systems, such as NoSQL.
Excellent problem-solving and analytical skills.

Strong communication and team collaboration abilities.
Experience with design and implementing production ready solutions using cloud infrastructure.
Good understanding of software development and deployment lifecycle.

If you come across a role that intrigues you but doesn’t perfectly line up with your resume, we encourage you to apply anyway. You could be one step away from work that will transform your life and career.

Uniquely Interesting Work, Life-changing Careers
With a single vision as inspiring as “Transforming patients’ lives through science™ ”, every BMS employee plays an integral role in work that goes far beyond ordinary. Each of us is empowered to apply our individual talents and unique perspectives in an inclusive culture, promoting diversity in clinical trials, while our shared values of passion, innovation, urgency, accountability, inclusion and integrity bring out the highest potential of each of our colleagues.

On-site Protocol

BMS has a diverse occupancy structure that determines where an employee is required to conduct their work. This structure includes site-essential, site-by-design, field-based and remote-by-design jobs. The occupancy type that you are assigned is determined by the nature and responsibilities of your role:

Site-essential roles require 100% of shifts onsite at your assigned facility. Site-by-design roles may be eligible for a hybrid work model with at least 50% onsite at your assigned facility. For these roles, onsite presence is considered an essential job function and is critical to collaboration, innovation, productivity, and a positive Company culture. For field-based and remote-by-design roles the ability to physically travel to visit customers, patients or business partners and to attend meetings on behalf of BMS as directed is an essential job function.

BMS is dedicated to ensuring that people with disabilities can excel through a transparent recruitment process, reasonable workplace accommodations/adjustments and ongoing support in their roles. Applicants can request a reasonable workplace accommodation/adjustment prior to accepting a job offer. If you require reasonable accommodations/adjustments in completing this application, or in any part of the recruitment process, direct your inquiries to

ad****************@bm*.com

. Visit careers.bms.com/eeo -accessibility to access our complete Equal Employment Opportunity statement.

BMS cares about your well-being and the well-being of our staff, customers, patients, and communities. As a result, the Company strongly recommends that all employees be fully vaccinated for Covid-19 and keep up to date with Covid-19 boosters.

BMS will consider for employment qualified applicants with arrest and conviction records, pursuant to applicable laws in your area.

Any data processed in connection with role applications will be treated in accordance with applicable data privacy policies and regulations.

Source link

06Jun

Advanced Plagiarism Detector Using Python and AI [4 Methods] | by Hasan Aboul Hasan

Other than that, the code should be simple to read and understand, given all the comments I added throughout the code😅 However, in case you found something unclear and you need some help, don’t hesitate to drop your questions on the forum!

In this method, we’ll be directly comparing both articles as a whole without chunking them by converting both of them into vector embeddings. Then, using cosine similarity, we’ll see if they’re similar to each other.

from scipy.spatial.distance import cosine
import time 
import resources
import openaidef convert_to_vector(text):
"""
Converts a given piece of text into a vector using OpenAI's embeddings API.
"""
text = text.replace("\n", " ")  # Remove newlines for consistent embedding processing
response = openai.embeddings.create(
input=[text],
model="text-embedding-3-small"
)
return response.data[0].embedding  # Return the embedding vector
def calculate_cosine_similarity(vec1, vec2):
"""
Calculates the cosine similarity between two vectors, representing the similarity of their originating texts.
"""
return 1 - cosine(vec1, vec2)  # The cosine function returns the cosine distance, so 1 minus this value gives similarity
def is_similarity_significant(similarity_score):
"""
Determines if a cosine similarity score indicates significant semantic similarity, implying potential plagiarism.
"""
threshold = 0.7  # Define a threshold for significant similarity; adjust based on empirical data
return similarity_score >= threshold  # Return True if the similarity is above the threshold, False otherwise
def search_semantically_similar(text_to_check):
"""
Compares the semantic similarity between the input text and a predefined article text.
It returns a list containing the similarity score and a boolean indicating whether
the similarity is considered significant.
"""
result = []  # Initialize an empty list to store the similarity score and significance flag
input_vector = convert_to_vector(text_to_check)  # Convert the input text to a vector using an embedding model
article_text = resources.article_two  # texts.two contains the text of the article to compare with
article_vector = convert_to_vector(article_text)  # Convert the article text to a vector
similarity = calculate_cosine_similarity(input_vector, article_vector)  # Calculate the cosine similarity between the two vectors
result.append(similarity)  # Append the similarity score to the list
result.append(is_similarity_significant(similarity))  # Append the result of the significance check to the list
return result  # Return the list containing the similarity score and significance flag
def calculate_plagiarism_score(text):
"""
Calculates the plagiarism score of a given text by comparing its semantic similarity
with a predefined article text. The score is expressed as a percentage.
"""
data = search_semantically_similar(text) # Obtain the similarity data for the input text
data[0] = data[0] * 100  # Convert the first item in the data list (similarity score) to a percentage
return data  # Return the plagiarism score and significance
#MAIN SECTION
start_time = time.time()  # Record the start time of the operation
text_to_check = resources.article_one  # Assign the text to check for plagiarism
plagiarism_score = calculate_plagiarism_score(text_to_check)[0]
significance = calculate_plagiarism_score(text_to_check)[1]
end_time = time.time()  # Record the end time of the operation
runtime = end_time - start_time  # Calculate the total runtime
# Output the results
print(f"Plagiarism Score: {plagiarism_score}%")  # Print the calculated plagiarism score
print(f"Is result Significant: {significance}")  # Print the signficance of the score
print(f"Runtime: {runtime} seconds")  # Print the total runtime of the script

As you can see, the code is very similar in structure to method 1. However, the search_semantically_similar function was edited to directly turn both articles into vectors, compare them, and return the result without chunking.

Plus, I added the calculate_plagiarism_score function, which takes the similarity score and generates a percentage of it. Then, it will return the percentage score and True/False statement if the plagiarism score is significant, which will be analyzed by comparing the cosine similarity score with the threshold I initiated to be 0.7

Now it’s time for AI to enter the battlelfield😂

This method is the same as method 1 in concept; however, instead of comparing the chunks by embedding them into vectors and generating the cosine similarity, we’ll compare them using a power prompt and OpenAI’s GPT model.

from SimplerLLM.tools.text_chunker import chunk_by_paragraphs
from SimplerLLM.language.llm import LLM, LLMProvider
import time 
import resources
import jsondef compare_chunks(text_chunk):
"""
Compares a text chunk with an article text and generates a response using a OpenAI's Model
"""
article_text = resources.article_two  # The text to compare against
prompt = resources.prompt3  # A template string for creating the comparison prompt
final_prompt = prompt.format(piece=text_chunk, article=article_text)  # Formatting the prompt with the chunk and article texts
llm_instance = LLM.create(provider=LLMProvider.OPENAI)  # Creating an instance of the language model
response = llm_instance.generate_text(final_prompt)  # Generating text/response from the LLM
response_data = json.loads(response)  # Parsing the response into a JSON object
return response_data  # Returning the parsed response data
def calculate_plagiarism_score(text):
"""
Calculates the plagiarism score of a text by comparing its chunks against an article text
and evaluating the responses from OpenAI's Model
"""
text_chunks = chunk_by_paragraphs(text)  # Split the input text into chunks using SimplerLLM built-in method
total_chunks = text_chunks.num_chunks  # The total number of chunks in the input text
similarities_json = {}  # Dictionary to store similarities found
chunk_index = 1  # Index counter for naming the chunks in the JSON
plagiarised_chunks_count = 0  # Counter for the number of chunks considered plagiarised
total_scores = 0  # Sum of scores from the LLM responses
for chunk in text_chunks.chunks:
response_data = compare_chunks(chunk.text)  # Compare each chunk using the LLM
total_scores += response_data["score"]  # Add the score from this chunk to the total scores
if response_data["score"] > 6:  # A score above 6 indicates plagiarism
plagiarised_chunks_count += 1
similarities_json[f"chunk {chunk_index}"] = response_data["article"]  # Record the article text identified as similar
json.dumps(similarities_json)  # Convert the JSON dictionary to a string for easier storage
chunk_index += 1  # Increment the chunk index
plagiarism_result_json = {}  # Dictionary to store the final plagiarism results
plagiarism_score = (plagiarised_chunks_count / total_chunks) * 100 if total_chunks > 0 else 0  # Calculate the plagiarism score as a percentage
plagiarism_result_json["Score"] = plagiarism_score
plagiarism_result_json["Similarities"] = similarities_json # Adding where we found similaritites
plagiarism_result_json["IsPlagiarised"] = (total_scores > total_chunks * 6)  # Recording if the response is really plagiarised
json.dumps(plagiarism_result_json)  # Convert the final results dictionary to a JSON string
return plagiarism_result_json  # Return the plagiarism results as a dictionary
#MAIN SECTION
start_time = time.time()  # Record the start time of the operation
text_to_check = resources.article_one  # Assign the text to check for plagiarism
plagiarism_score = calculate_plagiarism_score(text_to_check)
formatted_plagiarism_score = json.dumps(plagiarism_score, indent=2) # Format the output for better readability
end_time = time.time()  # Record the end time of the operation
runtime = end_time - start_time  # Calculate the total runtime
# Output the results
print(f"Plagiarism Score: {formatted_plagiarism_score}")  # Print the calculated plagiarism score
print(f"Runtime: {runtime} seconds")  # Print the total runtime of the script

In the code,, the main function is the calculate_plagiarism_score, which chunks the articles, sends them to the compare_chunks function to get the similarity score, generates a total plagiarism score, and formats the results as JSON to add some details other than the plagiarism score, keeping them clear and readable.

The compare_chunks function creates a GPT instance using SimplerLLM, then uses a power prompt to analyze both chunks and generate a score out of 10 for how similar they are. Here’s the prompt I’m using:

###TASK
You are an expert in plagiarism checking. Your task is to analyze two pieces of text, an input chunk,
and an article. Then you're gonna check if there are pieces of the article that are similar in meaning to 
the input chunk. After that you're gonna pick the piece of article which is most similar and generate for it
a score out of 10 for how similar it is to the input chunk. Then you're gonna need to generate the output
as a JSON format that contains the input chunk, the article chunk which is the most similar, and the score
out of 10. ### SCORING CRITERIA 
When checking for pieces in the article that are close in meaning to the chunk of text make sure you 
go over the article at least 2 times to make sure you picked the the right chunk in the article which is the most 
similair to the input chunk. Then when picking a score it should be based of how similar are the meanings 
and structure of both these sentences.# INPUTS
input chunk: [{piece}]
article: [{article}]# OUTPUT
The output should be only a valid JSON format nothing else, here's an example structure:
{{
"chunk": "[input chunk]",
"article": "[chunk from article which is similar]",
"score": [score]
}}

As you can see, it is a detailed prompt that is very well crafted to generate a specific result. You can learn how to craft similar prompts yourself by becoming a Prompt Engineer.

This method is a combination of methods 2 and 3, where we’re gonna be comparing both articles as a whole but using AI instead of vector embeddings.

from SimplerLLM.language.llm import LLM, LLMProvider
import time 
import resources
import jsondef compare_chunks(text_chunk):
"""
Compares a given text chunk with an article to determine plagiarism using a language model.
Returns dict: The response from the language model, parsed as a JSON dictionary.
"""
article_text = resources.article_two  # The text to compare against
# Formatting the prompt to include both the input text chunk and the article text
comparison_prompt = resources.prompt4.format(piece=text_chunk, article=article_text)
llm_instance = LLM.create(provider=LLMProvider.OPENAI)  # Creating an instance of the language model
response = llm_instance.generate_text(comparison_prompt)  # Generating response
response_data = json.loads(response)  # Parsing the response string into a JSON dictionary
return response_data  # Returning the parsed JSON data
def calculate_plagiarism_score(text_to_analyze):
"""
Calculates the plagiarism score based on the analysis of a given text against a predefined article text.
Returns dict: A JSON dictionary containing the plagiarism score and the raw data from the analysis.
"""
plagiarism_results = {}  # Dictionary to store the final plagiarism score and analysis data
plagiarised_chunk_count = 0  # Counter for chunks considered plagiarised
analysis_data = compare_chunks(text_to_analyze)  # Analyze the input text for plagiarism
total_chunks = len(analysis_data)  # Total number of chunks analyzed
for key, value in analysis_data.items():
# Check if the value is a list with at least one item and contains a 'score' key
if isinstance(value, list) and len(value) > 0 and 'score' in value[0] and value[0]['score'] > 6:
plagiarised_chunk_count += 1
# Check if the value is a dictionary and contains a 'score' key
elif isinstance(value, dict) and 'score' in value and value['score'] > 6:
plagiarised_chunk_count += 1
plagiarism_score = (plagiarised_chunk_count / total_chunks) * 100 if total_chunks > 0 else 0 # Calculate plagiarism score as a percentage
plagiarism_results["Total Score"] = plagiarism_score  # Add the score to the results dictionary
plagiarism_results["Data"] = analysis_data  # Add the raw analysis data to the results dictionary
json.dumps(plagiarism_results)  # Convert the results dictionary to a clear JSON string
return plagiarism_results  # Return the final results dictionary
#MAIN SECTION
start_time = time.time()  # Record the start time of the operation
text_to_check = resources.article_one # Assign the text to check for plagiarism
plagiarism_score = calculate_plagiarism_score(text_to_check)
formatted_plagiarism_score = json.dumps(plagiarism_score, indent=2) # Format the output for better readability
end_time = time.time()  # Record the end time of the operation
runtime = end_time - start_time  # Calculate the total runtime
# Output the results
print(f"Plagiarism Score: {formatted_plagiarism_score}")  # Print the scores
print(f"Runtime: {runtime} seconds")  # Print the total runtime of the script

This code is 80% like the code in method 3. However, instead of comparing each chunk, we send both articles as a whole and let OpenAI’s GPT generate a detailed plagiarism test, comparing all parts of the articles as it wishes. In the end, it returns a detailed output containing a plagiarism score and the top sections are found to be similar in their similarity score.

All this is done using this power prompt:

### TASK
You are an expert in plagiarism checking. Your task is to analyze two pieces of text, an input text,
and an article. Then you're gonna check if there are pieces of the article that are similar in meaning to 
the pieces of the input text. After that you're gonna pick chunk pairs that are most similar to each other
in meaning and structure, a chunk from the input text and a chunk from the article. You will then generate 
a score out of 10 for each pair for how similar they are.
Then you're gonna need to generate the output as a JSON format for each pair that contains 
the input text chunk, the article chunk which are the most similar, and the score out of 10.### SCORING CRITERIA 
When checking for peices in the article that are close in meaning to the chunk of text make sure you 
go over the article at least 2 times to make sure you picked the right pairs of chunks which are most similar.
Then when picking a score it should be based of how similar are the meanings and structure of both these sentences.### INPUTS
input text: [{piece}]
article: [{article}]### OUTPUT
The output should be only a valid JSON format nothing else, here's an example structure:
{{
"pair 1": 
[
"chunk 1": "[chunk from input text]",
"article 1": "[chunk from article which is similar]",
"score": [score]
],
"pair 2": 
[
"chunk 2": "[chunk from input text]",
"article 2": "[chunk from article which is similar]",
"score": [score]
],
"pair 3": 
[
"chunk 3": "[chunk from input text]",
"article 3": "[chunk from article which is similar]",
"score": [score]
],
"pair 4": 
[
"chunk 4": "[chunk from input text]",
"article 4": "[chunk from article which is similar]",
"score": [score]
]
}}

The prompt in methods 3 and 4 is very important to be well-crafted since all the results are based on it. Feel free to tweak and optimize it to your liking and if it generates better results make sure to share it with us in the comments below!

After we tried 2 types of machines to do the work for us, let’s now use human intelligence and see if their results are significant!

Here are the 2 texts I was comparing:

Article 1:What is generative AI?Generative AI refers to deep-learning models that can generate high-quality text, images, and other content based on the data they were trained on.Artificial intelligence has gone through many cycles of hype, but even to skeptics, the release of ChatGPT seems to mark a turning point. OpenAI's chatbot, powered by its latest large language model, can write poems, tell jokes, and churn out essays that look like a human created them. 
Prompt ChatGPT with a few words, and out comes love poems in the form of Yelp reviews, or song lyrics in the style of Nick Cave.

Source link

05Jun

AI Architect – Evergreen at Dell Technologies – Bengaluru, India

Senior Software Principal Engineer and Technical Staff, Software Engineering

The Software Engineering team delivers next-generation software application enhancements and new products for a changing world. Working at the cutting edge, we design and develop software for platforms, peripherals, applications and diagnostics — all with the most advanced technologies, tools, software engineering methodologies and the collaboration of internal and external partners.

Join us to do the best work of your career and make a profound social impact as a Technical Staff and Senior Software Principal Engineer on our Software Engineering Team in Bangalore

What you’ll achieve

As a Technical Staff, you will play a role of GenAI Architect to design and develop advanced artificial intelligence solutions for various Client Software Applications. This is in collaboration with cross-functional teams to understand business requirements and implement/deploy AI models and systems.
You will work with set of best brains inside Dell as well as in the industry SMEs in leading the next generation Client PCs powered with AI & its client SW products that gives World Class Customer Experience.

You will:

Closely work with product managers, data scientists, engineering teams, and other stakeholders to understand business goals and determine AI requirements.
Evaluate and select appropriate AI technologies, tools, and frameworks to achieve the set requirements.
Design and develop AI architectures and algorithms to support complex AI solutions.
Lead the development and implementation of AI models, using industry best practices in AI and machine learning.
Up to date with the latest advancements in AI technologies and identify opportunities for Client applications.

Take the first step towards your dream career

Every Dell Technologies team member brings something unique to the table. Here’s what we are looking for with this role:

Essential Requirements

13-22 years of working as an AI architect, data scientist, or related role with strong knowledge of Machine Learning, Deep Learning, and Natural Language Processing (NLP) techniques.
Proficiency in programming languages such as Python, Java, or C++, and familiarity with popular AI libraries and frameworks (e.g., TensorFlow, PyTorch). Good understanding of cloud computing platforms (e.g., Azure, AWS) and experience deploying AI models on these platforms.
Excellent problem-solving and analytical skills, with the ability to break down complex problems into actionable components.
Strong communication and collaboration skills, with the ability to work effectively within cross-functional teams.
Ability to stay updated with the latest advancements in AI technologies, frameworks, and platforms.

Desirable Requirements

Bachelor’s or Master’s degree in Computer Science with specialization in AI, Data Sciences
Knowledge of working in Scaled Agile environments.

Who we are

We believe that each of us has the power to make an impact. That’s why we put our team members at the center of everything we do. If you’re looking for an opportunity to grow your career with some of the best minds and most advanced tech in the industry, we’re looking for you.

Dell Technologies is a unique family of businesses that helps individuals and organizations transform how they work, live and play. Join us to build a future that works for everyone because Progress Takes All of Us.

Application closing date: July 2nd, 2024

Dell Technologies is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. Read the full Equal Employment Opportunity Policy here.

Source link

05Jun

Real World Evidence Research Analyst at Novartis – Dublin (Novartis Global Service Center (NGSC))

Job Description Summary

-Develop, support and provides input for deliverables aligned with HEOR and access requirements. -Executes the country overall pricing strategy, including discounts, rebates andother pricing mechanisms for all new medicines.

Job Description

Major accountabilities:

Performs quality control (QC) checking / proof reading of assigneddocuments including projects being handled by junior team members, tomeet customer expectations.
Maintains audit, SOP and training compliance

Key performance indicators:

Preparation and coordination of the assigned documents, meeting the set quality standards and on time for submission.
Completion ofan adequate number of HEOR projects.
Pricing speed and efficiency: time between approval and end of P&R process; effective maintenance of prices over time; satisfaction ofinternal customers.

Minimum Requirements:
Work Experience:

Market and customer intelligence.

Skills:

Business Dashboards.
Business Management.
Category Management.
Data Analysis.
Data Analytics.
Databases.
Finance.
Health Economics.
Health Technology Assessment (Hta).
Public Health.
Quality Center (Qc).
Quality Control.
Sql (Structured Query Language).

Languages :

Skills Desired

Business Dashboards, Business Management, Category Management, Data Analysis, Data Analytics, Databases, Finance, Health Economics, Health Technology Assessment (HTA), Public Health, Quality Center (Qc), Quality Control, Sql (Structured Query Language)

Source link

05Jun

Senior Machine Learning Engineer at BlackStone eIT – Egypt – Remote

BlackStone eIT, a leading computer software company, is seeking a highly skilled and experienced Senior Machine Learning Engineer to join our dynamic team. As a Senior Machine Learning Engineer at BlackStone eIT, you will be responsible for designing, developing, and deploying state-of-the-art machine learning models and algorithms. You will work closely with cross-functional teams to analyze complex data, identify opportunities for applying machine learning techniques, and lead the development and implementation of solutions to solve challenging business problems.

In this role, you will have the opportunity to work on cutting-edge projects, collaborate with industry experts, and make significant contributions to the company’s success. We are seeking individuals who are passionate about machine learning, possess strong analytical and problem-solving skills, and have a proven track record in delivering successful machine learning solutions.

Requirements

Experience: 3-5 years
Proven experience as a Machine Learning Engineer or similar role
Understanding of data structures, data modeling and software architecture
Deep knowledge of math, probability, statistics and algorithms
Ability to write robust code in Python
Outstanding analytical and problem-solving skills
Familiarity with machine learning frameworks (like Tensorflow or PyTorch) and libraries (like scikit-learn)
Excellent communication skills

Ability to work in a team

BSc in Computer Science, Mathematics or a similar field; a Master’s degree is a plus

Responsibilities

· Study and transform data science prototypes
· Research and implement appropriate ML algorithms and tools
· Develop machine learning applications according to requirements
· Select appropriate datasets and data representation methods
· Run machine learning tests and experiments
· Deploy the trained models and build REST APIs
· Perform statistical analysis and fine-tuning using test results
· Train and retrain systems when necessary
· Extend existing ML libraries and frameworks
· Study project Requirements
· write technical details at BRDs
· Keep abreast of developments in the field

Benefits

Paid Time Off
Work From Home
Performance Bonus
Training & Development

Source link

05Jun

Find Similar Research Papers In 1 Minute with AI and Python! | by Hasan Aboul Hasan

An obstacle most people face when writing academic research papers is finding similar papers easily. I myself faced this problem because it takes too much time to do so.

So, I built a Python script powered by AI to search for related keywords in an input abstract and then get related abstracts on Arxiv.

ArXiv is an open-access archive for nearly 2.4 million academic articles in physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.

Why choose Arxiv?

Simply because it already contains a lot of articles and a free built-in API, which makes it easier to access any article’s abstract directly. This bypasses the need to search the web using paid APIs for articles and then check if they contain an abstract, and if they do use an HTML parser to find the abstract — TOO MUCH WORK 🫠

http://export.arxiv.org/api/query?search_query=all:{abstract_topic}&start=0&max_results={max_results}

Enter your abstract’s topic instead of {abstract_topic}, and how many search results you want instead of {max_results}. Then, paste it into the web browser, and it’ll generate an XML file containing the summary (which is the abstract) and some other details about the article, like its ID, the authors, etc…

The idea is simple; here’s the workflow:

1- Extract from the input abstract the top 5 keywords (topics) that are most representative of its content.

2- Call the API on each of the 5 keywords we extracted

3- Analyze the results and check the similarity of these abstracts to the input abstract.

To use Arxiv’s API, we need a keyword to search for, so we need to extract the top keywords present in the input abstract. You can either use built-in libraries like nltk or spacy, but when I tried using them, the results were not as expected and not so accurate.

So, to get better results, I used OpenAI’s GPT-4 (you can use Gemini if you prefer it), gave it a power prompt, and generated optimal results. Here’s the code:

def extract_keywords(abstract):
# Constructing a prompt for the language model to generate keywords from the abstract
prompt = f"""
### TASK
You are an expert in text analysis and keyword extraction. Your task is to analyse an abstract I'm going to give you
and extract from it the top 5 keywords that are most representative of its content. Then you're going to generate
them in a JSON format in descending order from the most relevant to the least relevant.### INPUTS
Abstract: {abstract}
### OUTPUT
The output should be in JSON format. Here's how it should look like:
[
{{"theme": "[theme 1]"}},
{{"theme": "[theme 2]"}},
{{"theme": "[theme 3]"}},
{{"theme": "[theme 4]"}},
{{"theme": "[theme 5]"}}
]
"""
# Creating an instance of the language model using SimplerLLM
llm_instance = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4")
# Generating response from the language model
response = llm_instance.generate_text(user_prompt=prompt)
# Attempting to parse the response as JSON
try:
response_data = json.loads(response)
return json.dumps(response_data, indent=2)
except json.JSONDecodeError:
# Returning an error message if the response is not valid JSON
return json.dumps({"error": "Invalid response from LLM"}, indent=2)

This function uses SimplerLLM, which facilitates the process of calling OpenAI’s API without writing tedious code. In addition, it makes it very easy for you to use Gemini’s API instead of OpenAI by only changing the name of the LLM instance like this:

llm_instance = LLM.create(provider=LLMProvider.GEMINI, model_name="gemini-pro")

Very nice, right?😉

Back to our code.

The power prompt I crafted is the main engine of the above function, so if it weren’t efficiently crafted, the code wouldn’t work at all.

### TASK
You are an expert in text analysis and keyword extraction. Your task is to analyse an abstract I'm going to give you and extract from it the top 5 keywords that are most representative of its content. Then you're going to generate them in a JSON format in descending order from the most relevant to the least relevant.
### INPUTS
Abstract: {abstract}
### OUTPUT
The output should be in JSON format. Here's how it should look like:
[
{{"theme": "[theme 1]"}},
{{"theme": "[theme 2]"}},
{{"theme": "[theme 3]"}},
{{"theme": "[theme 4]"}},
{{"theme": "[theme 5]"}}
]

As you can see, it is a detailed prompt that is very well crafted to generate a specific result. By becoming a Prompt Engineer, you can learn how to craft similar prompts yourself.

After running the above function, we’ll have a JSON-formatted output containing 5 keywords. So, we need to search for abstracts for each of the 5 keywords, and we’ll do that using Arxiv’s API.

However, when you run Arxiv’s API call, you get an XML file like this:

So, to easily extract the ID and summary (abstract), we’ll import xml.etree.ElementTree that helps us easily navigate and extract information from XML-formatted text.

def get_abstracts(json_input):
input_data = json.loads(json_input)
all_summaries_data = []
for theme_info in input_data:
keyword = theme_info['theme']
max_results = 1  # Number of results to fetch for each keyword
# Constructing the query URL for the arXiv API
url = f"http://export.arxiv.org/api/query?search_query=all:{keyword}&start=0&max_results={max_results}&sortBy=submittedDate&sortOrder=descending"response = requests.get(url)
if response.status_code == 200:
root = ET.fromstring(response.text)
ns = {'atom': 'http://www.w3.org/2005/Atom'}
summaries_data = []
for entry in root.findall('atom:entry', ns):
arxiv_id = entry.find('atom:id', ns).text.split('/')[-1]
summary = entry.find('atom:summary', ns).text.strip()
summaries_data.append({"ID": arxiv_id, "abstract": summary, "theme": keyword})
all_summaries_data.extend(summaries_data[:max_results]) 
else:
print(f"Failed to retrieve data for theme '{keyword}'. Status code: {response.status_code}")
json_output = json.dumps(all_summaries_data, indent=2)
return json_output

In the above function, we’re looping over the 5 keywords we generated, and for each one, we’re calling the API, extracting the ID and abstract from the XML, saving them in a list, and formatting this list into JSON (easier to read).

How can we check for similarity between 2 abstracts? Again, AI 🤖

We’ll be using SimplerLLM again to create an OpenAI instance and a power prompt to perform the analysis and similarity checking.

def score_abstracts(abstracts, reference_abstract):
new_abstracts = json.loads(abstracts)
scored_abstracts = []
for item in new_abstracts:
prompt = f"""
### TASK
You are an expert in abstract evaluation and English Literature. Your task is to analyze two abstracts
and then check how similar abstract 2 is to abstract 1 in meaning. Then you're gonna generate
a score out of 10 for how similar they are. 0 being have nothing in common on different topics, and 10
being exactly the same. Make sure to go over them multiple times to check if your score is correct.
### INPUTS
Abstract 1: {reference_abstract}
Abstract 2: {item["abstract"]}
### OUTPUT
The output should be only the number out of 10, nothing else.
"""
llm_instance = LLM.create(provider=LLMProvider.OPENAI, model_name="gpt-4")
# Generating the similarity score from the language model
response = llm_instance.generate_text(user_prompt=prompt)# Extracting the score from the response and handling potential errors
try:
score = int(response)
perfect_match = score == 10
except ValueError:
score = 0
perfect_match = False
scored_abstracts.append({
"ID": item["ID"],
"theme": item["theme"],
"score": score,
"perfect_match": perfect_match
})
return scored_abstracts

We’re gonna use the JSON output we got from the function above containing all abstracts and IDs, and we’ll loop over each abstract, run the power prompt on it with the input abstract, and get the similarity score.

As mentioned above, the power prompt is a crucial part of the function; if it is bad, the code won’t work. So, read this article to improve your prompt crafting skills.

After getting the score, if it is 10/10, then the abstract we found is a perfect match for the input abstract.

To run the codes, you’re gonna have to create a .env file that contains your OpenAI API key or Gemini key like this:

And, of course, you’ll need to enter your input abstract to run the code on it:

# MAIN SCRIPT
reference_abstract = """
YOUR_ABSTRACT
"""
json_data = extract_keywords(reference_abstract)   
abstracts = get_abstracts(json_data)
data = json.dumps(score_abstracts(abstracts, reference_abstract),indent=2)
print(data)

Plus, don’t forget to install all the necessary libraries, which you can install by running this in the terminal:

pip install requests simplerllm

Get Code

Now, although the script we created is working properly, why don’t we improve it a little?

The search for abstracts is limited to only Arxiv, and maybe there is a very similar copy to your abstract that is not available on Arxiv but on a different website. So, why don’t we tweak the code a little and make it search on Google directly for similar abstracts, and then turn it into a tool with a nice UI?

To do that, we’ll only need to update the get_abstracts function:

# Search for related abstracts according to keywords and get link and content
def get_google_results(json_input):
keywords = json.loads(json_input)
search_results = []
for theme_info in keywords:
keyword = theme_info['theme']
query = f"{keyword} AND abstract AND site:edu AND -inurl:pdf"
result = search_with_value_serp(query, num_results=1)
for item in result:
try:
url = str(item.URL)
load = load_content(url)  # Assumes load_content is a function that fetches content from the URL
content = load.content
search_results.append({"Link": url, "Content": content, "theme": keyword})
except Exception as e:
print(f"An error occurred with {url}: {e}")
continue
json_output = json.dumps(search_results, indent=2)
return json_output

As you can see, the function now searches on Google using the search_with_value_serp The function is integrated into the SimplerLLM library. Then, I used the load_content function, which is also in the SimplerLLM library. This makes it very easy to access the link’s title and content.

In addition, you have to add your VALUE_SERP_API_KEY in the .env file. This is how it will look like:

Keep in mind that some keywords may not have an abstract similar to it on Google so that the search would return nothing. Therefore, you might get less than 5 links for similar abstracts.

The code above is only a prototype showing a headstart of this function. You can improve it to get better results, design a nice User Interface for it, and make a fully functional tool out of it. Then, you can build a SAAS business based on this tool.

In this way, you’ll have a monthly recurring income from these tools you built! Pretty nice, huh 😉

Remember, if you have any questions, make sure to drop them below in the comments section or on the forum.

Source link

05Jun

How Many Steps Forward? – European Law Blog

5 June 2024/
By Alessandro Marcia

Blogpost 30/2024

The history of EU institutions is marked by a long list of statements and political initiatives that endorse the legal claims of the LGBTIQA+ community (see, for instance, Kollman and Bell). Over the past decades, these have gradually been mainstreamed within different areas of EU law. Particularly, the current EU legislative term (2019-2024) has witnessed an increased commitment of EU institutions towards the LGBTIQA+ community. This is not only shown by the numerous and recurrent Resolutions of the European Parliament on this topic (see EPRS). It is also evident from several political and legislative initiatives that have been introduced over recent years, which (attempt to) intervene in diverse fields of EU law that are considered as relevant to individuals that identify as LGBTIQA+.

Meanwhile, most EU law scholars focus their research on narrow areas, such as non-discrimination (mainly, in the field of employment) and free movement (of same-sex couples and their children). In other words, LGBTIQA+ issues never appear as the starting point of the analysis but rather as an incidental reference in the context of other research topics (on this point, see Belavusau). This piece aims to provide a deeper overview of the EU’s direct commitment towards the LGBTIQA+ community during the EU legislative term that is now coming to an end. It will thus retrace the different political, legislative, and judicial developments occurred, which have been marked as relevant for, or targeted to, LGBTIQA+ persons. Some contextual challenges of EU law vis-à-vis LGBTIQA+ matters will also be highlighted.

An EU Strategy for LGBTIQA+ Equality

Looking back at the very beginning of this EU legislative term, on 12 December 2020, the European Commission adopted, by way of a Communication, the EU LGBTIQ Equality Strategy (hereinafter, ‘the Strategy’). Unsurprisingly, the adoption of the Strategy comes during the EU legislative term in which the first-ever Commissioner for Equalitywas appointed. Likewise, a specific unit working on ‘non-discrimination and LGBTIQ’ matters has been established in the European Commission. Prior to the publication of the Strategy, some had argued that the EU is equipped with adequate legal bases to intervene in the fields of non-discrimination and equality for LGBTIQA+ persons. These are, for instance, the non-discrimination clause in Article 19 TFEU, or Article 81(3) TFEU as regards aspects of family law with cross-border implications. Yet, the potential of these provisions had been restrained by the absence of an overarching and coherent approach. The Strategy seems to have, at least in principle, addressed this gap.

Despite its non-binding nature, the Strategy has been considered a significant development for LGBTIQA+ persons in the EU for the following three main reasons. First, the Strategy has a strong symbolic value. It represents the first instrument in the history of EU integration that targets specifically the LGBTIQA+ community. Second, the Strategy provides a comprehensive approach, as it addresses the topic from different angles. Indeed, it is built on four major axes: i) tackling discrimination against LGBTIQ people; ii) ensuring LGBTIQ people’s safety; iii) building LGBTIQ inclusive societies; iv) leading the call for LGBTIQ equality around the world. Last, the Strategy is very detailed. It precisely identifies legislative and non-legislative initiatives to be achieved within a fixed timeline, thus serving as a planning instrument for the Commission’s action.

More recently, a survey conducted by the EU Fundamental Rights Agency shows that while there are signs of slow and gradual progress, discrimination against LGBTIQA+ persons remain dramatically high. This is also evident in the ILGA-Europe’s annual rainbow map. As the end date of the Commission’s Strategy is approaching and EU elections are coming up, the question remains whether the next European Commission will develop a new instrument for LGBTIQA+ equality; or, as it will be argued below, try at least to fulfil the missed objectives of the current Strategy.

Recognition of same-sex parents and their children

On 7 December 2022, the European Commission proposed the Equality Package (hereinafter, ‘the Package’), a proposal for a Regulation to harmonise rules concerning parenthood in cross-border situations. One of the key aspects of the proposal is that once parental bonds are established in one Member State, these must be automatically recognised everywhere in the EU (for a deeper analysis of the Package, see Tryfonidou; see also Marcia).

The mutual recognition of same-sex parents and their children had also been addressed, just a year earlier, by the Court of Justice (CJEU) in the Pancharevo case (C-490/20). The dispute concerned a same-sex couple, a Bulgarian and a UK national. They gave birth to S.D.K.A. in Spain, where the couple had been married and was legally residing. Spain thus issued a birth certificate, as Spanish law recognises same-sex parenthood. Yet, Bulgarian authorities refused to issue a passport/ID for S.D.K.A since Bulgarian law does not recognise same-sex parenthood. This led to a preliminary question referred to the CJEU, namely whether such a refusal constituted a breach of EU free movement rights (notably, Articles 20 and 21 TFEU and Directive 2004/38). The Court ruled that the refusal to issue a passport or ID to S.D.K.A. would indeed alter the effectiveness of her right to move and reside freely within the Union. National authorities are thus required to recognise the parental bonds legally established in another Member State. This obligation, however, applies only for the purposes of the exercise of the right to free movement, while Member States remain free (not) to recognise same-sex parenthood within their internal legal orders (for a full overview of the judgment, see Tryfonidou; see also De Groot).

Despite the obligation stemming from this judgment, in practice, same-sex parents often experience long and expensive proceedings before national authorities. Indeed, the Commission stated that the key objective of the Equality Package is to reduce times, costs, and burdens of recognition proceedings for both families and national judicial systems. The proposed regulation would, in other words, ‘automatise’ the requirements introduced by the Court in Pancharevo (for the purposes of the exercise of the right to free movement). However, one of the biggest challenges to the adoption of the Package is its legal basis: Article 81(3) TFEU. This requires the Council to act unanimously under a special legislative procedure, after obtaining the consent of the European Parliament. If reaching unanimity among the 27 Member States is generally challenging, this becomes even more complex when the file concerns a topic on which Member States’ sensibilities and approaches differ dramatically. Indeed, some national governments, such as the Italian one, have already declared their unwillingness to support the Commission’s initiative (see, for instance, Marcia).

Combatting hate crime and hate speech

Current EU law criminalises hate crime and hate speech only if related to the grounds of race and ethnic origin. Yet, national laws differ significantly when it comes to such conduct in relation to sex, sexual orientation, age, and disability (see EPRS). To implement the Strategy’s objective of ‘ensuring LGBTIQ people’s safety’, on 9 December 2021, the Commission proposed to include hate crime and hate speech against LGBTIQA+ persons within EU crimes. This initiative requires a two-step procedure. First, Article 83(1) TFEU contains a list of areas of ‘particularly serious crime’ with a ‘cross-border dimension’ that justify a common action at EU level. This list can only be updated by a Council decision, taken by unanimity, after receiving the consent of the European Parliament. Second, once hate crime and hate speech have been included in this list, the Commission can follow up with a proposal for a directive to be adopted through the ordinary legislative procedure. This would establish minimum rules concerning the definition of criminal offences and sanctions (for a full analysis of the proposal, see Peršak).

The European Parliament addressed the problem of hate crime and hate speech against LGBTIQA+ persons on different occasions. Accordingly, in a Resolution of 18 January 2024, the Parliament positively welcomed the Commission’s initiative and urged the Member States to make progress on it. The Justice and Home Affairs Council of 3-4 March 2022 had previously discussed the proposal, concluding that ‘a very broad majority was in favour of this initiative’. Yet, the file has never been scheduled for further discussion or vote since then. Significantly, not even the Belgian Presidency of the Council managed to make any progress, despite the declared intention to make of LGBTIQA+ equality a priority during the country’s six-month lead of the institution. The Commission’s proposal is therefore far from being accomplished, with unanimity being – once again – the greatest challenge to overcome.

The return to EU values

In December 2022, the European Commission referred Hungary to the Court of Justice in the context of an infringement procedure (C-769/22). The contested legislation, approved by the Hungarian Parliament in June 2021, was depicted as a tool to combat paedophilia. As highlighted by the Commission and several NGOs, however, the law directly targets the LGBTIQA+ community. Indeed, it limits minors’ access to content that ‘promote(s) divergence from self-identity corresponding to sex at birth, sex change or homosexuality’ and bans or limits media content that concerns homosexuality or gender identity. It also introduces a set of penalties for organisations that breach these rules (see Bonelli and Claes).

During the past decade, Viktor Orbán made Hungary very (un)popular for the multiple violations of the rule of law and fundamental rights, including attacks to the LGBTIQA+ community. Thus, the introduction of – another – infringement procedure against Hungary seems business as usual. However, EU law scholars have immediately pointed out how this could be a landmark case. For the first time, the Commission has directly relied on Article 2 TEU, proposing a direct link between LGBTIQA+ equality and the ‘founding values’ of the EU. If there is no doubt that this is of high symbolical and political importance, questions have been raised as regards the ‘added legal value’ of article 2 TEU. In other words, the judicial mobilisation of Article 2 TEU does not seem to bring more legal benefits than an infringement procedure based only on the Charter of Fundamental Rights and other provisions of EU law.

It must be noted that the Commission’s reliance on EU values has encouraged a significant political and judicial mobilisation. In an unprecedented move, the European Parliament and fifteen Member States have asked to intervene before the CJEU. This is the first time in the history of EU integration that so many Member States have asked to intervene in support of the Commission’s action against another Member State. For some of them, including France and Germany, this is the first-ever intervention in a case related to fundamental rights’ protection (see Chopin and Leclerc). However, it should also be underlined that the group of countries that participate in the lawsuit has a markedly Western component. This clearly shows the existence (and the persistence) of an East-West divide when it comes to the controversial topic of LGBTIQA+ rights’ protection. Therefore, considering the unanimity requirements mentioned above, even the high participation the Member States to the infringement procedure seems insufficient to advance coherent action at EU level.

Conclusions

EU institutions, in particular the Commission and the Parliament, seem increasingly committed to offer more robust protection to LGBTIQA+ persons. This is shown by the first-ever EU comprehensive Strategy and the related legislative proposals, as well as the numerous calls of the European Parliament. Whereas this is clearly positive for the visibility and legal claims of the LGBTIQA+ community, the legal outcome appears however limited. All legislative proposals are blocked by the failure to reach unanimity in the Council. Indeed, the only changes occurred in terms of legal obligations seem to stem from the CJEU ruling in case Pancharevo (and other minor developments related to anti-discrimination case-law). If it is true that, in principle, the EU is equipped with good legal bases to legislate in the fields of non-discrimination and equality for LGBTIQA+ persons, the feasibility of EU intervention seems challenged by the type of legislative procedure provided and the unanimity requirement. Therefore, further research is needed to identify the actual potential of EU competences to deal with the legal claims advanced by the LGBTIQA+ community.

The pending ‘EU values case’ (C-769/22 Commission v Hungary) shows the existence of highly divergent cultural and political views between the Member States, especially when it comes to issues such as LGBTIQA+ equality which seemingly continues to be controversial. At the end of this week (6-9 June 2024), EU citizens will be called to elect the new Members of the European Parliament (MEPs). As current polls show, far-right parties are likely to gain an increased number of seats. Accordingly, this could lead to a more conservative composition of the next European Commission. These dynamics may constitute a significant shift in the commitment of these institutions to enhance LGBTIQA+ rights’ protection. Indeed, the European Parliament and the European Commission are considered two early [LGBTIQA+] movement allies, as they have been supporting the claims of this community on numerous occasions before and during this term. Therefore, the question is whether these potential political changes will result in a softening of their commitment. If so, the CJEU may remain the only and last resort for LGBTIQA+ individuals at EU level.

Source link

05Jun

Director, Venture Capital – Artificial Intelligence at Condé Nast – San Jose, CA

Our vision is to transform how the world uses information to enrich life for all.

Micron Technology is a world leader in innovating memory and storage solutions that accelerate the transformation of information into intelligence, inspiring the world to learn, communicate and advance faster than ever.

We are seeking a strong, well rounded leader who works well cross functionality to craft and implement our strategy on Artificial Intelligence!

Our company manages a $300M Venture Capital Fund focusing on the AI competence clusters in Silicon Valley, Tel Aviv, New York and Munich.

Responsibilities included but not limited to:

Act as focal point for various internal AI related projects, resources and how they fit in the overall above global strategy.
Sources Venture Capital deals in the USA and owns the end to end process from deal sourcing, due-diligence, to investing in Seed, A and B Rounds.
Championing and maintaining a close link to companies Micron has invested in and completing our strategic objectives related to our investment strategy.
Drive alignment/sync/execution for our AI / ML efforts across the company
Drives and supports a range of critical initiatives around artificial and machine learning technology, partnering with both business and technology customers
Understands ongoing developments in the broader artificial intelligence ecosystem and has ideally domain expertise in 1-2 of the following verticals: autonomous vehicles, manufacturing or data center systems / Software.
Supports creation of senior executive briefings for the Executive Team of Micron, the Board and our Investment Committee and other forums as required
Communicates grounded recommendations to senior leaders and their broader organizations, influencing upwards and laterally to drive organizational change and alignment
Develops and implements dashboards to supervise the progress of our ecosystem efforts and Venture Capital investments
Mentors less expert members of the team.

Preferred Skills:

Knowledge of artificial intelligence and machine learning landscape, combined with strong business consulting competence, enabling the identification, design and deployment of efficient AI / machine learning business opportunities
Strong focus on Organic and Inorganic growth opportunities that take Micron beyond our current memory centric semiconductor business
Strong attention to detail and focus: your work will be used at the highest levels of decision-making and customers will expect to use your information without qualification or second-guessing its accuracy
Strong collaboration ethic and motivational skills to translate analysis into action
Productive with solid track record to lead and develop others
Ability to work with senior executives and provide high quality responses on a short timelines
Comfort level with changing needs and priorities, ambiguity and information overload.
Strong communication, interpersonal and advocacy skills

Education:

Master in computer science, computer/electrical engineering, statistics, physics, mathematics or a related field required.
MBA from a Tier 1 University
Ideally dual education: Foreign / USA and or foreign work experience for 2-3 years

The ideal candidate should have Startup Experience in senior position or ex co-founder, alternatively business experience in marketing, strategy or a consulting background from major firms such as BCG, McKinsey etc.

We will consider candidates with a strong financial background who can demonstrate a good technical in-depth understanding of at least vertical mentioned above.

The US base salary range that Micron Technology estimates it could pay for this full-time position is:

$165,000.00 – $332,000.00

Our salary ranges are determined by role, level, and location. The range displayed on each job posting reflects the minimum and maximum target for new hire salaries of the position across all US locations. Within the range, individual pay is determined by work location and additional job-related factors, including knowledge, skills, experience, tenure and relevant education or training. The pay scale is subject to change depending on business needs. Your recruiter can share more about the specific salary range for your preferred location during the hiring process. Additional compensation may include benefits, discretionary bonuses and equity.

As a world leader in the semiconductor industry, Micron is dedicated to your personal wellbeing and professional growth. Micron benefits are designed to help you stay well, provide peace of mind and help you prepare for the future. We offer a choice of medical, dental and vision plans in all locations enabling team members to select the plans that best meet their family healthcare needs and budget. Micron also provides benefit programs that help protect your income if you are unable to work due to illness or injury, and paid family leave. Additionally, Micron benefits include a robust paid time-off program and paid holidays. For additional information regarding the Benefit programs available, please see the Benefits Guide posted on micron.com/careers/benefits.

Micron is proud to be an equal opportunity workplace and is an affirmative action employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, age, national origin, citizenship status, disability, protected veteran status, gender identity or any other factor protected by applicable federal, state, or local laws.

To learn about your right to work click here.

To learn more about Micron, please visit micron.com/careers

US Sites Only: To request assistance with the application process and/or for reasonable accommodations, please contact Micron’s People Organization at

hr**********@mi****.com

or 1-800-336-8918 (select option #3)

Micron Prohibits the use of child labor and complies with all applicable laws, rules, regulations, and other international and industry labor standards.

Micron does not charge candidates any recruitment fees or unlawfully collect any other payment from candidates as consideration for their employment with Micron.

Source link

04Jun

Senior Data Engineer GCP at Gen – Prague, Czech

Gen is a global company powering Digital Freedom through consumer brands including Norton, Avast, LifeLock, Avira, AVG, Reputation Defender, and CCleaner. Our combined heritage is rooted in providing safety for the first digital generations. We bring leading technology solutions in cybersecurity, privacy, and identity protection to more than 500 million users in 150 countries so they can live their digital lives safely, privately, and confidently today and for generations to come.

About Us:

We are a dynamic, small team dedicated to developing a platform for Online Controlled Experiments. Our mission is to empower everyone in the company—developers, copywriters, designers, marketers, and more—to test their changes and measure the impact on our customers.

About the role:

We are seeking a talented data engineer to join our head office in Prague. In this role, you will contribute to the development and maintenance of the platform that consists of real-time and batch data processing, statistical data evaluation, and web-based user interface. The system is deployed on Google Cloud Platform (GCP) and utilize GCP’s native technologies. We prioritize automation, rigorous testing, code reviews, and good system monitoring.

The person will be required to go to the office couple of times a week.

What You Will Do in This Role:

Develop and extend existing data integrations to enhance platform capabilities
Analyze data and implement metrics based on specified requirements
Collaborate with the Central Data Office team to acquire and integrate data sources
Monitor and troubleshoot data pipelines to ensure seamless operation
Utilize a modern technology stack within Google Cloud Platform

What You Will Need to Be Successful in This Role

Proficiency in SQL and Python
A responsible approach to system development and maintenance
Expertise in sustainable development practices, including automated testing, version control, code reviews, and deployment pipelines
Effective communication skills, both spoken and written

What We Welcome

Prior experience with cloud platforms (Google Cloud, AWS, Azure)
Experience with column-oriented database
Knowledge of Typescript or JavaScript
Familiarity with Scala
Basic understanding of Docker and Kubernetes

Why Join Us:

Join a passionate team committed to innovation and excellence. You will have the opportunity to work with state-of-the-art technology, contribute to meaningful projects, and help shape the future of our platform. We value collaboration, continuous learning, and a positive work environment.

What We Offer:

Flexible working hours
Hybrid working – 2-3 days in the office
Space for personal and professional growth
Pleasant work environment, gym, music room, library
The chance to join a major global tech company listed on the S&P 500
Tuition reimbursement for job-related courses
Sustainability allowance
Mac/Windows laptop and mobile phone

#LI-DNI

Gen is proud to be an equal-opportunity employer. We celebrate diversity and are committed to creating an inclusive and accessible environment for all employees. All employment decisions are based on merit, experience, and business needs, without regard to race, color, national origin, age, religion, sex, pregnancy (including childbirth or related medical conditions), genetic information, disability (physical or mental), medical condition, marital status, sexual orientation, gender identity or gender expression, military or veteran status, or any other consideration made unlawful by federal, state, or local law. Gen strictly prohibits unlawful discrimination based on such protected characteristics and seeks to recruit the most talented candidates from diverse cultures and backgrounds.

We also consider employment-qualified individuals with arrest and conviction records. In addition, we will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. Learn more about pay transparency.

Gen complies with all anti-discrimination laws.

To conform to U.S. export control regulations, applicant should be eligible for any required authorizations from the U.S. Government.

Source link

04Jun

Senior Data Quality Specialist at M&T Bank – Buffalo, NY

Location: Full-time remote option, east coast only.

OVERVIEW:

Provides Data Quality measurement, evaluation and assessment services relating to the new Customer Data Management Platform. Creates and documents data quality policies and procedures and supports them through programmatic implementation. Provides process evaluation and performance monitoring solutions through the management and execution of data quality processing.

POSITION RESPONSIBILITIES:

Monitor, identify, assess, document, and communicate potential quality issues relative to the pillars of Data Quality (Completeness, Uniqueness, Timeliness, Accuracy, Conformity)
Evaluate large datasets for performing advanced data analysis to assess initial data quality
Perform advanced data profiling and discovery to identify trends and anomalies related to data quality
Identify risk-related issues needing escalation to management and to stakeholders
Research and determine the scope of data quality issues to identify steps to remediate them
Determine the level of business impact due to data quality issues
Produce quality dashboards to measure all aspects of the data quality pillars
Design and implement DQ controls based on ETL and the importance of data elements
Develop and Document enterprise data quality policies and procedures in partnership with leadership
Work with stakeholders to establish and monitor service level agreements (SLA’s), communication protocols, and data quality assurance policies
Review code created by developers, ensure alignment to data quality rules and related data definitions
Coordinate critical data quality management functions across cross functional teams
Develop and maintain enterprise data quality policies and procedures in partnership with leadership
Comply with the enterprise data quality policy and standards
Comply with Financial Services, Banking transactions and regulatory mandates for customer data
Mentor and assist less experienced team members
Complete other related duties as assigned.

MINIMUM QUALIFICATIONS REQUIRED:

Bachelor’s degree and a minimum of 5 years related experience, or in lieu of a degree, a combined minimum of 9 years higher education and/ or work experience, including a minimum of 5 years related experience
Experience working in Data Governance and Data Quality programs at a large, complex organizations
Expert in Structured Query Language (SQL) – Advanced SQL
Experience in one or more database technologies
Experience in the collection and management of metadata

IDEAL QUALIFICATIONS PREFERRED:

5 or more years’ experience in Financial Services
Experience using data quality and data governance tools
Experience developing, documenting and applying data quality improvement metrics
Demonstrated Project Management/Business Analysis skills
Experience with data CDPs, Customer 360 and/or Amperity
Experience designing and developing dashboards
Strong Communication Skills and proven ability to Influence Change

M&T Bank is committed to fair, competitive, and market-informed pay for our employees. The pay range for this position is $97,869.52 – $163,115.87 Annual (USD). The successful candidate’s particular combination of knowledge, skills, and experience will inform their specific compensation.Location:Buffalo, New York, United States of America

Source link

1 … 271 272 273 274 275 … 310