Job title: Product Manager X Site (12 Months)
Company: ESB
Job description: –business unit innovation in ESB – facilitating the development of new lines of business, products and services, aiding in the… of business (Product) from nominated concept managed through the innovation stage-gate process to commercial delivery in…
Expected salary: €46000 – 54000 per year
Location: Dublin
Job date: Fri, 30 Aug 2024 05:52:21 GMT
Apply for the job now!
Logistic Regression, Explained: A Visual Guide with Code Examples for Beginners | by Samy Baladram | Sep, 2024
CLASSIFICATION ALGORITHM
While some probabilistic-based machine learning models (like Naive Bayes) make bold assumptions about feature independence, logistic regression takes a more measured approach. Think of it as drawing a line (or plane) that separates two outcomes, allowing us to predict probabilities with a bit more flexibility.
Logistic regression is a statistical method used for predicting binary outcomes. Despite its name, it’s used for classification rather than regression. It estimates the probability that an instance belongs to a particular class. If the estimated probability is greater than 50%, the model predicts that the instance belongs to that class (or vice versa).
Throughout this article, we’ll use this artificial golf dataset (inspired by [1]) as an example. This dataset predicts whether a person will play golf based on weather conditions.
Just like in KNN, logistic regression requires the data to be scaled first. Convert categorical columns into 0 & 1 and also scale the numerical features so that no single feature dominates the distance metric.
# Import required libraries
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from sklearn.preprocessing import StandardScaler
import pandas as pd
import numpy as np# Create dataset from dictionary
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', 'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy', 'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast', 'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0, 72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0, 88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False, True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)
# Prepare data: encode categorical variables
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Yes').astype(int)
# Rearrange columns
column_order = ['sunny', 'overcast', 'rainy', 'Temperature', 'Humidity', 'Wind', 'Play']
df = df[column_order]
# Split data into features and target
X, y = df.drop(columns='Play'), df['Play']
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)
# Scale numerical features
scaler = StandardScaler()
X_train[['Temperature', 'Humidity']] = scaler.fit_transform(X_train[['Temperature', 'Humidity']])
X_test[['Temperature', 'Humidity']] = scaler.transform(X_test[['Temperature', 'Humidity']])
# Print results
print("Training set:")
print(pd.concat([X_train, y_train], axis=1), '\n')
print("Test set:")
print(pd.concat([X_test, y_test], axis=1))
Logistic regression works by applying the logistic function to a linear combination of the input features. Here’s how it operates:
- Calculate a weighted sum of the input features (similar to linear regression).
- Apply the logistic function (also called sigmoid function) to this sum, which maps any real number to a value between 0 and 1.
- Interpret this value as the probability of belonging to the positive class.
- Use a threshold (typically 0.5) to make the final classification decision.
The training process for logistic regression involves finding the best weights for the input features. Here’s the general outline:
- Initialize the weights (often to small random values).
# Initialize weights (including bias) to 0.1
initial_weights = np.full(X_train_np.shape[1], 0.1)# Create and display DataFrame for initial weights
print(f"Initial Weights: {initial_weights}")
2. For each training example:
a. Calculate the predicted probability using the current weights.
def sigmoid(z):
return 1 / (1 + np.exp(-z))def calculate_probabilities(X, weights):
z = np.dot(X, weights)
return sigmoid(z)
def calculate_log_loss(probabilities, y):
return -y * np.log(probabilities) - (1 - y) * np.log(1 - probabilities)
def create_output_dataframe(X, y, weights):
probabilities = calculate_probabilities(X, weights)
log_losses = calculate_log_loss(probabilities, y)
df = pd.DataFrame({
'Probability': probabilities,
'Label': y,
'Log Loss': log_losses
})
return df
def calculate_average_log_loss(X, y, weights):
probabilities = calculate_probabilities(X, weights)
log_losses = calculate_log_loss(probabilities, y)
return np.mean(log_losses)
# Convert X_train and y_train to numpy arrays for easier computation
X_train_np = X_train.to_numpy()
y_train_np = y_train.to_numpy()
# Add a column of 1s to X_train_np for the bias term
X_train_np = np.column_stack((np.ones(X_train_np.shape[0]), X_train_np))
# Create and display DataFrame for initial weights
initial_df = create_output_dataframe(X_train_np, y_train_np, initial_weights)
print(initial_df.to_string(index=False, float_format=lambda x: f"{x:.6f}"))
print(f"\nAverage Log Loss: {calculate_average_log_loss(X_train_np, y_train_np, initial_weights):.6f}")
b. Compare this probability to the actual class label by calculating its log loss.
3. Update the weights to minimize the loss (usually using some optimization algorithm, like gradient descent. This include repeatedly do Step 2 until log loss cannot get smaller).
def gradient_descent_step(X, y, weights, learning_rate):
m = len(y)
probabilities = calculate_probabilities(X, weights)
gradient = np.dot(X.T, (probabilities - y)) / m
new_weights = weights - learning_rate * gradient # Create new array for updated weights
return new_weights# Perform one step of gradient descent (one of the simplest optimization algorithm)
learning_rate = 0.1
updated_weights = gradient_descent_step(X_train_np, y_train_np, initial_weights, learning_rate)
# Print initial and updated weights
print("\nInitial weights:")
for feature, weight in zip(['Bias'] + list(X_train.columns), initial_weights):
print(f"{feature:11}: {weight:.2f}")
print("\nUpdated weights after one iteration:")
for feature, weight in zip(['Bias'] + list(X_train.columns), updated_weights):
print(f"{feature:11}: {weight:.2f}")
# With sklearn, you can get the final weights (coefficients)
# and final bias (intercepts) easily.
# The result is almost the same as doing it manually above.from sklearn.linear_model import LogisticRegression
lr_clf = LogisticRegression(penalty=None, solver='saga')
lr_clf.fit(X_train, y_train)
coefficients = lr_clf.coef_
intercept = lr_clf.intercept_
y_train_prob = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(y_train_prob) + (1 - y_train) * np.log(1 - y_train_prob))
print(f"Weights & Bias Final: {coefficients[0].round(2)}, {round(intercept[0],2)}")
print("Loss Final:", loss.round(3))
Once the model is trained:
1. For a new instance, calculate the probability with the final weights (also called coefficients), just like during the training step.
2. Interpret the output by seeing the probability: if p ≥ 0.5, predict class 1; otherwise, predict class 0
# Calculate prediction probability
predicted_probs = lr_clf.predict_proba(X_test)[:, 1]z_values = np.log(predicted_probs / (1 - predicted_probs))
result_df = pd.DataFrame({
'ID': X_test.index,
'Z-Values': z_values.round(3),
'Probabilities': predicted_probs.round(3)
}).set_index('ID')
print(result_df)
# Make predictions
y_pred = lr_clf.predict(X_test)
print(y_pred)
Evaluation Step
result_df = pd.DataFrame({
'ID': X_test.index,
'Label': y_test,
'Probabilities': predicted_probs.round(2),
'Prediction': y_pred,
}).set_index('ID')print(result_df)
Logistic regression has several important parameters that control its behavior:
1.Penalty: The type of regularization to use (‘l1’, ‘l2’, ‘elasticnet’, or ‘none’). Regularization in logistic regression prevents overfitting by adding a penalty term to the model’s loss function, that encourages simpler models.
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_scoreregs = [None, 'l1', 'l2']
coeff_dict = {}
for reg in regs:
lr_clf = LogisticRegression(penalty=reg, solver='saga')
lr_clf.fit(X_train, y_train)
coefficients = lr_clf.coef_
intercept = lr_clf.intercept_
predicted_probs = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(predicted_probs) + (1 - y_train) * np.log(1 - predicted_probs))
predictions = lr_clf.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
coeff_dict[reg] = {
'Coefficients': coefficients,
'Intercept': intercept,
'Loss': loss,
'Accuracy': accuracy
}
for reg, vals in coeff_dict.items():
print(f"{reg}: Coeff: {vals['Coefficients'][0].round(2)}, Intercept: {vals['Intercept'].round(2)}, Loss: {vals['Loss'].round(3)}, Accuracy: {vals['Accuracy'].round(3)}")
2. Regularization Strength (C): Controls the trade-off between fitting the training data and keeping the model simple. A smaller C means stronger regularization.
# List of regularization strengths to try for L1
strengths = [0.001, 0.01, 0.1, 1, 10, 100]coeff_dict = {}
for strength in strengths:
lr_clf = LogisticRegression(penalty='l1', C=strength, solver='saga')
lr_clf.fit(X_train, y_train)
coefficients = lr_clf.coef_
intercept = lr_clf.intercept_
predicted_probs = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(predicted_probs) + (1 - y_train) * np.log(1 - predicted_probs))
predictions = lr_clf.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
coeff_dict[f'L1_{strength}'] = {
'Coefficients': coefficients[0].round(2),
'Intercept': round(intercept[0],2),
'Loss': round(loss,3),
'Accuracy': round(accuracy*100,2)
}
print(pd.DataFrame(coeff_dict).T)
# List of regularization strengths to try for L2
strengths = [0.001, 0.01, 0.1, 1, 10, 100]coeff_dict = {}
for strength in strengths:
lr_clf = LogisticRegression(penalty='l2', C=strength, solver='saga')
lr_clf.fit(X_train, y_train)
coefficients = lr_clf.coef_
intercept = lr_clf.intercept_
predicted_probs = lr_clf.predict_proba(X_train)[:, 1]
loss = -np.mean(y_train * np.log(predicted_probs) + (1 - y_train) * np.log(1 - predicted_probs))
predictions = lr_clf.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
coeff_dict[f'L2_{strength}'] = {
'Coefficients': coefficients[0].round(2),
'Intercept': round(intercept[0],2),
'Loss': round(loss,3),
'Accuracy': round(accuracy*100,2)
}
print(pd.DataFrame(coeff_dict).T)
3. Solver: The algorithm to use for optimization (‘liblinear’, ‘newton-cg’, ‘lbfgs’, ‘sag’, ‘saga’). Some regularization might require a particular algorithm.
4. Max Iterations: The maximum number of iterations for the solver to converge.
For our golf dataset, we might start with ‘l2’ penalty, ‘liblinear’ solver, and C=1.0 as a baseline.
Like any algorithm in machine learning, logistic regression has its strengths and limitations.
Pros:
- Simplicity: Easy to implement and understand.
- Interpretability: The weights directly show the importance of each feature.
- Efficiency: Doesn’t require too much computational power.
- Probabilistic Output: Provides probabilities rather than just classifications.
Cons:
- Linearity Assumption: Assumes a linear relationship between features and log-odds of the outcome.
- Feature Independence: Assumes features are not highly correlated.
- Limited Complexity: May underfit in cases where the decision boundary is highly non-linear.
- Requires More Data: Needs a relatively large sample size for stable results.
In our golf example, logistic regression might provide a clear, interpretable model of how each weather factor influences the decision to play golf. However, it might struggle if the decision involves complex interactions between weather conditions that can’t be captured by a linear model.
Logistic regression shines as a powerful yet straightforward classification tool. It stands out for its ability to handle complex data while remaining easy to interpret. Unlike some other basic models, it provides smooth probability estimates and works well with many features. In the real world, from predicting customer behavior to medical diagnoses, logistic regression often performs surprisingly well. It’s not just a stepping stone — it’s a reliable model that can match more complex models in many situations.
# Import required libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score# Load the dataset
dataset_dict = {
'Outlook': ['sunny', 'sunny', 'overcast', 'rainy', 'rainy', 'rainy', 'overcast', 'sunny', 'sunny', 'rainy', 'sunny', 'overcast', 'overcast', 'rainy', 'sunny', 'overcast', 'rainy', 'sunny', 'sunny', 'rainy', 'overcast', 'rainy', 'sunny', 'overcast', 'sunny', 'overcast', 'rainy', 'overcast'],
'Temperature': [85.0, 80.0, 83.0, 70.0, 68.0, 65.0, 64.0, 72.0, 69.0, 75.0, 75.0, 72.0, 81.0, 71.0, 81.0, 74.0, 76.0, 78.0, 82.0, 67.0, 85.0, 73.0, 88.0, 77.0, 79.0, 80.0, 66.0, 84.0],
'Humidity': [85.0, 90.0, 78.0, 96.0, 80.0, 70.0, 65.0, 95.0, 70.0, 80.0, 70.0, 90.0, 75.0, 80.0, 88.0, 92.0, 85.0, 75.0, 92.0, 90.0, 85.0, 88.0, 65.0, 70.0, 60.0, 95.0, 70.0, 78.0],
'Wind': [False, True, False, False, False, True, True, False, False, False, True, True, False, True, True, False, False, True, False, True, True, False, True, False, False, True, False, False],
'Play': ['No', 'No', 'Yes', 'Yes', 'Yes', 'No', 'Yes', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'No', 'Yes', 'Yes', 'No', 'No', 'No', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'Yes', 'No', 'Yes']
}
df = pd.DataFrame(dataset_dict)
# Prepare data: encode categorical variables
df = pd.get_dummies(df, columns=['Outlook'], prefix='', prefix_sep='', dtype=int)
df['Wind'] = df['Wind'].astype(int)
df['Play'] = (df['Play'] == 'Yes').astype(int)
# Split data into training and testing sets
X, y = df.drop(columns='Play'), df['Play']
X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5, shuffle=False)
# Scale numerical features
scaler = StandardScaler()
float_cols = X_train.select_dtypes(include=['float64']).columns
X_train[float_cols] = scaler.fit_transform(X_train[float_cols])
X_test[float_cols] = scaler.transform(X_test[float_cols])
# Train the model
lr_clf = LogisticRegression(penalty='l2', C=1, solver='saga')
lr_clf.fit(X_train, y_train)
# Make predictions
y_pred = lr_clf.predict(X_test)
# Evaluate the model
print(f"Accuracy: {accuracy_score(y_test, y_pred)}")
Supply Demand Execution Analyst
Job title: Supply Demand Execution Analyst
Company: Apple
Job description: industries. It’s the diversity of those people and their ideas that encourages the innovation that runs through… for a number of our product lines, support & lead capital projects along with translating business priorities into a systematic & automated plan…
Expected salary:
Location: Cork
Job date: Fri, 30 Aug 2024 06:23:40 GMT
Apply for the job now!
Generative AI Annotator Thai
Job title: Generative AI Annotator Thai
Company: Cpl Group
Job description: hours – Mon – Fri 9am – 5:30pm or 6pm to 2am with shift premium Language – Thai At Covalen, we’re not just a business… has previous experience in effectively coping with a fast-paced, high pressure role in a constantly changing business environment…
Expected salary:
Location: Dublin
Job date: Fri, 30 Aug 2024 06:47:54 GMT
Apply for the job now!
Software Development Engineer II, SES
Job title: Software Development Engineer II, SES
Company: Amazon
Job description: business needs into features and projects, with an opportunity to participate in strategic planning, contributing to the… to fostering a culture of creativity and openness. Your work will encourage innovation and growth, and you will play a role in…
Expected salary:
Location: Dublin
Job date: Fri, 30 Aug 2024 07:27:25 GMT
Apply for the job now!
Data Test Engineer at Entain – Hyderabad, India
Company Description
Ivy is a global, cutting-edge software and support services provider, partnering with one of the world’s biggest online gaming and entertainment groups. Founded in 2001, we’ve grown from a small tech company in Hyderabad to one creating innovative software solutions used by millions of consumers around the world, with billions of transactions taking place to head even some of the biggest technology giants. Focused on quality at scale, we deliver excellence to our customers day in and day out, with everyone working together to make what sometimes feels impossible, possible.
This means that not only do you get to work for a dynamic organization delivering pioneering technology, gaming and business solutions, you can also have an exciting and entertaining career. At Ivy, Bright Minds Shine Brighter.
Job Description
As a Data Test Engineer, you will be responsible or defining and executing test cases for data-related projects. Reporting to the Analytics QA Manager, you will be part of the Data Engineering Team, who are building one of the most important projects in our company. The Data Test Engineer will work closely with data engineers, analysts, and other stakeholders to ensure the accuracy, completeness, and reliability of data processes and analytics solutions.
What you will do:
- Collaborate with cross-functional teams to understand data processing and analytics requirements.
- Create Test data, SQL scripts to validate ETL flows, data loading, transformations, aggregations, data reconciliation etc.
- Design and execute test cases to validate the accuracy, completeness, and reliability of data processing workflows.
- Conduct thorough testing of scripts, procedures and ETL (Extract, Transform, Load) processes to validate data integrity and consistency.
- Identify and address data quality issues through rigorous testing and validation.
- Verify the accuracy of data transformations and aggregations within the data warehouse.
- Implement and execute regression testing to ensure that changes to data processes do not adversely impact existing functionality.
- Maintain regression test suites for ongoing projects.
- Participate in User acceptance testing and support implementations.
- Logging and tracking defects.
- Collaborate with cross-functional teams to address issues and implement solutions.
- Maintain comprehensive documentation for test plans, test cases, and test results.
- Report and track defects, ensuring timely resolution.
Qualifications
Who you are:
Key Skills:
- Proven experience as a ETL or Data Test Engineer or in a similar role.
- Strong understanding of data processing technologies and ETL processes.
- Should have worked on testing large data sets, designing and executing test cases for complex data warehousing integration solution and Business intelligence reports.
- Proficiency in writing advance SQL queries, power BI report testing and test management tools.
- Working knowledge of testing methodologies, including manual function, API and automated testing, defect management
- Familiarity with Continuous integration and continuous delivery, good to have worked on cloud environments.
- Analytical mindset with the ability to identify and address complex data quality issues.
- Escalate risks and issues in a timely manner within existing risk frameworks.
- Excellent verbal and written communication skills.
- Ability to communicate effectively with technical and non-technical stakeholders.
- Familiarity with data governance and compliance practices
Desirable:
- Certification in software testing or data-related certifications.
- Snowflake
- Testing practice in the Data Science space Automation with Python or similar data testing software tool
Additional Information
At Ivy, we know that signing top players requires a great starting package, and plenty of support to inspire peak performance. Join us, and a competitive salary is just the beginning. Working for us in [insert location of role], you can expect to receive great benefits like:
- Safe home pickup and home drop (Hyderabad Office Only)
- Group Mediclaim policy
- Group Critical Illness policy
- Communication & Relocation allowance
- Annual Health check
And outside of this, you’ll have the chance to turn recognition from leaders and colleagues into amazing prizes. Join a winning team of talented people and be a part of an inclusive and supporting community where everyone is celebrated for being themselves.
Should you need any adjustments or accommodations to the recruitment process, at either application or interview, please contact us.
At Ivy, we do what’s right. It’s one of our core values and that’s why we’re taking the lead when it comes to creating a diverse, equitable and inclusive future – for our people, and the wider global sports betting and gaming sector. However you identify, across any protected characteristic, our ambition is to ensure our people across the globe feel valued, respected and their individuality celebrated.
Manufacturing Technician
Job title: Manufacturing Technician
Company: Celestica
Job description: about innovation and automation. Detailed Description Your next challenge will be… Installation, set up, maintenance and support… Communicating potential problems that may affect capacity and quality to the business unit Team Leader as well as across zones/work…
Expected salary:
Location: Galway
Job date: Fri, 30 Aug 2024 07:36:10 GMT
Apply for the job now!
Construction Category Manager
Job title: Construction Category Manager
Company: Abrivia Recruitment
Job description: strategies for the construction and facilities categories, ensuring alignment with business goals and objectives. Lead the RFx… and analytic tools to enhance category management practices, drive data-driven decision-making, and support innovation. Lead…
Expected salary:
Location: Dublin
Job date: Fri, 30 Aug 2024 07:47:40 GMT
Apply for the job now!
Sales Representative with German (Remote in Ireland)
Job title: Sales Representative with German (Remote in Ireland)
Company: Salve.Inno Consulting
Job description: while raising awareness of the client’s value proposition. Clearly communicate both technical and business value propositions via…, as we believe that diversity fuels innovation and success. Join us in building a culture of inclusivity and collaboration…
Expected salary:
Location: Co Dublin
Job date: Fri, 30 Aug 2024 02:14:43 GMT
Apply for the job now!
Retail Supervisor
Job title: Retail Supervisor
Company: SMCP
Job description: because we are convinced that innovation and creativity are born from a diversity of opinions and profiles. Because we want… experience, ensuring the smooth running of the business… Working for Maje is: Starting your adventure with a personalised…
Expected salary:
Location: Cork
Job date: Fri, 30 Aug 2024 02:29:03 GMT
Apply for the job now!