AI Product Craft Newsletter

AI Product Craft Newsletter

Share this post

AI Product Craft Newsletter
AI Product Craft Newsletter
AI/ML Project Development Phases 2/4: Data Preparation and Model Selection
User's avatar
Discover more from AI Product Craft Newsletter
Our newsletter helps non-technical leaders expand their skills and knowledge in AI/ML product management. Become a better AI/ML product manager with strategy, actionable guidance, practical insights, real-world use cases delivered to your inbox
Already have an account? Sign in

AI/ML Project Development Phases 2/4: Data Preparation and Model Selection

Step 2 of 4 in a successful AI/ML project development, the quality and relevance of your data, combined with an appropriate model choice, are critical determinants of your AI solution's success.

Patrick Ncho
and
AI Product Craft Newsletter
Jul 18, 2024

Share this post

AI Product Craft Newsletter
AI Product Craft Newsletter
AI/ML Project Development Phases 2/4: Data Preparation and Model Selection
Share

This article is the second of a 4 part series guides you through the four key stages of AI/ML development, emphasizing the importance of a data-centric approach and providing best practices for each phase.

🔗🔗 - Click here to Browse full 4 article part series (Coming soon…)

Thanks for reading AI Product Craft Newsletter! Subscribe for free to receive new posts and support my work.

Phase 2: Data Preparation and Model Selection

In this article, we discuss about the data preparation and model selection phase in AI product development process. This phase is crucial in determining the success of your AI solution. It involves transforming raw data into a format suitable for machine learning, selecting appropriate features, and choosing the right model architecture. The quality of your data and the suitability of your chosen model are paramount in achieving desired outcomes.

Today, I’ll cover the following practices:

1. Data cleansing that ensures you're building on a solid foundation. Poor quality data can undermine even the most sophisticated AI models.

2. Data augmentation that helps you make the most of limited data resources, which is often a challenge in AI projects.

3. Feature engineering that allows you to inject domain expertise into your model, potentially improving performance beyond what the model could achieve on raw data alone.

4. Data normalization that prevents certain features from dominating the model simply due to their scale, ensuring fair treatment of all inputs.

5. Model selection that is about finding the right tool for the job. The most complex model isn't always the best choice - it depends on your specific needs and constraints.

6. Cross-validation provides a reality check on your model's performance, helping you avoid the pitfall of overfitting to your training data.

Understanding the rationale behind these practices allows you to apply them more effectively and adapt them to your specific context.

Let’s dive in!


1. Data cleansing

Clean data is essential for accurate AI models. Inconsistencies, errors, and outliers can lead to biased or unreliable results. Cleansing ensures that your model is learning from high-quality, relevant data, which directly impacts its performance and reliability.

- Identify and handle missing values (imputation or deletion)

- Detect and remove outliers, considering their potential significance

- Standardize data formats and units

- Resolve inconsistencies and duplicates

2. Data augmentation

Augmentation helps address issues of limited data and can improve model generalization. By artificially expanding your dataset, you can help your model learn more robust features and reduce overfitting, especially when working with small or imbalanced datasets.

- Implement techniques like oversampling for imbalanced datasets

- Use data generation techniques (e.g., SMOTE for tabular data, GANs for images)

- Apply domain-specific augmentation methods (e.g., rotations for image data)

3. Feature engineering

Effective feature engineering can significantly improve model performance by creating more informative inputs. It allows you to incorporate domain knowledge into your model and can help uncover hidden patterns in the data that the model might not discover on its own.

- Create interaction terms between existing features

- Develop domain-specific features based on expert knowledge

- Use dimensionality reduction techniques like PCA or t-SNE

- Implement feature selection methods to identify most relevant attributes

4. Data normalization

Normalization ensures that all features contribute equally to the model's learning process. Without normalization, features with larger scales could dominate the model, leading to biased results and slower convergence during training.

- Apply scaling techniques like Min-Max scaling or Standard scaling

- Use normalization methods appropriate for your data type and model

- Ensure consistent normalization across training and test sets

5. Model selection

Choosing the right model is crucial for achieving optimal performance. Different models have different strengths and weaknesses, and the best choice depends on your specific problem, data characteristics, and requirements (e.g., accuracy vs. interpretability).

- Consider the nature of your problem (classification, regression, clustering, etc.)

- Evaluate model interpretability requirements

- Assess computational resources and training time constraints

- Start with simpler models and progressively increase complexity

6. Cross-validation

Cross-validation helps assess how well your model generalizes to unseen data. It provides a more robust evaluation of model performance than a single train-test split and helps detect overfitting early in the development process.

- Implement k-fold cross-validation to assess model generalization

- Use stratified sampling for imbalanced datasets

- Consider time-based splits for time-series data

- Evaluate multiple performance metrics to get a comprehensive view

By focusing on these practices and understanding their importance, you set the stage for developing a robust, reliable AI model. Remember, the quality of your data preparation and the appropriateness of your model choice are often more important than the complexity of your algorithms in determining the success of your AI solution.

—

📚Continue reading the full series: The Four Key Phases of AI/ML Product Development

  1. Discovery and Feasibility: Phase 1 of 4 in AI/ML Project Development

  2. Data Preparation and Model Selection: Phase 2 of 4 in AI/ML Project Development

  3. Prototype and Experimentation: Phase 3 of 4 in AI/ML Project Development

  4. Production Deployment and Continuous Iteration: Phase 4 of 4 in AI/ML Project Development

Thanks for reading AI Product Craft Newsletter! Subscribe for free to receive new posts and support my work.

Share this post

AI Product Craft Newsletter
AI Product Craft Newsletter
AI/ML Project Development Phases 2/4: Data Preparation and Model Selection
Share

Discussion about this post

User's avatar
How Perplexity.ai Build Its AI-Powered Conversational Search Assistant
Discover the AI product design, user-centric principles, data strategies, and key AI/ML technology stack behind Perplexity.ai's search engine and…
May 25, 2024 â€¢ 
AI Product Craft Newsletter
 and 
Patrick Ncho

Share this post

AI Product Craft Newsletter
AI Product Craft Newsletter
How Perplexity.ai Build Its AI-Powered Conversational Search Assistant
The Top 100 AI & ML Terms AI Product Managers Need to Know
From algorithms and neural networks to ethics and zero-shot learning, master the language of artificial intelligence and machine learning to build…
May 17, 2024 â€¢ 
AI Product Craft Newsletter
2

Share this post

AI Product Craft Newsletter
AI Product Craft Newsletter
The Top 100 AI & ML Terms AI Product Managers Need to Know
Welcome to AI Product Craft Newsletter
AI Product Craft is a newsletter to help non technical professionals become better at AI/ML product management, even with minimal technical expertise in…
Jan 11, 2024 â€¢ 
Patrick Ncho

Share this post

AI Product Craft Newsletter
AI Product Craft Newsletter
Welcome to AI Product Craft Newsletter

Ready for more?

© 2025 AI Product Craft Newsletter
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share

Create your profile

User's avatar

Only paid subscribers can comment on this post

Already a paid subscriber? Sign in

Check your email

For your security, we need to re-authenticate you.

Click the link we sent to , or click here to sign in.