10 episodes

Data is often the basis for how we see the world, and how the world sees us. Understanding these data-based projections is the focus of this podcast, which discusses topics related to data analytics, machine learning, and data science. Produced and hosted by Jim Harris.

Data-Based Projections Jim Harris

    • Technology

Data is often the basis for how we see the world, and how the world sees us. Understanding these data-based projections is the focus of this podcast, which discusses topics related to data analytics, machine learning, and data science. Produced and hosted by Jim Harris.

    That is Not Machine Learning

    That is Not Machine Learning

    Machine learning (ML) can provide unique analytical insights, as well as help automate some operational and decision-making processes more efficiently and effectively than non-ML alternatives. However, ML is also among the buzziest of buzzwords, and many are overselling and oversimplifying its usage. 
    Do not let anyone frame a data analysis, business problem, or process improvement as an ML use case. Instead, say: That is Not Machine Learning — that is a data analysis, business problem, or process improvement where ML might be able to help. But not before we evaluate other options. And with the understanding that ML is rarely going to be either the first or only aspect of the solution. 
     
    This episode is sponsored by: Vertica.com 
     
    Extended Show Notes: ocdqblog.com/dbp 
     
    Follow Jim Harris on Twitter: @ocdqblog 
     
    Email Jim Harris: ocdqblog.com/contact 
     
    Other ways to listen: bit.ly/listen-dbp 
     

    • 26 min
    Machine Learning is Label Making

    Machine Learning is Label Making

    Label Making. That is my simple two-word definition of Machine Learning. Machine Learning is Label Making. ML is LM. 
    Especially supervised machine learning, which creates either numerical labels (using regression algorithms) to make predictions about a continuous data value (such as sale or stock prices), or categorical labels (using classification algorithms) to assign data to pre-defined groups also called classes (such as Fraud or Not Fraud for financial transactions). 
     
    This episode is sponsored by: Vertica.com 
     
    Extended Show Notes: ocdqblog.com/dbp 
     
    Follow Jim Harris on Twitter: @ocdqblog 
     
    Email Jim Harris: ocdqblog.com/contact 
     
    Other ways to listen: bit.ly/listen-dbp 
     

    • 15 min
    Cloudy with a Chance of Data Analytics

    Cloudy with a Chance of Data Analytics

    Based on one of my presentations, this episode provides a five-part vendor-neutral framework for evaluating the critical capabilities of a cloud data analytics solution: Deploy, Store, Optimize, Analyze, Govern. 
     
    This episode is sponsored by: Vertica.com
     
    Extended Show Notes: ocdqblog.com/dbp
     
    Follow Jim Harris on Twitter: @ocdqblog
     
    Email Jim Harris: ocdqblog.com/contact
     
    Other ways to listen: bit.ly/listen-dbp 
     

    • 27 min
    Big Data Quality, Then and Now

    Big Data Quality, Then and Now

    A decade ago, just before the beginning of the data science hype cycle was the big data hype cycle. At that time I had the privilege of sitting down with Ph.D. Statistician Dr. Thomas C. Redman (aka the “Data Doc”). 
    We discussed whether data quality matters less in larger data sets, if statistical outliers represent business insights or data quality issues, statistical sampling errors versus measurement calibration errors, mistaking signal for noise (i.e., good data for bad data), and whether or not the principles and practices of true “data scientists” will truly be embraced by an organization’s business leaders.
    This episode is an edited and slightly shortened version of that discussion, which even though it is from ten years ago, I think it still provides good insight into big data quality, then and now.
     
    Extended Show Notes: ocdqblog.com/dbp
     
    Follow Jim Harris on Twitter: @ocdqblog
     
    Email Jim Harris: ocdqblog.com/contact
     
    Other ways to listen: bit.ly/listen-dbp 
     

    • 29 min
    Three Questions for Data Analytics

    Three Questions for Data Analytics

    Before you get started on any data analytics effort, you need to have at least preliminary answers to three questions: (1) What problem are we trying to solve?, (2) What data can we apply to that problem?, and (3) What analytical techniques can we apply to that data?  
     
    This episode is sponsored by: Vertica.com
     
    Extended Show Notes: ocdqblog.com/dbp
     
    Follow Jim Harris on Twitter: @ocdqblog
     
    Email Jim Harris: ocdqblog.com/contact
     
    Other ways to listen: bit.ly/listen-dbp 
     

    • 12 min
    Machine Learning on Opening Day

    Machine Learning on Opening Day

    In time for opening day of the 2022 Major League Baseball (MLB) season, I discuss the initial results of my Baseball Data Analysis Challenge.   
    See the extended show notes for links to my input data, my results as a Microsoft Excel file, and my SQL scripts on GitHub.   
    I used logistic regression machine learning classification models to calculate win probabilities for the Boston Red Sox across nine (9) game metrics, and a Naïve Bayes machine learning classification model to predict individual game wins and losses with an associated probability.  
    Think you can best my model? Game on! The baseball data analysis challenge continues. Play ball!
     
    Extended Show Notes: ocdqblog.com/dbp
     
    Follow Jim Harris on Twitter: @ocdqblog
     
    Email Jim Harris: ocdqblog.com/contact
     
    Other ways to listen: bit.ly/listen-dbp 
     

    • 9 min

Top Podcasts In Technology

Acquired
Ben Gilbert and David Rosenthal
All-In with Chamath, Jason, Sacks & Friedberg
All-In Podcast, LLC
Lex Fridman Podcast
Lex Fridman
Hard Fork
The New York Times
TED Radio Hour
NPR
Darknet Diaries
Jack Rhysider