The Data Canteen: Episode 07
Dr. Keith Allen: VDSML Member Profile
Show Notes
In this episode, I'm joined by Dr. Keith Allen to trace his journey from U.S. Army Infantry officer to data science consultant! Dr. Allen and I also chat in-depth about his passion for sports analytics, and how that drove him to focus his PhD dissertation on the question "Should we go for it on fourth down?"
FEATURED GUEST:
Name: Dr. Keith Allen
Email: getallenkl@gmail.com
LinkedIn: https://www.linkedin.com/in/rainbowmountaindatascience/
About: Dr. Keith Allen is a Specialist Master Senior Data Scientist for Deloitte in Huntsville, AL. He has over 20 years of experience in simulation modeling, quantitative research, multi-variate statistics and test and evaluation. Dr. Allen began his career as an Infantry officer in the U.S. Army and then became a senior test officer and team lead as an Army civilian leading developmental flight test projects for as a civilian. In recent years, Dr. Allen has worked as a technical consultant serving DoD, NASA and other public and private entities as a senior data scientist focused on stochastic modeling, predictive analytics and mathematical optimization. Dr. Allen holds a bachelors and master’s in engineering fields, and successfully completed his PhD in data science in 2021. Dr. Allen also holds several professional certifications in the field of project management, modeling and simulation and test and evaluation.
SUPPORT THE DATA CANTEEN (LIKE PBS, WE'RE LISTENER SUPPORTED!):
Donate: https://vetsindatascience.com/support-join
LEVEL-UP YOUR VDSML MEMBERSHIP:
Subscribe to a Premium-level Membership: https://vetsindatascience.com/support-join
EPISODE LINKS:
Dr. Keith Allen's Dissertation Website: https://www.bridgetheparadox.org/
Northcentral University's PhD in Data Science: https://www.ncu.edu/programs-degrees/doctoral/doctor-philosophy-data-science
Karl Popper (author): https://plato.stanford.edu/entries/popper/
Thomas Khun (author): https://plato.stanford.edu/entries/thomas-kuhn/
Nvidia Deep Learning Institute: https://www.nvidia.com/en-us/training/
INFORMS Certified Analytics Professional (CAP) Certification: https://bit.ly/2MIOxqP
PODCAST INFO:
Host: Ted Hallum
Website: https://vetsindatascience.com/thedatacanteen
Apple Podcasts: https://podcasts.apple.com/us/podcast/the-data-canteen/id1551751086
YouTube: https://www.youtube.com/channel/UCaNx9aLFRy1h9P22hd8ZPyw
Stitcher: https://www.stitcher.com/show/the-data-canteen
CONTACT THE DATA CANTEEN:
Voicemail: https://www.speakpipe.com/datacanteen
VETERANS IN DATA SCIENCE AND MACHINE LEARNING:
Website: https://vetsindatascience.com/
Join the Community: https://vetsindatascience.com/support-join
About: Veterans in Data Science and Machine Learning (VDSML) is a valuable and supportive community for current and aspiring data professionals with U.S. Military experience. Our aim is to further the careers of fellow Veterans through intentional sharing of collective knowledge and experience.
OUTLINE:
00:00:07 - Introduction
00:01:28 - Where Dr. Allen earned his PhD in data science
00:06:34 - Journey from the U.S. Army to data science
00:39:01 - Dr. Allen's data science PhD dissertation
01:05:14 - Dr. Allen answers the question "Who should pursue a PhD in data science?"
01:08:54 - Insights from the world of data science consulting
01:26:10 - Dr. Allen's next 50 and 100 meter data science learning targets
01:27:56 - Two excellent authors for your data science reading list
01:30:09 - The best ways to contact Dr. Allen
Transcript
DISCLAIMER: This is a direct, machine-generated transcript of the podcast audio and may not be grammatically correct.
Ted Hallum: [00:00:07] Welcome to episode seven of the data canteen, a podcast focused on the care and feeding of data scientists and machine learning engineers who share in the common bond of U.S. Military Service. I'm your host, Ted Hallum. Today I'm joined by Dr. Keith Allen. Keith is a fellow member of the Veterans in Data Science and Machine Learning Community, a recent recipient of a PhD in Data Science from North Central University and a Specialist Master Data Scientist at Deloitte.
Today, Keith and I talk about his experience earning his doctorate in data science, his path from the U.S. Army to systems engineering and where he is now in data science consulting.
Finally, Keith recommends technologies, courses, and authors to enrich your data science journey. I hope you enjoy the conversation as much as I did.
Dr. Keith Allen, welcome to the data. Canteen. Thanks so much for coming on the show.
Keith Allen: [00:00:56] Thank you very much for the opportunity to appreciate it.
Ted Hallum: [00:00:59] So, Keith, now from what you and I have talked about. You have just finished up your PhD in data science. Is that right?
Keith Allen: [00:01:06] That's correct. Yes. Yeah. I successfully defended my, PhD dissertation about a month ago.
Ted Hallum: [00:01:13] Fantastic. Well, first of all, congratulations, because I know that is a tremendous amount of work. There are very few things that parallel the amount of effort that it takes to accomplish that.
So, hats off now, where did you go to get your PhD?
Keith Allen: [00:01:28] I went to the North Central University NCU. It's a predominantly online, it's actually a fully online university. One of the only few I think in the country now all of that thing, that type of academic system is getting more popular of course, these days.
So NCU administrative headquarters in San Diego and they also have offices in Scottsdale, Arizona. But I think they have something like 10,000 students worldwide. Mostly in the United States, but they do have international students as well. And that's that's a function of obviously being an online university.
NCU is mostly, there are bachelor's programs. They have three think they have three distinct colleges, school technologies the, school that I was in, most of the programs, there are our graduate programs, as I understand it there are bachelor's programs, but they're mostly master's and PhD students.
And the, one of the reasons why I chose a school against you and I, and we'll, get into my background what led me to, to, NCU. But the, advantage of a school like NCU is for working professionals like myself who have full-time job, how family, and so you offers know, you don't have to be physically at a location, like a typical brick and mortar school.
You can take the, core, the entire program is designed around somebody who needs that flexibility and time like somebody who works full-time who can needs that flexibility. Okay. I have two hours tonight, so let's do school. So that's the advantage of a, system like that .
Ted Hallum: [00:03:11] So it's, I guess the program was set to be done asynchronously from anywhere.
Keith Allen: [00:03:15] Yes. Yeah. I mean I, basically earned my PhD on my laptop I mean, obviously you're doing research as a PhD student, you got to do dissertation and you've got to collect data you've got to have you've got to have the resources available like webcams, but yeah, everything you do can be done from your computer now.
And I really think looking forward, this is the future of academia. There will still be a call for brick and mortar institutions. There are good things that happen. A lot of good things that happen at major universities because the major universities have the facilities to do a lot of research, right.
From, for what I do, my, my office could really be anywhere somebody who collects and analyzes data and builds models and those types of things. Yeah. Maybe part of the process of, capturing the data of recording the data, it would be at a location, but most of your most of your research is going to be done sitting at your computer so you can do that anywhere.
So that's the advantage of doing that.
Ted Hallum: [00:04:25] Now, I'm curious when you did this program or were you able to leverage an education benefit that you received in the military?
Keith Allen: [00:04:32] Absolutely. So the, yeah, the VA paid for 60% of my tuition.
And so knowing that I still had those, benefits, I said, well, might as well masala used those benefits that I earned a long time ago, that we're still on the record there. So yeah, the VA I always tell people this is my third degree. Now the army, let's just say the army or the DOD in general, including the VA has really paid for almost all of my formal education.
I mean there are, some funding that I had to apply to that, but yeah, I mean, it's the army paid for my master's at pay for a good portion of my bachelor's. So when you talk about veterans benefits and something that I'm passionate about is talking to kids about the military.
If they're I don't know, it never pushed, I would never push the military on anybody. It's definitely not for everybody. But I have spent a lot of time especially when I was living in Arizona and now here in Huntsville, Alabama, where I live now I spent a lot of time trying to mentor young people.
And sometimes these, the idea of going in the military comes up and that's one of the advantages I tell them, look even if you just do four year commitment, Look at the advantages that you get with GI bill or VA benefits or, whatever, is afforded to you. So you gotta think about that
Ted Hallum: [00:06:00] Like you said, it might only be four years, but then you have benefits that lasts decades.
Yeah. So, yeah, absolutely. Now I'm curious. Well, I think you're the first person I've met that actually has their doctorate in data science proper. So I'd love for you to take us on that whole journey and to start at the beginning, how was it that you decided that this was something you needed to do?
What was that, spark that said, I need to do a PhD in data science.
Keith Allen: [00:06:34] Yeah. So it, really stuck. I don't know about 10 or 12 years ago. And I, told you before we were talking, it's not just, it's a very intertwined, very lengthy process that took me more than a decade to get to where I am just now.
So I left the military in the mid 2000, about 2005 originally. And I started my civilian. My, my, first job out of the military was testing parachutes and aircraft systems at the U S army Yuma proving grounds, because you could probably see some, I gotta look I gotta, bend a certain way, but you can see all of the airdrop stuff behind me.
So that's an ode to my my, a major part of my. My career as a, as an army civilian testing, doing tests and evaluation, developmental testing of, aerospace systems parachute aircraft, technology sensor, systems, things like that. And so as a, and that was a really, that was an engineering job.
So my bachelor's was in civil engineering. And so the, having the bachelor's and the combination of that, and the military experience as an army officer set me up, for success coming out of the military initially. And so I, got this fantastic job in Arizona doing developmental testing of parachutes in aircraft systems.
And as a function of doing that job, our job was to measure flight performance and measure the reliability of these systems. And so we were part of the DOD. Most of the work we do is DOD. We did do some aerospace, some space work with NASA and some other private industry customers I'd say about 99, the 95% of it was DOD.
So as a function of my job as a test officer and team lead was to design experiments and go and execute flight experiments and collect data to be able to measure something or some things about these different systems. And then ultimately my job was to. Model that data and then tell the client who's typically a senior government executive or, a general officer or something, somebody like that, whether their system meets or did not meet the S the, specifications.
Right. And, we had a big hand, we had, we were also involved in the requirements development process as well, because as testers, there are things, there are ways to write requirements that make it easier or more clear for somebody like myself as a, data scientist or engineer to, to actually design an experiment around that.
Right. And then to have a metric for what it's supposed to do. And that's very important. We can definitely, we definitely need to talk about that later on in this conversation. Cause that's a, very important for young data scientists to understand that. So as a function of doing my work at the Yuma proving ground testing, parachutes, and aircraft, it was very statistics heavy.
And I started to take courses in my free time. And, in sponsored by my by the army, by my work I would take these short courses to improve my statistics skills. And I ended up one week at this this course that this gentlemen taught who, operates at a Phoenix and he was instrumental.
He was one of the. Cor co industrial quality control and reliability engineers for Motorola. And he developed this whole course on statistical quality control. So it's all industrial engineering. It's very statistics heavy. And I took this course, this is probably around 2010 or so. And I just, it like clicked.
It was just one of those moments in your life where you just go Oh, I don't I need it. I need to focus on this. Like on the data science and the statistics. I don't know why. I just felt like I need, I, it first made a lot of sense to me. And I said and I, thought it was highly useful for not only what I did back then, but I knew it would be useful if I stayed in the technical community state and the engineering and the data science community.
So I took that class and I just started to, in the years after that, I really started gearing my career. It takes time. You can't just jump off the cliff and go, ah I'm going to go do this other thing. You have to get education and experience in that. And you have to have, you really have to have a 10 or 10 plus year plan.
So my 10 plus year plan was, I'm going to go back. I'm going to go back and get a master's. And I'm going to focus in modeling and simulation and statistical quality control. And then I'm going to get some S some certifications in modeling simulation and project management, things like that. And then eventually then w we'll see where that can take me as far as a data science career.
The thing about so I'm getting to the PhD here in a minute. I'm sorry. This is a long, this is a long a story. No.
Ted Hallum: [00:11:45] So context is good. You have to set the stage. Yeah. So everybody will appreciate the story so much better. Yeah.
Keith Allen: [00:11:51] Yeah. You it took me a time to navigate through the engineering field and really figure out what side of that I really wanted be in, which is really the data analytics data, science side. So now it's 2015, 2016 times 2015, I finished my masters university of Arizona. I took about a year off and I really, I did really well there.
And I had a lot of I had a couple professors there that I work with during my master's program that I had a really good relationship with. And their research was very very similar. Their research agendas were very similar to what I was interested and actually what I was doing at YPG. And so I thought, well, if this is what I really want to do for the rest of my career, then I ought to, get a doctorate in it and that way maybe, it'll, allow, it'll afford me a lot of other opportunities going forward.
So I started my PhD journey at the university of Arizona and the same department that the department of systems and industrial engineering at the university of Arizona. And I got through the first year and the, PhD program there in that particular department was very different than the master's.
And what, what happens in. What happens. It's very, important when you're looking at a graduate program and you could say this for, masters as well, but it's really important on the, PhD side, you've got to, you got to find a program that fits how you learn and what you want to learn and what you want to focus on.
And it, from the coursework to the dissertation, to the opportunity, to work with professors on the dissertation research that you want to do. And because what happens a lot at brick and mortar schools, and you hear horror stories about this all time is this young grad student comes in and basically they start working with they start working with an advisor, a chair advisor a tenured track faculty, right?
PhD who's on a tenure track or is tenured. And they end up doing their research. And what was the advice that was given to me a long time ago when I started this process 2015 timeframe 16 was do what you love, find the program that you, that works for you in your life and the way you learn what you want to get out of it.
And as far as dissertation research goes, do what you love and find faculty that will support you in doing what you love, because you're going to spend a lot of time. And a lot of your money or somebody else's money doing this stuff. So you ought to be something that is a valuable for your career and be something that you're going to be motivated year after year to come back to.
And so I thought about that. And as I got through my first year at Arizona, I realized it wasn't the wrong department. I, was talking to my advisor back then. Who's actually now going to be a colleague of mine. Dr. David Gross down at a Florida state and he, was my advisor at Arizona. And he with a couple of other faculty basically say, look you're, it's going to be hard, Th they, gave, me the rundown and they basically said, look, you may have a hard time here because of what you're interested in is not necessarily what they're as a department, what they're interested. It's not just, it's not that I couldn't do it. It's just that it may not be a good fit. So after a very hard lesson in that, I actually pulled out of that program.
And I took about eight months, six to eight months off, and I put it, I still wanted to get the doctorate. I, it was the reason for getting the doctorate is two things. It's not only to become an expert and to become a leader in my field and to deal with, to teach and to. And to help other people grow.
But it's also because it was a challenge. And I think my philosophy, the way I live my life is that there are as you, get older in your life you, have to continuously test yourself, whatever that means to you. Right. And I always advise young people like you don't give up. Okay, great. You just graduated.
What's next because you don't want to get complacent because if you, don't, if you stop challenging yourself and I'm not saying every day, but I'm saying periodically through your life, you got to have these mountains to climb in your life, or else you just start releasing, you just start dying.
You start getting, lazier less bad stuff happens when people aren't engaging their mind and their body to of course. So I'm maintaining that healthy challenge for yourself with a big tummy. So that's really the, reason for it. So I pulled out of the program at Arizona, kind of reassessed everything.
And some things happens personally and professionally around the 2017 timeframe. And I decided to leave the government and moved to Huntsville, Alabama, but I still had this desire to, do the doctorate. And I said, well, I can't do it at Arizona. Cause I th most of these brick and mortar schools have a, residency requirements.
And I want to find something that's flexible for me, for what having family work, working individual, et cetera, And so I, looked around and I found NCU and it's been, it was a blessing in disguise, but it came out of a big failure. I, look at that as I don't know, maybe not a failure, probably a good moment in my, like a good test, a good okay, do you really want this?
And, Oh, by the way, are you willing to be, are you willing to take a step back and say, maybe I'm in the wrong program. Maybe I can do this. I will do this, but maybe a different way. And that was a big that was a big moment for me in my career. That happened four, three, four years ago. They're very important struggle in my life that got me to where I am today.
Ted Hallum: [00:18:28] I definitely see it as a success in the sense that you were undaunted, because anybody that's going to be serious in jumping into this career of data science and machine learning, you are going to face challenges. Your challenge may be a doctorate. It may not be a doctorate. Your challenge might be an eight month bootcamp.
Your challenge might be an 18 month or 24 month master's degree, but there's going to come a point where if you're going to succeed in data science and machine learning, it's going to require some serious grit and you're going to come up against adversity. And there's going to be points where it's three in the morning and you want to quit or maybe you don't get to sleep at all that night, whatever the case may be.
And. You have to be that sort of person that says it doesn't matter. Like I'm going to do this. I'm not going to let anybody stop me. Even if it's like you came up against an organizational challenge. Okay, well, I'm gonna move that organization out of the way. I don't want to find an organization that is a better fit with me and I'm going to move on.
So I think that's a tremendous, that's a tremendous template for our audience to key into and try to follow because if, you think that you won't face comparable challenges in your own data science walk you just haven't been walking the path long enough because it's going to happen.
Keith Allen: [00:19:48] Yeah, absolutely. Yeah. It, to be fair. I look back the, lessons for me. I talk about this a lot because it's important. The lesson for me is, so I was naive I had done so well in my masters there at that unit, at that, in that department, around that faculty, I was naive by thinking, Oh, well that trend will just continue into the PhD program.
What I, and what that did is it made me blind to it. I didn't take the time to research what they were asking, and I failed at that and that was a big lesson learned for me, was be investigate what you're getting into. You talk to as many people about that are in that, or from that before you invest a significant amount of time to something very important. And that's a big failure but, it's a big lesson for me that I took, I am able to take forward now.
Ted Hallum: [00:20:51] Absolutely. And I think there are parallels to that too, for people who are looking for a job so you look through your job posting and yes, it's true. There's a lot of companies that don't know exactly what they want out of a data scientist or machine learning engineer there's companies that are notorious for asking for an entire team's worth of skills in one person.
But if you look through a job posting, usually there'll be things in there that will key you in to some fundamental nature of the, position. Things like a successful candidate in this position thrives working independently. Well, you can't just gloss over that.
They've basically just told you there's no data science team you're gonna be expected to be a lone Wolf data scientist. So you can't then be surprised if you get the job and you find out you're all alone, like it was in there. But we have to make sure that we're when we're reading these things that we're not reading.
I think what we want to see we're reading what's actually on the page or what's on the screen. Cause I think that's a lot of times what happens is we want the job so bad or we want the PhD program so bad or the master's degree or whatever the case may be that we almost become blind to what is in black and white right there in front of us.
Keith Allen: [00:22:15] Absolutely. Yeah. A hundred percent. And, yeah, it's important too, because there's a lot of misconceptions about what a PhD is. A lot of people think about PhDs as being, these, brainiacs that can write a bunch of equations and figure all this stuff out. Really. I don't know, I've been around, I've been around a lot of PhDs in different fields.
So you could probably do a study about different different fields and are different dispositions and things. But at the end of the day, a PhD is a doctor philosophy. And so you're not just expected to understand the technical part of it. You're also expected to be able to research it.
You're, expected to be able to read other people's work and critically analyze it. And you're expected to participate in the, furthering and the development of the theory. Right? So, so it's, I always tell people that it's probably 60% philosophy and 40%, all the other stuff like modeling and, and, hardcore math and statistics, at least for me in my role, you know what I do.
We talked about like you said looking at, a program description, just like a job description and going, okay, visa. These are the expectations. This is the type of coursework I'm going to be involved in. This is the type of knowing the professors too, in the department that you're Your perspective department is if you can, is a big deal as well.
And it's very important, especially when you get to the dissertation phase to be, to have committee members that support you. And that, that are, that have the same field of interest that you do. That's very, important. The committee, it cannot be overstated how important the, and you could say same thing for the, on the master session for the thesis, committee and the dissertation committee in grad school, it can make you and break you.
I mean I've heard great stories. I've I had a great experience, but I've also, I have friends that also had horror story experiences and it just because there was a particular member of the committee that gave him a hard time and it made the process very, painful for them. So it's very important.
Ted Hallum: [00:24:49] Well, of course, here in a second, we're going to dive into your dissertation topic. But before we do that, I know before you actually do your dissertation, at PhD program. You have to take at least some courses. So I'd love to hear what were, your favorite courses as part of this PhD program?
Keith Allen: [00:25:05] Anything that involved multi-variate statistics? I enjoy it. I think so. So my program was 15 courses. They're all 500 level, 600 level. Graduate level courses. I, about a third of those were all for the program that I was in, where were all statistics multi-variate statistics, the design of experiments, those types of things.
So anything that involved multi-varied statistics or, research, I also enjoyed quantitative research, the philosophy side of it because that's very important as well. And like I said, if you want to be considered a data science expert, it's not good enough just to be good at coding or running neural networks or whatever sitting there at the computer and just running and crunching data all day it's you also are expected to, be able to talk about the theory and you have to be able to explain, you have to be able to defend why you it's very important to be able to whenever a young, younger engineer or data scientist shows me a cool model, the first one of the first things I, unless they explicitly talk about this, one of the first things I asked them before we actually look at like what's coming out of the model is my question is always what analysis or what evidence do you drove you to that approach to that type of model?
Like why you develop what, research gap or what problem, what w and what was the problem, whatever the research question is, take me through your process. And I would say the most important part of any study is what's in chapter one, because if you can't articulate the problem, if you can't develop and articulate a well-written problem, statement, research questions, hypotheses, then everything you do on the modeling side will be at best a guess. And so, and at a minimum, it won't be defendable. So to me I'd rather, spend time on a project in the beginning of the project, really making sure that we have. Good metrics that we have a good problem statement, et cetera, et cetera, because everything else flows from that.
Ted Hallum: [00:27:30] When you go from an approach to a good approach, to the best approach, there's orders of magnitude, worth of difference between each of those. And I think a lot of times, people, especially in their first data science position will select an approach because they have an old homework assignment. Yes. It use that approach and it's a crutch that they fall back on.
Right. But you make a great point just because you have a template that you can follow and you can make it work in no way, justifies it as a good approach. Certainly not the best approach for the problem that your company client has at hand. Right. And it's you do your client or your company, a huge disservice, if you don't dive into the problem and do that legwork that you're talking about to actually think through is the approach that I'm taking even a logical approach for the data and the problem that I'm trying to solve, right?
Keith Allen: [00:28:31] Yeah, absolutely. And a lot of people make it's more important to know. So there's all kinds of. There's all kinds of statistical and an AI and machine learning models to choose from right when there's just you, don't, you you don't need a, you don't need a graduate degree to understand that.
Right? What I'm, what's more important when you have a set of models that you're considering using for a problem, it's often more important to understand what the limitations of those models are, right. What they were originally developed for. I almost think it's more important to understand that because then you can, if you understand all the limitations and you understand what your problem is then, you're naturally going to be driven to the most effective model. And maybe you try a few maybe there's nothing wrong with, Hey, we have three approaches.
Let's run three of them in parallel and see and, and, quantify the outputs.
Ted Hallum: [00:29:33] So, yeah, absolutely. And I mean especially with the classic statistical learning techniques, if you're looking at a particular technique that requires variables that have a Garcia and distribution and your data is not normally distributed, and none of the power transforms that you try and get you there, then you're at a dead end.
Keith Allen: [00:29:56] And that's why I focused on the process. Right. And I'm lucky because I came from the systems engineering world where everything almost all, things are process driven . And the coming from the systems engineering world really opened my mind up to the process of how to, go from a need, a problem statement metrics, research questions, et cetera, et cetera, methodology. So that when you are you get to the end, then you have this model you can defend you can, see how it traces all the way back to your need and your problem statement.
So that's a very important, I think I think one of my biggest recommendations at Ford data scientists is to take some systems engineering classes, take a systems engineering process class, because in fact I'm, working with Dr. Gross down at Florida state. And we're going to develop a course, roughly two, two graduate courses that I'm going to teach here probably in another year where it's going to be data analytics, but it's going to be heavily leveraging the systems engineering process and life cycle.
And, so we're going to take, students through the systems, engineering process, life cycle, And we're going to that, then we're going to interject, okay, this is how you apply some data science to this, to address this part of the problem. So
Ted Hallum: [00:31:30] where will those courses be offered?
Keith Allen: [00:31:32] So down in Panama city of all places there is a, Florida state university and Florida, a and M university have a cooperative program down there. And my former advisor, Dr. David Gross, and his colleague are they, are, they're basically growing this systems engineering program down there.
And they invited me to, to come down there and do some guest lectures and then at least virtually now do some guest lectures. And eventually a long-term plan is actually to have me be an adjunct and to develop two courses actually a two semester long graduate level course that, that addresses the data analytics in the systems engineering life cycle.
So we're developing that course. Now I'll let you know, when we in academia have to go through a lot, it's an arduous process to get your syllabus your content approved and all that. So we're starting that now. Probably start a, it probably won't be ready by this coming year. What we're looking at doing is doing a pilot next summer with a summer course.
Make some improvements and then start the, in the fall with the, full on course. So
Ted Hallum: [00:32:58] I think that concept of melding systems engineering with data science is fantastic. So once you get those, once you're in the thick of it with those courses, we might have to have you back on, so you can tell us how it's done.
Keith Allen: [00:33:07] I'd love to. Yeah. Yeah.
Ted Hallum: [00:33:09] Yeah, I know a lot of times when we go into a new experience, we have all these expectations about what we think that experience is going to be, what we think it's going to entail. And then oftentimes the reality of the experience once we're in it, or once we're looking at it in the rear view mirror, we think, wow, that was actually quite a bit different than what I had expected going into it.
As far as the doctorate degree as a whole and the dissertation experience were those pretty much did the reality of them play out to be in line with your expectations or when you think back on it are there certain things that stand out as, wow, that's actually pretty different than what I thought going in?
Keith Allen: [00:33:53] It was, of course it's going to be, it's not going to be exactly what you expected. I actually had a better experience than I expected. I had a much more pause. I thought getting through the coursework. Okay. It's just coursework. I have a, strong background in all these, in most of these topics.
So I didn't really think coursework is going to be that challenging. I thought, that the candidacy process would have, was going to be a little bit different than it was. So by the time I got to the candidacy process and I was so different schools do it differently, but at NCU, once you finish your coursework, you have you have another course sort of quote unquote course where it's, basically your candidacy process and what you ended up doing.
A big part of that is you work with your dissertation chair and you basically, you write what's called a prospectus and it's about a 15 page paper that explains your entire study or what you're going to do for your study. Sort of like an outline much, of it. It's, what's going to get captured in chapter one anyway, in a problem statement a little bit of background research, et cetera.
And that actually went better than I thought. I thought I was going to have some challenges with it's a lot of writing. It's a lot of, it's a lot of how you present arguments and how you write it out. But NCU gives a lot of really good resources for that type of thing. And I just paid attention to.
The information that was available at university, I looked at other other dissertations that were done in, the same field as Miami. It's not the same topic, but obviously gives you a good template for what the expectation is. And it really ended up being a better experience than I thought because because I had so many resources and, probably just as importantly, I had a I had a dissertation chair that was very supportive of what I want to do.
And I can't emphasize, I know we said, mentioned that before, but it's very, important that your committee, especially your chair B is somebody whose support both supportive of you and supportive of what you want to do for your research so yeah, I would say all in all there, wasn't a really negative surprises. It just I followed the process and I used the tools that were around me. To, make it better.
Ted Hallum: [00:36:32] One of the mantras on this podcast that comes up in almost every episode and I do it on purpose because it's, just one of these little common sense.
Things that I think often gets overlooked is I say on this podcast go where you're celebrated, not where you're tolerated, because where you're celebrated is going to where you, that's, where you're going to have incredible achievement and you're going to Excel and people are going to see value in what you do.
And it sounds like that's exactly what happened with you at North central university, with the committee and the chair that you had, they celebrated the direction that you wanted to take your PhD.
Keith Allen: [00:37:10] Yes. A hundred percent. I wouldn't have, I wouldn't have gotten through that program. I wouldn't be sitting here talking to you if I, if that, wasn't true.
Yeah. So that's a credit to, it's a credit to the university, but it's also a credit to, my committee. And their support.
Ted Hallum: [00:37:28] Well, at least in my life I've found that always rings true. So whether it's. An educational opportunity, whether it's a potential employer, I always try to fill out the environment, the culture, be honest about where I am and where I want my career, my education, and those things to go and make sure that those things are in line.
Because as long as they're complimentary, usually that relationship is going to be good. But if ever you look at it, you have to be honest with yourself. Because like I said, kind of like when you're looking at the job posting and you want so bad for that job to be a good fit, but that does just because you want, it doesn't mean that it will be exactly.
Keith Allen: [00:38:11] Yes, sir. Yeah. Would I, the way I would say so the way I've heard it said and I, use this, I had a professor he's retired now at Arizona and w the way he said it from a statistical perspective was you got to stay within your confidence interval, so, so as you grow, as you progress in life, both personally and professionally that confidence interval grows and shrinks, but generally speaking it's somewhere in, you need to be somewhere in here.
And so I've always thought about it like that. Like, you asked yourself when sometimes when you feel comfortable or, I, said the other PhD experience at Arizona just, I was out of my confidence interval and I had to get back in the confidence interval.
Ted Hallum: [00:38:55] Yeah. I love that. That's a very appropriate analogy for this podcast and our audience.
So I appreciate that. Well, without further ado, I want to talk about your dissertation topic because it's very cool and you have a nice website and everything, which we'll try to get pulled up here. And I think our audience is going to love to see the work that you did.
Keith Allen: [00:39:16] Okay, great.
Ted Hallum: [00:39:19] Okay. So I'll go ahead and let you introduce it. And then once you've sort of covered what your research accomplished then it's so awesome here on the website. You have linked the actual model that we can run, so we'll even run a scenario so people can see how, your research plays out in real life.
Keith Allen: [00:39:41] Okay. Yeah. Fantastic. So what I did is so first of all the, website that's being shown here is a bridge, the paradox.org. It's open to the public on this website there's, several pages to the, website. The page you're looking at here is the, intro page, the main page.
And so there's a little, there's an infographic about some of the results of the study. There's a description down below a paragraph or two about the study, and then there's some links here both to email contacts for me for email and LinkedIn. And then we'll, talk about the model here for a minute, but just in general.
So this I'm, a huge football fan. Both college and, professional football. And I have once I got serious about doing a PhD, I wanted to do something in sports analytics for my dissertation. So this study is a quantitative study where we look at risk under risk decision-making under uncertainty on the part of NFL coaches. And what we did is we took the last 10 years worth of NFL data.
Luckily at the NFL, just like many other sports leagues, there's a lot of data out there and it's all publicly available. So there's not a, usually not a a big problem in sourcing that data. And so as an example of risk making under uncertainty, I looked at the decision to go forward on fourth down.
And so if you're not, if you're not that familiar with American football , if you have the ball, if you're on, if you do the offensive, you have a total of four attempts to make 10 yards. And if you don't make those 10 yards, then the other team gets possession of the ball at the spot of that failed drive or that fear they'll play.
And so in the, in, in NFL football typically coaches, when they face fourth down, historically they typically opt to punt or to kick a field goal. If they're in field goal range opting for a more risk averse strategy as opposed to going for it. And so what I did is I asked the question well, I, so I asked several questions, but what we're, what we did is we, looked at from a quantitative perspective first that what are the most, what are the significant predictors predictor variables of successful or unsuccessful fourth down conversion?
What are the rates prickly in the NFL have successfully conversion converting that we developed. I developed a model that's based off of multiple logistic regression and Monte Carlo simulation that computes the probabilities. Of given, certain predictor variables that are, that we showed that are significant, that have a significant effect on fourth down success, we take those and we can run a logistic regression in a Monte Carlo simulation that predicts the probability of any, play on the, field of play predicts the probability of successfully converting that fourth down then on top of that, what we did is I applied a discrete time Mark off chain theory.
So if you're familiar with Markov chains or there's two kind of Tucson, two schools of thought and markup chains, there's the discreet case and there's the continuous case. And they get used for different applications. So it looked at discrete time, Mark off chains to add onto the model where you get, you compete the probabilities using the multiple logistic regression and the Monte Carlo.
And then you apply a a discrete time Mark off chain. It's actually a problem called the gambler's ruin problem. And so the gambler's ruin problem is is is a special case of a simple, random walk. And essentially what it says is that if you think about a gambling game let's say you're going to go to Vegas and play cards or slots or whatever gambling game.
There's a probability that you're going to win a probability that you're going to lose. And so on a simple gambling game, the gambler's ruin problem allows you to predict the point at which the gambler goes broke or, we weren't, when we think we're going to go broke. Right? So every time I, throw money down to stay in a game, right, the purse gets bigger, but my probability of way of continuously winning changes.
And so at some point I'm going to go broke. And so the discrete time Mark off chain, the Mark off chain part of this is, calculating the point where the, on a long enough timeline where the coach in a, sequence of place, a sequence of gambles on the football field would actually go broke or, the other way you could think about it is the point that w what's the furthest we could get in terms of field position to try to get into a scoring position without turning the ball over.
That'd be another way to think about it. So we do all that. And what we actually find is the infographic is probably the easiest way to look at the results. If you're really interested in reading my dissertation is there is a copy on this, or you could actually look at the slide deck as well. Which is which is, what I presented during the defense.
But if you look at the infographic, what we actually find is starting from the, upper left there, the pie chart, There were 15 different variables that we looked at at the outset we wanted to, build, I wanted to build a model that was as simple as possible because there's all these con there's all kinds of confounding.
There's all kinds of variables out there. You could look at as being some kind of having an influence on successful the success of a particular player not. And so what we did is I looked at, I used principles called principal components analysis to, to look at all the possible predictor variables and and reduce the number of variables in my model to four main predictor variables that have 86% of the effect on the, outcome variable, which is successful conversion of fourth down.
So 86%, and those four variables are time of the game yard line. So position on the field play type runner pass. So the NFL classifies all plays and let their a pun or some kind of kick. They classified all plays as a runner, a pass, and they have a whole set of metrics for what constitutes a Ford pass and what constitutes a run.
And then yards the game, right? So fourth and three fourth and six, et cetera. And so when we run the multiple logistic regression and, apply some Monte Carlos theory to that, what we, well, we actually find, so I, that allowed me to, develop the mathematical relationships, the system of equations.
That govern that, that, are the foundation for the model. And then when we run that model repeatedly in our case, 10,000 times as a Monte Carlo, we find is that as the time of, as the game wears on the I'm sorry, I kind of have to look, at this individually. So some of the variables, so the first thing we found is that the, you can see up here in the, Oh, I'll have a, I'll have a way to point here, but if you're if you, look at this graphic, the graphic on the top, right you see these, functions here.
I didn't put the math in here cause it's an infographic, but basically the, as the yards to gain increases, it gets obviously more difficult. Your probability of successfully converting that fourth down is a lot less. The, pass is a much more effective way to get a first down then than a run play, particularly on short, yardage plays.
And what was most interesting to me. And I actually ended up rejecting one of my hypotheses based on this, but as the game wears on and as the time of possession and as the, field position is advanced. So let's say you're on offense. And you're, late in the game and you're driving the ball.
Your probability of successfully converting a fourth down is actually less. And I, was very surprised by that. I thought that surely as the time as the game where's on that, that the pressure would be on more on the defense and therefore the, it would be demonstrated in at least quantitatively that, the offense would be offenses would be more successful towards the second half of the game or towards the end of the game.
So that's what the model says. Now, in a PhD dissertation, you are not just getting leaving results of something. You're also ha you have to apply a theory. You have to, apply what's called a theoretical lens to your study. And so the theoretical lens for this study is a concept called the St.
Petersburg paradox. And the St. Petersburg paradox essentially says that humans, when dealing with risk under uncertainty they, tend to, make the more conservative.
Choice despite the evidence, despite empirical or statistical evidence of significant, gains that could be made, right? Typically the, human's going to look at that and go, too risky. I'm going to, I'm going to choose the I'm going to choose the less risky decision.
So in this case, that would be punting the ball on fourth down. And so what I wanted to do with, so along with the St Petersburg paradox the, important part of that is the important part of this theoretical lens is also to understand expected utility theory and expected utility theory provides the mathematical framework for solving parts of the paradox.
So if you're going to look at the St Petersburg paradox you, if you want to address it mathematically and statistically, you have to invoke the, expected utility theory, expecting you to see all the theories use Y widely and in economics and finance, and in many other fields to look at, the relationship between what's called ex risk utility, and and what the actual expected value is.
So utility is best understood as usefulness, right? So your subjective view of a risk is going to be a function of probably several variables, right? So if you were gambling at I don't know if you're a game, like it's the, going back to the Vegas example you gambling on a card game in Vegas with your financial situation and your life is going to be much different than somebody, some guy who's a billionaire, right?
Because he's got money to play with and losing a couple thousand dollars on a table, it's nothing to him, but it's a lot to you, right? So utility is a subjective view of the usefulness of making a risk. The expected value is just that it's a, computed, it's a computer statistical expected value from the data set.
And so when we plot that, when we look at the data from the last 10 years, and I focused, on the last 10 years of, NFL data minus the 2020 season because of the COVID situation. There were a lot of confounding variables and I decided not to use the 2020 season, but to 2010 to 2019. When we, compare what the coaches actually did, if we compute their, if we compute their utility function based on their, what they actually did or called during the game.
But then we look at the expected value from the logistic regression. You actually get, you get this function here on the left, the bottom left of this of this infographic. It's a logarithmic function, and you can see that the, as the expected value, which is in the X axis, is that increases the expected utility.
Really it, it, really, it increases very rapidly early, but it actually plateaus. I actually really doesn't plateau it actually approaches a value of one, but it never actually hits one. So it has a limit of one. And so just from looking at that, you can basically see that despite there, there, are many situations in NFL games, at least historically from the data that I looked at, there are many situations where you have a high probability of actually making a first down.
And, that's what, that's why I built the calculator. Cause I want it to be, yeah. I want people to be able to put a situation and see what the probability is. But despite there, despite many, situations in games where the statistics say, Hey, go for it. You've got an 80% probability. You go, get that first down.
Why are coaches, why ultimately the question that I want to answer, why are NFL coaches still risk averse? And so using the St Petersburg paradox and the unexpected utility theory, what we find is that it's because they don't value the risk of going for, despite what the statistics say, despite the fact that if you had gone for it over those 10 years, as a whole, the NFL would have realized about 7,800 yards or so gained office, which is pretty significant.
But in reality, the coaches subjectively view that as a huge risk. In fact, they viewed those same set of plays as a loss of, over 127,000 yards of field position. And so at the end of the day, the no other conclusions from the study is that these coaches are, there's other things that they're concerned with.
It's not just about the quantitative statistics. And that's where, what led me to my recommendations for, future research, which are really chapter five of this study. And my recommendations are things that I want to do in the future to act to, explain this to to, better dive into why coaches are risk averse, et cetera.
And ultimately I think that some qualitative research needs to occur. Ideally I would have done a qualitative study and then done a quantitative study to sort of develop the theory and then address it with some, quantitative variables. But I want it to not be in school for another five years.
So I chose the quantitative route, but yeah, so that's basically what it is. It's, a model. And so we can, if you want to go ahead and we can play with the model here, if you want to click on the link button there.
Ted Hallum: [00:55:14] What values would you recommend that I put in here?
Keith Allen: [00:55:16] To best, so well, so you can, but the stock ones I got give me a second. I'm gonna, I'm going to enlarge my screen just so I can see the numbers here. Yeah. I just, stuck some, well here we'll just, go through, we'll just kind of go through and give everybody an orientation here.
So. Obviously the, top. So when you go hit, when you go to this again, this is a link off of my my my, study website, the top white portion is your user input panel. And there are little tips there. So if you, click on or hover over the little eye yep. There you go. That just gives you a very quick definition of that variable, right?
So it's, if you're, if you follow football, this is a lot of these, a lot of these concepts are going to be very familiar to you, like game time play type those things. Right. So there's some tool tips. There's there are some there are default values in here. So let's just look at, let's look at the current situation that's here.
All right. So the current situation here is that we have the ball. Let's just pretend we're on offense. So we have the ball we're on our own 25 yard line. Right. And what we'd like to do, ultimately, an ox, this is where the Markov chain comes into it. And I'm gonna explain this here in a minute, but what we really like to do is we'd like to try to see if we can get to midfield, right.
And this is in the situation. So we'd like to get somewhere around the 45 of our opponents territory before we have to make another decision. One thing I wanted to do with this model is not just tell, give the discrete distributions of, Oh, it's fourth and three from this situation, what's your probability.
What I wanted to do is I want to really give coaches a tool. Coaches are not just worried about one play. They're worried about sequences of place or sequences of drives, right? So they may have a strategy that says, all right, we're going to try this sequence of plays. I'd really like to get three points or I'd like to get seven points out of this drive.
Or maybe not. Maybe I want to play field position. Maybe it might be I'm okay with punting. So I want to be able to give the coaches a model that just goes beyond words, predicting what the probability of, making a particular fourth down. And so that's what we actually involve involved the Mark off Chan here.
And I'll, give you, I'll just let me run through the example on that, that have become clear in a minute. All right. So we're trying to eventually not necessarily on this play, but we're trying to eventually together, you get to the 45 and then at that point we can make another decision about what we're going to do a second quarter, eight, 15 left.
And right now we're faced with fourth and three. So yardage, the gain is three yards and we're considering a pass play. Okay. The combination of those, just looking at the logistic regression, the probability of successful conversion, running a passport at fourth and three or with fourth and three from our own 25 is 66% 0.6, six.
Right. So that's pretty good. Yeah, it's a 50%. Okay. So we're, we're, feeling pretty good about our options is potentially about now what happens now. Again, I wanted to make a model. I wanted to design a model more for that is more useful thing for coaches who are thinking long-term right.
Just about one play. I want to, get, there's a certain goal that I have on this drive. And so what I wanted to do is I want to invoke the gambler's ruin problem to, basically tell them what is the long run probability. So you have a fourth of three now, what's your long run probability of getting to, to a particular point on the field beyond just this fourth and three.
And so that's where the gambler's ruin problem and that bottom line versus Mark off chain model prediction. What, it does is it takes the probability it above it from the from the logistic regression and it puts it into a series of it's actually an algorithm in it. It's a it's a decision-making algorithm and it takes those probabilities.
And depending on the the choices that are made, it gives you what the long run probability of making it. Now it's telling you it's, a hundred percent what it is. If we change one variable, go ahead and change, play type tough to run, and we'll see what happens. Now you can see that the probability given the logistic regression goes down significant out of 12, 10, a point 0.1 to 12% and result the resulting the long run outlook for it.
Getting to the opponents into the opponent's territory at 45 is zero. And what it turns out is that, and this is part of the, this is actually part of the gambler's ruin problem theorem that the actual proof in the mathematical theory if your, probability of success on the first play, right, it's fourth, I got to get this fork down first, right?
Then I can think about what my longterm give me a fresh set of downs and then we can re-look it at the situation. Long-term if your probability is good, less than 50%, what if it turns out mathematically that you're long on probability of getting to your ultimate objective is actually zero is less than 50%.
It's actually zero. It's just the way that the proof the fearsome works out. So basically again, the way I would envision and using this model for a coaching staff is what is, what's my current situation. What's the probability of success to confer, converting that. And then what's my long run probability of getting to our, where I really want to be.
So let's use another example. Let's, maybe change the current yard line to 45 of our own territory, and you can just type it in there and let's try to get down to the 35. Well, let's say 30. Let's just try the 30 of the opponents. So the, situation here and you can change the time up if you want.
Let's maybe look later in the game, like a third or fourth quarter. Yeah. Fourth quarters. Good. Yeah. And let's make okay. That's good. Yeah. So, okay. So this is a situation later in the game. We got the ball again, and we'd like to get we at a minimum, would like to get into a scoring position with a field goal at the 30 yard line.
Okay. So we're at the 45 we're run outside of field goal position. And of course the coach, he knows what his kicker's capable of doing. He probably knows from both training and warmups, but you know what, there is kickers sort of max range that day is going to be so decides, well, I need to get the ball out.
I need to advance the ball a 30 at least to get this to get this field goal kick. So it, but it's fourth down. And we're at the 45. So we've got to get this first down first. And so he considers a run play. And so you can see there from the prob probability successful conversions points. You want not that great, but if I change the playtime to pass if you, change that to two, a pass, it goes up significantly.
And in fact, the Markov chain model says that once you get that first down converted, then you actually have very good chance of getting down to the 30 yard line. So that's the way to use the model. And ultimately there's a lot more research that has to be done on this. Again it's, limited in the sense that it's just a quantitative study.
We need to fill it. What we really need to do going forward is decent qualitative research. What I'd like to do is I'd love to do a series of interviews with NFL coaches and design the questions around things like understanding what variables they consider the most significant, right? Cause it may not just it may, be these four, but it may be other things that they consider player health or who knows.
And those might be qualitative metrics that we then that will inform us. It will better inform us of it'll better inform our theory. And then from that we can, go back to the quantitative data or collect more data. And I think that's the feature of this study is to do a really a mixed methods study.
What's called mixed methods. So it's qualitative and quantitative research.
Ted Hallum: [01:03:31] Well, that was a fantastic rundown. The model is amazing. I enjoyed so much reading through the information you sent over about how the model works. Now I'm curious how many sports, how many football teams are you on retainer for already?
Keith Allen: [01:03:49] None, right now, but I would love the opportunity to work with you know you know, professional teams or even collegiate teams I mean, It's, modeled around football, but you can take, you could do a similar study in other sports as well because at the end of the day, Yes, it's about football and it's about understanding riff risk on fourth down, et cetera. But really what we're talking about is risk making risk-taking under uncertainty and that's a big part of my research agenda.
And, really that, type of research is, applicable to any field because we all make risk decisions every day of our lives and businesses are making those decisions. So it's just something that that, I'm very interested in. And yeah it's, a, good start. I'm satisfied with where it is now.
Definitely got, me a degree, so that's good.
Ted Hallum: [01:04:42] Absolutely. Well, so we had the website up there for anybody that is as interested in Keith's model as I am the, we've got the URL on the screen here.
So you can go to his website and get into the weeds. If you're listening, you don't have the visual. The URL is HTTPS colon slash www dot bridge, the paradox.org. And that's where you can go to find out everything that Keith did with his model. You can see the actual web version of the model that you can play with.
And I highly recommend that you go out there and take a look at that. Keith, the last thing I'll ask you PhD related before we move on is you have a very informed perspective now, and it's fresh in your memory because you just completed defending your dissertation for other people who were thinking about potentially doing a PhD in data science, who do you recommend this experience to?
Because it's, good for people in certain circumstances, but it's not something for everybody. And I think that would be really helpful to some of our listeners that are weighing the the scales of, should I do this or should I not do this?
Keith Allen: [01:05:49] Yeah. So you, have to understand, you have to first have a good vision of what you want to do with it.
What's the end goal. So whenever I advise, especially the younger generation, they're talking to me about their dreams or their goals, or what have you. So, okay. W where are you today? Where do you want to be in five or 10 years? And does that require, or does or, does it, does an advanced degree, is that really going to help you with what you want to do in five or 10 years?
So if the answer is yes, then okay. Now if you've decided, okay, I am going to pursue a, master's a PhD, et cetera, or story, any kind of, pursuit like that. Okay. So you've got to understand, you've got to make sure that supports your plan going forward, your five or 10 year plan or beyond.
And then once you've decided that, then you've got to do the, you've got to do the research, like we talked about before. If you are, interested in helping develop theory and reading other people's literature to help, assist with critically analyzing research and, being being more on the, theoretical side and the Philip philosophical side than a PhD is for you.
If you just care about doing the hardcore modeling and and you just, you don't you don't you, just, want to be at the computer doing data stuff, which is great. I mean, a lot of people do, a lot of people are just happy being an analyst. Then, you don't need to go get a PhD.
There's a lot of smart, there's a lot of really, smart data scientists and statisticians that don't have doctorates and and they're experts in one field of modeling and they're, fantastic. So yeah it's not, the end all be all, but again, you gotta ask yourself what, you want to do.
What's your five or 10 year plan and it is that type of degree or, pursuit is that, does that fit your plan?
Ted Hallum: [01:08:01] That's wonderful insight because, and this theme has come up multiple, times within this one episode.
But again, I think a lot of times people want something so bad and they, and a lot of times that want, or that desire is based off of expectations that may or may not be grounded in reality. And I think you've given them an excellent window into what a doctorate in data science actually is and what it is useful for.
And people have to then just be honest with that, knowledge that you provided to them, does that dovetail with where they want their career to go and the things that they're interested in. And if it does then a PhD in data, science is probably exactly what they need to do in the world would be better off for it.
But if not, and they try to force it, a forced fit is never a good fit. So
Keith Allen: [01:08:53] no, for sure.
Ted Hallum: [01:08:54] Keith, thank you so much for covering all that about your experience with the doctorate in data science looking at more recently at your career experience, when you sent over your resume, I saw that you've had a couple of really cool experiences in data science consulting.
So I wanted to focus in on that because that is a hot area consulting, a lot of data scientists working consulting. What was your experience like getting your first data science consulting position?
Keith Allen: [01:09:23] So when I'm so I've lived in this area where I am now in Northern Alabama, in Huntsville for, over two years now.
And when I first came here I was, working over at NASA and I was doing some subcontract work on, the systems engineering side doing some data analytics and some it's a model based systems, engineering, MBSE. And I got, but my long-term goal was to, get pure data science role instead of this sort of hybrid role that I was in for a long time, which was I was really an engineer, but I was doing data science as an engineer, which is great.
I mean, obviously it's gotten to me where I am, where God gotten me to, to where I am today. And I had the opportunity to get, into consulting at the in the fall of 2019. And there's some data science. Opportunities and consulting here locally. I first started my consulting career with a company called guide house.
They're about 8,000 people worldwide, mostly in the United States and a local office here. And we were, we had some some work that we were doing with the army army material commands. We were doing some supply chain modeling for them, and I developed some some, time series predictive models for their supply chain efforts.
As well as some other efforts I was involved in. And then I got the opportunity. I got the opportunity to go to another consulting company called Deloitte much bigger much much more global in reach. And that's really where that's really where the opportunity really opened up. The aperture really opened up for me at Deloitte because it's such a, much bigger company and it's it's, they've got they've got a foot in almost every industry.
And so as a data scientist, know what I always tell people is the best thing you can do if you're a leader, the best thing or manager, or what have you, and you have data scientists, data analysts, data scientists, the best thing you can do for them is to get, them involved in as much as possible to be able to apply their skills to, to as many different problems as possible.
Variety is the key for us. The, worst thing you could do for a data scientist or engineer is to pigeonhole them into one thing. And so I was looking for that opportunity to be able to have to be able to get involved most of my experiences in aerospace and defense. And so I knew that in order to grow I need to, have it, I need to have at least the opportunity to be able again, to other things.
So that's why I chose chose the consulting route. Especially with a company like Deloitte, they have there, they have a lot of resources to be able to applier our skills. I love, consulting. It's not just your typical data science work. Obviously there's statistics analytics and models that need to be developed and run.
So there's, a fair bit of technical side of it, but I also really enjoy being able to meet with clients and, managers and executives in and be able to help develop their problem. One thing that you'll notice as you, start if you, go into consulting, especially as you, continue in that a lot of clients, in fact, a lot of non-technical managers.
They typically don't really understand what they want. They know where they want to go. They know what they want the outcome to be. They don't know how to get there. And they typically don't know that they know what they don't want. And so what I typically do, the, and this took me many years to figure out how to have these kinds of conversations.
So what I typically do is I go in I just, sit with them. I just let them talk. And I just let, them explain to me in their words what the problem is. And then after that, it's my job. Maybe not necessarily in that same meeting, but as a followup, what I typically do is I take down everything that they say, and I develop a problem statement and some research questions.
And then I give that back to them. And I say, did I understand this is, what you're really trying to get after. Is this really the question you're trying to answer might make some tweaks. And then once you have that, that now you have a good you've got a good start on the, project.
And so a lot of my, role as a consultant is not just answering technical questions. It's actually helping the client or helping management D take their ideas and use some data science and some science philosophy to help them articulate their problem better or understand their problem better.
Ted Hallum: [01:14:29] So you hit on one of the things I definitely want to talk about, and that is that consulting comes with a unique set of challenges. You mentioned that oftentimes the client may not know exactly what they want. They know what they don't want, or maybe they have an implementation in mind, but that may not be the best way or even a good way to go out and getting to what they actually want.
They have a set of expectations. They've got data availability constraints. You have to work with their infrastructure, maybe depending on the situation. So I've rattled off a couple of challenges that I could envision just in my mind, what has been the biggest challenges that you've experienced so far in your careers as a data science consultant
Keith Allen: [01:15:18] Convincing the client that there.
So a lot of clients are set on a particular technology going forward or certain model, right. But we got to use machine learning. We got to use machine learning well. Okay. And I'm I'm trying to back them up because I want to develop the problem statement and the, research questions in the methodology first, before so that we have the evidence.
To support a particular use of machine learning concept. And so that can be very frustrating. It can also be very frustrating when they're dead set on, Hey, we're going to use neural networks to do this. I'm just using that as an example, or we're using this technique. And again, it's, I go back to I go back to you know what I said before, which is you gotta be able to defend that now they're the client, they get the final say, right.
They're, paying money to, get these answers. So ultimately my job then if, I realize it can, if I realize very quickly that they're th that just they're going to go this route and I've got to help them do it the best way possible. Okay. That's fine now. But it's my, duty to tell you what the risks of that are.
And so a lot of times that I have to articulate what the limitations are, the challenges, okay. You decided that route, but you should be aware that these are the risks of doing that. So ultimately my goal is to say, okay, I'll, we'll I can support you the best way I can with this approach. Here are the limitations that just so you know, we can that you, understand what it's at the end of the day I, a lot of my philosophy, it actually comes from being in the military.
Because a lot of the ways that I deal with executives is very similar to how I dealt with colonels and generals in the, military. I mean, they're very similar in that, regard. And so it's my job as a technical SME to just say, okay, here are the advantages and disadvantages of using that.
If you're happy with that, if you accept the risks of that, then that's what we'll do.
Ted Hallum: [01:17:27] There's a degree of diplomacy involved.
Keith Allen: [01:17:29] Oh yeah. There has to be. Yeah, Again, I'm not the, I'm not the decision maker, so and, rightly so. Right. It's if the data scientist is also the decision maker, You're not getting, I don't think that's a healthy thing, right?
I it's my job to support the decision maker. It's my job to give the tools, the decision maker and make them and, allow them to understand the feasible space of the, challenges and the risks that they're, thinking about. But at the end of the day, it's not my job to make the decision for them.
Ted Hallum: [01:18:02] I love that you highlighted that particular challenge if for no other reason, because it drips with irony. Fundamentally a company has brought you in as a COO as a consultant because they presumably don't have the expertise in-house to do it themselves.
And yet they already have, in some cases, decided on the implementation that they think would work best. And I could see where that would be very difficult to navigate. But I you've given any of our listeners who find themselves in that situation, a great approach. You said you highlight the challenges with their approach.
You highlight the risk with their approach. And I would imagine that in some cases, once you highlight that, that maybe they actually do realize, maybe this, isn't the way we want to go about this. And then at that point you can lay out an alternate approach.
Keith Allen: [01:18:56] Right? Right. Yeah, exactly. Yeah.
Ted Hallum: [01:18:59] So when it comes to every type of data scientist, no matter whether it's in consulting or in some other area, I think the number one character trait is curiosity.
But beyond that because consulting is you is a unique environment. You do have to be diplomatic. You do have to work within other people's expectations. So what specific character traits do you think that people doing, considering doing data science as a consultant, they need to possess
Keith Allen: [01:19:29] Number one, communication skills. That's probably the most important thing. If you're going to go into that con that line of work.
In fact I, wouldn't just say for consulting, if you're going to be in any kind of STEM field and you're going to aspire to be a technical SME and something, whether it's, I don't know, civil engineering or whatever it is you need to be able to effectively communicate your ideas and your study.
And so us data, scientists and engineers, we have a bad reputation and probably deservedly so about, being able to express our ideas or to be able to talk to non-technical people. It's something that I've really tried to do in my career is to be able to, especially with public speaking taking the opportunities to develop your public speaking style and to, be an effective communicator not, just speaking, but also writing.
It's very important because at the end of the day, you can have the greatest idea you could have, the best model to solve that problem. But at the end of the day, if you can't effectively communicate its utility and the outputs and, how you got that information, et cetera, et cetera, boy, it's worthless.
So I think if there's one sort of killer skill it's public speaking and just effective communication, whether that's divide that into public speaking or written communication as well.
Ted Hallum: [01:21:03] Now, Keith, I think you've hit on some of that's absolutely key because a lot of times data scientists and machine learning engineers do find themselves engaging with the upper level leadership within companies.
And that's because they're either In forming a business decision or they're answering a business problem. And so I couldn't agree with you more, you have to be an excellent communicator because if you, could have the best solution that the most optimal solution, but if you can't communicate it in a way that decision maker understands it and trusts it, it's never going to get off the ground. Or it's just going to stay on a shelf collecting dust,
Keith Allen: [01:21:43] Yeah. Or, it's not, or the study's not going to get funded again, or it's not going to continue or there's a whole host of things.
There's a lot of, there's a lot of technical folks out there in the world, that'll say, Oh, you got to dumb it down. So I think that's very insulting to a business leader or to the executives that, that, you know that, I've been around my career. It'd be, it's not dumbing it down. It's distilling it.
It's distilling potentially very intricate technical data to something that is again, useful to them that is addresses their problems. always,
Ted Hallum: [01:22:25] What's that almost like PCA take all those complicated variables and pre-project them into a simpler space, but you're still getting all the core of what matters into, the communication.
Keith Allen: [01:22:37] Yeah. Yeah. And, always be able to draw tuh, tuh, tuh, tuh. If you've set the study up correctly, you should always be able to take the results and the conclusions and backwards track all the way back through methodology data, set data collection, research questions, hypothesis, problem, statement. It should all follow.
It should all link. There should never be. There should never be a situation where you get to the end and you go, Oh, well, what are we trying to solve? And you go back and you can trace it backwards and you go, well, we didn't develop a problem statement for that. Well, there's one of your problems.
Ted Hallum: [01:23:19] Well, and I would just toss out there to go right in line with what you're saying is that wherever you would like to practice data science, you should try to obtain some domain expertise in that area, because I think that's critical to identifying the right problem and, capturing all the facets of it in a problem statement.
I feel like that's very difficult to do if you entirely like domain expertise in that area
Keith Allen: [01:23:44] Domain expertise helps. It definitely does. I don't, I, I don't think it's app, so you don't have necessarily all the time in the world to become a dome. I'll give you an example. So for the last about, year, and a half, I was involved in supply chain work and I was doing some predictive modeling in a supply chain area.
I've never done any supply chain work ever. I mean, I know the basic concepts, but I I but, I was teamed up with the team that I was helping were three very, accomplished supply chain experts that had been doing supply chain the two of them combined for over 50 or 60 years and it kind of imported the domain.
Oh yeah. So, so, so it's. Just because you don't, you're entering a new domain doesn't necessarily mean that you're at a disadvantage. It means that you need to ally yourself and team yourself up and find people that are experts in a long, the way they're going to teach you a lot about, I learned so much about supply chain with that project.
I mean, it was like some of the stuff I was like, wow, I would never, I would have never learned that if I had been around those people. So, and likewise, you're also there to, in, to educate them because they're, they know what their client wants. Supply chain we'll use that example and they understand it and can articulate to you potentially better than, the client can.
And then it's your job to take that and, develop a model for it, or develop a methodology for it or a data collection plan for it, or what have you. So I, prefer to work in teams that are combined comp combined arms teams use a military term. The best project teams that I've ever worked on or are exactly like that, whether there was a DOD, a military team or a civilian team, it was always people from different backgrounds, different, even different educations, different backgrounds, different viewpoints that came together to solve a common problem.
And we all were, if you can work well in that environment and you can leverage each other's strengths, then that's really the most effective team.
Ted Hallum: [01:25:58] So as we wrap up there are a few questions that I love to ask every single guest that comes on this show because as we mentioned earlier the data scientists learning is never done.
This is an ever changing field. So anybody that's serious about being a data scientist or machine learning engineer has to keep a pulse on what's changing and how they can continue to do their job optimally next week, next month, next year. So what is the next learning opportunity that's on your radar?
Keith Allen: [01:26:30] So immediately the 50 meter target is I'm finishing this Nvidia GPU training. Yeah. Supercomputing churning. So I have a, I finished all the pre-work and I have a bootcamp here and another one a week, a couple of day bootcamp where I'm going to finish that. So I'm very excited about that. The Nvidia folks, like I said before that they've done a great job with their GPU processors and I'm really interested to find out more.
So that's the, that's the, short-term long-term, I'm probably later this year, I'm going to go after another certification surf certified analytics, national CAAP. I have all the qualifications to take that exam. I just have to take the exam. So when I put some time into studying for that exam and, and, taking that certification so I'll probably do that later this year.
Ted Hallum: [01:27:22] I'll make sure that when we publish this episode links to these learning opportunities that you mentioned are in the show notes for anybody who's interested for anybody who wants to learn more about the certified analytics professional certification, they can go back and listen to episode four of this podcast with Dr.
Scott Nestler, who w he was actually one of the folks that helped to develop that certification. And he goes into great detail about what you could expect and how you could prepare. So anybody who's interested in that particular certification, I would direct your attention back to that previous episode.
All right, well, keep, so the last question about learning resources, podcasts, and books, we'd love to get your recommendations. What are your favorites?
Keith Allen: [01:28:04] I have two books. Well they're, actually, let me give you authors. As I mentioned, I'm very much into science philosophy, and if you're going to be a good data scientist if you just if you want to be a well-rounded data scientist, you really need to understand science philosophy.
And probably the two best authors of my opinion and the last a hundred years or so are Karl popper and Thomas Kuhn and KA Karl popper really is the father of the hypothesis of the, modern scientific hypothesis. Thomas Kuhn more recent seventies, sixties and seventies, eighties he's a fascinating, he's a science philosopher. He pioneered the idea of the paradigm shift, which is basically, as you look at the advancement of civilization from a technology and science perspective, you're going to have these long periods of you'll make incremental improvements. But it's, not really, the, new technology is really just based off the old technology with some extra improvements.
And then all of a sudden there's something that happens where a paradigm shift occurs and some new technology is created that revolutionizes everything from then on, you could, look back at probably the iPhone, the computer as there's plenty of examples. So Thomas Kuhn is is I think tune and, popper, if you want to look at, if you're in the technology, if you all, if you want to understand science philosophy, which I, again, I think is foundational.
If you're going to be a data scientist they're there any of their works, or you could start with some of their famous works they're well well-published so
Ted Hallum: [01:30:00] awesome. So you've given us some excellent courses that we can go look into a couple of awesome authors that we can go and dive into their works.
Keith, I appreciate you so much for coming on the show. It's been awesome to hear about your previous military experience, your journey to get your PhD in data science, and then all of your career experiences as a data science consultant. For anybody that is been inspired by what Keith has told us in his story, or if it's raised additional questions that you'd like to reach out to him about, you can see his LinkedIn name is right there on the screen.
Underneath this video, I'm going to throw his email address up here as well. So you can reach out to him via email and then for anyone who is a premium subscriber to our community, you'll have access to our Slack workspace. Keith, as a guest on the show and veterans, a data science machine learning community member himself will get complimentary access to our premium slight workspace.
And there's a channel there just for you to follow up with our community members like Keith, who come on the show. So any questions or thoughts that you want to give to Keith, you can give to him there as well. Keith, thanks so much. I can't wait to have you on the show next time. Hopefully we can hear about those courses that you're developing and how they go.
Keith Allen: [01:31:16] Yeah, that'd be great. Thank you very much, Ted. I appreciate it. Thanks for the opportunity.
Ted Hallum: [01:31:21] You're welcome.
Thank you for joining me for this episode with Dr. Keith Allen, as always until the next episode I bid you clean data, low P values and Godspeed on your data journey.