Portrait of the Data Scientist

Portrait of a data scientist

Like waking up the day after a 20 mile hike, he opens his eyes. Only he didn’t go on a hike yesterday, too much work. Many questions will come to his head, once it has seen more sunlight. He knows he stares at the artificial sun too long before bed. It is a habit that he pretends he wants to shake. Tonight, he’ll do better. For now, he needs an IV drip of caffeine. The Monster Energy drinks of his college days are no longer an option, his digestive tract has “matured”. He feels the rough carpet under his feet as he transitions from prone to standing. The walk downstairs isn’t bad. He’s awake, just not completely aware yet. The sound of the coffee brewing triggers a warm sensation in him. Every step of the ritual is comforting. If only he didn’t have to cycle the coffee to maintain positive energy levels. He makes a note on his phone, follow up on the latest coffee research, deep dive into timing and “chronographs”.

He sits down for the morning standup and doesn’t get the irony. Coffee takes 15 minutes to kick in. The meeting goes as usual, some chit chat about the weather at the beginning, some updates that could have been an email, no interaction except for the PM and the person he is asking an update from.This PM at least has the decent sense to keep the meeting short. It’s not really the length of meetings that is the problem though. It’s more so the frequency that leads to disruption of focus which leads to bad outcomes. Makes you wonder how work from home has actually led to increased productivity. Are they measuring that correctly? Perhaps their metric is misspecified. He makes a note to self to follow up on their research methodology.

The code editor initiates. He makes sure that he is using spaces, not tabs. Coders that use spaces get paid more, and if he knows anything, it’s that correlation and causation are synonymous. He tackles that pipeline bug from yesterday that is causing the whole analysis to stall. After an hour researching the esoteric error message, he solves it. The function was passed a string, not an integer. Wow, he tells himself, that was the straw that broke the camel’s back? And we sent people to the moon without python? Makes a note to self, learn static typing by next week or he has to donate money to Nancy Pelosi’s campaign. He works better from fear of loss than promise of reward. It’s science. The pipeline progresses, victory in this sprint is assured. But then, pandas starts acting a bit drunk and throws a ridiculous error. The line in the code that causes the error isn’t even in the traceback. Well, he could always switch to R. Then it would just fail silently without this useless error message fifty levels deep into the pandas module.

Sigh. Time for a break. Perhaps all of this work on the optimization module isn’t worth it, he thinks, as he sips on yirgacheffe. After all, they’ve only been able to generate a cost savings of 0.1% in the POC after 3 months of work. And the client is a mid-sized company, have they actually met the breakeven point? Oh well, there’s only so many hours in the day and he doesn’t have access to salary records at his company. Back at it, let’s run the pipeline again. It works! Why? Who cares, we’ll just call it a self-repairing system. Time to save the progress to show management, they will be so impressed. Tries to merge, git conflicts! Crap, well, maybe this super sophisticated branching model that no one follows isn’t all it’s cracked up to be. Billy and Miranda always bypass dev, but never get reprimanded for it. He gets down to the dregs of his lukewarm coffee, which are surprisingly delicious. He feels shame at enjoying dregs, makes note to self to follow up and see if this is normal behavior. He also notes that he and a python class are doing the same thing, making notes to self constantly. He wonders if he is an android, but quickly realizes that the evidence for it is too scary to consider. He gives into his confirmation bias, he is human.

A new email comes in, they’ve moth balled the project for now. Just increased that 80% never make it to production stat. How does it feel? Terrible. But the total data of the world doubles every two years. As long as there is electricity, there will be computational devices. And as long as they are around, the need for data scientists will be there. So he doesn’t worry. A new slack message comes in, invite to after work happy hour at Meehan’s. Just the thing to cheer him up. And he doesn’t forget, demand for his skill is there. His forecast model tells him it will only ever increase. So he doesn’t worry. But he does wonder if they should have just used the heuristic model to begin with. He makes a note, look into it tomorrow.

essential