10 mistakes that I made as a Data Scientist

image

Working in the field of data science can be difficult, but it is worth it and, besides, it brings a good income. Some actions increase our effectiveness (for example, refusing to use Slack). Inaction is also sometimes beneficial. Below I will talk about my mistakes that hindered the development of either my career or my company.

Misunderstanding that there are different types of tasks in Data Science

Once I came to the company as a “data scientist”, hoping that I would be engaged in predictive modeling. But in the end, I wrote the internal code of the application. I made a mistake.

My previous activity in data science was exclusively related to the construction of models, and I mistakenly believed that the new responsibilities would be similar.

Many types of tasks and work are hidden under the auspices of data science. Add to this indistinct job descriptions and get the recipe for excellent confusion.

image

3 really different challenges

Browse publications on the topic of data science, you will find work tasks that correspond to the following categories:

  • competitive intelligence
  • data analysis
  • machine learning
  • Data engineering
  • software engineering

Try to understand the specifics of the work before embarking on it. The best thing is to first talk to someone from the team you are about to join and find out all the details from him. Otherwise, soon you will again be looking for a new job.

Lack of mentor

For a month I was banging my head against a wall due to a problem related to machine learning, and did not even consult with anyone about this. When a new mentor came and suggested that I try a certain technique, the problem was resolved throughout the week.

I then had no idea what exactly I did not know. That is, I was not familiar with the full range of existing methods of processing natural language (NLU). But my mentor with a doctorate and work experience in this area was.

You do not know what you do not know.

Startups often pull to work in areas where you are not an expert. And when you do not know the terminology related to certain techniques and methods, it is very difficult to find a solution.

No one should work in such a high-tech field as AI, without a senior engineer or mentor who could throw ideas. But this extremely speeds up the work.

Do Not Recognize Project Failure

I killed 2 weeks in order to create a new model and replace it with an existing one. At the same time, the old model had higher performance.

Instead of wasting time and leaving this thing, I spent another 2 weeks on speculative corrections, hoping to improve my model. But she did not work. We threw it away with two wasted weeks.

Timeboxing clearly made it clear that it was time to stop. But pride, wounded by failure, ordered to continue.

Timebox, not scopebox.
– according to Lean Software Development

Those of us who work in data science research groups do not have a complete picture of the company’s priorities. We can contribute to an infinite number of projects. But some will fail, and we will need to move on.

If you have exceeded the time allotted for a minor project, and additional hours do not guarantee success, then the most wise decision in this situation will be to abandon this ill-fated project.

Do not take notes at the end of the project

The startup in which I worked changed priorities, so I returned to work on the NLP project, which I left 1.5 years ago. There were no records of tried-and-tested approaches or studies, and I had to start all over again.

Given that startups often change their priorities, there are situations when you drop a project and then return to it again.

Sometimes projects fail, we don’t remember the exact reason, and the management wants to resume the same process.

The repository for failed and ongoing projects is greatly underestimated. Links to tried-and-tested code, to research, and notes on what has been done will help you get on track quickly if you take up the project again.

Pull for help

I hate wasting other people’s time. First, I will carefully try each recommendation of the leader / mentor, and only then, in the most extreme case, I will go for repeated advice. Therefore, I constantly postponed meetings. A textbook perfectionist.

There is a fine line between not asking for help too soon or too late.

If someone has agreed to be your mentor, then he wants to help you. And he has experience to understand in time that you have lost the right course, even if you yourself have not yet realized it.

Take advice, try it and continue.

If you could overcome the problem yourself, you would not need anyone.

I understand that, on the other hand, there are people who strive to ask for help, without absolutely delving into the problem – this item does not apply to them.

Do not admit your ignorance

Before I even started work on machine learning, I met my first AI mentor for coffee.

Mentor: Have you previously applied machine learning?
Me: Yes, neural networks
Mentor: What frameworks did you use?
I’m Um … Python.

In fact, all I knew about machine learning was the knowledge gained from a couple of lessons from a course on Deep Learning Andrew Nga. But I wanted to sound smart.

Nevertheless, we began to work together. However, do not pretend to know things that you are not really aware of. Experienced people can smell nonsense for a mile. And it’s easier to help someone who recognizes their weaknesses.

Learning without a project

I spent months reading techniques and theories that I no longer remember. Jupyter notes and nameless notes are inevitably lost.

On the other hand, I recall how every time I learned something new, I included it in a working application, published code in Jupyter, or wrote a blog post.

Another advantage of the latter is that you can easily go back and make a link to the working code.

Take responsibility for any result, for everything that you have devoted time and studied.

Invalid development ratio vs. research

Remembering multi-armed bandit problem, it should be noted that development relies on methods that you know, while research is looking for new methods. You need both of these directions.

Debugged work with several familiar libraries for machine learning has its drawbacks. You begin to see problems in the framework of already known methods.

Specifically, in my case, the library that I had never heard of far exceeded my hackneyed Sklearn.

Avoid under-exploration by applying root cause approach when you start working with a new problem.

Do not record reusable functions

In the interview task, it was necessary to build a standard NLP pipeline, including all the familiar points (cleaning, lemmatization, vectorization, model selection, etc.).

Given the limited time to complete the task, I regretted that there were no ready-made templates at hand.

Instead of completing the project in 30 minutes and looking like a genius, I spent a few hours. I overcame the interview stage, I had a hard time, but it could be much easier.

To do the hard work, I will always choose a lazy person, because he will find an easy way to do it
– Bill Gates

If you find that you write the same code several times in Jupyter, do yourself a favor and write a reusable module.

Underestimating the importance of knowledge in a particular field

I made machine learning peplains that used the wrong tags. I worked for companies that created products that the industry did not need.

Keeping up to date with industry news and having knowledge in the field is the most underrated part of the software development process.

Domain knowledge is harder to acquire than coding skills. They accumulate over the years of work.

  1. Beware of engineers not interested in domain knowledge. At some point they will come to a standstill.
  2. Agree to work in a startup only if at least one of the founders has extensive experience in the industry for which the product is being created.

Conclusion

A career is not long enough to make mistakes alone. It’s easier to learn from each other. Work stability is a thing of the past, but we can still climb to the pinnacle of success by doing the right thing and spending less time on the wrong thing.

I hope you enjoyed reading about my mistakes. And what mistakes did you make as a data scientist?

image

Learn the details of how to get a sought-after profession from scratch or Level Up in skills and salary by taking paid SkillFactory online courses:


Read more

  • The coolest Data Scientist does not waste time on statistics
  • How to Become a Data Scientist Without Online Courses
  • Sorting cheat sheet for Data Science
  • Data Science for the Humanities: What is Data
  • Steroid Data Scenario: Introducing Decision Intelligence

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *