Words by Esther Wershof
Edited by Vacha Patel Data science is one of the fastest-growing fields and has been revolutionizing a variety of industries. We, scientists, are no exception to that. But just the thought of coding can cause many biologists to get highly strung. Without prior learning, it’s difficult to know where to start. Maybe you want to build on your skillset to be eligible for a broader selection of jobs. Or maybe you are fed up with being a slave to cell-culture and never want to go to the lab on a weekend again. You might want to live at the intersection of coding and experiments. Whatever your motivation, here are some tips to get started: Which language to use? There are many programming languages out there, some incredibly powerful, others downright bizarre (Chicken, Jelly, Lolcode are just some to name.) But a sensible starting point is to choose either Python or R. Don’t agonize too much over which one you choose. The good news is that once you’ve learned a few fundamental concepts (for loops, while loops, if statements). It becomes a lot easier to learn new languages. Python is incredibly versatile and widely used hence looks great on your resume. R is great for data science and tends to run into fewer weird bugs/compatibility issues. If you have any friends/lab-mates who use one language or the other that would be a good enough reason to make this your choice. Statistics Before diving in, it’s a good idea to have a quick refresher on statistics. Matthew Clapham makes excellent digestible videos on basic statistics, and I still go back to this regularly. Complicated graphs and exotic statistical tests are meaningless unless your data satisfies certain criteria. Essential concepts to understand are parametric vs non-parametric tests, interpreting p-values, and correlation vs causation. Now you’ve picked your language and are feeling like a statistical wizard it’s time to get started. Learning basic coding The best place to get started with learning Python is with Google Colab. Click ‘new notebook’ and you’re ready to code. Two excellent resources for learning the basics are Kaggles’ introduction to python and Bucky Roberts’ YouTube channel if you prefer to learn through videos. This Jobtensor tutorial is a great free online resource to check out too. If R is your language of choice- download RStudio. Once again, Bucky Roberts has an excellent R tutorial series or you can have a go with one of the many introductory tutorials online eg datacamp. We are also big fans of sthda (if you can follow the tutorial on survival analysis, you’ll be in a great position). There are so many great free resources online. In addition to Kaggle and sthda, coursera and udacity offer many courses to further your knowledge. But there are two very important key ingredients to remember in order to make it a full member of the coding club.
Ultimately whatever route you choose, putting in the effort to improve your statistics and coding skills will be massively worth it. You’ll do better science in the lab, be more attractive on the job market, and most importantly, it’s really enjoyable and satisfying to understand such a fundamental part of modern science. Happy coding!
0 Comments
Your comment will be posted after it is approved.
Leave a Reply. |
Archives
October 2024
Categories
All
|