Skip to content

New Member Guide (Technical)

Rocio Ng edited this page Jul 27, 2017 · 5 revisions

Interested in and/or new to the Data Science Group?

Here is a guide (for those with a technical background or wish to develop one!) to help you get started with the group and find a place for yourself here. All skills and levels are welcome and if you put in the effort you will get to contribute to some amazing projects!

Join Our Slack Channel data-science in the sfbrigade slack. Not on slack? Go here to request access.

Figure out what level you are at for Data Science Technical Skills:

  • Beginner: Little to no coding experience. Knows a bit of excel and familiar with some statistical concepts. Has heard of a linear regression.

  • Novice: Has done some code academy classes or something similar. Can load a data set into a Juptyer Notebook and/or R studio and can do some basic operations like subsets, means, histograms, scatter plots.. etc. Some experience with exploratory data analysis.

  • Intermediate: Decent programming skills. Comfortable loading a dataset, cleaning, merging, reshaping, and applying filtering operations. Comfortable with descriptive statistics and exploratory analysis. Can apply some basic hypothesis tests like t-tests, ANOVAs, and chi-square tests and create useful data visualizations. May know machine learning fundamentals and have some experience with feature engineering, predictive modeling (linear regressions, random forests, cross-validation, etc.), and unsupervised methods (PCA, clustering etc).

  • Advanced/Expert: All of the above and more. Has coded or helped code multiple working data science projects already. May even have a related job or has won some kaggles.

Find your level and explore the suggestions below them. These are just ideas!

Beginner

  • Check out the current project repos. Explore!
  • Offer to write a blog post about any of the current projects. This will allow you to learn more about the inner workings of a data science project and more importantly, what makes a good data science project.
  • Check out the Learning Resources and start learning some Python or R.
  • Download a dataset from one of these sources and follow a tutorial or two on said dataset. Explore!
  • Fork or clone a repo and try to run the code yourself. Experiment with the code and data and feel free to ping team members with ideas and questions.
  • In this process you may develop an idea for a potential project. Pitch it and recruit members with more experience (They can help you scope your project and/or contribute code and analysis work). Become the project owner/manager and learn first hand how a data science project is implemented from start to finish.

Novice

  • All suggestions above plus..
  • See if there is any data exploration or cleaning needed for a dataset in a current project. They may be in early stages or have some lingering questions they would like to explore but haven't had time yet to do.
  • Offer to do some QA on a project. Fork a project and test how reproducible the data pipelines and analysis are. Find bugs or if it is an analysis find parts that are unclear or plots that can be improved on.

Intermediate

  • All suggestions above plus...
  • Become an active contributor to a project. Prototype some machine learning models, test some hypotheses, create useful data visualizations, or help with frontend/backend/DBA work if the project needs it.

Advanced/Expert

  • Fork and revive an inactive project that was never finished.
  • Bring new life into a finished project.
  • Offer to help lead a project, do some modelling, and/or provide technical feedback.
  • Start your own project!

To reach out to a project:

  • See who the project leads and contacts are (These should be on the repo's README). Say hello to any of them at a Hack Night or ping them on slack. If they are not around do not get discouraged, we are all volunteers and sometimes get busy! If you need help or advice find or contact either of the team leads, @rocio and/or @sanat.