By Cheng Han Lee
There are lots of resources out there to learn about, or to build upon what you already know about, data science. But where do you start? What are some of the best or most authoritative sources? Here are some websites, books, and other resources that we think are outstanding.
If you want to see the latest trends and read analyses of what’s happening in the data science field…
- Flowing Data: On Flowing Data, Dr. Nathan Yau, PhD explores how data professionals—statisticians, scientists, designers, and others—analyze and visualize data to better understand the world around us. He also offers book recommendations, tutorials, a job board, and a membership feature to help budding data scientists grow and hone their craft. In addition to the tutorials and resources, Yau offers a humorous take on the challenges in working as a data professional, in topics such as the ethical challenges in gathering data and the mistakes often made in data analysis, and how data is used to track changes and growth in society over time.
- FiveThirtyEight: Launched by data-wiz Nate Silver, FiveThirtyEight offers data analysis and visualizations of political, cultural, and economic issues. Their work ranges from light-hearted and interactive to in-depth and pointed and offers a great example of how data can be made accessible and applicable to everyday life.
- R-Bloggers: Looking for a hub of content around open-source statistical software? R-Bloggers is the place! Currently over 500 blogs are featured on R-Bloggers, focusing on news and tutorials related to R, “a free software environment for statistical computing and graphics.” As an aggregator, R-Bloggers pulls in helpful content around the niche topic of the R-language, making it easy for you to follow major trends in this field and major contributors, all in one site.
- Simply Statistics: Three biostatistics professors from Johns Hopkins University, Harvard University, and the Dana Farber Cancer Institute manage this site that is chock-full of articles about how data is being used (and mis-used) to solve complex problems. These professors also offer data analysis classes on Coursera and interview up-and-coming data scientists on their careers. The clear career-focused angle of the interviews allows you to forecast your own career trajectory.
- Edwin Chen: A data scientist at Dropbox, Edwin Chen offers hands-on tips on how to create and improve algorithms and data analysis tools. What makes Chen’s blog helpful is his incorporation of large-scale algorithms and analysis (e.g. Facebook, Amazon). If you want more guidance on techniques and analysis rooted in current methods, Chen is a helpful resource.
- Hunch: John Langford, Doctor of Learning at Microsoft Research, created this blog to explore machine learning, specifically, what it is and how we’re using it. If you’re new to machine learning or curious about what it means for your newly chosen career, Hunch offers an in-depth look into this topic by reviewing and analyzing new ideas (Allreduce (or MPI) vs. Parameter server approaches) and events like the Conference on Digital Experimentation.
If you want to learn more about data science…
- Open Source Data Science Masters: This site offers a free list of online classes and resources. The resources are organized as a self-paced curriculum, with the assumption that you have a basic understanding of programming. The curriculum includes theoretical/foundational classes as well as tactical, hands-on classes in computer science, programming, and design so that you move through the curriculum with a strong understanding of data science.
- Learn Data Science: Similar to Open Source Data Science Masters, Learn Data Science offers a self-paced curriculum that introduces you to four key topics in the machine learning field: linear regression, logistic regression, random forests, k-means clustering.
If you want to join a community where you can ask questions and learn from fellow data scientists and analysts…
- Reddit Machine Learning Subreddit: With over 30,000 members, this site offers a wide range of people to connect with and to share challenges and solutions with. Redditors share news, research papers, videos, and more on machine learning, data mining, information retrieval, learning theory, and related topics.
- Cross Validated: A Q&A community for people interested in stats, data viz, and more, this site offers a straightforward way to to ask questions about data science and to find the most helpful answers. You can also get a weekly digest of popular questions and unanswered questions so you never miss a conversation.
- Datatau: Venture Beat refers to this site as a “Hacker News for data scientists,” and it lives up to its name. Interesting articles are shared and commented on, and users share career advice for people new to the data science field.
- Metaoptimize: In this Q&A community for people interested in machine learning, data mining, natural language processing and more, questions are voted on and badges awarded, making it easy for visitors to find the most popular and helpful questions and answers.
- Kaggle Competitions: If you’re interested in data science, you’ve likely come across Kaggle, a platform for data prediction competitions. While you can search through a list of upcoming competitions, the website also features a forum where visitors can look for partners for competitions, share resources, and ask for support in developing a career in data science.
If you want the latest data science news…
- Data Science Weekly: Each Thursday, Data Science Weekly sends an email with the latest news and trends in data science. You can also search their site for interviews, job opportunities, and resources on how to build a career in data science.
- KDNuggets: This website is full of great tutorials, articles, webinars, and more on data mining and building models. You can also have this helpful information sent to you twice a month by signing up for their popular newsletter.
If you want to stay on top of the latest research…
Attending conferences is a great way to stay on top of trends. However, if you can’t attend the following conferences in person, you can attend virtually by watching videos of the presentations and downloading the associated papers.
- Neural Information Processing Systems Foundation (NIPS) Conference: NIPS is a premier academic conference on machine learning whose goal is to “foster the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects.” At their recent conference in Montreal, Canada, they explored pressing issues around privacy, the role of machine learning in climate change, optimization, and networks. The papers submitted make for great reading.
- International Conference on Machine Learning (ICML): ICML is a premier academic machine learning conference supported by the International Machine Learning Society, a nonprofit that supports research on machine learning. Their most recent conference was held in Bejing and explored cognition, learning models, and crowdsourcing in machine learning research and usage. The papers accepted for Cycle 1 offer a snapshot of the range of topics.
- Knowledge Discovery and Data Mining (KDD): Managed by the Association for Computing Machinery, the KDD encourages “advancement and adoption of the ‘science’ of knowledge discovery and data mining” by encouraging standards in terminology and methodology, and fostering community. Their recent conference in New York focused on how data science can be used for social good. While submitted papers aren’t available, you can see photos and videos from the conference here.
If you’re looking for brilliant minds to follow…
These expert data scientists tweet about the latest news and trends in the field.
- Hillary Mason (@hmason): Data Scientist in Residence at Accel and Scientist Emeritus at bitly.
- Dj Patil (@dpatil): VP of Product at RelateIQ.
- Jeff Hammerbacher (@hackingdata): Founder and Chief Scientist at Cloudera and Assistant Professor at the Icahn School of Medicine at Mount Sinai.
- Peter Skomoroch (@peteskomoroch): Equity Partner at Data Collective, former Principal Data Scientist at LinkedIn.
- Drew Conway (@drewconway): Head of Data at Project Florida.
- Nathan Yau (@flowingdata): Statistician and Author of Flowing Data.
The bottom line
There are lots of websites, books, communities, and resources that you can use independently to improve your skill set and your knowledge of the data science field. With this list as a starting point, you should be able to find plenty of experts out there to help you grow and develop your own expertise. Who knows? One day you just might be a resource on someone else’s list.