Sang Lee

196 Flips | 1 Magazine | 16 Likes | 2 Following | 7 Followers | @SangLad | Keep up with Sang Lee on Flipboard, a place to see the stories, photos, and updates that matter to you. Flipboard creates a personalized magazine full of everything, from world news to life’s great moments. Download Flipboard for free and search for “Sang Lee”

Big Data with R

Presentation at the <b>Symposium on Data Science and Statistics</b> (SDSS) 2018<p><i>Abstract:</i> A review of techniques and R packages to aid in the success of Big …

Big Data

Introduction to Machine Learning for non-developers

(This article was first published on <b> R - Data Science Heroes Blog</b>, and kindly contributed to R-bloggers)About Machine LearningWe all know that …

Machine Learning

R and Python are joining forces, in the most ambitious crossover event of the year—for programmers

Hadley Wickham is the most important developer for the programming language R. Wes McKinney is among the most important developers for programming language Python. The two languages, which are free to use, are often seen as competitors in the world of data science. Wickham and McKinney don’t think …

Data Science

Big changes behind the scenes in R 3.5.0

A major update to R is now available. The R Core group has announced the release of R 3.5.0, and binary versions for Windows and Linux are now …

Data Science

An Introduction to Greta

(This article was first published on <b> R Views</b>, and kindly contributed to R-bloggers)I was surprised by greta. I had assumed that the tensorflow and …

Data Science

Yet Another Caret Workshop

(This article was first published on <b> Computational Social Science</b>, and kindly contributed to R-bloggers)IntroYesterday I gave a workshop on applied …

How do I interpret the AIC

(This article was first published on <b> Bluecology blog</b>, and kindly contributed to R-bloggers)How do I interpret the AIC?My student asked today how to …

Data Science

Regular Expressions Every R programmer Should Know

(This article was first published on <b> R on The Jumping Rivers Blog</b>, and kindly contributed to R-bloggers)• Regex: The backslash, \<br>• Regex: The hat ,^, and …

Python Programming

Introducing TensorFlow Probability

<i>Posted by: Josh Dillon, Software Engineer;</i> <i>Mike Shwe</i><i>, Product Manager; and</i> <i>Dustin Tran</i><i>, Research Scientist — on behalf of the TensorFlow Probability</i> …

Deep Learning

Weighted survey data with Power BI compared to dplyr, SQL or survey by @ellis2013nz

(This article was first published on <b> free range statistics - R</b>, and kindly contributed to R-bloggers)A conundrum for Microsoft Power BII’ve been …

Data Science

autoEDA - Automated exploratory data analysis

Introduction<p>autoEDA aims to automate exploratory data analysis in a univariate or bivariate manner. It has the ability to output plots created with …

Data Science

Jupyter Notebook for Beginners: A Tutorial

The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. A notebook integrates code and …

Python Programming

An overview of keyword extraction techniques

(This article was first published on <b> bnosac :: open analytical helpers</b>, and kindly contributed to R-bloggers)In this blogpost, we will show 6 keyword …

Natural Language Processing

Python and Jupyter Notebooks

Recently I have began to use Jupyter notebooks with Python but have struggled with the constant need to download dependencies or have something not …

reticulate: R interface to Python

We are pleased to announce the <b>reticulate</b> package, a comprehensive set of tools for interoperability between Python and R. The package includes …

Python Programming

Regression Analysis

Chapter 6<p>Regression Analysis<p>Linear regression is an approach for modeling the linear relationship between two variables.<p>Ordinary Least Squares<p>The …

Comparing additive and multiplicative regressions using AIC in R

(This article was first published on <b> R – Modern Forecasting</b>, and kindly contributed to R-bloggers)One of the basic things the students are taught in …

Data Science

Machine Learning Modelling in R : : Cheat Sheet

(This article was first published on <b> R – The R Trader</b>, and kindly contributed to R-bloggers)I came across this excellent article lately “Machine …

Machine Learning

Introducing Pandas DataFrame for Python data analysis

Pandas is an open source Python library for data analysis. It gives Python the ability to work with spreadsheet-like data for fast data loading, …

Python Programming

Automate R processes

(This article was first published on <b> bnosac :: open analytical helpers</b>, and kindly contributed to R-bloggers)Last week we updated the cronR R package …

Data Science

Python packages (numpy/pandas/etc) in Visual Studio 2017 on Windows

I've just installed Visual Studio Community with the workloads for Python and Data Science.<p>I create a new Regression project from the Python\Machine …

Why is R's data.table so much faster than pandas?

I have a 12 million rows dataset, with 3 columns as unique identifiers and another 2 columns with values. I'm trying to do a rather simple task:<br>- …

Python Programming

Ten Machine Learning Algorithms You Should Know to Become a Data Scientist

Machine Learning Practitioners have different personalities. While some of them are “I am an expert in X and X can train on any type of data”, where …

5 Ways to Find Interesting Data Sets

<i>Editor’s note: This post was written as part of a collaboration with Enigma, a public data company. Author India Kerle is a data curator at …

Open Data

R Tip: Use the vtreat Package For Data Preparation

(This article was first published on <b> R – Win-Vector Blog</b>, and kindly contributed to R-bloggers)If you are working with predictive modeling or machine …

Data Science

How to Create an Interactive Dashboard with Crossfilter and Dc.Js: A tutorial

Create an Interactive Dashboard with Crossfilter and Dc.Js: A tutorialHow to build a dynamic dashboard in 5 minutes.<b>Crossfilter.js and dc.js are two</b> …

Data science with Python: Turn your conditional loops to Numpy vectors

Vectorization trick is fairly well-known to data scientists and is used routinely in coding, to speed up the overall data transformation, where …

Data Science

Setting up a Python Development Environment in Atom

Of course there are a lot of great text editors out there. Sublime Text, Brackets, Atom. I’ve always personally been a fan of Atom because it is …

Lesser known dplyr tricks

(This article was first published on <b>Econometrics and Free Software</b>, and kindly contributed to R-bloggers)<p>In this blog post I share some lesser-known …

These are the best books for learning modern statistics—and they’re all free

Statistics came well before computers. It would be very different if it were the other way around.<p>The stats most people learn in high school or college come from the time when computations were done with pen and paper. “Statistics were constrained by the computational technology available at the …

Machine Learning