Join Du

467 Flips | 14 Magazines | 1 Like | 8 Followers | @JoinDu | Keep up with Join Du on Flipboard, a place to see the stories, photos, and updates that matter to you. Flipboard creates a personalized magazine full of everything, from world news to life’s great moments. Download Flipboard for free and search for “Join Du”

Real–world HTTP/2: 400gb of images per day

The now–finalized HTTP/2 specification has rightfully garnered a lot of interest from the web performance community. The new protocol is aimed at …

Hadoop: Can the Tortoise be a Hare?

Major companies have invested heavily in Hadoop, but this Big Data analytics platform has obvious limitations.

Hadoop

The Platform Stack

Understanding platform business models<p>I first wrote about the Platform Stack last year on my blog, and subsequently used it as a core framework in my …

Introducing FBLearner Flow

Many of the experiences and interactions people have on Facebook today are made possible with AI. When you log in to Facebook, we use the power of …

Machine Learning

Big Data, Digital and Cloud Solutions - Java Experts

Consulting & Design<p>Our consultants provide the critical recommendations and guidance at any time during the lifecycle of your project: design, PoC, …

Sorry ARIMA, but I’m Going Bayesian

When people think of “data science” they probably think of algorithms that scan large datasets to predict a customer’s next move or interpret …

Data Science

How to get into the top 15 of a Kaggle competition using Python

Kaggle competitions are a fantastic way to learn data science and build your portfolio. I personally used Kaggle to learn many data science concepts. …

Data Science

700 SQL Queries per Second in Apache Spark with FiloDB

Apache Spark is increasingly thought of as the new jack-of-all-trades distributed platform for big data crunching – what with everything from …

Deep Learning for Internet of Things Using H2O

<b>By Sibanjan Das, Analytics Consultant, and Ajit Jaokar, FutureText</b>.<p>H2O is an Open Source machine learning platform for smarter applications. At the …

Will Spark Power the Data behind Precision Medicine? | Amazon Web Services

<i>Christopher Crosbie is a</i> <i>Healthcare and Life Science Solutions Architect</i> <i>with Amazon Web Services.</i> <i><br>This post was co-authored by Ujjwal Ratan, a</i> …

Cloud Computing

What Spark's Structured Streaming really means

Last year was a banner year for Spark. Big names like Cloudera and IBM jumped on the bandwagon, companies like Uber and Netflix rolled out major …

Big Data

Are You Using the Right Tools for Your Big Data Projects?

Data scientists rely on tools/products/solutions to help them get insights from data. Gregory Piatetsky of KDNuggets conducts an annual survey of …

manishdesai1 on Flipboard

2,679 Added | 2 Magazines | 7 Likes | 6 Following | 577 Followers | @manishdesai1 | Keep up with manishdesai1 on Flipboard, a place to see the …

Clustering geolocated data using Spark and DBSCAN

Machine learning, and in particular clustering algorithms, can be used to determine which geographical areas are commonly visited and “checked into” …

Machine Learning

The Role of Statistical Significance in Growth Experiments

The concepts of experimental design and hypothesis testing originate from the work of Ronald Fisher in the early 20th century. These concepts …

Apache Hadoop at 20

A week or two ago, Doug Cutting wrote up a ten-year retrospective on Apache Hadoop for the project’s birthday. I enjoyed it. As co-creator of the …

Big Data

Dive into Apache Hadoop open source technology

On this week’s NFV/SDN Reality Check, we look at some top news items from across the space as well as speak with Cloudera on CSPs adopting Apache …

Virtualization

How BuzzFeed Thinks About Data, And Some Charts, Too

Some thoughts for an increasingly complex media landscape.<p>Two years ago I wrote about how BuzzFeed thinks about data science. All the same basic tenets still hold today, and yet the metrics we cared about then are vastly different from the metrics we care about today. Why is that, and how did that …

Google Analytics

Fast Search and Analytics on Hadoop with Elasticsearch

Zookeeper and Oozie: Hadoop Workflow and Cluster Managers

We generate petabytes of data every day, which is processed by farms of servers distributed across the geographical location of the globe. With big …

Interview with Jay Kreps about Apache Kafka

This time we interviewed Jay Kreps, one of the creators of Apache Kafka. Kafka is an open source messaging system with a few design choices that make …

Step by Step how to predict the future with Machine Learning

<b>Ever wondered how machine learning works? How exactly do you use historical data to predict the future? Well here’s a tutorial that will help you</b> …

Machine Learning

Why you should use Spark for machine learning

As organizations create more diverse and more user-focused data products and services, there is a growing need for machine learning, which can be …

Machine Learning

Deep Learning Comp Sheet: Deeplearning4j vs. Torch vs. Caffe vs. TensorFlow vs. MxNet vs. CNTK - Deeplearning4j: Open-source, Distributed Deep Learning for the JVM

Comparing Top Deep Learning Frameworks: Deeplearning4j, PyTorch, TensorFlow, Caffe, Keras, MxNet, Gluon & CNTK<p>Skymind bundles Deeplearning4j and …

Deep Learning

10 tips for integrating NoSQL databases in your business

Data is driving innovation and growth for business, but only for businesses prepared to handle data effectively.<p>While relational databases have their …

Databases

6 Differences Between Pandas And Spark DataFrames

With 1.4 version improvements, <b>Spark DataFrames</b> <b>could become the new Pandas</b>, making <i>ancestral</i> RDDs look like Bytecode.<p>I use heavily Pandas (and …

How a matchmaking algorithm saved lives

Long before dating sites, a pair of economists delved into the question of matchmaking, and hit upon a formula with applications far beyond …