Cross Validation done wrong

Cross validation is an essential tool in statistical learning 1 to estimate the accuracy of your algorithm. Despite its great power it also exposes …

Machine Learning

globalmit - Learning Globally Optimal Dynamic Bayesian Network with the Mutual Information Test (MIT) Criterion

Project Information




Dynamic Bayesian networks (DBN) are widely applied in modeling various biological networks, including the …

Life Sciences

Ten Ways Big Data Is Revolutionizing Supply Chain Management

Bottom line: Big data is providing supplier networks with greater data accuracy, clarity, and insights, leading to more contextual intelligence shared across supply chains.

Forward-thinking manufacturers are orchestrating 80% or more of their supplier network activity outside their four walls, using …

Big Data

open source big data analysis and visualization platform

Analyze relationships, automatically discover paths between entities, and establish new links in 2D or 3D.

Overlay data as layers on a map for a …

Edward Tufte

Home | Socrata Open Source | Socrata Labs

Socrata Open Data Server

Community Edition

In support of its commitment to the open data community and to the proliferation of open data standards, …

Open Source

The open source data portal software

CKAN, the world’s leading
open-source data portal platform

CKAN is a powerful data management system that makes data accessible – by providing tools to …



Focus on your work and let us worry about the technical details!

OpenDataSoft has been developed for business users, not technical ones. With …

Information Design

Picking the right portal to make open data accessible

Some of the most commonly used platforms offer support for large datasets, visualization and search, easing agencies' transition to open data.

Open Data

UN Global Pulse: Tapping Big Data for Sustainable Development

In 2009, at the peak of the financial crisis, one of the revelations was the absence of up-to-the-minute, accurate data about who was being impacted …

Big Data

Microsoft Announces Public Preview Of Azure Data Catalog

In a blog post published this morning, Microsoft’s Joseph Sirosh, corporate vice president at Microsoft, who is in charge of Azure ML, announced the public preview of the Azure Data Catalog, an in-house tool to facilitate discovery of a company’s data sources.

Azure ML is Microsoft’s machine …


Using big data could alert us to risks in the food supply chain

As shoppers, we’ve become used to the reliable presence of brands in supermarkets. The idea of food scarcity and disruption to supplies doesn’t come into plans for our weekly food shop.

But the reality for many global food manufacturers is uncertainty. Chocolate production is one example. Some 40% …

Food Industry

How big data has transformed research

Shy Genel, Hubble fellow at the astronomy department of Columbia University, US

Describe your research project:
Illustris is a computer simulation of the evolution of the universe, through which we study how galaxies and their constituent stars and black holes form and evolve over cosmic time.

How did

Scientific Research

Best of the visualisation web... May 2015

At the end of each month I pull together a collection of links to some of the most relevant, interesting or thought-provoking web content I’ve come …


Designing Data-Driven Interfaces

Telling the story of your data

“Dashboard”, “Big Data”, “Data visualization”, “Analytics” — there’s been an explosion of people and companies looking …

Data Visualization

Big Data Gets Small – Ask These 3 Questions First

Big data is posted online approximately 13 times a minute according to the social analytics tool Atlas from Infegy, meaning that by the time you …

Big Data

An executive’s guide to machine learning

Machine learning is based on algorithms that can learn from data without relying on rules-based programming. It came into its own as a scientific …

Machine Learning

Mini-glossary: Big data terms you should know

When it comes to assembling a list of key big data terms, it makes sense to identify terms that everyone needs to know — whether they are highly …

Data Mining

7 Tools for Data Visualization in R, Python, and Julia

Last week, some examples of creating visualizations with htmlwidgets and R were presented. Fortunately, there are many more options available for …

Self-organization and missing values in SOM and GTM


In this paper, we study fundamental properties of the Self-Organizing Map (SOM) and the Generative Topographic Mapping (GTM), ramifications …

Data Mining

NCRG: Netlab

The Netlab toolbox is designed to provide the central tools necessary for the simulation of theoretically well founded neural network algorithms and …


SOM Toolbox

This is the new home page of SOM Toolbox, a function package for MATLAB implementing the Self-Organizing Map (SOM) algorithm and more. We have moved …


Inceptionism: Going Deeper into Neural Networks

Artificial Neural Networks have spurred remarkable recent progress in image classification and speech recognition. But even though these are very …

Javaplex by appliedtopology


Persistent Homology and Topological Data Analysis Library

The JavaPlex library implements persistent homology and related techniques from …


Topographical Data Analysis Next Great Hope

Round and round she goes…

Everybody who remembers how neural nets were going to save the world, raise your hands. Little higher. Make sure everybody …

Zappix Launches Visual IVR Big Data Analytics

Zappix, a provider of visual IVR and mobile app authoring technology, is now offering a big data analytics suite as part of its Visual IVR platform. …

Data Management

Apache Spark 1.4 adds R language and hardened machine-learning

Read this

ClearStory CEO: How Apache Spark is helping bring analytics to the average Joe

Read More

By providing access to the popular R statistical …


Deep Learning Machine Beats Humans in IQ Test

Computers have never been good at answering the type of verbal reasoning questions found in IQ tests. Now a deep learning machine unveiled in China …

Machine Learning


PySpark + Scikit-learn = Sparkit-learn



Sparkit-learn aims to provide scikit-learn functionality …


Which Big Data, Data Mining, and Data Science Tools go together?

We analyze the associations between the top Big Data, Data Mining, and Data Science tools based on the results of 2015 KDnuggets Software Poll. …

Data Mining