Big Data

By fasoulas | fasoulas created a magazine on Flipboard. “Big Data on Flipboard” is available with thousands of other magazines and all the news you care about. Download Flipboard for free and search for “fasoulas”.

Hazelcast Launches an Open Source In-Memory Stream Processing Engine

Hazelcast, known chiefly for its open source in-memory data grid (IMDG), has launched an open source lightweight, distributed data-processing engine …

Hadoop

Serving Real-Time Machine Learning Predictions on Amazon EMR | AWS Big Data Blog

The typical progression for creating and using a trained model for recommendations falls into two general areas: training the model and hosting the …

Amazon Web Services

Amazon Redshift Engineering’s Advanced Table Design Playbook: Preamble, Prerequisites, and Prioritization | AWS Big Data Blog

<i></i><i>Zach Christopherson is a Senior Database Engineer on the Amazon Redshift team.</i><p><b>Part 1: Preamble, Prerequisites, and Prioritization</b> <br>Part 2: …

Databases

Running Jupyter Notebook and JupyterHub on Amazon EMR | AWS Big Data Blog

<i>Tom Zeng is a Solutions Architect for Amazon EMR</i><p>Jupyter Notebook (formerly IPython) is one of the most popular user interfaces for running Python, R, …

Derive Insights from IoT in Minutes using AWS IoT, Amazon Kinesis Firehose, Amazon Athena, and Amazon QuickSight | AWS Big Data Blog

<i>Ben Snively is a Solutions Architect with AWS</i><p>Speed and agility are essential with today’s analytics tools. The quicker you can get from idea to first …

R Server 9 Adds Machine Learning to Work with Your Data Where It Lives

Built by data scientists, the R programming language has always been a tool for data scientists. But Microsoft’s R Server 9, the first full new …

Software Engineering

Analyzing Data in S3 using Amazon Athena | AWS Big Data Blog

<i>Neil Mukerje is a Solution Architect for Amazon Web Services</i> <i><br>Abhishek Sinha is a Senior Product Manager for Amazon EMR</i><p>Amazon Athena is an interactive …

Implementing Authorization and Auditing using Apache Ranger on Amazon EMR | AWS Big Data Blog

<i>Varun Rao is a Big Data Architect for AWS Professional Services</i><p>Role-based access control (RBAC) is an important security requirement for multi-tenant …

Low-Latency Access on Trillions of Records: FINRA’s Architecture Using Apache HBase on Amazon EMR with Amazon S3

<i>John Hitchingham is Director of Performance Engineering at FINRA</i>The Financial Industry Regulatory Authority (FINRA) is a private sector regulator …

Hadoop

Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics

<i>Chris Marshall is a Solutions Architect for Amazon Web Services</i>Analyzing web log traffic to gain insights that drive business decisions has …

Amazon Web Services

A Data Sharing Platform Based on AWS Lambda | AWS Compute Blog

Julien Lepine<br>Solutions Architect<p>As developers, one of our top priorities is to build reliable systems; this is a core pillar of the AWS Well …

Installing and Running JobServer for Apache Spark on Amazon EMR

<i>Derek Graeber is a senior consultant in big data analytics for AWS Professional Services</i>Working with customers who are running Apache Spark on Amazon …

Amazon Web Services

Serverless Big Data pipeline on AWS

Lambda is a powerful tool when integrating different services on AWS. During the last months, I've successfully used serverless architectures to …

Big Data

How SmartNews Built a Lambda Architecture on AWS to Analyze Customer Behavior and Recommend Content

<i>This is a guest post by Takumi Sakamoto, a software engineer at SmartNews. SmartNews in their own words: "SmartNews is a machine learning-based news</i> …

Machine Learning

All the Apache Streaming Projects: An Exploratory Guide

The speed at which data is generated, consumed, processed, and analyzed is increasing at an unbelievably rapid pace. Social media, the Internet of …

Hadoop

GitHub on BigQuery: Analyze all the open source code

Posted by Felipe Hoffa, Google Developer Advocate<br>Google, in collaboration with GitHub, is releasing an incredible new open dataset on Google …

Programming

Scale out your existing MySQL landscape with Scalebase

In a nut shell, Scalebase is to MySQL what Greenplum DB is to postgresql, it makes it possible to create an MPP database based on mysql. You can use …

Databases

Real-time in-memory OLTP and Analytics with Apache Ignite on AWS

<i>Babu Elumalai is a Solutions Architect with AWS</i>Organizations are generating tremendous amounts of data, and they increasingly need tools and systems …

Hadoop

Apache Gets Yet Another Stream Processing Engine with Apex

The recent promotion of DataTorrent’s Apex to an Apache Software Foundation top-level project gives the foundation yet another open source engine for …

Hadoop

Monitoring Performance Across a Data Pipeline

In my last POST, I outlined a framework to monitor the performance of data processing frameworks like Apache Storm, Spark, Kafka, etc. These data …

Big Data

Analyze a Time Series in Real Time with AWS Lambda, Amazon Kinesis and Amazon DynamoDB Streams

<i>This is a guest post by Richard Freeman, Ph.D., a solutions architect and data scientist at JustGiving. JustGiving in their own words:</i> "<i>We are one of</i> …

Amazon Web Services

AWS Partner Post Spotlight: Attunity

Partners are a vital part of the AWS ecosystem, and AWS Partners have made important contributions to the AWS Big Data Blog.This month’s Partner Post …

Big Data

Big Data Website Gets a Big Makeover at AWS

<i>Jorge A. Lopez is responsible for Big Data Solutions Marketing at AWS</i>The big data ecosystem is evolving at a tremendous pace, giving rise to a …

Big Data

How to build your own recommendation engine using machine learning on Google Compute Engine

article<p>en_US<p>https://cloudplatform.googleblog.com/2016/03/how-to-build-your-own-recommendation-engine-using-Machine-Learning-on-Google-Compute-Engine.h …

TensorFlow machine learning with financial data on Google Cloud Platform

If you knew what happened in the London markets, how accurately could you predict what will happen in New York? It turns out, this is a grea...

Optimize Spark-Streaming to Efficiently Process Amazon Kinesis Streams

<i>Rahul Bhartia is a Solutions Architect with AWS</i> <i>Martin Schade, a Solutions Architect with AWS, also contributed to this post.</i>Do you use real-time …

Distributed Systems

Rackspace Cloud Big Data Now Available on AWS Public Cloud

Talking with current and prospective customers about our Managed Cloud Big Data service, we’ve gotten really good feedback on the current state of …

Big Data

How To: Twitter Sentiment Analysis Demo on Rackspace Cloud Big Data

At Rackspace, we make it easy to get started with Big Data tools like Spark, Kafka and Hadoop, but after you get the infrastructure, where do you go …

DevOps

Process Amazon Kinesis Aggregated Data with AWS Lambda

<i>Ian Meyers is a Principal Solutions Architect with AWS</i>Last year, we introduced the Amazon Kinesis Producer Library (KPL) to simplify the development …

Big Data