Apache Spark

How to Install and Use Homebrew

The creators of Homebrew say that it is the “missing package manager for macOS”. Packages are collections of files that are bundled together that can …

Ubiquitous Computing

vtreat 1.2.0 is Available on CRAN, and it is now Big Data Capable

(This article was first published on <b>R – Win-Vector Blog</b>, and kindly contributed to R-bloggers)<p>We here at Win-Vector LLC have some really big news we …

Big Data

Save and load model in Apache Spark

I'm trying to implement a simple model to detect anomalies based on gaussian distribution but i don't know if there is a way to save the model like …

Test Data

Force Apache Flink to execute at a given point

It is my understanding that Apache Flink does not actually run the operations that you ask it to until the result of those operations is needed for …

Ubiquitous Computing

SparkStreaming: avoid checkpointLocation check

I'm writing a library to integrate Apache Spark with a custom environment. I'm implementing both custom streaming sources and streaming writers.Some …

Apache Spark

online time series anomalies detection with apache spark

we have a data pipeline systemapache kafka---->spark steaming----->spark mlibthe data consumed is time series data (e.g. each record is in the form …

Ubiquitous Computing

Is there a way to create a Python GUI or a Web Interface for a Spark Scala application?

I have a full project that uses Apache Spark to analyze specific forms of data and retrieve specific queries. The project is written in the Scala …

Ubiquitous Computing

Scala how to find the max of Intgers in a List[Row]

Let the list behow to get the maximum value in scala . The list is List[Row] not List[Int]


Compute euclidean distance between two DenseMatrix in Scala

i have two Densematrix i want to compute the euclidean distance between each points with saclaThanks


Updating grouper column based on specific column value in Scala/Apache Spark

Here is what I am trying to accomplish using spark or scala+spark:Each instance of "A", in column_1, signifies the start of a new group up until the …


Apache Spark - Tweets Processing

Given a huge dataset of tweets i need to:• extract and count the hashtags.<br>• extract and count the emoticons/emojis.<br>• extract and count the words …


Where is scala on node with spark-shell installed?

I have Apache Spark installed on a cluster. I can run spark-shell on the cluster master node. So, it means there is the scala installed to this …

Ubiquitous Computing

Optimizing Apache Spark JDBC with SQL Server using boundary limits

We are having a framework that uses Apache Spark to get data from SQL Server using Spark SQL . You can see a sample of the query below . Here I have …

SQL Servers

Spark-submit -class command not found?

I am running a project with kafka and Apache spark. To run my kafka stream I am running this command from within the project:However I simply get the …


How to convert a hex decimal column in scala to int

I tried to use conv function as I saw in some examples, but is not working for me. I don't understand why this function returns the same value in the …


scala -Joining Spark RDD by function of key

I am running Apache Spark 2.11 and using Scala. Is there any way to join two RDDs by a function of the key?Specifically, if I have an RDD …

Ubiquitous Computing

apache spark send dataframe values to specific IP port

how to send dataframe inBlockquoteto particular IP port( spark scala?Thanks


Getting Started with Apache Spark Standalone Mode of Deployment

Hi I am a beginner at Apache Spark and I'm still researching on this topic, but I wanted to type in this command line: $sudo apt-get install Scala. …


RDD processing in scala file

I have loaded the 2 csv files,converted RDD to DF, and I have written some JOIN conditions to perform on them. I have used spark shell for these. Now …

Ubiquitous Computing

Apache Spark SQL

Course Description<p>In this course you'll learn the physical components of a Spark cluster, and the Spark computing framework. You’ll build your own …

Big Data

apache spark standalone cluster - spark-submit - ConnectException: Call From ubuntu/ to ubuntu:9000 failed

When i am writing code in Intellij and spark 2.3.0 with master("local") and executing in Intellij giving me output. But, 1) if i start spark …

Big Data

I want to get latitude and longitude from a given ip-address in Scala [on hold]

The lat and longitude should be generated from https://ipstack.com. The data is given in JSON Format. How should I get started to tackle this problem?


I want to get a geolocation like latitude and longitude from a given ip-address in Scala.

Lets say there is a website that generates the coordinates based upon give address.How can I get the data and feed it to my code in Scala. I am new …

Ubiquitous Computing

Submitting python script with Apache Spark Hidden REST API

I need to submit a py file with Apache Spark Hidden REST API As I followed arturmkrtchyan tutorial, I couldn't find any example or document regarding …


How to decompress and list all file names from an archive tar.gz with Apache Spark 2?

I searched on Internet how to solve my problem but I only found this link but this solution didn't work.Do you have an idea how I could solve this?


Parse Dataframe column Apache Spark

I am crawling data from websites the data is in JSON format and has nested fields.The required field has so much of irrelevant data.Now I want to …

Python Programming

Apache Spark GraphX - pregel system. Current superstep number

Pregel computations consist of a sequence of iterations, called supersteps. How can I get the current superstep number in the sendMsg function?

Ubiquitous Computing

Data Replication between two Neo4J databases

I have one Neo4j production database, and Disaster Recovery database. Every week end, the data in Production should make available in Disaster …


How to embed Apache Spark (Scala) in HTML?

As a working PHP Developer, I came across to work with Big Data, I choose to work with Apache Spark (Scala) for Excel file read and write. I want to …

Ubiquitous Computing

As of 2018, which language is better for Apache spark, scala, java8, or python?

Now that, java8 has lambda function, is scala still preferable. and what do companies use most? scala or java.

Ubiquitous Computing