Ubiquitous Computing

Not able to register RDD as TempTable

I am using IntelliJ and trying to get data from MySql DB and then write it into Hive table. However I am not able to register my RDD to a temp table. …

Ubiquitous Computing

Error double literal found in Scala while executing jar tf command

I have recently started learning Scala. I am trying to execute the tf jar command on spark -shell promptAnd it's throwing <b>error</b>error: ';' expected but …

Apache Spark

Incompatibility Spring Boot 2 and GraphQL

I want to check out Spring Boot 2 and GraphQL. My pom.xml looks like:graphql Demo project for Spring Boot with GraphQLWhen I start the Project it …

Ubiquitous Computing

Spark 2.3.1 structured streaming kafka ClassNotFound

I am trying to use Spark 2.3.1 structured streaming with Kafka. Getting the following error "xception in thread "main" …

Ubiquitous Computing

Run Spark with scala Master/Slave

I'm new to this industry and I try to understand how to work the spark! I run on the linux the spark with spark-submit and i set 1 master and 2 …


F3 doesn't work in Eclipse for Scala-Java mixed JARs

I am unable to use F3 and other Eclipse controls when I am working with Scala code in Eclipse.To reproduce this issue:• Create a new Maven project with …


IntelliJ scala project import

I have a scala project in IntelliJ with a simple folder structure src/core/CommonCSVReader.scala I have a program.scala file under src/ i run …


How to map variable names to features after pipeline

I have modified the OneHotEncoder example to actually train a LogisticRegression. My question is how to map the generated weights back to the …

Logistic Regression

How to cache only part of the RDD in Spark?

I have a PairRDD<Metadata, BigData>. I want to do two actions: one on all the data in the RDD, and then another action on only the Metadata. The …


Scala read csv file and sort the file

I have read a csv file to dataframe and I want to sort the df in asc order:37: error: value sort is not a member of Unit …

Apache Spark

How to use JDBC source to write data (data frame) from HDFS to Teradata in Scala (Spark-shell)

• Steps required to write data using JDBC connections in Scala<br>• Possible issues with JDBC sources and know solutions?

Ubiquitous Computing

Aggregate Equivalent of Spark in Flink

Is there any equivalent of Spark's RDD Aggregate for Flink's DataSet?After a few hours of searching it seems there is nothing.Flink's DataSet API has …

Ubiquitous Computing

Apache Flink for Stream Processing

There are different stream processing frameworks like Apache Storm, Apache Spark, Apache Samza, Apache Flink, etc which has unique features. In this …

Machine Learning

Need to know masking the data using sparksql in scala?

I need to mask the Data using sparksql in scala language. if example i have student table i need to mask student name and Grade(CGPA) column.

Apache Spark

Clone objects using Serialization by byte array java

How to serialize and deserialize objects using ByteArrayOutputStream and ByteArrayInputStream? I need a simple and explicit explanation of this …


XGBoost failing after using windowing functions on label column

I have successfully trained an XGBoost model where trainDF is a dataframe hacing two columns: features and label where we have 11k 1s and 57M 0's …

Ubiquitous Computing

Getting an array of arrays of structs from a struct

I have the following Spark SQL schema:From an instance of route, I am trying to get the legs array and declare its type when I define the val.I tried …

Ubiquitous Computing

cassandra/datastax: programatically setting datastax package

The following spark-submit script works:Programatically, I can do the same by configuring SparkContext:<b>Question</b>Can I add --packages …


Scala RSA decryption BadPaddingException: Decryption error

I am unable to decode the data using private key. I am trying to decode the using openssl is workingopenssl rsautl -decrypt -inkey $priv_key -in $key …


Computing Quartiles over Windowed Dataframe

I have some data, for the sake of discussion take it to be given by:I would like to compute quartiles for each ID over a moving window of days. …

Python Programming

create graph in spark-graphX

I have spark 2.3 and I use scala with sbt. I want to create a graph in graphx. here is my code:but I get this error:why do I get this error and what …

Ubiquitous Computing

Create Apache Flink Table From Kafka DataStream

I have a DataStream of typewhich I created using a (Flink) Kafka consumer. I wish to create a Flink table from this DataStream. I have looked through …

Apache Software Foundation

DB connection with foreachRDD Spark Streaming

I am creating and passing a connection to the database while streaming the data. Reading everytime the data from the file and creating Neo4j sessions …

Ubiquitous Computing

Scala Spark - illegal start of definition

This is probably a stupid newbie mistake, but I'm getting an error running what I thought was basic Scala code (in a Spark notebook, via Jupyter …


Spark Streaming: join with window operation doesn't run in parallel

I'm trying to join two spark streaming jobs with window a operation as follows:On spark web UI I see one window operation is running sequentially and …


Hadoop's star dims in the era of cloud object data storage and stream computing

One of the most noteworthy findings from Wikibon’s annual update to our big data market forecast was how seldom Hadoop was mentioned in vendors’ …

Big Data

How to solve exception: ERROR XSDB6: Another instance of Derby may have already booted the database while loading textfile into spark

I have hdfs cluster on linux centos which contains Spark.1.6.0 by default. Since it is an old one, I updated spark version to Spark.2.2.0 and Scala …

Ubiquitous Computing

How to run jar with multiple main methods with spark-submit?

I want to take multiple Scala main method in a single jar and run my spark application.Can I specify what class should run? Please provide an example.

Ubiquitous Computing

Spark dataframe saveAsTable is using a single task

We have a pipeline for which the initial stages are properly scalable - using several dozen workers apiece.One of the last stages isFor this stage we …

Ubiquitous Computing

Apache spark - loading data from elasticsearch is too slow

I'm new to Apache Spark and I'm trying to load some elasticsearch data from a scala script I'm running on it.Here is my script:And it works, but It's</b> …