Author: davide

Big Data

From Pandas to Spark

Work in progress for translating Pandas to Spark… Operation Pandas PySpark Read file df = pd.read_csv(file, sep =’;’, encoding= ‘latin-1’) df = spark.read.format(‘csv’).options(header=’true’, inferSchema=’true’, delimiter=’;’).load(file) Group By group_by = df.groupby(“field1”)[‘field2’] group_by = df(“field1”).agg(“field2”) Unique values of column unique=df[‘field1’].nunique() unique=data.select(‘field1’).distinct().count() Head df.head(number_rows) df.head(number_rows) –> ugly df.show(number_rows)Continue reading

Book

My Books

Read and Learn… Some of my books: Big data. Una rivoluzione che trasformerà il nostro modo di vivere e già minaccia la nostra libertà di Viktor Mayer-Schönberger, Kenneth N. Cukier From Google Books: Come si può osservare in tempo reale l’espandersi di un’epidemia? In cheContinue reading

Main

Who I Am?

Hi everyone, I am Davide Gazzè, I received the Bachelor’s and Master’s degrees in Computer Engineering respectively in 2005 and 2010 at the University of Pisa. From 2010 to 2015, I was a Research Fellow at the Institute of Informatics and Telematics (IIT) of theContinue reading