Converting Spark DataFrame to Pandas DataFrame

%pythondf=spark.sql(“select * from name_csv”)
display(df.select(“*”))

pandas_df = df.toPandas()

Creating SQL Table using Spark

%python
acc_1=spark.sql(“create table test_spark as select columns, column,columnc from table where to_date(ac_opn_dt) < ‘2012-07-01’ )”)

# Given pandas dataframe,  return a spark’s dataframe.
def pandas_to_spark(pandas_df):
columns = list(pandas_df.columns)
types = list(pandas_df.dtypes)
struct_list = []
for column, typo in zip(columns, types):
struct_list.append(define_structure(column, typo))
p_schema = StructType(struct_list)
return sqlContext.createDataFrame(pandas_df, p_schema)

Author: Ajay Ohri

http://about.me/ajayohri

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s