Use df.printSchema()
Category: Analytics
Machine Learning Week with 15% Discount Code
A project is more than just a Kaggle dataset
a few criteria that define a good data science project
- Learnability- What did you learn in the Project
- Capability – What capabilities were showcased in the project
- Difficulty- How difficult or easy was the project
- Potential Hireability- How likely are you going to be hired based on that project
- Ability- What creative approaches did you bring to the solution
A few datasets I liked only from a teaching purpose- iris, Boston, mtcars, Titanic, German Credit and mnist handwriting
A project is more than just a Kaggle dataset. hashtagdatascience hashtagdatasets hashtagkaggle hashtagmachinelearning
Saving Dataframe as a table
- ModelData2=ModelData.toPandas() #CONVERTS SPARK DF TO PANDAS DF
- table_model = spark.createDataFrame(ModelData2) # CREATES SPARK DF
- table_model.write.saveAsTable(‘LIBRARYPATH.model_data’) #SAVES AS TABLE
AND
new_df = transformed_chrn2[[‘Var1’, ‘Var2’, ‘Var3’, ‘Var4′,’Var5’]]
table_df = spark.createDataFrame(new_df)
table_df.write.saveAsTable(‘directory_name.table_name’)
SOURCE
https://stackoverflow.com/questions/30664008/how-to-save-dataframe-directly-to-hive
https://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-connect-to-sql-database
https://docs.microsoft.com/en-us/azure/databricks/getting-started/spark/dataframes
Why a mentor
bar-chart in python
In Seaborn a bar-chart can be created using the sns.countplot
method and passing it the data
https://towardsdatascience.com/introduction-to-data-visualization-in-python-89a54c97fbed
Split and Substring in Hive QL
Suppose you have a variable like AccountID
split(trim(AccountID),’-‘)[0]
trim- removes spaces
split using – , splits the string into multiple parts based on delimiter –
and [0] gives the first part of the split string ([1] will give the second part, etc .)