Home » Posts tagged 'social media'
Tag Archives: social media
httR by Hadley #rstats
The awesome Hadley Wickham has just released the next version of httr package. Prof Hadley is currently on leave from Rice Univ and working with the tremendous geeks at R Studio . New things in the httr package-
http://blog.rstudio.org/2012/10/14/httr-0-2/
httr, a package designed to make it easy to work with web APIs. Httr is a wrapper around RCurl, and provides:
- functions for the most important http verbs:
GET,HEAD,PATCH,PUT,DELETEandPOST. - support for OAuth 1.0 and 2.0. Use
oauth1.0_tokenandoauth2.0_tokento get user tokens, andsign_oauth1.0andsign_oauth2.0to sign requests. The demos directory has six demos of using OAuth: three for 1.0 (linkedin, twitter and vimeo) and three for 2.0 (facebook, github, google).
I especially like the OAuth functionality as I occasionaly got flummoxed with existing R OAuth packages , and this should hopefully lead to awesome new social media analytics posts by the larger R blogger community. Also given the fact that unauthenticated API requests to Twitter are greatly expanded by OAuth authenticated requests- (see https://dev.twitter.com/docs/rate-limiting )
- Unauthenticated calls are permitted 150 requests per hour. Unauthenticated calls are measured against the public facing IP of the server or device making the request.
- OAuth calls are permitted 350 requests per hour and are measured against the oauth_token used in the request.
some creative use cases should see an incredible amount of cross social media analysis (not just one social media channel ) at a time.
R for Social Media Analytics ? Watch this space..
Related articles
- New version of httr: 0.2 (rstudio.org)
Interview John Myles White , Machine Learning for Hackers
Here is an interview with one of the younger researchers and rock stars of the R Project, John Myles White, co-author of Machine Learning for Hackers.
Ajay- What inspired you guys to write Machine Learning for Hackers. What has been the public response to the book. Are you planning to write a second edition or a next book?
John-We decided to write Machine Learning for Hackers because there were so many people interested in learning more about Machine Learning who found the standard textbooks a little difficult to understand, either because they lacked the mathematical background expected of readers or because it wasn’t clear how to translate the mathematical definitions in those books into usable programs. Most Machine Learning books are written for audiences who will not only be using Machine Learning techniques in their applied work, but also actively inventing new Machine Learning algorithms. The amount of information needed to do both can be daunting, because, as one friend pointed out, it’s similar to insisting that everyone learn how to build a compiler before they can start to program. For most people, it’s better to let them try out programming and get a taste for it before you teach them about the nuts and bolts of compiler design. If they like programming, they can delve into the details later.
Ajay- What are the key things that a potential reader can learn from this book?
John- We cover most of the nuts and bolts of introductory statistics in our book: summary statistics, regression and classification using linear and logistic regression, PCA and k-Nearest Neighbors. We also cover topics that are less well known, but are as important: density plots vs. histograms, regularization, cross-validation, MDS, social network analysis and SVM’s. I hope a reader walks away from the book having a feel for what different basic algorithms do and why they work for some problems and not others. I also hope we do just a little to shift a future generation of modeling culture towards regularization and cross-validation.
Ajay- Describe your journey as a science student up till your Phd. What are you current research interests and what initiatives have you done with them?
John-As an undergraduate I studied math and neuroscience. I then took some time off and came back to do a Ph.D. in psychology, focusing on mathematical modeling of both the brain and behavior. There’s a rich tradition of machine learning and statistics in psychology, so I got increasingly interested in ML methods during my years as a grad student. I’m about to finish my Ph.D. this year. My research interests all fall under one heading: decision theory. I want to understand both how people make decisions (which is what psychology teaches us) and how they should make decisions (which is what statistics and ML teach us). My thesis is focused on how people make decisions when there are both short-term and long-term consequences to be considered. For non-psychologists, the classic example is probably the explore-exploit dilemma. I’ve been working to import more of the main ideas from stats and ML into psychology for modeling how real people handle that trade-off. For psychologists, the classic example is the Marshmallow experiment. Most of my research work has focused on the latter: what makes us patient and how can we measure patience?
Ajay- How can academia and private sector solve the shortage of trained data scientists (assuming there is one)?
John- There’s definitely a shortage of trained data scientists: most companies are finding it difficult to hire someone with the real chops needed to do useful work with Big Data. The skill set required to be useful at a company like Facebook or Twitter is much more advanced than many people realize, so I think it will be some time until there are undergraduates coming out with the right stuff. But there’s huge demand, so I’m sure the market will clear sooner or later.
(TIL he has played in several rock bands!)
R for Business Analytics- Book by Ajay Ohri
So the cover art is ready, and if you are a reviewer, you can reserve online copies of the book I have been writing for past 2 years. Special thanks to my mentors, detractors, readers and students- I owe you a beer!
You can also go here-
http://www.springer.com/statistics/book/978-1-4614-4342-1
R for Business Analytics
Ohri, Ajay
2012, 2012, XVI, 300 p. 208 illus., 162 in color.
ISBN 978-1-4614-4342-1
Due: September 30, 2012
(net)
- Covers full spectrum of R packages related to business analytics
- Step-by-step instruction on the use of R packages, in addition to exercises, references, interviews and useful links
- Background information and exercises are all applied to practical business analysis topics, such as code examples on web and social media analytics, data mining, clustering and regression models
R for Business Analytics looks at some of the most common tasks performed by business analysts and helps the user navigate the wealth of information in R and its 4000 packages. With this information the reader can select the packages that can help process the analytical tasks with minimum effort and maximum usefulness. The use of Graphical User Interfaces (GUI) is emphasized in this book to further cut down and bend the famous learning curve in learning R. This book is aimed to help you kick-start with analytics including chapters on data visualization, code examples on web analytics and social media analytics, clustering, regression models, text mining, data mining models and forecasting. The book tries to expose the reader to a breadth of business analytics topics without burying the user in needless depth. The included references and links allow the reader to pursue business analytics topics.
This book is aimed at business analysts with basic programming skills for using R for Business Analytics. Note the scope of the book is neither statistical theory nor graduate level research for statistics, but rather it is for business analytics practitioners. Business analytics (BA) refers to the field of exploration and investigation of data generated by businesses. Business Intelligence (BI) is the seamless dissemination of information through the organization, which primarily involves business metrics both past and current for the use of decision support in businesses. Data Mining (DM) is the process of discovering new patterns from large data using algorithms and statistical methods. To differentiate between the three, BI is mostly current reports, BA is models to predict and strategize and DM matches patterns in big data. The R statistical software is the fastest growing analytics platform in the world, and is established in both academia and corporations for robustness, reliability and accuracy.
Content Level » Professional/practitioner
Keywords » Business Analytics - Data Mining - Data Visualization - Forecasting - GUI - Graphical User Interface - R software - Text Mining
Related subjects » Business, Economics & Finance - Computational Statistics - Statistics
TABLE OF CONTENTS
Possible Digital Disruptions by Cyber Actors in USA Electoral Cycle
Some possible electronic disruptions that threaten to disrupt the electoral cycle in United States of America currently underway is-
1) Limited Denial of Service Attacks (like for 5-8 minutes) on fund raising websites, trying to fly under the radar of network administrators to deny the targeted fundraising website for a small percentage of funds . Money remains critical to the world’s most expensive political market. Even a 5% dropdown in online fund-raising capacity can cripple a candidate.
2) Limited Man of the Middle Attacks on ground volunteers to disrupt ,intercept and manipulate communication flows. Basically cyber attacks at vulnerable ground volunteers in critical counties /battleground /swing states (like Florida)
3) Electro-Magnetic Disruptions of Electronic Voting Machines in critical counties /swing states (like Florida) to either disrupt, manipulate or create an impression that some manipulation has been done.
4) Use search engine flooding (for search engine de-optimization of rival candidates keywords), and social media flooding for disrupting the listening capabilities of sentiment analysis.
5) Selected leaks (including using digital means to create authetntic, fake or edited collateral) timed to embarrass rivals or influence voters , this can be geo-coded and mass deployed.
6) using Internet communications to selectively spam or influence independent or opinionated voters through emails, short messaging service , chat channels, social media.
7) Disrupt the Hillary for President 2016 campaign by Anonymous-Wikileak sympathetic hacktivists.
Analytics for Cyber Conflict
The emerging use of Analytics and Knowledge Discovery in Databases for Cyber Conflict and Trade Negotiations
The blog post is the first in series or articles on cyber conflict and the use of analytics for targeting in both offense and defense in conflict situations.
It covers knowledge discovery in four kinds of databases (so chosen because of perceived importance , sensitivity, criticality and functioning of the geopolitical economic system)-
- Databases on Unique Identity Identifiers- including next generation biometric databases connected to Government Initiatives and Banking, and current generation databases of identifiers like government issued documents made online
- Databases on financial details -This includes not only traditional financial service providers but also online databases with payment details collected by retail product selling corporates like Sony’s Playstation Network, Microsoft ‘s XBox and
- Databases on contact details – including those by offline businesses collecting marketing databases and contact details
- Databases on social behavior- primarily collected by online businesses like Facebook , and other social media platforms.
It examines the role of
-
voluntary privacy safeguards and government regulations ,
-
weak cryptographic security of databases,
-
weakness in balancing marketing ( maximized data ) with privacy (minimized data)
-
and lastly the role of ownership patterns in database owning corporates
A small distinction between cyber crime and cyber conflict is that while cyber crime focusses on stealing data, intellectual property and information to primarily maximize economic gains
cyber conflict focuses on stealing information and also disrupt effective working of database backed systems in order to gain notional competitive advantages in economics as well as geo-politics. Cyber terrorism is basically cyber conflict by non-state agents or by designated terrorist states as defined by the regulations of the “target” entity. A cyber attack is an offensive action related to cyber-infrastructure (like the Stuxnet worm that disabled uranium enrichment centrifuges of Iran). Cyber attacks and cyber terrorism are out of scope of this paper, we will concentrate on cyber conflicts involving databases.
Some examples are given here-
Types of Knowledge Discovery in -
1) Databases on Unique Identifiers- including biometric databases.
Unique Identifiers or primary keys for identifying people are critical for any intensive knowledge discovery program. The unique identifier generated must be extremely secure , and not liable to reverse engineering of the cryptographic hash function.
For biometric databases, an interesting possibility could be determining the ethnic identity from biometric information, and also mapping relatives. Current biometric information that is collected is- fingerprint data, eyes iris data, facial data. A further feature could be adding in voice data as a part of biometric databases.
This is subject to obvious privacy safeguards.
For example, Google recently unveiled facial recognition to unlock Android 4.0 mobiles, only to find out that the security feature could easily be bypassed by using a photo of the owner.
Example of Biometric Databases
In Afghanistan more than 2 million Afghans have contributed iris, fingerprint, facial data to a biometric database. In India, 121 million people have already been enrolled in the largest biometric database in the world. More than half a million customers of the Tokyo Mitsubishi Bank are are already using biometric verification at ATMs.
Examples of Breached Online Databases
In 2011, Playstation Network by Sony (PSN) lost data of 77 million customers including personal information and credit card information. Additionally data of 24 million customers were lost by Sony’s Sony Online Entertainment. The websites of open source platforms like SourceForge, WineHQ and Kernel.org were also broken into 2011. Even retailers like McDonald and Walgreen reported database breaches.
The role of cyber conflict arises in the following cases-
-
Databases are online for accessing and authentication by proper users. Databases can be breached remotely by non-owners ( or “perpetrators”) non with much lesser chance of intruder identification, detection and penalization by regulators, or law enforcers (or “protectors”) than offline modes of intellectual property theft.
-
Databases are valuable to external agents (or “sponsors”) subsidizing ( with finance, technology, information, motivation) the perpetrators for intellectual property theft. Databases contain information that can be used to disrupt the functioning of a particular economy, corporation (or “ primary targets”) or for further chain or domino effects in accessing other data (or “secondary targets”)
-
Loss of data is more expensive than enhanced cost of security to database owners
-
Loss of data is more disruptive to people whose data is contained within the database (or “customers”)
So the role play for different people for these kind of databases consists of-
1) Customers- who are in the database
2) Owners -who own the database. They together form the primary and secondary targets.
3) Protectors- who help customers and owners secure the databases.
and
1) Sponsors- who benefit from the theft or disruption of the database
2) Perpetrators- who execute the actual theft and disruption in the database
The use of topic models and LDA is known for making data reduction on text, and the use of data visualization including tied to GPS based location data is well known for investigative purposes, but the increasing complexity of both data generation and the sophistication of machine learning driven data processing makes this an interesting area to watch.
The next article in this series will cover-
the kind of algorithms that are currently or being proposed for cyber conflict, the role of non state agents , and what precautions can knowledge discovery in databases practitioners employ to avoid breaches of security, ethics, and regulation.
Citations-
- Michael A. Vatis , CYBER ATTACKS DURING THE WAR ON TERRORISM: A PREDICTIVE ANALYSIS Dartmouth College (Institute for Security Technology Studies).
- From Data Mining to Knowledge Discovery in Databases Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyt
C4ISTAR for Hacking and Cyber Conflict
As per http://en.wikipedia.org/wiki/C4ISTAR
C2I stands for command, control, and intelligence.
C3I stands for command, control, communications, and intelligence.
C4I stands for command, control, communications, computers, and (military) intelligence.
C4ISTAR is the British acronym used to represent the group of the military functions designated by C4 (command, control, communications, computers), I (military intelligence), and STAR (surveillance, target acquisition, and reconnaissance) in order to enable the coordination of operations
I increasingly believe that cyber conflict will develop its own terminology and theory and paradigms in due time. In the meantime, it will adopt paradigms from existing military literature and adapt it to the unique sub culture of cyber conflict for both offensive, defensive as well as pre-emptive actions. Here I am theorizing for a case of targeted hacking attacks rather than massive attacks that bring down a website for a few hours and achieve nothing but a few press headlines . I would also theorize on countering such attacks.
So what would be the C4ISTAR for -
1) Media company supporting SOPA/PIPA/Take down Mega Upload-
Command and Control refers to the ability of commanders to direct forces-
This will be the senior executives including the members of board, legal officers, and public relationship/marketing people. Their name is available from corporate websites, and social media scraping can ensure both a list of contact addresses (online) as well as biases for phishing /malware attacks. This could also include phone (flooding or voicemail hacking ) attacks , and attacks against the email server of the company rather than the corporate website.
Communications- This will include all online and social media channels including websites of the media company , but also those of the press relations firms handling communications , phones,websites- anything which the target is likely to communicate externally (and if possible internal communication)
Timing is everything- coordinating attacks immediately is juevenile, but it might be more mature to attack on vulnerable days like product launches or just before a board of directors meeting
Intelligence-
Most corporates have an in-house research team, they can be easily targeted using social media channels, but also offline research and digging deep. Targeting intelligence corps of the target corporate is likely to produce a much better disruption. Eventually they can be persuaded to stop working for that corporate.
Computers- Anything that runs on electricity and can be disabled – should be disabled. This might require much more creativity than just flooding.
surveillance- This can be both online as well as offline, and would be of electronic assets, likely responses for the attack, and the key people who are to be disrupted.
target acquisition- at least ten people within each corporate can and should be ideally disrupted, rather than just the website. this would call for social media scraping, and prior planning. even email in-boxes can be disrupted (if all else fails)
and reconnaissance-
study your target companies, target employees, and their strategies.
Then segment and prioritize in a list of matrix of 10 to 10, who is more vulnerable and who is more valuable to attack.
the C4ISTAR for -a hacker activist organization is much more complicated but forensics reveal that most hackers tend to leave a signature style (in terms of computers,operating systems,machine ids,communication, tools, or even port numbers used)
the best defense for a media rich company to prevent hacking attacks is to first identify its own C4ISTAR structure for its digital content strategy and then fortify as well as scrub vulnerabilities (including from online information regarding its own employees)
(to be continued)
http://www.catb.org/~esr/faqs/hacker-howto.html





