Analytics for Income Tax : YES and this is how you do it

Wise Practitioner – Predictive Analytics Interview Series: Jeff Butler at IRS Research, Analysis, and Statistics organization

Q: How would you characterize your agency’s current and/or planned use of predictive analytics?  What is one specific way in which predictive analytics actively drives decisions in your agency?

A: The IRS uses a wide range of analytic methods, tools, and technologies to address such problems as ID theft, refund fraud, inventory optimization, and other activities related to its statutory mandates. In an era of persistently reduced budgets, the use of data analytics has become more important than ever to drive innovation, risk management, and decision making across the agency.

Q: Can you describe the challenges you face or have already overcome in establishing a data-driven environment in your agency?

A: Large organizations don’t change their leopard spots overnight. Building a data-driven culture involves fundamental changes to workforce skills and business-IT relationships, which requires change leadership and long-term commitments.

Q: Can you discuss any near term goals you have for improving your agency’s use of predictive analytics?

A: The U.S. taxpayer population has some complexities that present unique challenges to the IRS. For example, high-wealth individuals often behave more like a business, and businesses with connected entities often look more a group of interrelated economic structures than a single business. There is growing interest in network analysis and related methods as an exploratory approach to better understand these types of patterns.

Q: Can you describe a successful result from the employment of predictive analytics in your agency, i.e., cost avoidance, funds recovered, improved efficiency, etc.

A: ID theft remains a significant challenge for the IRS—and therefore for U.S. taxpayers as well. The financial and psychological cost to families whose tax returns are fabricated by ID thieves can be devastating and long lasting. The use of data analytics has allowed the IRS to accelerate the process of verifying ID theft cases for faster case resolution, lowering direct costs through improved automation. Analytic models are also key to detecting and preventing billions of dollars in fraudulent refund claims each year.

Q: Sneak preview: Please tell us a take-away that you will provide during your talk at Predictive Analytics World for Government.

A: Greater awareness is needed by agencies that the traditional paradigm for analyzing data in massively large environments is changing and skills need to adapt. Organizational boundaries between IT and business have to be removed. Greater emphasis needs to be placed on multi-disciplinary teams that combine skills from computer science, IT, statistics, economics, and applied math.

 

-kindly contributed by

http://www.predictiveanalyticsworld.com/patimes/wise-practitioner-predictive-analytics-interview-series-jeff-butler-at-irs-research-analysis-and-statistics-organization09022015/

 

Hackers Paradise on Google webcam snap on ubuntu reviews for Software are so helpful

  • So I ask Google from Alphabet this question

https://www.google.co.in/search?q=webcam+snap+on+ubuntu&oq=webcam+snap+on+ubuntu&aqs=chrome..69i57.8044j0j7&sourceid=chrome&es_sm=122&ie=UTF-8

Screenshot from 2015-09-29 21:18:38

  • and I get this site  https://apps.ubuntu.com/cat/applications/natty/cheese/

Screenshot from 2015-09-29 21:18:25

  • and the reviews are so helpful and perfect
  • perhaps Google search results that feature reviews should have some T (and T tests)

Screenshot from 2015-09-29 21:18:19

Indian Culture and Indian Startups

Some things unique to Indian Startups-

1) Meet my Co Founder is my relative– Indian Startups tend to have more relatives as co founders than American startups I have seen. American founders do hire relatives as early employees but that is much more less than in India.

Basically this is because Indian culture values family over business. Also this is because family members can be trusted more in our society than friends (see below)

2) Payment is delayed , Boss– One common excuse startups face in India and sometimes contribute to is delaying payments , invoices and promised monetary benefits.

Basically this is because Indian culture values honesty over money

3) Follow up – Send an email. Then send whatsapp. Then call. Then meet. Mostly over things that were committed verbally or orally by or to a tech startup in India.

Basically this is because Indian  culture values communication and interaction over honesty

4) Stock options are in the mail– Indian startup employees are promised stock options on day 1 but they will always be ready next year. American Startups either dont promise anything or they they give you a contract on it

Basically this is because Indian  culture values entrepreneurship over employment

5) SarkaarThere is no Govt in technology startups- This is laissez faire or pure capitalism. No govt to protect you and no govt to help you. Republican Americans should take note.

Basically this is because Indian  Government  values technology corporations over technology entrepreneurship because technology corporations lobby them better than technology entrpreneurs. This is quite exactly the same as America

Basically, in both American and Indian startup culture- money talks, and cash is king.

For the love of Data : Interview with DataJoy Founder James Allen

 Describe how you came u with the idea of Setting up DataJoy? What are some of the things that you have learnt while creating ShareLaTeX.

The idea for DataJoy came organically as we talked to users of ShareLaTeX about the difficulties in their research workflow, beyond just paper writing. Like with LaTeX, Python and R have a high learning curve for new users. Having to first worry about installing them and getting a working environment set up is a difficult hurdle for people when they just want to start getting a feel for the language itself. Basically we want to let you write and run your first few lines of Python or R as quickly as possible.

There are also the difficulties that people face with collaboration. Getting someone else in a position to be able to run your code can be hard, especially if you use a lot of specialist packages or specific versions. If you’re actively working together on some code, making sure you don’t get in each other’s way is difficult. Version control systems have a very steep learning curve and need your entire team to use it. We think the real-time nature of DataJoy is a nice middle ground that lets everyone work together without fear of overwriting or disrupting your collaborator’s work, but has no learning curve.

With ShareLaTeX, we realised that there is a huge silent majority of students and researchers who may not be very tech-literate but are actively engaged in the academic process. These people just want to achieve their end goal, whether it’s submitting an assignment, writing a paper, or analysing some data. They aren’t posting on Stack Overflow or reading blogs about best practices because they don’t care about the technology, they only care about getting their work done. These are the people who we’ve found that we can help the most.

I can set up a ipython notebook server on Amazon and also using RStudio Server ( or just use an AMI which has both). What advantages does DataJoy give me as a data scientist? How is it different from R-fiddle?

Absolutely, and I don’t think DataJoy will ever replace this use-case. If you’re advanced enough in your understanding of your tools, and the infrastructure behind them then setting up a server on Amazon for yourself has a lot of benefits. However, there are a lot more people out there who want the benefits of a cloud environment, but wouldn’t know where to start with setting up their own server and are more focused on the results of their research than in learning how to do so.

Even as someone who does know how to set up such a server though, it’s still an extra piece of infrastructure that you need to manage and support. If you use DataJoy then you can let us do that for you and just focus on your actual data science workflow.

What are some of the ways that you have thought of monetizing this model of creating infrastructure for data scientists?

It’s still very early days and we’re still learning about the needs of different users, but I think there are likely to be 2 or 3 main sources of revenue for DataJoy:

  • Individual accounts for users looking for more compute resources or more advanced features,
  • Group and site license for teams in enterprise, or universities, or teaching looking to move their whole teams’ workflow to DataJoy,
  • Onsite installations

Are you thinking of expanding to include things like Spark et al for users?

We’re focusing on Python and R at the moment to make sure that we can provide the best user experience for these languages. However, our long term goal is to make DataJoy language agnostic so that you can bring your favourite language and toolchain and we’ll be able to support it. We have a very flexible infrastructure on the backend and the limitation to Python and R at the moment is to keep things simpler for us and users.

What are some case studies that you want to share?

We’re really excited about how DataJoy is being used in classrooms all over the world. I haven’t asked permission to share these stories publicly, so without naming names, I’m aware of a lecturer who is using DataJoy to run classes in an interactive way that just wasn’t possible before. He can present the lesson as code in DataJoy on the projector, and have all his students be logged into the same project on their laptops. Students can fill in chunks of code as the lesson progresses and it appears immediately on the projector and on other student’s screens.

Likewise, another lecturer is using DataJoy as a way to distribute assignments to students and if they get stuck, she can quickly log in directly to the student’s project and help them debug it. This has saved her lots of unnecessary hassle of getting the student to email her the code, and then fighting with possible version mismatches or missing dependencies. Being able to see the problem in exactly the same context as the student has been invaluable.

These cases are really exciting to us because they open up completely new ways of teaching that just weren’t possible before.

Do you intend to make the code for DataJoy open source or for users who want to run their own DataJoy server on premise?

Yes, absolutely! ShareLaTeX is already open source and available for users to run individual instances. The DataJoy code base is branched from ShareLaTeX and it still in our open GitHib repositories. The only problem with DataJoy at the moment is that the infrastructure for running Python and R code on our backend is quite tied in to our specific architecture. As soon as we work out how to abstract that so that it can run easily anywhere, we will release DataJoy as an open source project.

What else is on your product roadmap for DataJoy?

At the moment we have two main focuses: Improving DataJoy for teaching, and improving the ease of use for new Python/R users. We want to make it easier for teachers to manage large classes of students and work interactively with them. We also want to make sure that we remove the roadblocks that new Python or R users face, including making error messages more clear, making it ridiculously easy to install any package (even ones that need compiling from source) and providing help, tutorials and examples at the right times.

Describe your own journey as a developer hacker and entrepreneur. What advice would you give to young people entering data science and devops today?

I came to ShareLaTeX and DataJoy after doing a PhD in theoretical physics at Durham University which I finished in early 2013. I’d always had an interest in programming, and worked as a part-time web developer for a web hosting company while I was an undergraduate at Edinburgh studying maths. As a PhD student, I’d written a prototype LaTeX editor that had a bit of traction, and teamed up with my co-founder Henry to work on ShareLaTeX in 2012. Henry comes from a strong software development background and has helped me mature a lot as a software developer to be able to write and maintain large scale services.

I don’t have much experience doing data science directly, but my advice for all aspect of life would be reach out and talk to as many people as possible, especially if they are doing interesting or different work from you. Only by getting lots of opinions (sometimes conflicting!) can you start to build up a realistic view of the world. Surrounding yourself with people you can learn from is very important too, and part of this. If you can’t find people in real life, then find good people to listen to online. Of course, always evaluate what they say with a critical eye :).

Do you intend to make the code for DataJoy open source or for users who want to run their own DataJoy server on premise?

Yes, absolutely! ShareLaTeX is already open source and available for users to run individual instances. The DataJoy code base is branched from ShareLaTeX and it still in our open GitHib repositories. The only problem with DataJoy at the moment is that the infrastructure for running Python and R code on our backend is quite tied in to our specific architecture. As soon as we work out how to abstract that so that it can run easily anywhere, we will release DataJoy as an open source project.

What else is on your product roadmap for DataJoy?

At the moment we have two main focuses: Improving DataJoy for teaching, and improving the ease of use for new Python/R users. We want to make it easier for teachers to manage large classes of students and work interactively with them. We also want to make sure that we remove the roadblocks that new Python or R users face, including making error messages more clear, making it ridiculously easy to install any package (even ones that need compiling from source) and providing help, tutorials and examples at the right times.

Describe your own journey as a developer hacker and entrepreneur. What advice would you give to young people entering data science and devops today?

I came to ShareLaTeX and DataJoy after doing a PhD in theoretical physics at Durham University which I finished in early 2013. I’d always had an interest in programming, and worked as a part-time web developer for a web hosting company while I was an undergraduate at Edinburgh studying maths. As a PhD student, I’d written a prototype LaTeX editor that had a bit of traction, and teamed up with my co-founder Henry to work on ShareLaTeX in 2012. Henry comes from a strong software development background and has helped me mature a lot as a software developer to be able to write and maintain large scale services.

I don’t have much experience doing data science directly, but my advice for all aspect of life would be reach out and talk to as many people as possible, especially if they are doing interesting or different work from you. Only by getting lots of opinions (sometimes conflicting!) can you start to build up a realistic view of the world. Surrounding yourself with people you can learn from is very important too, and part of this. If you can’t find people in real life, then find good people to listen to online. Of course, always evaluate what they say with a critical eye :).

How would Datajoy enable coding on mobile phones or even learning coding on mobile phones.

We’d love to support DataJoy on mobile devices, but they present a number of unique technical challenges. We’ve found that what makes a nice user interface on a PC does not transfer to a tablet/phone very well, and so we’d need to redesign the whole experience. We also have to work with poorer network connections, and offline usage. These are problems that we’re excited to tackle because I think it would let people work in ways with Python and R that haven’t been possible before, but for now we’re focused on improving the desktop/laptop experience

Screenshot from 2015-09-28 22:29:07

Screenshot from 2015-09-28 22:28:45

(ps – I love DataJoy, and I have no commercial interests at all in them. I just get a kick from kicking tires in R and Python in a browser WITHOUT any installations hassles)

https://www.getdatajoy.com/

Google is watching you and how

Here is some R code we have written.

library(jsonlite)
a=fromJSON(“/home/rstudio/R/Takeout/Location History/LocationHistory.json”)
b=as.data.frame(a)

mygoog=NULL
mygoog$latitude=b$locations.latitudeE7/10000000
mygoog$longitude=b$locations.longitudeE7/10000000
mygoog$time=as.POSIXct(as.numeric(b$locations.timestampMs)/1000 , origin=”1970-01-01″)

mygoog=as.data.frame(mygoog)

library(ggmap)
Map zoom = 12,
size = c(640, 640),
scale = 2, maptype = c(“terrain”),
color = “color”)

plot1 geom_path(data = mygoog, aes(x = longitude, y = latitude
),
alpha = I(0.5),
size = 0.8)
suppressWarnings(print(plot1))

mygoog2=mygoog[time>”2015-09-21 12:09:31″,,]
plot1 <- ggmap(Map) +

geom_path(data = mygoog2, aes(x = longitude, y = latitude
),
alpha = I(0.5),
size = 0.8)
suppressWarnings(print(plot1))

How the Facebook User Experience hacked Friendship

Two friends met face to face after a long time.

Dude, said one, Why did you block me on FB

Look bro, replied the second friend, you unfriended me first

Hey I am sorry maan, said the first friend. I was just pissed at your Status Message

Which one, asked the first

I dont know, replied two

Why didnt you just unfollow me bro, asked the first. Stop seeing my posts but still remain friends

Oh crap, said the second friend. That User Experience of Facebook is so complicated

Friendship should never complicated, said the first

There are ways of making friends without the internet, said the second

and the two friends remained friends ever after

Screenshot from 2015-09-24 06:11:52