Beginner’s Guide: Python for Analytics | Seaborn

Beginner’s Guide: Python for Analytics | Seaborn

Beginner’s Guide to Using Python with HR Data | Exploration Series

Part Three – Seaborn

In this first tutorial series, I’m exploring the IBM HR Attrition and Performance data set. This is a great data set used to demonstrate the possibilities from using machine learning and other data science techniques.

Now we’ll move on to using Seaborn for our visualizations. The benefit of Seaborn is it continues to abstract the complex, underlying calls to visualize your data – further allowing you to focus on your analysis task and not having to think about how to implement what you want to do. It goes even further and provides built-in functionality that would be incredibly complex to implement without the benefit of Seaborn.

Series Outline

0: basic operations & summary statistics

1: matplotlib

2: pandas visualization

3: seaborn

4: plotly

5: series summary

3: Seaborn

 

view on github


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Credits
Photo: Photo by Randall Ruiz on Unsplash

Beginner’s Guide: Python for Analytics | Pandas

Beginner’s Guide: Python for Analytics | Pandas

Beginner’s Guide to Using Python with HR Data | Exploration Series

Part Two – Pandas

In this first tutorial series, I’m exploring the IBM HR Attrition and Performance data set. This is a great data set used to demonstrate the possibilities from using machine learning and other data science techniques.

Next, we’ll take a look at the power of Pandas to plot our data. As a budding data [analyst/scientist/enthusiast], Pandas has become my most common import and tool. Plotting directly from pandas objects makes it very easy to stay in the flow of analyzing data. Let’s get going.

Series Outline

0: basic operations & summary statistics

1: matplotlib

2: pandas visualization

3: seaborn

4: plotly

5: series summary

2: Pandas

 

view on github


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Beginner’s Guide: Python for Analytics | Matplotlib

Beginner’s Guide: Python for Analytics | Matplotlib

Beginner’s Guide to Using Python with HR Data | Exploration Series

Part One – Matplotlib

In this first tutorial series, I’m exploring the IBM HR Attrition and Performance data set. This is a great data set used to demonstrate the possibilities from using machine learning and other data science techniques.

In this next walkthrough, we’ll begin to ‘see’ our data through the use of visualization packages. In R there are 3 commons plotting tools, and other packages extend these main items. In Python, there is Matplotlib, and most other packages build on this foundation. So, the decision of where to start with Python plotting is an easy one – let’s get going.

Series Outline

0: basic operations & summary statistics

1: matplotlib

2: pandas visualization

3: seaborn

4: plotly

5: series summary

1: matplotlib

 

view on github


Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

5 Reasons Not to Use Excel for People Analytics

5 Reasons Not to Use Excel for People Analytics

Chief Financial Officers are now demanding their teams stop using Excel. While your C-level executives may not be demanding this of you, there are very good reasons to consider alternatives. If Finance is ready to abandon Excel, HR should certainly make the jump. Seriously, have you ever seen what a Financial Analyst builds in Excel? It’s like the car accident that people just can’t stop staring at.

Here’s my Top 5 reasons to use something other than Excel for your Data Analysis work.

 

1 Excel doesn’t do Big Data

Excel tops out at 1,048,576 rows. I believe that the majority of HR departments do not have Big Data… yet. To HR generally, ~1 million rows may feel like huge data, but it does not meet today’s definition of Big Data. In fact, that’s no where close.

Excel supports 16,834 columns in a worksheet. Personally, I’ve never seen any data nearly as wide as 16,000+ columns – and I never, ever want to.

I’m willing to wager a large sum that your HR data is not going to come in a wide format, but rather a long one. When your data is in a long format, even HR data  of a mid-sized organization will surpass the ~1 million row limit.

Headcount is a simple example. Let’s consider a few reasonable examples and see when we max out of Excel.

  • Assume you have 40,000 active employees. If you have 25 years of history, you’ll have hit your limit.
  • Assume you have 10,000 employees, but you want to look at this on a monthly basis. You’ll only get 8 years worth of data in Excel.

Yes, you could of course pre-process some of the information. You could have your HCM aggregate and deliver the data. This is certainly reasonable, and even advisable in certain situations. But when you want to slice your information multiple ways – by gender, department, job level – each of those is a separate request for data. Most data analysis and visualization tools work best with granular data, that you control the various aggregations from. I’ve never found a case where I didn’t benefit more from having more granular-level information. Oh, except for when using Excel…

2 I don’t like Excel graphing.

Honestly, I hate Excel graphs. This is my least favorite part of using the software. I feel like a data visualization failure when I try to make a decent graph. I can perform advanced table calculations in Tableau, build interactive Python and R visualizations, and write complex database queries; yet I can’t manage a decent bar graph in Excel. That’s only a slight exaggeration.

Granted, I’ve never put in the time to really master Excel graphing. But I’ve no motivation to. It’s complex, limited, and I’ve already found many better options. Why torture myself further? I’ve seen the light, and it’s glorious outside of Excel.

3 endless calculating


'Calculating 4 processors...'

Oh. my. gosh.

The amount of time I’ve suffered through Excel crunching data. Literally crunching data; leaving my work laptop sounding like it’s grinding something internally. And all I did was add a formula and apply it to the colu… *computer promptly stops responding*.

That’s all it takes to lose your Tuesday afternoon to a seemingly endless cycle of calculations. There are websites and blogs dedicated to speeding it up. I say it’s faster to not use Excel at all.

4 repeatability

It’s nearly time for the big presentation… just one final tweak … and, No!, No, no, no; nooooooooo! Yes, Excel has crashed again. You’re left scrambling to recover your workbook.

Sheets get deleted. Formulas are altered. New data is added.

Furthermore, for those among us that love to build reports and dashboards in Excel – just watch when their manager asks for the most minor of cosmetic layout alterations. Their face says it all “You just added 8 hours of unmerging, moving, and resizing 4,000 cells because of your request.”

5 accuracy

  • $6 billion. That’s the amount of money JP Morgan Chase lost in 2012, in large part due to Excel errors.
  • 88%. That’s the amount spreadsheets found to have human errors present. Nearly 9 out of 10.

Those numbers likely speak for themselves. Excel has a feature ‘paste as values’. I use it when I want to avoid the dreaded ‘Calculating…’ The downside – there’s absolutely zero evidence of the work. You could record macros, but good luck making quick changes to a macro. If you can do that, I’ll imagine that you’re already writing code elsewhere as well.

 

Alternatives

There are countless alternatives. Your choice depends on what you aim to accomplish, what you may already know, and what you can afford.

Open-Source Languages:

Other options:

Each of these has it’s pros/cons. Open-source languages have endless possibilities, but you’ve got to learn to code. Tools such as Tableau and QlikView can cost thousands per license.

Results matter most

I’ll be honest, you can’t, and probably shouldn’t avoid Excel entirely. There’s a right tool for every job. There are jobs that Excel is great, maybe even perfect for.

I hope you’ll check out some of these, keeping an open and curious mind. Check out some of my Tutorials, I hope to convince you through examples more than my words.

There’s also this: the best tool is the one you use.

photo credits
stop: Photo by Bethany Legg on Unsplash

midnight clock: Photo by Loic Djim on Unsplash

dog: Photo by Matthew Henry on Unsplash

Beginner’s Guide: Python for Analytics | The Basics

Beginner’s Guide: Python for Analytics | The Basics

 

Beginner’s Guide to Using Python with HR Data | Exploration Series

Part Zero – The Basics

In this first tutorial series, I’m exploring the IBM HR Attrition and Performance data set. This is a great data set used to demonstrate the possibilities from using machine learning and other data science techniques.

I’ll be back with tutorial posts that walk through how to apply more advanced techniques to generate predictive and prescriptive insights from the data. But that’d be jumping ahead. First, the basics. Exploratory Data Analysis, or EDA.

It’s often tempting to jump right in and try to find the most advanced insight possible. When I’m in the process of learning something new, it’s my first instinct to begin applying it straight away, skipping the basics. Eventually, I’ll stumble; and it’s always something I could have avoided by simply spending a little bit of time really understanding the data I have.

To properly analyze data, you must understand it. Is it complete (missing values), are the errors (values out of normal bounds – is this correct), and generally what information is contained within the data? Depending on where the request is coming from in a work-context, you may not control the data, so what you get is what you have; it’s often much easier when you’ve pulled your own data – it’s just not always possible, or even smart to do so.

Always begin with an exploration of your data. In this tutorial, I’m digging out my current favorite tool – Python. If you’ve never programmed, if Excel still frightens you a bit, or you’re firmly in the R camp – read on; this series will show the possibilities while exploring 5 different packages and interpreting and understanding data.

Series Outline

0: basic operations & summary statistics

1: matplotlib

2: pandas visualization

3: seaborn

4: plotly

5: series summary

0: basic operations & generating summary statistics

 

view on github