I first heard this term used at the Tableau Conference a few years ago. In one of their first HR tracks, I attended every presentation remotely close to the intersection of HR and Tableau. The disclaimer from every presenter at the start included a blurb that we weren’t looking at real data. I would have cringed had I not heard this. One presenter though introduced the term ‘fata’ – fake data. I’ve adopted it since.
In the pursuit of sharing ideas in the space of People Analytics, one hurdle is the extremely sensitive nature of the actual data we’re working with. Names, emails, gender, age, social security numbers, and much more are all often part of an employee data file and useful in analysis. However, sharing this information is not proper and could result in you losing your job in people analytics. Especially outside your organization, but even within, the information must be protected from those who shouldn’t have access.
This tutorial will show a few ways in which you can create this data. Useful for development, sharing internally, and presenting your work to the work – without losing your job.
Using one of these options you’ll have the ability to create a complete data set, quickly, and without exposing any real user or employee data.
As the dust on GDPR has settled, the conflict grows as to the balance of finding insights while maintaining data privacy. Martin Fowler writes about Datensparsamkeit which he loosely defines as “data frugality”. Anyone who’s already dealt with privacy laws in Germany can relate, but with the onset of GDPR and the growing concerns of the ethical obligations and bounds of data usage, this article introduces apt compromises one can consider to strike a proper balance.
There are many approaches to maintaining the privacy of associate information and still achieving analytical goals:
Consider aggregating data right as you pull from your HRIS. Doing so removes any potential risk of exposure. A well-defined objective sets the level of detail properly at the outset.
Allow anonymous survey responses to remain anonymous. Strip away any identifiable information straight away. It also avoids the pressure when a Director, even one in HR, asks to see the responses from their reports.
Hat tip to Martin Fowler for his idea:
Datensparsamkeit suggests that you shouldn’t store the IP address directly, perhaps instead you should hash it and only store the hash.
consider applying hashes to personally-identifiable information that still can be used in analysis, but in a safely anonymized form.
As I’ve spent more and more time using HR data, I’ve grown more comfortable with less. Having worked with many business teams, notably marketing, that thrive on ‘more is more’ – with people data less is often appropriate. Firstly the goal is to respect the data of actual people, which is becoming more and more rare, and after that – remain legally compliant. You can do both.
Chief Financial Officers are now demanding their teams stop using Excel. While your C-level executives may not be demanding this of you, there are very good reasons to consider alternatives. If Finance is ready to abandon Excel, HR should certainly make the jump. Seriously, have you ever seen what a Financial Analyst builds in Excel? If not, well, just be glad.
1 Excel doesn’t do Big Data
Excel tops out at 1,048,576 rows. I believe that the majority of HR departments do not have Big Data… yet. To HR generally, ~1 million rows may feel like huge data, but it does not meet today’s definition of Big Data. In fact, that’s no where close.
Excel supports 16,834 columns in a worksheet. Personally, I’ve never seen any data nearly as wide as 16,000+ columns – and I never, ever want to.
I’m willing to wager a large sum that your HR data is not going to come in a wide format, but rather a long one. When your data is in a long format, even HR data of a mid-sized organization will surpass the ~1 million row limit.
Headcount is a simple example. Let’s consider a few reasonable examples and see when we max out of Excel.
Assume you have 40,000 active employees. If you have 25 years of history, you’ll have hit your limit.
Assume you have 10,000 employees, but you want to look at this on a monthly basis. You’ll only get 8 years worth of data in Excel.
Yes, you could of course pre-process some of the information. You could have your HCM aggregate and deliver the data. This is certainly reasonable, and even advisable in certain situations. But when you want to slice your information multiple ways – by gender, department, job level – each of those is a separate request for data. Most data analysis and visualization tools work best with granular data, that you control the various aggregations from. I’ve never found a case where I didn’t benefit more from having more granular-level information. Oh, except for when using Excel…
2 I don’t like Excel graphing.
Honestly, I hate Excel graphs. This is my least favorite part of using Excel. I feel like a data visualization failure when I use Excel. I can perform advanced table calculations in Tableau, build interactive Python and R visualizations, and write complex database queries; yet I can’t manage a decent bar graph in Excel. That’s only a slight exaggeration.
Granted, I’ve never put in the time to really master Excel graphing. But I’ve no motivation to. It’s complex, limited, and I’ve already found many better options. Why torture myself further? I’ve seen the light, and it’s glorious outside of Excel.
3 endless calculating
'Calculating 4 processors...'
Oh. my. gosh.
The amount of time I’ve suffered through Excel crunching data. Literally crunching data; leaving my work laptop sounding like it’s grinding something internally. And all I did was add a formula and apply it to the colu… *computer promptly stops responding*.
That’s all it takes to lose your Tuesday afternoon to a seemingly endless cycle of calculations. There are websites and blogs dedicated to speeding up Excel. I say it’s faster to not use Excel at all.
Every Excel user has had to use this. It’s always at the perfect moment too, just before big presentation, one final tweak … and, NO!, No, no, no; nooooooooo! Yes, Excel has crashed again. You’re left scrambling to recover your workbook.
Sheets get deleted. Formulas altered. And all of this before your data changes. Especially those among us that love to build reports and dashboards in Excel – just watch when their manager asks for the most minor of cosmetic layout alterations. Their face says it all “You just added 8 hours of unmerging, moving, and resizing 4,000 cells because of your request.”
$6 billion. That’s the amount of money JP Morgan Chase lost in 2012, in large part due to Excel errors.
Those numbers likely speak for themselves. Excel has a great ‘feature – ‘paste as values’. I use it when I want to avoid the dreaded ‘Calculating…’ The downside – there’s absolutely zero evidence of the work. You could record macros, but good luck making quick changes to a macro. If you can do that, I’ll imagine that you’re already writing code elsewhere as well.
There’s a lengthy list, and I’ve plans to cover these in-depth for use in People Analytics.
Whatever data you have is the data you can work with. If you’re at the beginning or early arc of your Analytics Journey, you’ve enough data, enough for even some of the more advanced analytics. Getting your hands on it, cleaning it, and forming actionable insights from it may be a different story.
Big Data is the term that is used ubiquitously for any reference to analytics. HR data isn’t Big Data.
Big data is data sets that are so voluminous and complex that traditional data processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating and information privacy. There are five dimensions to big data known as Volume, Variety, Velocity and the recently added Veracity and Value. – wikipedia
This does not describe HR data. Most organizations considering applying analytics to their data have thousands, if not hundreds, of employees. Combining your historical HCM data with a few other sources, and you may climb above a million records. Excel alone can handle that.
Take advantage of the current state
So HR doesn’t have big data, that doesn’t mean analytics isn’t worth pursuing. On the contrary it makes it ripe for pursuing analytics using the available data sets of today. You can begin your analytics journey without the worry of capital projects and IT teams building out Hadoop clusters for big data processing, that also have to take into account the unique security required governing HR data. None of that is required, you can begin with simple queries of HCM data with tools you’re team already uses, and you’re on the path to insights.
To HR, analytics does seem like big data. HR is typically used to the individual-level transactions – career planning, performance assessments, candidate interviews, compensation reviews. Working with organization-wide data, even subsets of, can feel like big data to HR professionals. It’s certainly a step up from traditional HR practice, and it provides big return for the effort expended.
It won’t stay this way for long
HR data not being classified as Big Data is the state today. With the proliferation of personal activity trackers, organizational network analysis (ONA), and other emerging data collection, HR data will reach the classification of big data in the not too distant future. All the reason to begin your analytics journey now, so you’re ready for the shift as it comes.
An HR-geared site should have some objectives right? For the year, we’ve lofty goals to kick out and share into the world of people analytics.
First up will be a push to build out the initial round of machine learning posts. A step-by-step approach to building your first few machine learning projects with HR data. This will be done first in Python.
Which brings to our next goal. R. Many of you want to see these items completed in R. And we’ll begin to Port over some of the code and examples into R. This will be a fun learning on our side as well.
Thirdly, more visualization examples. People analytics can mean giving users access to the information and letting them find answers and meaning. There are so many times today to give direct access to users to explore data.
Finally, data and data science education. To the early point of getting information in the hands of interested parties, there should be a push – especially in HR – to understand data, and acquire data literacy and skills to better serve organizations and modern business. Why ‘especially in HR’? HRBPs are great at people-related skills, but we’re bringing data to the last business unit to embrace it and it’s not a skill that is strong in HR… Yet.