3 Comments

One of my favorite validity checks is for numeric disguised as character, particularly currencies--$1,234--or even just a numeric matrix that has been mated with a character object and had to undergo type conversion.

Expand full comment

Hmmm. This is useful. And thanks for taking the plunge into Substack (and writing a guest post for Randy, which is how I found you, even though I haven't read it yet - I just had to click on a link called "Data and Tacos"!).

I'm having a problem with a dataset now and I'm not sure where It fits into your framework. I am counting climate tech companies by SIC codes and one company can be classified under multiple codes, so there's a lot of double counting (and triple, quadruple, etc.). Promiscuous categorization!

I don't have company names, just counts, so I really don't know what I'm looking at except in the general sense of "There's a lot of activity in Sector X."

Far be it for me to criticize our data overlords at the Bureau of Labor statistics, but what kind of bad data is this, do you think?

Expand full comment