For the Love of Data https://www.fortheloveofdata.com/feed/podcast/ We love data and how it intersects with news, products, technologies, and companies. Listen to our podcast and join the discussion to stay informed on the latest and greatest in the world of BI and analytics. Sun, 26 Mar 2023 02:50:03 +0000 en-US hourly 1 https://i0.wp.com/www.fortheloveofdata.com/wp-content/uploads/2015/10/drawing400x400.png?fit=32%2C32&ssl=1 For the Love of Data https://www.fortheloveofdata.com/feed/podcast/ 32 32 For the Love of Data is a monthly podcast devoted to all things data from industry news, new products, and cool data visualizations. Host Robert Furr and others hold discussions, interviews, reviews, and arguements to determine where the information technology industry is heading, with an emphasis on Business Intelligence (BI), Information Management (IM), and data analytics. Topics like data science, analytics, strategy, and governance are just a few of the topics on the table. SQL, NoSQL, Tableau, R, Oracle, MySQL, SQL Server... these are just a few of many tools we will noodle on during each episode. For the Love of Data false episodic For the Love of Data admin@fortheloveofdata.com podcast Insight on the latest in the world of data, analytics and BI For the Love of Data https://fortheloveofdata.com/wp-content/uploads/powerpress/drawing3000x3000.jpg https://www.fortheloveofdata.com/feed/podcast/ 529b849e-56fc-5fa4-85a4-b56f86205f65 E035 – Your Data and an Announcement https://www.fortheloveofdata.com/e35/?utm_source=rss&utm_medium=rss&utm_campaign=e35 Mon, 03 Dec 2018 03:53:12 +0000 http://www.fortheloveofdata.com/?p=416 https://www.fortheloveofdata.com/e35/#respond https://www.fortheloveofdata.com/e35/feed/ 0 Announcement: Celebrating FTLOD’s 3 year anniversary this month Covered a diverse range of topics from BBQ and chocolate to alogorithms and graph databases Future episodes will be much more ad-hoc and when I come across a topic that is interesting Please stay subscribed Please reach out on Twitter or LinkedIn to let me know what […] Announcement:
  • Celebrating FTLOD’s 3 year anniversary this month
  • Covered a diverse range of topics from BBQ and chocolate to alogorithms and graph databases
  • Future episodes will be much more ad-hoc and when I come across a topic that is interesting
  • Please stay subscribed
  • Please reach out on Twitter or LinkedIn to let me know what your favorite episode has been

 

The Importance of Your Data

  • Quotes:
    • With great power comes great responsibility -Amazing Spider Man #15
  • Data Commercialization
    • Ford’s CEO recently suggested that the data collected by the company’s financial services arm also represents a valuable, low-overhead asset.1
    • Not just driving data, but also using data from purchase process such as marital status, income, etc.
    • However, in desperation to maintain profits, what would some companies do?
    • Know how your data is being used.
    • Tim Cook recently criticized Google, FB, and others (not by name) of creating a “‘data industrial complex’ in which our personal information ‘is being weaponized against us with military efficiency.’”2.
    • Talked about the echo chamber that social networks and algorithms can create
    • However, this is not all data doomsday
      • Data is helping us achieve better, deeper, faster insights than ever before
      • We are bettering our health, optimizing economies, and identifying connections that we never could have before
      • All this reward comes with some risks that we need to manage and be aware of
  • Data Breaches
    • Marriott disclosed a 500MM record breach. Not the biggest, but it hackers had access since 2014.
    • Names, phone numbers, email addresses, passport numbers, date of birth and arrival and departure information. For millions others, their credit card numbers and card expiration dates were potentially compromised.3
  • What to do to protect yourself if your data is part of a breach:4
    • Sign up for services like SpyCloud (it is free)
    • Change your password – and ideally switch to unique passphrases
    • Monitor your accounts for suspicious activity
    • Open a separate credit card for online transactions
    • Limit the information you share
    • Avoid saving credit card information on websites
    • Be vigilant

Music:

Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

Sources:

  1. https://threatpost.com/ford-eyes-use-of-customers-personal-data-to-boost-profits/139209/
  2. https://www.nytimes.com/2018/10/26/technology/apple-time-cook-europe.html
  3. https://www.cnn.com/2018/11/30/tech/marriott-hotels-hacked/index.html
  4. https://www.cnn.com/2018/11/30/tech/marriott-breach-what-to-do/index.html
  5. https://answers.kroll.com/

 

]]>
Announcement: Celebrating FTLOD’s 3 year anniversary this month Covered a diverse range of topics from BBQ and chocolate to alogorithms and graph databases Future episodes will be much more ad-hoc and when I come across a topic that is interesting Plea... Announcement: Celebrating FTLOD’s 3 year anniversary this month Covered a diverse range of topics from BBQ and chocolate to alogorithms and graph databases Future episodes will be much more ad-hoc and when I come across a topic that is interesting Please stay subscribed Please reach out on Twitter or LinkedIn to let me know what […] For the Love of Data full false 19:28
E034 – Using Data to Make Perfect Chocolate – Part 2 https://www.fortheloveofdata.com/e34/?utm_source=rss&utm_medium=rss&utm_campaign=e34 Wed, 31 Oct 2018 11:38:36 +0000 http://www.fortheloveofdata.com/?p=410 https://www.fortheloveofdata.com/e34/#respond https://www.fortheloveofdata.com/e34/feed/ 0 In the second part of this two-part episode, we do a data deep dive into a decadent vat of chocolate. We talk about various stats and data with Brian Mikiten, former process engineer and founder of Casa Chocolates in San Antonio, TX. We also cover the types of chocolate and how much of chocolate making […] In the second part of this two-part episode, we do a data deep dive into a decadent vat of chocolate. We talk about various stats and data with Brian Mikiten, former process engineer and founder of Casa Chocolates in San Antonio, TX. We also cover the types of chocolate and how much of chocolate making is an art vs. a science. See part one for the history of chocolate and an overview of how to make it.

  • Types
    • White
    • Dark
    • Milk
    • Ruby – created in 2017 from Ruby cocoa beans by Barry Callebaut in Switzerland

      Photo via bakemag.com
  • Chocolate data
    • World Chocolate Day is July 7th. US National Chocolate Day is October 28.
    • Infographic: The World's Biggest Chocolate Consumers | Statista
    • The United States accounts for 20% of the world’s chocolate consumption.
    • On the average Valentine’s Day, nearly $400 million of chocolate is purchased around the world, accounting for 5% of the industry’s total sales.
    • 22% of all chocolate consumed between 8pm and midnight.
    • Chocolate significantly reduces theta activity in the brain, which is associated with relaxation, which is why we want to eat chocolate when we’re feeling stressed out.
    • Myth: Chocolate is high in caffeine (contains ~6mg/bar, same as decaf coffee)
    • More than 70% of Americans prefer milk chocolate
    • In 2011, Thorntons created the world’s largest chocolate bar, which weighed in at 12,770 lbs. It measured 13 ft. by 13 ft. by 1 ft.
    • Top companies by sales (via https://www.icco.org)
    • $ / ton by date
    • Top 10 World Cocoa Producers
  • Science / data driven production of chocolate
    • Equipment used
    • Variables evaluated / controlled
    • What is your test process?
  • Science vs. Art of chocolate making
  • Bean profiles
  • Brian’s background and history of Casa Chocolate
  • What Casa Chocolate’s approach is to making chocolate
  • Tips for getting started at home
  • Where people can find out more about Brian and Casa Chocolate

Music:

Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

Sources

]]>
In the second part of this two-part episode, we do a data deep dive into a decadent vat of chocolate. We talk about various stats and data with Brian Mikiten, former process engineer and founder of Casa Chocolates in San Antonio, TX. In the second part of this two-part episode, we do a data deep dive into a decadent vat of chocolate. We talk about various stats and data with Brian Mikiten, former process engineer and founder of Casa Chocolates in San Antonio, TX. We also cover the types of chocolate and how much of chocolate making […] For the Love of Data full false 31:11
E033 – Using Data to Make Perfect Chocolate – Part 1 https://www.fortheloveofdata.com/e33/?utm_source=rss&utm_medium=rss&utm_campaign=e33 Wed, 24 Oct 2018 19:37:23 +0000 http://www.fortheloveofdata.com/?p=393 https://www.fortheloveofdata.com/e33/#comments https://www.fortheloveofdata.com/e33/feed/ 2 In the first part of this two-part episode, we do a data deep dive into a decadent vat of chocolate. We talk about history and how to make chocolate. In part two, we will talk about various stats and data with Brian Mikiten, former process engineer and founder of Casa Chocolates in San Antonio, TX. […] In the first part of this two-part episode, we do a data deep dive into a decadent vat of chocolate. We talk about history and how to make chocolate. In part two, we will talk about various stats and data with Brian Mikiten, former process engineer and founder of Casa Chocolates in San Antonio, TX.

  • History of chocolate
    • Evidence dates back as early as 1500 BC
    • Fermented beverages date back to 350BC
    • Believed to have originated with Mesoamericans
    • Made its way to Europe where sugar was added in 16th century
    • In 1828, Dutch chemist Coenraad Johannes van Houten used alkaline salts to process into “Dutch cocoa”
    • 1847 – J.S. Fry and Sons created the first chocolate bar
    • 1876 – Swiss chocolatier Daniel Peter added milk powder to create milk chocolate
    • 2/3 of cocoa today is produced in Western Africa
    • Fair trade chocolate certifies that chocolate is not gathered with child or slave labor
  • Overview of chocolate making
    • Harvesting – pods contain ~40 cacao beans

      Photo via TripAdvisor.com
    • Roasting


    • Cracking
    • Winnowing
    • Grinding
    • Conching
    • Tempering
    • Molding

Music:

Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

Sources

]]>
In the first part of this two-part episode, we do a data deep dive into a decadent vat of chocolate. We talk about history and how to make chocolate. In part two, we will talk about various stats and data with Brian Mikiten, In the first part of this two-part episode, we do a data deep dive into a decadent vat of chocolate. We talk about history and how to make chocolate. In part two, we will talk about various stats and data with Brian Mikiten, former process engineer and founder of Casa Chocolates in San Antonio, TX. […] For the Love of Data full false 1:03:01
E032 – 2018 State of DevOps Report https://www.fortheloveofdata.com/e32/?utm_source=rss&utm_medium=rss&utm_campaign=e32 Sun, 30 Sep 2018 04:50:10 +0000 http://www.fortheloveofdata.com/?p=386 https://www.fortheloveofdata.com/e32/#respond https://www.fortheloveofdata.com/e32/feed/ 0 Intro Greg’s Background Intro to DevOps Tools you’ve used Intro to the report & this year vs. previous years Feels more general and high-level than some of the previous reports (no MTTR mentioned for instance) Who took the survey? Surveyed over 30,000 people in 7 years (~4,300 / yr) Technology is overrepresented – 38% of […]
  • Intro
  • Greg’s Background
  • Intro to DevOps
  • Tools you’ve used
  • Intro to the report & this year vs. previous years
    • Feels more general and high-level than some of the previous reports (no MTTR mentioned for instance)
  • Who took the survey?
    • Surveyed over 30,000 people in 7 years (~4,300 / yr)
    • Technology is overrepresented – 38% of total respondents
    • Energy & Resources was only 2%
    • Tech + FS = 50%
    • Infosec = only 3% of people
    • 29% were dedicated DevOps (14% IT, 15% Dev/Eng)
  • Keywords

    • Data is mentioned 43 times in the report
    • Security = 65x
    • Agile = 7
    • DevOps = 328
    • Top 10 words in Word Cloud: (removed Puppet | State of DevOps footer on each page)
      • 245DevOps
      • 210teams
      • 149Stage
      • 123practices
      • 111can
      • 99team
      • 79organizations
      • 75services
      • 73business
      • 69success
    • No DataOps, no SecOps or DevSecOps
  • C-suite seems out of touch with conditions on the ground
    • Differences in perception – p. 30
    • Sometimes overstate team’s opinion by a factor of 2x
  • Stages in the report:

    • First
      • Stage 0: Build the foundation
    • Second
      • Stage 1: Normalize the technology stack
      • Stage 2: Standardize and reduce variability
      • Stage 3: Expand DevOps practices
    • Third
      • Stage 4: Automate infrastructure delivery
      • Stage 5: Provide self-service capabilities
  • CAMS = Culture, Automation, Measurability, Sharing
  • Principal Industries:
    • Top: Tech, Financial Services, Manufacturing/Industry
    • Bottom: Non-Profit, Energy/Resources, Media
    • Trend: Most to least competition?
  • Music:

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    ]]>
    Intro Greg’s Background Intro to DevOps Tools you’ve used Intro to the report & this year vs. previous years Feels more general and high-level than some of the previous reports (no MTTR mentioned for instance) Who took the survey? Surveyed over 30, Intro Greg’s Background Intro to DevOps Tools you’ve used Intro to the report & this year vs. previous years Feels more general and high-level than some of the previous reports (no MTTR mentioned for instance) Who took the survey? Surveyed over 30,000 people in 7 years (~4,300 / yr) Technology is overrepresented – 38% of […] For the Love of Data full false 45:52
    E031 – Data Collaboration with Cursor https://www.fortheloveofdata.com/e31/?utm_source=rss&utm_medium=rss&utm_campaign=e31 Thu, 30 Aug 2018 23:50:23 +0000 http://www.fortheloveofdata.com/?p=378 https://www.fortheloveofdata.com/e31/#respond https://www.fortheloveofdata.com/e31/feed/ 0 Learn about Cursor, a new platform for collaboration around data, hosted platforms and BI artifacts. I sat down with Adam Weinstein, CEO and Co-Founder of Cursor, to learn about the platform. About Cursor Cursor offers a data search and analytics hub that makes disparate data accessible and actionable, enabling technical and business users alike to […] Learn about Cursor, a new platform for collaboration around data, hosted platforms and BI artifacts. I sat down with Adam Weinstein, CEO and Co-Founder of Cursor, to learn about the platform.

    About Cursor

    Cursor offers a data search and analytics hub that makes disparate data accessible and actionable, enabling technical and business users alike to effortlessly get answers, collaborate and gain insights. Founded by a trio of data leaders from Salesforce, LinkedIn, and Pandora, Cursor’s easy-to-deploy software has been adopted by teams at Apple, Atlassian, Deloitte, Incedo, LinkedIn, NovumRx, and Slack. Cursor is based in San Francisco, CA.

    Cursor Press

    Topics:

    1. What is Adam’s background?
    2. How BI has evolved over the past 10-20 years.
    3. What some of the most pressing challenges are for organizations today?
    4. What should people being doing today, outside of a specific tool, to get better at collaborating?
    5. How can Cursor help with those challenges?
    6. How is content secured on the platform? (separating data from metadata)
    7. Where can people find out more about Cursor?
    8. What’s next for Cursor as far as features or a roadmap?
    9. What are some tools that Adam can’t live without in your daily work?

    Music:

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    ]]>
    Learn about Cursor, a new platform for collaboration around data, hosted platforms and BI artifacts. I sat down with Adam Weinstein, CEO and Co-Founder of Cursor, to learn about the platform. About Cursor Cursor offers a data search and analytics hub t... Learn about Cursor, a new platform for collaboration around data, hosted platforms and BI artifacts. I sat down with Adam Weinstein, CEO and Co-Founder of Cursor, to learn about the platform. About Cursor Cursor offers a data search and analytics hub that makes disparate data accessible and actionable, enabling technical and business users alike to […] For the Love of Data full false 40:28
    E030 – July 2018 News Roundup https://www.fortheloveofdata.com/e30/?utm_source=rss&utm_medium=rss&utm_campaign=e30 Tue, 31 Jul 2018 01:49:19 +0000 http://www.fortheloveofdata.com/?p=374 https://www.fortheloveofdata.com/e30/#respond https://www.fortheloveofdata.com/e30/feed/ 0 July 2018 News Roundup This month’s episode is a roundup of news from a variety of sources covering three main topics: BI / Dataviz Tools Databases and Platforms Tools and Frameworks Note: Most of the text extracts below are direct quotations from new sources cited in the source list at the bottom of these show […] July 2018 News Roundup

    This month’s episode is a roundup of news from a variety of sources covering three main topics:

    1. BI / Dataviz Tools
    2. Databases and Platforms
    3. Tools and Frameworks

    Note: Most of the text extracts below are direct quotations from new sources cited in the source list at the bottom of these show notes. This episode is a compilation from those sources.

    BI / Dataviz Tools

    PowerBI enhancements (7/12/18)

    • Microsoft has updated its Power BI analytics service in an effort to expand data prep capabilities and unify data analytics across platforms.
    • “Using the Power Query experience familiar to millions of Power BI Desktop and Excel users, business analysts can ingest, transform, integrate and enrich big data directly in the Power BI web service – including data from a large and growing set of supported on-premises and cloud-based data sources, such as Dynamics 365, Salesforce, Azure SQL Data Warehouse, Excel and SharePoint,” the post reads.
    • Power BI now supports data in Azure Data Lake Storage, and integrates with SQL Server Analysis Services and SQL Server Reporting Services.
    • Microsoft today announced the general availability of Visio Visual for Power BI. Based on the feedback collected from the customers during the preview period, Microsoft has made the following changes to the Visio Visual:
      • Support for Power BI Mobile app
      • The ability to change the diagram link embedded earlier and to copy an embedded link to the clipboard
      • Configurable auto-zoom settings that can be turned on and off
      • Support for complex diagrams using layers
      • Overall performance improvements

    Tableau acquires Empirical Systems

    • Tableau last month announced the acquisition of Empirical Systems, an artificial intelligence (AI) startup with an automated discovery and analysis engine designed to spot influencers, key drivers, and exceptions in data.

    Looker Enhances Data Science Capability with Integration for Google Cloud BigQuery ML

    • With Looker and BQML, data teams can now save time and eliminate unnecessary processes by creating machine learning (ML) models directly in Google BigQuery via Looker – without the need to transfer data into additional ML tools. BQML predictive functionality will also be integrated into new or existing Looker Blocks allowing users to surface predictive measures in dashboards and applications.

    DBs and Platforms

    MemSQL Unveil Significant Update to Database for Real-time Modern Applications and Analytical Systems (Version 6.5 released)

    • Queries are now up to four times faster than the previous MemSQL version (which was already 10x faster than legacy database providers), enabling insights in milliseconds across billions of rows.
    • New automated workload optimization capabilities provide a consistent database response under ultra-high concurrency without the need for manual tuning or specialized DBA resources.
    • Additions to the MemSQL industry-leading “transform-as-you-ingest” capabilities allow customers to use stored procedures for in-database transformations to easily build real-time data pipelines.
    • Resource optimization improvements for multi-tenant deployments deliver greater control and scalability for varied database sizes whether on-premises or in the cloud.

    Hortonworks Data Platform 3.0

    • Even a Hadoop stalwart such as Hortonworks Inc. sees the writing on the wall, which is why, in its recent 3.0 release, it emphasized heterogeneous object storage. The new Hortonworks Data Platform 3.0 supports data storage in all of the major public-cloud object stores, including Amazon S3, Azure Storage Blob, Azure Data Lake, Google Cloud Storage and AWS Elastic MapReduce File System.
    • HDP’s latest storage enhancements include a consistency layer, NameNode enhancements to support scale-out persistence of billions of files with lower storage overhead, and storage-efficiency enhancements such as support for erasure coding across heterogeneous volumes. HDP workloads access non-HDFS cloud storage environments via the Hadoop Compatible File System API.
    • My thoughts: Are Hadoop and HDFS Dying???
    • As we are heading into the fourth industrial revolution, HDP 3.0 is a giant leap for the Big Data ecosystem, with major changes across the stack and expanded eco-system (Deep Learning and 3rd Party Dockerized Apps). HDP 3.0 can be deployed both on-premise and in the major cloud platforms – AWS, Microsoft Azure, and Google Cloud. Many of the HDP 3.0 new features are based on Apache Hadoop 3.1 and include containerization, GPU support, Erasure Coding and Namenode Federation. In order to provide a Trusted Data Lake, we are installing Apache Ranger and Apache Atlas by default with HDP 3.0. In order to streamline the stack, we have removed components such as Apache Falcon, Apache Mahout, Apache Flume, and Apache Hue, and absorbed Apache Slider functionalities into Apache YARN.

    Tools and Frameworks

    Python 3.7.0 is now available

    • Data classes that reduce boilerplate when working with data in classes.
    • A potentially backward-incompatible change involving the handling of exceptions in generators.
    • A “development mode” for the interpreter.
    • Nanosecond-resolution time objects.
    • UTF-8 mode that uses UTF-8 encoding by default in the environment.
    • A new built-in for triggering the debugger.
    • Easier access to debuggers through a new breakpoint() built-in
    • Simple class creation using data classes
    • Customized access to module attributes
    • Improved support for type hinting
    • Higher precision timing functions
    • More importantly, Python 3.7 is fast.
      • Each new release of Python comes with a set of optimizations. In Python 3.7, there are some significant speed-ups, including:
        • There is less overhead in calling many methods in the standard library.
        • Method calls are up to 20% faster in general.
        • The startup time of Python itself is reduced by 10-30%.
        • Importing typing is 7 times faster.
    • You can easily get an idea of how much time the imports in your script takes, using -X importtime:

    Apache OpenNLP 1.9.0 released

    • The Apache OpenNLP team is pleased to announce the release of Apache OpenNLP 1.9.0.
    • The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
    • It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
    • Apache OpenNLP 1.9.0 binary and source distributions are available for download from our download page: download page
    • The OpenNLP library is distributed by Maven Central as well. See the Maven Dependency page for more details: Maven Dependency
    • What’s new in Apache OpenNLP 1.9.0
      • This release introduces new features, improvements and bug fixes. Java 1.8 and Maven 3.3.9 are required.
      • Additionally the release contains the following changes:
        • Brat Document Parser should support name type filters
        • Brat format support fails on multi fragment annotations
        • Remove MD5 hashes from Release process
        • Use String[] instead of StringList in LanguageModel API
        • BRAT Annotator service Fails to start
        • Token model creation fails without at least one <SPLIT> tag
        • Update Penn Treebank URL
        • Explain the new format of feature generator XML config
        • Unify code to sum up input context features
        • FeatureGeneratorUtil can recognize Japanese Hiragana and Katakana letters

    TensorFlow 1.9.0

     

    PYPL Language Rankings: Python ranks #1, R at #7 in popularity

    Music:

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    ]]>
    July 2018 News Roundup This month’s episode is a roundup of news from a variety of sources covering three main topics: BI / Dataviz Tools Databases and Platforms Tools and Frameworks Note: Most of the text extracts below are direct quotations from new ... July 2018 News Roundup This month’s episode is a roundup of news from a variety of sources covering three main topics: BI / Dataviz Tools Databases and Platforms Tools and Frameworks Note: Most of the text extracts below are direct quotations from new sources cited in the source list at the bottom of these show […] For the Love of Data full false 15:37
    E29 – Is Data The New Oil? https://www.fortheloveofdata.com/e29/?utm_source=rss&utm_medium=rss&utm_campaign=e29 Fri, 29 Jun 2018 11:56:28 +0000 http://www.fortheloveofdata.com/?p=366 https://www.fortheloveofdata.com/e29/#respond https://www.fortheloveofdata.com/e29/feed/ 0 Is Data the New Oil? Concept originated by Clive Humby, the British mathematician who established Tesco’s Clubcard loyalty program. Humby highlighted the fact that, although inherently valuable, data needs processing, just as oil needs refining before its true value can be unlocked. Why it is the new oil Valuable commodity Different uses among many applications […] Is Data the New Oil?
    • Concept originated by Clive Humby, the British mathematician who established Tesco’s Clubcard loyalty program. Humby highlighted the fact that, although inherently valuable, data needs processing, just as oil needs refining before its true value can be unlocked.
    • Why it is the new oil
      • Valuable commodity
      • Different uses among many applications
      • Currently the big buzz of most large companies (Google, Facebook, Apple, etc.)
      • Quantity is generally better in both
      • AI is the darling of so many industries right now, and it is entirely dependent on data
      • There are ethical concerns with how we source and use this, just like there were and are geopolitical and ethical concerns with how we source and use Oil
      • Certain things cannot function (currently) without oil (passenger airplanes, boats)
        • Same with data: Oil & Gas, Netflix, Agriculture, Manufacturing, Healthcare, a general enabler
    • Why it isn’t the new oil
      • Oil is finite, but data is not
        • Rob’s Counterpoint: There is a shelf life on data that makes it less usable over time
      • Data does not have a standard price benchmark like oil
      • Not a physical asset; can be duplicated or shared relatively easily
      • Oil requires huge amounts of resources to recover and transport
        • Rob’s Counterpoint: building a successful “app” with the scale to generate meaningful data does have some costs, albeit not the scale of oil
      • Data is more useful the more that it is used, whereas oil loses energy the more it is used/processed
        • Rob’s Counterpoint: Oil is not useful by itself to most people; it’s really the product oil becomes or enables that is useful

    The Data of Oil

    • Difference between operating on surface vs. subsea: small tubing error occurs…
      • Surface: 2-3 hours downtime; a few thousand $$ to fix
      • Subsea: 3 months downtime, $40-50mm to fix, not including lost revenue due to  deferred production (ex. 15,000 bpd well * $67/barrel * 90 days = $90.45mm)
    • A good sized offshore platform generates revenue greater than the entire country of Belize ($2.3bn vs. $1.8bn)
    • Of all the oil we can find, we generally only recover 10-20% in a field with current technology
    • 45-50% of oil generated in the US is used for transportation
    • US consumption per day is about 2 ½ gallons of crude oil / day / person
    • The U.S. has 4% of the world’s population but uses 25% of the world’s oil
    • Total daily oil consumption around the world is 84,249,000 barrels/day
    • Top 3 countries by proven oil reserves are: Venezuela, Saudi Arabia, Canada; US is #10
    • Gas is 12,200 Wh/kg vs. Li-Ion at 265 Wh/kg (~46x more energy dense)
    • MTTF (Mean time to failure) – 500 years on some parts – needed to operate in subsea environments for 30 years
    • Area of dinner plate = 10.5”
      • Area = Pi * R^2 * 20,000 PSI
      • Area = ¼ * Pi * D^2 * 20,000 PSI
      • 0.25 * pi * 10.5 * 10.5 * 20,000 = 1.73180295029137E6
      • = 1,731,802 pounds on a single dinner plate (equivalent to ~9 737 Jets)
    • Length Records
      • Analogy: Standing on top of the Empire State Building in NYC and trying to put a straw in a coke can sitting on the sidewalk below
      • Deepest Well (scientific study) = Kola Superdeep Borehole= 40,230 ft.
      • CHAYVO WELL – SAKHALIN-I PROJECT-The current world record holder for longest well; depth of 44,291 feet with a horizontal reach of 39,478 feet
      • DEEPWATER HORIZON – drilled the deepest oil well in history. The well was drilled to 35,050 vertical depth

      Well depth by year
      Well depth by year

    Music:

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    Is Data the New Oil?

    The Data of Oil Sources:

    Other Data is the New Oil Sources:

    Other Oil Sources:

    ]]>
    Is Data the New Oil? Concept originated by Clive Humby, the British mathematician who established Tesco’s Clubcard loyalty program. Humby highlighted the fact that, although inherently valuable, data needs processing, Is Data the New Oil? Concept originated by Clive Humby, the British mathematician who established Tesco’s Clubcard loyalty program. Humby highlighted the fact that, although inherently valuable, data needs processing, just as oil needs refining before its true value can be unlocked. Why it is the new oil Valuable commodity Different uses among many applications […] For the Love of Data full false 59:55
    E028 – Bimodal BI and Data Virtualization https://www.fortheloveofdata.com/e28/?utm_source=rss&utm_medium=rss&utm_campaign=e28 Sun, 27 May 2018 04:31:15 +0000 http://www.fortheloveofdata.com/?p=363 https://www.fortheloveofdata.com/e28/#respond https://www.fortheloveofdata.com/e28/feed/ 0 Today we’re back with another guest from the Netherlands. I’m not sure what it is about the Dutch, but they’ve been on a roll with some helpful thought leadership when it comes to data. My guest is Rick van der Lans, a highly-respected analyst, consultant, author, and international lecturer specializing in data warehousing, business intelligence, big […] Today we’re back with another guest from the Netherlands. I’m not sure what it is about the Dutch, but they’ve been on a roll with some helpful thought leadership when it comes to data. My guest is Rick van der Lans, a highly-respected analyst, consultant, author, and international lecturer specializing in data warehousing, business intelligence, big data, and database technology.

    I came across one of Rick’s whitepapers a few months ago on data virtualization. We got in touch and sat down to talk more in depth about the topic. Rick has a lot of data street cred. For many years, he has served as the chairman of the annual European Enterprise Data and Business Intelligence Conference in London and the annual Data Warehousing and Business Intelligence Summit in The Netherlands. He has written tons of articles, blogs, and several books, including the first book on SQL. There will be links to some of the places and things Rick has written and other info in the show notes below.

    Topics:

    • Rick’s background: author, blogger, consultant – worked on data virtualization (DV) for last 6-7 years
    • How did Rick get interested in DV?
    • Classical data warehouses vs. logical data warehouses
    • What is bi-modal BI? (term introduced by Gartner in 2014)
      • Agile/Self-Service vs. longer, more cautious approach
    • Bi-modal BI vs. the Data Quadrant
    • Comparison of major Data Virtualization Vendors
      • Denodo
      • Tibco DV Manager (bought from Cisco recently)
      • Red Hat
      • Data Virtuality Ultrarep
      • Others (AtScale, Cero, StoneBond, IBM – new entry acquired from Rocket Software)
      • Some are more mature, some are newer (Denodo vs. Tibco = green apples vs. red apples)
    • Companies rolling their own DV (in-memory / views vs. a dedicated tool)
    • DV products are not DB views on steroids
    • Lineage / impact analysis and other features
    • Caching vs. materialization – can store cached data in a virtual table in an intermediary data store. Can be help performance or prevent interference from a transactional source (keeping results consistent for an entire week).
    • How DV can help organizations that are struggling
    • How DV may not be a silver bullet
    • How are different industries embracing these principles?
    • What patterns do you see in companies embracing these principles?
    • What companies should not use this? DV not great at this time on unstructured audio / video, auto-tagging of images
    • Why a classical DWH experienced person may fail at DV
    • What are the warning signs that a DV is going off the rails?
      • Fuzzy logic needed to combine disparate sources
      • Not an integration cureall
      • How you deploy these with projects
    • How to get started? (pick a single, sexy report as a starting point)
    • Where do you go next?  (how to unify other data delivery systems, data marketplaces, API gateways)
    • How to avoid misconceptions about DV (it is slow, only about integration, etc.)
    • How to contact Rick
    • The first book on SQL

    Places to find Rick’s work:

    He has published blogs for the following websites:

    He has written the following books:

    Music:

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    ]]>
    Today we’re back with another guest from the Netherlands. I’m not sure what it is about the Dutch, but they’ve been on a roll with some helpful thought leadership when it comes to data. My guest is Rick van der Lans, a highly-respected analyst, Today we’re back with another guest from the Netherlands. I’m not sure what it is about the Dutch, but they’ve been on a roll with some helpful thought leadership when it comes to data. My guest is Rick van der Lans, a highly-respected analyst, consultant, author, and international lecturer specializing in data warehousing, business intelligence, big […] For the Love of Data full false 48:27
    E027 – The Data Quadrant https://www.fortheloveofdata.com/e27/?utm_source=rss&utm_medium=rss&utm_campaign=e27 Sat, 28 Apr 2018 19:18:09 +0000 http://www.fortheloveofdata.com/?p=351 https://www.fortheloveofdata.com/e27/#respond https://www.fortheloveofdata.com/e27/feed/ 0 My guest in today’s episode is Ronald Damhof (@ronalddamhof), the creator of the Data Quadrant. This quadrant is a sense-making framework in the complex word of data that enables a common frame of reference between managers, domain experts and engineers. This model is used by many organisations to formulate data strategy and justify investments in […] Ronald DamhofMy guest in today’s episode is Ronald Damhof (@ronalddamhof), the creator of the Data Quadrant. This quadrant is a sense-making framework in the complex word of data that enables a common frame of reference between managers, domain experts and engineers. This model is used by many organisations to formulate data strategy and justify investments in the data domain. It is used as the strategic underpinning for a data architecture, it guides the ‘rules of the game’ and it separates the fundamental concerns in data. Furthermore, it explains how an organisation can toggle the need to innovate with data and the need to deploy and use data at scale, repeatedly, safe, lawful, with constant quality and robust.

     

     

    Data Quadrant

    Topics:

    • Ronald background as a “data fundamentalist”
    • His concept of a full scale data architect
    • The push / pull point, from 1950s Toyota, applied to data
    • Development styles from systemic to opportunistic
    • Data Vault’s influence on the quadrant
    • Where data modelling (Q1) and data lakes (Q3) fit into the quadrants
    • Where should you start? Q1/Q2 or Q3/Q4
    • 90% of organizations in the Netherlands are using Data Vault

    General recommendations on tools by quadrant:

    • Q1
      • Automation – Wherescape or custom
      • Federalization – mainly still RDBMS
    • Q2
      • API’ing the data
      • Losing faith in datasets and data marts
    • Q3
      • Fast infra
      • Doesn’t believe Hadoop is a good fit for most orgs.
      • Likes fast analytical DBs like Vertica or MonetDB?
    • Q4
      • Open source
      • R, Python, Git, Dataiku
      • Abstraction layer away from code is helpful
      • Azure Platform

    Ronald Damhof’s background:

    • Primary degree in Economics
    • Certified Data Vault Grand Master
    • Data Architect at the Dutch Central Bank in the Netherlands

    Music

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    ]]>
    My guest in today’s episode is Ronald Damhof (@ronalddamhof), the creator of the Data Quadrant. This quadrant is a sense-making framework in the complex word of data that enables a common frame of reference between managers, My guest in today’s episode is Ronald Damhof (@ronalddamhof), the creator of the Data Quadrant. This quadrant is a sense-making framework in the complex word of data that enables a common frame of reference between managers, domain experts and engineers. This model is used by many organisations to formulate data strategy and justify investments in […] For the Love of Data full false 55:26
    E026 – The Four Types of Automation https://www.fortheloveofdata.com/e26/?utm_source=rss&utm_medium=rss&utm_campaign=e26 Fri, 30 Mar 2018 05:30:14 +0000 http://www.fortheloveofdata.com/?p=344 https://www.fortheloveofdata.com/e26/#respond https://www.fortheloveofdata.com/e26/feed/ 0   Introductory Product Models: Only partner implementations (BluePrism) Limited Features (WorkFusion) Customer Revenue Limited (UI Path) Single License (Softomotive) Music Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org Sources: https://irpaai.com/definition-and-benefits/ https://www.edgeverve.com/wp-content/uploads/2017/02/forrester-wave-robotic-process-automation.pdf https://www.gartner.com/doc/reprints?id=1-3U26FK2&ct=170222&st=sb http://www.uipath.com/hubfs/News_photos/Forrester_Wave_RPA_Report.png?t=1522186102828 http://images.abbyy.com/India/market_guide_for_robotic_pro_319864%20(002).pdf https://www.uipath.com/community https://www.workfusion.com/rpaexpress https://idm.net.au/article/0011800-which-rpa-software-should-i-use


     

    Introductory Product Models:

    • Only partner implementations (BluePrism)
    • Limited Features (WorkFusion)
    • Customer Revenue Limited (UI Path)
    • Single License (Softomotive)

    Music

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    1. https://irpaai.com/definition-and-benefits/
    2. https://www.edgeverve.com/wp-content/uploads/2017/02/forrester-wave-robotic-process-automation.pdf
    3. https://www.gartner.com/doc/reprints?id=1-3U26FK2&ct=170222&st=sb
    4. http://www.uipath.com/hubfs/News_photos/Forrester_Wave_RPA_Report.png?t=1522186102828
    5. http://images.abbyy.com/India/market_guide_for_robotic_pro_319864%20(002).pdf
    6. https://www.uipath.com/community
    7. https://www.workfusion.com/rpaexpress
    8. https://idm.net.au/article/0011800-which-rpa-software-should-i-use
    ]]>
      Introductory Product Models: Only partner implementations (BluePrism) Limited Features (WorkFusion) Customer Revenue Limited (UI Path) Single License (Softomotive) Music Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.   Introductory Product Models: Only partner implementations (BluePrism) Limited Features (WorkFusion) Customer Revenue Limited (UI Path) Single License (Softomotive) Music Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org Sources: https://irpaai.com/definition-and-benefits/ https://www.edgeverve.com/wp-content/uploads/2017/02/forrester-wave-robotic-process-automation.pdf https://www.gartner.com/doc/reprints?id=1-3U26FK2&ct=170222&st=sb http://www.uipath.com/hubfs/News_photos/Forrester_Wave_RPA_Report.png?t=1522186102828 http://images.abbyy.com/India/market_guide_for_robotic_pro_319864%20(002).pdf https://www.uipath.com/community https://www.workfusion.com/rpaexpress https://idm.net.au/article/0011800-which-rpa-software-should-i-use For the Love of Data full false 27:30
    E025 – The Hype of AI https://www.fortheloveofdata.com/e25/?utm_source=rss&utm_medium=rss&utm_campaign=e25 Wed, 28 Feb 2018 05:28:14 +0000 http://www.fortheloveofdata.com/?p=339 https://www.fortheloveofdata.com/e25/#respond https://www.fortheloveofdata.com/e25/feed/ 0 Thank you my friend and fellow Capco cohort, Daragh Fitzpatrick, for joining me on this episode of FTLOD where we cut through the Hype of AI to understand some of the key challenges and opportunities facing consumers and businesses alike when working with or alongside AI. Given that we’re talking about AI, I also have […] Thank you my friend and fellow Capco cohort, Daragh Fitzpatrick, for joining me on this episode of FTLOD where we cut through the Hype of AI to understand some of the key challenges and opportunities facing consumers and businesses alike when working with or alongside AI.

    Given that we’re talking about AI, I also have a twist for today’s interview–transcription! Today’s episode is transcribed here using machine learning from webASR, a free service provided through the University of Sheffield’s Machine Intelligence for Natural Interfaces (MINI).

    Note: The transcription is wonderful as a starting point and for a free service, but it does diverge from the actual conversation fairly significantly at times. Please listen to the episode as you read along.

    Topics:

    • Definition of AI and the singularity–should we be concerned?
    • What’s going on in the AI space?
    • Typical use cases in industry
    • RPA vs. AI and different use cases
    • Recommendation systems
    • Challenges in profiling users or customers
    • Ethical challenges and consequences of bad AI or black box AI
    • AI is like fire: it can be highly useful, but it can also be a weapon and burn you.
    • Perceptions of AI that are overhyped
    • Not every product or service needs AI to be good
    • At what point does intelligence begin?
    • The fourth industrial revolution and its impact on society
    • How to responsibly introduce life-altering AI
    • Will AI supplement our lives and give us better quality of live, or will it make us do more, faster, stronger?
    • Advancements in how AI plays the game, Dover

    Some of the items we discuss are available in the following places:

    1. https://blog.1871.com/the-1871-fintech-forum-a-discussion-around-the-reality-of-todays-automation-practices
    2. https://blog.1871.com/1871-fintech-forum-future-of-data-and-analytics
    3. https://samharris.org/podcasts/116-ai-racing-toward-brink/
    4. http://fortune.com/2018/02/20/nasdaq-delist-long-blockchain-bitcoin-iced-tea/
    5. https://en.wikipedia.org/wiki/Fourth_Industrial_Revolution

    Music

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    ]]>
    Thank you my friend and fellow Capco cohort, Daragh Fitzpatrick, for joining me on this episode of FTLOD where we cut through the Hype of AI to understand some of the key challenges and opportunities facing consumers and businesses alike when working w... Thank you my friend and fellow Capco cohort, Daragh Fitzpatrick, for joining me on this episode of FTLOD where we cut through the Hype of AI to understand some of the key challenges and opportunities facing consumers and businesses alike when working with or alongside AI. Given that we’re talking about AI, I also have […] For the Love of Data full false 1:04:47
    E024 – Will machine learning kill traditional database indexes? https://www.fortheloveofdata.com/e24/?utm_source=rss&utm_medium=rss&utm_campaign=e24 Wed, 31 Jan 2018 07:47:49 +0000 http://www.fortheloveofdata.com/?p=328 https://www.fortheloveofdata.com/e24/#respond https://www.fortheloveofdata.com/e24/feed/ 0 In this episode my friend Vikas Popuri and I chat about Google’s paper comparing ML models to traditional DB indexes. Background: Google used learned indexes , machine learning models, to access data and compared these to B-Tree, Hash, and Bloom Filter indices Trained a model using multiple stages where the earlier stages could approximate a location […] In this episode my friend Vikas Popuri and I chat about Google’s paper comparing ML models to traditional DB indexes.

    Background:

    • Google used learned indexes , machine learning models, to access data and compared these to B-Tree, Hash, and Bloom Filter indices
    • Trained a model using multiple stages where the earlier stages could approximate a location and later stages would work with a subset to improve accuracy. Each stage could choose a different model to advance the search further.
    • FYI, the diagram below looks like a decision tree, but it is not. Each stage/model could have different distributions and could repeat the model used above or below.

    • They achieved access time and space savings across the board, even without using GPUs or TPUs (Tensor Processing Units)
    • “Retraining the model” – the tests were performed on a static data set, so no retraining or index maintenance was required.

    Observations / Questions:

    • Used Tensorflow with Python as the front end — apparently a lot of initial overhead with this as a test stack.
    • B-Tree indexes to some extent are a model, especially if they don’t store every key and instead store the first key in a page.
    • The paper made some rudimentary assumptions, such as using a random hash function.
      • What if the data is not static? How long would it take to retrain the model vs. maintain an index?
      • What if data profiling caused you to index certain attributes and not others?
      • What are the best practices with this newer approach
    • The power of being able to use different models at different stages is intriguing. You could also potentially maintain traditional indexes as a backup / failsafe that would upper bound to the performance of a B-Tree.
    • Load times – The folks from Google commented that they could retrain a simple model on a 200M data set in “just [a] few seconds if implemented in C++”
    • Recursive question: do you need an optimizer to optimize the optimization path?
    • Room for improvement:
      • GPUs/TPUs
      • Incorporating common queries into the model to know what questions people are asking

    Music

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    ]]>
    In this episode my friend Vikas Popuri and I chat about Google’s paper comparing ML models to traditional DB indexes. Background: Google used learned indexes , machine learning models, to access data and compared these to B-Tree, Hash, In this episode my friend Vikas Popuri and I chat about Google’s paper comparing ML models to traditional DB indexes. Background: Google used learned indexes , machine learning models, to access data and compared these to B-Tree, Hash, and Bloom Filter indices Trained a model using multiple stages where the earlier stages could approximate a location […] For the Love of Data full false 23:45
    E023 – 2017 Data Digest https://www.fortheloveofdata.com/e23/?utm_source=rss&utm_medium=rss&utm_campaign=e23 Sat, 30 Dec 2017 07:12:22 +0000 http://www.fortheloveofdata.com/?p=322 https://www.fortheloveofdata.com/e23/#respond https://www.fortheloveofdata.com/e23/feed/ 0 This episode reflects on some of the hottest topics from 2017 and their impact their data has on our lives this year and into 2018. Cryptocurrency Many of these data points come from here. Since the year began, the aggregate market cap of all cryptocurrencies combined has increased by more than 3,200% as of Dec. […] This episode reflects on some of the hottest topics from 2017 and their impact their data has on our lives this year and into 2018.

    Cryptocurrency

    Many of these data points come from here.

    • Since the year began, the aggregate market cap of all cryptocurrencies combined has increased by more than 3,200% as of Dec. 18
    • Bitcoin went through the roof, hitting an all-time high of 1 BTC = $19,891 on 12/17/2017.

    • BTC makes up 54% of the aggregate $589 billion market cap of all cryptocurrencies
    • The graphics-card hardware needs of miners has been a big reason why NVIDIA and Advanced Micro Devices have seen a double-digit percentage surge in sales recently
    • Back on Dec. 10, CBOE Global Markets (NASDAQ:CBOE) became the first to introduce bitcoin futures trading, with CME Group (NASDAQ:CME) following a week later
    • 612 new cryptocurrencies began trading in 2017
    • Top 10 cryptocurrencies in 2017 as of 12/29 according to BitInfoCharts.com (pretty similar list on AtoZForex.com):
    Cryptocurrency Price in USD Price in BTC First Trade Exchange volume 24h
    BTC

    Bitcoin

    $ 15,030.33

    +9.79% ($1,340) in 12h

    +9.56% ($1,312) in 7d

    1 BTC

    +0% in 12 hours

    +0% in 7 days

    2010-07-17 100,317 BTC

    100,316.59 BTC

    1,250,728,823.58 USD

    XRP

    Ripple

    $ 1.4

    +11.79% ($0.15) in 12h

    +28.92% ($0.31) in 7d

    0.000093 BTC

    +1.82% in 12 hours

    +17.67% in 7 days

    2014-08-14 462,239,606 XRP

    36,699.78 BTC

    551,610,001.98 USD

    ETH

    Ethereum

    $ 750.82

    +9.3% ($63.9) in 12h

    +12.07% ($80.9) in 7d

    0.05 BTC

    -0.45% in 12 hours

    +2.29% in 7 days

    2014-09-30 784,632 ETH

    34,510.75 BTC

    518,708,138.27 USD

    BCH

    Bitcoin Cash

    $ 2,571.87

    +8.04% ($191) in 12h

    +6.17% ($149) in 7d

    0.171 BTC

    -1.59% in 12 hours

    -3.1% in 7 days

    2017-08-01 209,597 BCH

    33,824.21 BTC

    508,389,215 USD

    LTC

    Litecoin

    $ 255.88

    +11.54% ($26.5) in 12h

    +0.89% ($2.26) in 7d

    0.017 BTC

    +1.59% in 12 hours

    -7.91% in 7 days

    2012-07-13 1,156,615 LTC

    18,070.26 BTC

    271,602,057.82 USD

    IOT

    IOTA

    $ 3.87

    +11.04% ($0.38) in 12h

    +2.12% ($0.08) in 7d

    0.00026 BTC

    +1.14% in 12 hours

    -6.8% in 7 days

    2017-08-30 37,838,946 IOT

    9,288.14 BTC

    139,603,822.48 USD

    XMR

    Monero

    $ 366.67

    +7.1% ($24.3) in 12h

    +9.42% ($31.6) in 7d

    0.024 BTC

    -2.46% in 12 hours

    -0.13% in 7 days

    2014-06-04 275,568 XMR

    6,354.54 BTC

    95,510,916.21 USD

    DASH

    Dash

    $ 1,126.09

    +10.81% ($110) in 12h

    +2.5% ($27.5) in 7d

    0.075 BTC

    +0.93% in 12 hours

    -6.45% in 7 days

    2014-02-20 85,797 DASH

    6,073.26 BTC

    91,283,097.97 USD

    XVG

    VERGE

    $ 0.168

    +41.89% ($0.05) in 12h

    +60.02% ($0.06) in 7d

    0.000011 BTC

    +29.23% in 12 hours

    +46.05% in 7 days

    2016-02-18 606,321,139 XVG

    5,940.19 BTC

    89,283,020.8 USD

    ICX

    ICON

    $ 5.73

    +4.78% ($0.26) in 12h

    +183.99% ($3.72) in 7d

    0.00038 BTC

    -4.56% in 12 hours

    +159.21% in 7 days

    2017-11-11 13,061,177 ICX

    5,072.57 BTC

    76,242,347.41 USD

    Data Breaches

    1. Equifax – 9/7/2017 – 143mm US consumers affected
      1. Stock plunged nearly $4bn in the aftermath
      2. https://www.equifaxsecurity2017.com/
    2. RNC Voter List – nearly every registered voter, ~200mm Americans
    3. Yahoo’s 2013 breach revelation – affected accounts went from 1bn to 3bn
    4. Uber – 57mm user accounts and drivers, paid to keep it under wraps
    5. 560mm Passwords – a massive list of 560mm credentials compiled into one database of breaches from at least 10 services

    You can check if your account is part of a compromise at have i been pwned or SpyCloud.

     

    World Affairs

    The World Bank has a fascinating article with 12 charts covering food assistance, climate change, education, nutrition, elections, energy and a tribute to Hans Rosling, who made us see the world in new ways with breathtaking visualizations.

    Other Data Tidbits

    • Most popular Instagram Post: Beyonce – https://www.instagram.com/p/BP-rXUGBPJa/
    • Most retweeted Twitter Post: Carter’s quest for Wendy’s Chicken Nuggest – https://twitter.com/carterjwm/status/849813577770778624/photo/1
    • Oracle bought API management firm Apiary. Be on the lookout for how that evolves for the tool and for Oracle
    • RPA saw continued growth and implementations. Expect more in 2018.
    • Kubernetes is becoming the de facto standard for container management and was upgraded to Adopt by TechRadar. Expect it to continue to gain steam and start influencing data solutions more in 2018.

    Music:

    Auld Lang Syne by Fresh Nelly, from Free Music Archive.

    Sources:

    1. https://www.coindesk.com/price/
    2. https://www.investing.com/currencies/btc-usd-historical-data
    3. https://bitinfocharts.com/new-cryptocurrencies-2017.html
    4. https://atozforex.com/news/top-10-cryptocurrency-2017/
    5. https://www.fool.com/investing/2017/12/19/16-cryptocurrency-facts-you-should-know.aspx
    6. http://cryptocurrencyfacts.com/
    7. https://gizmodo.com/the-great-data-breach-disasters-of-2017-1821582178
    8. https://www.equifaxsecurity2017.com/
    9. http://clark.com/personal-finance-credit/equifax-data-breach-a-look-back-at-our-biggest-story-of-2017/
    10. http://beta.latimes.com/business/hiltzik/la-fi-hiltzik-equifax-breach-20170908-story.html
    11. https://haveibeenpwned.com/
    12. https://www.instagram.com/p/BP-rXUGBPJa/
    13. https://www.usnews.com/news/national-news/articles/2017-12-12/twitters-top-10-most-retweeted-tweets-of-2017
    14. http://www.worldbank.org/en/news/feature/2017/12/15/year-in-review-2017-in-12-charts
    15. https://www.youtube.com/watch?v=YpKbO6O3O3M
    16. https://www.informationweek.com/strategic-cio/digital-business/2017-year-in-review—exponential-automation/a/d-id/1330648?
    17. https://www.thoughtworks.com/radar/platforms/kubernetes

     

    ]]>
    This episode reflects on some of the hottest topics from 2017 and their impact their data has on our lives this year and into 2018. Cryptocurrency Many of these data points come from here. Since the year began, This episode reflects on some of the hottest topics from 2017 and their impact their data has on our lives this year and into 2018. Cryptocurrency Many of these data points come from here. Since the year began, the aggregate market cap of all cryptocurrencies combined has increased by more than 3,200% as of Dec. […] For the Love of Data full false 23:57
    E022 – Tech Spec – Tableau Project Maestro Data Prep https://www.fortheloveofdata.com/e22/?utm_source=rss&utm_medium=rss&utm_campaign=e22 Thu, 30 Nov 2017 10:34:43 +0000 http://www.fortheloveofdata.com/?p=308 https://www.fortheloveofdata.com/e22/#respond https://www.fortheloveofdata.com/e22/feed/ 0 Zip file of all the sample data, Maestro flows, and Tableau workbook I used to get a first impression: E022_maestro_demo_files. Screenshots Sample Flow from Tableau Field Selection Data Profiling Filters Join Clause Refresh / Run Flow File Output Options Pros: Has the clean, intuitive feel of Tableau. I did my hands-on test with no training or […] Zip file of all the sample data, Maestro flows, and Tableau workbook I used to get a first impression: E022_maestro_demo_files.

    Screenshots

    Sample Flow from Tableau

    Field Selection

    Data Profiling

    Filters

    Join Clause

    Refresh / Run Flow

    File Output Options

    Pros:

    1. Has the clean, intuitive feel of Tableau. I did my hands-on test with no training or previous exposure
    2. Lots of features for a first release – joins, unions, type conversion, calculated fields, data connectors, etc.
    3. Easy to click into any part of your flow and see data
    4. Ability to edit inline – much like tweaking an Excel pivot table
    5. Data profiling is a nice visual cue to begin working with data
    6. Ability to sort, filter, rename, add calculated fields anywhere along the way
    7. Great for quick and dirty data prep that you know is heading into Tableau for ad-hoc analysis

    Cons:

    1. Ability to sort, filter, rename, add calculated fields anywhere along the way – this can get messy for others to come behind you to maintain or see what is happening
    2. Reconciliation issues between reports will now be complicated by similar flows doing slightly different things
    3. You have to remove header fields from Excel if you want Maestro to latch onto and display field names from table. By default, it looks at first row and gives generic names if column headings aren’t there (i.e., F1, F2, …)
    4. Can only have one flow open at any time
    5. Performance seems a tiny bit slow on my example with ~13,000 rows. Curious to see how it will perform against larger data sets, RDBMS, and big data connectors
    6. Only outputs to TDE or Hyper formats currently. No ability to save as CSV, XLSX, PDF, or write back to a data store
    7. Unable to source data from a TDE or Tableau Workbook
    8. No reuse of common transformations or logic across different flows
    9. NO community generated content yet – since it is very new, you can’t Google for answers or YouTube videos. Established, mature ETL and data prep tools will continue to have a leg up on this front for a while.

    Music

    Deep Sky Blue by Graphiqs Groove

    Sources:

    1. https://www.tableau.com/project-maestro
    2. https://prerelease.tableau.com/
    3. https://www.eia.gov/electricity/data/eia923/
    4. https://www2.census.gov/programs-surveys/popest/tables/2010-2016/state/totals/nst-est2016-01.xlsx
    ]]>
    Zip file of all the sample data, Maestro flows, and Tableau workbook I used to get a first impression: E022_maestro_demo_files. Screenshots Sample Flow from Tableau Field Selection Data Profiling Filters Join Clause Refresh / Run Flow File Output Optio... Zip file of all the sample data, Maestro flows, and Tableau workbook I used to get a first impression: E022_maestro_demo_files. Screenshots Sample Flow from Tableau Field Selection Data Profiling Filters Join Clause Refresh / Run Flow File Output Options Pros: Has the clean, intuitive feel of Tableau. I did my hands-on test with no training or […] For the Love of Data full false 22:59
    E021 – Data Deep Dive – Halloween Spending and Candy https://www.fortheloveofdata.com/e21/?utm_source=rss&utm_medium=rss&utm_campaign=e21 Tue, 31 Oct 2017 04:29:05 +0000 http://www.fortheloveofdata.com/?p=299 https://www.fortheloveofdata.com/e21/#respond https://www.fortheloveofdata.com/e21/feed/ 0 Just in time for Halloween this year, we take a look at the way people will spend their money on Candy and other goods during this spooky time. Spending People in the US are expected to spend $9.1 billion on Halloween this year, according to a study by the National Retail Federation. Several predictions about this year’s Halloween […] Just in time for Halloween this year, we take a look at the way people will spend their money on Candy and other goods during this spooky time.

    Spending

    People in the US are expected to spend $9.1 billion on Halloween this year, according to a study by the National Retail Federation.

    Several predictions about this year’s Halloween season include:

    • U.S. consumers are projected to drop $82.93 on average, up almost 12 percent from $74.34 last year.
    • More than 171 million consumers are expected take part in Halloween festivities.
    • Adults ages 18-34 are projected to spend on average $42.39, compared with $31.03 for all adults.

    According to the survey, consumers plan to spend:

    • $3.4 billion on costumes (purchased by 69 percent of Halloween shoppers),
    • $2.7 billion on candy (95 percent),
    • another $2.7 billion on decorations (72 percent)
    • and $410 million on greeting cards (37 percent).

    Among Halloween celebrants:

    • 71 percent plan to hand out candy,
    • 49 percent will decorate their home or yard,
    • 48 percent will wear costumes,
    • 46 percent will carve a pumpkin,
    • 35 percent will throw or attend a party,
    • 31 percent will take their children trick-or-treating,
    • 23 percent will visit a haunted house and 16 percent will dress pets in costumes.

    Top Costumes

    More than 3.7 million children plan to dress as their favorite action character or superhero, 2.9 million as Batman characters and another 2.9 million as their favorite princess while 2.2 million will dress as a cat, dog, monkey or other animal.

    Proving that Halloween isn’t just for kids, a record number of adults (48 percent) plan to dress in costume this year. More than 5.8 million adults plan to dress like a witch, 3.2 million as their favorite Batman character, 3 million as an animal (cat, dog, cow, etc.), and 2.8 million as a pirate.

    Pets won’t be left behind when it comes to dressing up for Halloween. Ten percent of pet lovers will dress their animal in a pumpkin costume, while 7 percent will dress their cat or dog as a hot dog and 4 percent as a dog, lion or pirate.

    Candy

    CandyStore.com released data from 10 years of bulk candy online sales that show favorite candies by state.

    STATE TOP CANDY POUNDS 2ND PLACE POUNDS 3RD PLACE POUNDS
    TX Starburst 1952361 Reese’s Cups 1927663 Almond Joy 837525

     

    STATE TOP CANDY POUNDS 2ND PLACE POUNDS 3RD PLACE POUNDS
    AL Candy Corn 55274 Hershey’s Mini Bars 54369 Tootsie Pops 42533
    AK Twix 4678 Blow Pops 4578 Kit Kat 3892
    AZ Snickers 904633 Hershey Kisses 817463 Hot Tamales 527843
    AR Jolly Ranchers 225990 Butterfinger 215897 Hot Tamales 89027
    CA M&M’s 1548990 Salt Water Taffy 1345782 Skittles 1034527
    CO Milky Way 5620 Twix 5478 Hershey Kisses 4087
    CT Almond Joy 2457 Milky Way 1985 M&M’s 1023
    DE Life Savers 20748 Skittles 18072 Candy Corn 10217
    FL Skittles 630938 Snickers 587385 Reese’s Cups 224637
    GA Swedish Fish 130647 Hershey Kisses 109672 Jolly Ranchers 55049
    HI Skittles 267872 Hershey Kisses 264728 Milky Way 139874
    ID Candy Corn 85903 Starburst 60826 Reese’s Cups 39847
    IL Sour Patch Kids 155782 Kit Kat 151786 Reese’s Cups 95627
    IN Hot Tamales 95092 Starburst 78920 Snickers 34589
    IA Reese’s Cups 58974 M&M’s 53982 Butterfinger 25782
    KS Reese’s Cups 231476 M&M’s 230082 Dubble Bubble Gum 159092
    KY Tootsie Pops 67829 3 Musketeers 60273 Reese’s Cups 30865
    LA Lemonheads 102833 Reese’s Cups 89738 Jolly Ranchers 45092
    ME Sour Patch Kids 58290 M&M’s 45938 Starburst 16782
    MD Milky Way 38782 Reese’s Cups 30748 Blow Pops 12093
    MA Sour Patch Kids 75638 Butterfinger 73892 Salt Water Taffy 45982
    MI Candy Corn 146782 Skittles 135982 Starburst 87740
    MN Tootsie Pops 195783 Skittles 194672 Almond Joy 98726
    MS 3 Musketeers 109783 Snickers 103993 Butterfinger 57829
    MO Milky Way 42739 Dubble Bubble Gum 34751 Butterfinger 24780
    MT Dubble Bubble Gum 24675 M&M’s 14673 Twix 13784
    NE Sour Patch Kids 106728 Salt Water Taffy 78624 M&M’s 23674
    NV Hershey Kisses 322884 Candy Corn 203746 Skittles 167837
    NH Snickers 63876 Starburst 62468 Salt Water Taffy 25987
    NJ Skittles 159324 Tootsie Pops 157893 M&M’s 110673
    NM Candy Corn 83562 Milky Way 65682 Jolly Ranchers 45721
    NY Sour Patch Kids 200008 Candy Corn 101292 Reese’s Cups 56776
    NC M&Ms 96110 Reese’s Cups 95763 Candy Corn 62308
    ND Hot Tamales 65782 Jolly Ranchers 61829 Candy Corn 51827
    OH Blow Pops 150324 M&M’s 146782 Starburst 105752
    OK Snickers 20938 Dubble Bubble Gum 10283 Butterfinger 8892
    OR Reese’s Cups 90826 M&M’s 67626 Tootsie Pops 42774
    PA M&M’s 290762 Skittles 281847 Hershey’s Mini Bars 150372
    RI Candy Corn 17862 M&M’s 13894 Twix 9003
    SC Candy Corn 114783 Skittles 98782 Hot Tamales 41892
    SD Starburst 24783 Jolly Ranchers 22983 Candy Corn 7827
    TN Tootsie Pops 59837 Salt Water Taffy 34859 Skittles 20938
    TX Starburst 1952361 Reese’s Cups 1927663 Almond Joy 837525
    UT Jolly Ranchers 475221 Reese’s Cups 29823 Tootsie Pops 198564
    VT Milky Way 29837 M&M’s 27811 Skittles 17662
    VA Snickers 26783 Hot Tamales 26178 Candy Corn 18726
    WA Tootsie Pops 223850 Salt Water Taffy 210981 Hershey Kisses 78662
    DC M&M’s 26092 Tootsie Pops 21364 Blow Pops 14763
    WV Blow Pops 43776 Hershey’s Mini Bars 23554 Milky Way 18911
    WI Starburst 116788 Butterfinger 115982 Jolly Ranchers 42998
    WY Reese’s Cups 32889 Salt Water Taffy 26555 Skittles 20812

     

    FiveThirtyEight took a different approach by analyzing data from 269,000 head-to-head matchups between candies. Their findings:

    Reese’s took 4 of the top 10 spots!

    They boiled it down into the following elements:

    Music

    In This Creepy, Sleepy Backward Town by Squire Tuck via Free Music Archive

    Sources

    1. https://nrf.com/media/press-releases/halloween-spending-reach-record-91-billion
    2. https://www.candystore.com/blog/facts-trivia/halloween-candy-map-popular/
    3. https://www.candyindustry.com/blogs/14-candy-industry-blog/post/87484-halloween-scary-good-for-candy-sales
    4. http://fivethirtyeight.com/features/the-ultimate-halloween-candy-power-ranking/
    5. http://freemusicarchive.org/music/Squire_Tuck/Happy_Halloween_1583/In_This_Creepy_Sleepy_Backward_Town_1_-_29102016_1146
    ]]>
    Just in time for Halloween this year, we take a look at the way people will spend their money on Candy and other goods during this spooky time. Spending People in the US are expected to spend $9.1 billion on Halloween this year, Just in time for Halloween this year, we take a look at the way people will spend their money on Candy and other goods during this spooky time. Spending People in the US are expected to spend $9.1 billion on Halloween this year, according to a study by the National Retail Federation. Several predictions about this year’s Halloween […] For the Love of Data full false 16:58
    E020 – How Crisis Text Line uses data to save lives https://www.fortheloveofdata.com/e20/?utm_source=rss&utm_medium=rss&utm_campaign=e20 Wed, 27 Sep 2017 02:13:59 +0000 http://www.fortheloveofdata.com/?p=291 https://www.fortheloveofdata.com/e20/#respond https://www.fortheloveofdata.com/e20/feed/ 0 If you’re in crisis, text 741741 if you’re in the US to talk with a counselor now. In this episode we speak with the people behind Crisis Text Line and Crisis Trends, two services that use data to make a difference for those going through a crisis or looking for someone with whom to talk. […] If you’re in crisis, text 741741 if you’re in the US to talk with a counselor now. In this episode we speak with the people behind Crisis Text Line and Crisis Trends, two services that use data to make a difference for those going through a crisis or looking for someone with whom to talk.

    Overview

    Key Stats

    • Over 1 million messages transmitted per month
    • 75% of texters are under 25
    • 10% under age 13
    • 65% say they have shared something with Crisis Text Line that they haven’t shared with anyone else
    • Usually at least one active rescue per day
    • Take people based on severity and have the ability to initiate an active rescue (via 911)
      • Words like ibuprofen, aspirin, tylenol are more indicative of active rescue need than the words die, overdose, suicide
      • 🙁 emoji is 4x more of an indicator
    • Roots of CTL go back to 1906 when Save-A-Life League started via newspaper ads
      • The Samaritans was the first phone suicide hotline and started in November 1953
    • Founded by Nancy Lublin, who is also the CEO of DoSomething.org, in 2011

    • Introductions – background, how they got their start, how they got involved in CrisisTextLine
      • Staci – volunteer
      • Scotty – Data Scientist
    • History of Crisis Text Line and high-level structure (where they operate, # of locations, # of employees / volunteers)
    • Staci’s experience
      • What was training like?
      • Where do she take sessions and how often?
      • How do she feel after a session?
      • Her experience as a counselor and thoughts on the impact, data, etc.
    • What ways they collect data
      • #s of texters
      • UI platform for counselors
      • Types of data they collect
      • Types of technologies used to collect/manage it – both publicly, behind the scenes, for presentations, etc.
    • What ways they use data
      • CrisisTrends.org site
      • Anonymity, opt-in/opt-out options and how frequent each occur
    • Key stats they feel are most important/surprising/alarming, etc.
      • How has data made an impact to those in need?
      • How has data made an impact to counselors?
      • How has data made an impact to the organization?
      • How has data made an impact to the crisis advocacy sector as a whole?
    • What ways can other people can use their data
      • Do they encourage that visitors explore to find their own insights?
      • Will data be available by zip code at some point?
    • Data Science
      • What tools and techniques do they see being most important in the near term?
      • What do they see as becoming less important in the near term?
      • What is something they could have told their earlier selves that would have made their path to this point easier?
    • Organization Info
      • How someone can get involved
      • What they need most
      • What is in store for the future? New technologies, platforms for contact, etc.
      • How someone can contact them

    Music

    Deep Sky Blue by Graphiqs Groove

    Sources

    1. https://youtu.be/KOtFDsC8JC0 – TED talk about origin
    2. https://www.crisistextline.org/
    3. https://crisistrends.org/
    4. http://www.newyorker.com/magazine/2015/02/09/r-u
    ]]>
    If you’re in crisis, text 741741 if you’re in the US to talk with a counselor now. In this episode we speak with the people behind Crisis Text Line and Crisis Trends, two services that use data to make a difference for those going through a crisis or l... If you’re in crisis, text 741741 if you’re in the US to talk with a counselor now. In this episode we speak with the people behind Crisis Text Line and Crisis Trends, two services that use data to make a difference for those going through a crisis or looking for someone with whom to talk. […] For the Love of Data full false 1:12:20
    E019 – Tech Spec – Cognos Analytics (11.0.6) https://www.fortheloveofdata.com/e019/?utm_source=rss&utm_medium=rss&utm_campaign=e019 Sun, 20 Aug 2017 22:09:44 +0000 http://www.fortheloveofdata.com/?p=283 https://www.fortheloveofdata.com/e019/#respond https://www.fortheloveofdata.com/e019/feed/ 0 Join me as I chat with my colleague and Cognos guru John Frazier about the latest release of Cognos, leading up to the anticipated release of the next version, 11.0.7, near the end of Q3. The latest version of Cognos (11.0.6) debuted on March 21, 2017. You can sign up for a perpetually free trial […] Join me as I chat with my colleague and Cognos guru John Frazier about the latest release of Cognos, leading up to the anticipated release of the next version, 11.0.7, near the end of Q3.

    The latest version of Cognos (11.0.6) debuted on March 21, 2017. You can sign up for a perpetually free trial (like Tableau Online) here.

    Version 11 was originally released in December 2015 and was mainly a UI redesign on top of Cognos 10 features. Analysis and Query Studios will eventually be deprecated.

    New Features in 11 vs. 10

    • New UI – responsive web design on UI, but not on reports
    • Better self-service capabilities and collaboration for teams
    • Upload data files – upload delimited text or Excel files to be stored in a columnar format (Parquet) on the file system (not in memory or in the DB). These are immediately usable in dashboards and don’t require entry into FM.
    • Data modules (intent based modeling based on Watson) similar to FM packages
      • Note: Dashboards only use uploaded files and data modules
    • Available on cloud
    • Mobile and desktop from a single report
    • Active reports as prompts
    • Free cloud trial
    • Admin console is unchanged

    New Features in 11.0.6

    • Mapping enhancements
      • Multiple admin boundaries, add’l postal code support
    • Dashboarding enhancements
      • Direct access to OLAP packages (Framework packages accessible since 11.0.5)
      • Widgets using data from the same source are connected by default
      • New grid widget
      • Color gradient by measure
      • Date filters can include blanks
    • Portal enhancements
      • Share/embed through overflow menu
      • Folder customizations can be done directly through the UI more easily (without uploading JSON configs)
      • Create shortcuts and report views
    • Storytelling enhancements
      • New guided journey templates
      • New animations (side fade, slide, scale, zoom, pivot)
      • Better pins (smart named, better search and filter)
      • Timelines – smart names
      • Change scene template while working on your story/dashboard
    • Reporting enhancements
      • Better lineage support for FM packages
      • Business glossary (w/IBM InfoSphere Information Governance Catalog integration)
      • Better freeze list column heading control
      • Better query support when editing data modules
      • Report templates – can save for your team or save as style reference reports
    • Support for Planning Analytics
      • Dashboard support for TM1 / Planning cubes
      • REST connectivity to planning analytics
      • Support for attribute hierarchies
      • Support for localized Planning Analytics cubes
    • Data server enhancements
      • Support for Google BigQuery and Google Cloud SQL via the BigQuery JDBC and MySQL JDBC drivers, respectively.
      • JDBC URL for Data Server Connections
      • Test connection feedback (this is not just in admin console now)

    John’s Likes/Dislikes with v11:

    • For those who are “used” to ReportStudio there is a pretty “steep” learning curve to locate where particular tools or components have been moved.
    • To be fair, ReportStudio had some counter-intuitive placements for some of these same tools (e.g. Hierarchy of design elements, etc.) that caused major headaches for new report designers.
    • Overall the new interface is more “intuitive” and the novice report developers I’ve worked with have picked it up remarkably quickly.
    • There are some changes that are really “nice” – like being able to see which Lists/Graphs use a particular query right from the query tree without having to “search” for where it is used on the “right click” menu.

    Music

    Deep Sky Blue by Graphiqs Groove

    Sources

    1. https://www.ibm.com/analytics/us/en/technology/products/cognos-analytics/
    2. https://www.ibm.com/communities/analytics/cognos-analytics-blog/the-latest-release-of-cognos-analytics-is-here/
    3. http://newintelligence.ca/top-12-reasons-to-upgrade-to-cognos-analytics-a-k-a-cognos-11/
    4. https://www.ibm.com/support/knowledgecenter/SSEP7J_11.0.0/com.ibm.swg.ba.cognos.ca_new.doc/c_ca_nf_deprecated.html
    5. https://www.ibm.com/support/knowledgecenter/en/SSEP7J_11.0.0/com.ibm.swg.ba.cognos.ca_new.doc/c_ca_nf_11_0_x.html
    6. https://www.slideshare.net/senturus/cognos-analytics-version-11-questions-answered
    ]]>
    Join me as I chat with my colleague and Cognos guru John Frazier about the latest release of Cognos, leading up to the anticipated release of the next version, 11.0.7, near the end of Q3. The latest version of Cognos (11.0.6) debuted on March 21, Join me as I chat with my colleague and Cognos guru John Frazier about the latest release of Cognos, leading up to the anticipated release of the next version, 11.0.7, near the end of Q3. The latest version of Cognos (11.0.6) debuted on March 21, 2017. You can sign up for a perpetually free trial […] For the Love of Data full false 31:00
    E018 – Tech Spec – Sia, ultimate blockchain file storage https://www.fortheloveofdata.com/e018/?utm_source=rss&utm_medium=rss&utm_campaign=e018 Sun, 30 Jul 2017 03:25:20 +0000 http://www.fortheloveofdata.com/?p=279 https://www.fortheloveofdata.com/e018/#respond https://www.fortheloveofdata.com/e018/feed/ 0 What if you could store your data in the cloud, encrypted, for a fraction of the cost of Amazon S3, Google, or Azure? With Sia, a decentralized file storage solution that leverages blockchain, you can. Learn more about how it works in this episode. Blockchain Overview A blockchain is a permissionless distributed database that maintains […] What if you could store your data in the cloud, encrypted, for a fraction of the cost of Amazon S3, Google, or Azure? With Sia, a decentralized file storage solution that leverages blockchain, you can. Learn more about how it works in this episode.

    Blockchain Overview

    A blockchain is a permissionless distributed database that maintains a continuously growing list of transactional data records. The system’s design means it is hardened against tampering and revision, even by operators of the nodes that store data. The initial and most widely known application of the block chain technology is the public ledger of transactions for bitcoin, but its structure has been found to be highly effective for other financial vehicles.

    CONSENSUS BUILDING The ability for a significant number of nodes to converge on a single consensus of the most up-to-date version of a large data set such as a ledger TRANSACTION VALIDITY The ability for any node that creates a transaction to determine whether the transaction is valid, able to take place, and become final (i.e. that there were no conflicting transactions) AUTOMATED RESOLUTION An automated form of resolution that ensures that conflicting transactions (such as two or more attempts to spend the same balance in different places) never become part of the confirmed data set.

    Blockchain block detail

    [Illustration by Matthäus Wander (Wikimedia)]

    • Timestamp: The time when the block was found.
    • Reference to Parent (Prev_Hash): This is a hash of the previous block header which ties each block to its parent, and therefore by induction to all previous blocks. This chain of references is the eponymic concept for the blockchain.
    • Merkle Root (Tx_Root): The Merkle Root is a reduced representation of the set of transactions that is confirmed with this block. The transactions themselves are provided independently forming the body of the block. There must be at least one transaction: The Coinbase. The Coinbase is a special transaction that may create new bitcoins and collects the transactions fees. Other transactions are optional.
    • Target: The target corresponds to the difficulty of finding a new block. It is updated every 2016 blocks when the difficulty reset occurs.
    • The block’s own hash: All of the above header items (i.e. all except the transaction data) get hashed into the block hash, which for one is proof that the other parts of the header have not been changed, and then is used as a reference by the succeeding block.

    Why You Can't Cheat at Bitcoin 2. But one miner wants to alter a transaction in block 74. 3. He'd haveto make his changes and redo all the computations for blocks 74—90 and do block 91. That's 18 blocks Of expensive computing. I . Say everybody is working on block 91. 4. What's worse, he'd have to do it all before everybody else in the Bitcoin network finished just the one block (number 91) that they're working on.

    Sia Overview

    • Decentralized network that places encrypted pieces of your data on dozens of notes
    • Aims to be fastest, cheapest, most secure storage solution and compete with AWS, GCP, Azure
    • Users pay in Siacoins, a cryptocurrency like Bitcoin
      • Must go USD -> Bitcoin -> Siacoin -> Wallet -> File Upload
    • Open source
    • Started by David Vorick and Luke Champine through a VC backed Boston-based company called Nebulous Inc
    • Origins in the HackMIT 2013 conference
    • Uses ASICs (application specific integrated circuits) for mining
      • These are purpose built integrated circuits, not general multi-use devices
      • Evolution from CPU -> GPU – ASIC
      • Faster and less vulnerable to attacks than GPUs
      • Why? See here.
      • Created a company to make ASICs called obelisk.
      • ~$2,500 per machine
    • Current price is about 124 Siacoin to $1USD

    Pros

    • Decentralized, peer-to-peer
    • Encrypted and immutable
    • Hosts can earn money by renting free disk space to renters
      • Must maintain 95% uptime to preserve collateral

    Possible Issues

    • Renters uploading illegal content to hosts
      • However, renters would have to pay for the bandwidth leechers use to download files
    • Slow at this point
    • Low number of users

    Music

    Deep Sky Blue by Graphiqs Groove

    Sources:

    ]]>
    What if you could store your data in the cloud, encrypted, for a fraction of the cost of Amazon S3, Google, or Azure? With Sia, a decentralized file storage solution that leverages blockchain, you can. Learn more about how it works in this episode. What if you could store your data in the cloud, encrypted, for a fraction of the cost of Amazon S3, Google, or Azure? With Sia, a decentralized file storage solution that leverages blockchain, you can. Learn more about how it works in this episode. Blockchain Overview A blockchain is a permissionless distributed database that maintains […] For the Love of Data full false 16:00
    E017 – Tech Spec – Tableau 10.3 New Features https://www.fortheloveofdata.com/e017/?utm_source=rss&utm_medium=rss&utm_campaign=e017 Thu, 29 Jun 2017 11:19:12 +0000 http://www.fortheloveofdata.com/?p=266 https://www.fortheloveofdata.com/e017/#respond https://www.fortheloveofdata.com/e017/feed/ 0 In this episode we cover the new features in Tableau 10.3. This version debuted on May 31st, and a 10.3.1 update was released on 6/21/17. Data Driven Alerts Only on Tableau Server Receive an alert when a mark crosses a visual threshold Can use on any viz with a continuous numeric axis Can sign up […] In this episode we cover the new features in Tableau 10.3. This version debuted on May 31st, and a 10.3.1 update was released on 6/21/17.

    1. Data Driven Alerts
      1. Only on Tableau Server
      2. Receive an alert when a mark crosses a visual threshold
      3. Can use on any viz with a continuous numeric axis
      4. Can sign up yourself and others; then each person can self-administer
      5. Default check rate is 60 minutes or when an extract is refreshed. Can customize with this command:

    tabadmin set dataAlerts.checkIntervalInMinutes

    tabadmin restart

    1. Tableau Bridge – Limited Release
      1. Connect to live, on-premise data from Tableau Online
      2. Replaces the sync client – is basically the sync client + live query functionality. Client is installed and ran behind your firewall and pushes data to Tableau Online.
      3. Live connections must be enabled by administrators. Limited to RDBMSs (MySQL, SQL Server, etc.)
      4. Oracle cloud hosted DBs must use Tableau Bridge
      1. Must run as a service to enable live connections
      1. Must embed credentials in Tableau Bridge if you want it to automatically update on a schedule
      2. Will restart every hour minimum. You can set this window with this command:

    tabonlinesyncclientcmd.exe SetDataSyncRestartInterval –restartInterval=<value in seconds>

    1. Best Practices (https://www.tableau.com/about/blog/2017/5/introducing-tableau-bridge-live-queries-premises-data-tableau-online-70767)
      1. Split bridges into two machines: one for extract refreshes and another for live queries, unless usage is extremely low
      2. Run the bridge continuously (ideally on a VM in a data center)
      3. Tune dashboards and queries to leverage extracts for summarized data
    1. Smart Table and Join Recommendations – Machine Learning will recommend tables and joins (even on non-similar names) based on previous usage metrics
    2. PDF Connector
      1. Connect to PDFs, identify tables, and pull data out
      2. Less copying/pasting/massaging of data to get it ready for Tableau
      3. Somewhat limited at this time, but continuing to be developed
    3. More Union support in more connectors
      1. DB2
      2. Hadoop
      3. Teradata
      4. Netezza
    4. New connectors
      1. Amazon Athena
      2. MongoDB BI
      3. OneDrive
      4. ServiceNow
      5. Dropbox
      6. JSON – scan entire file, not just a sample
    5. Automatic Query Caching – Tableau server can pre-cache queries in recent workbooks after an extract refresh to speed up performance on initial load.
    6. Miscellaneous
      1. More options in Web Authoring (drills, formats, changing displays)
      2. Story points navigator – more streamlined
      3. Mobile – Android improvements, banner to Tableau Mobile, universal linking that allows you to click and open in Tableau Mobile
      4. Tooltip selections – highlight data from tooltip links
      5. Latest date filter
      6. Distribute evenly
      7. Maps – French, Netherlands, Australian, and New Zealand updates
      8. Apply table calc filters to totals
      9. Custom subscriptions – days/hours, etc.
      10. APIs – various REST updates (tags on sources and views, switch sites, get sites list, etc.)

    Music is Deep Sky Blue by Graphiqs Groove

    Sources

    1. https://www.tableau.com/new-features/10.3
    2. https://www.tableau.com/about/blog/2017/4/save-time-data-driven-alerts-tableau-103-67888
    ]]>
    In this episode we cover the new features in Tableau 10.3. This version debuted on May 31st, and a 10.3.1 update was released on 6/21/17. Data Driven Alerts Only on Tableau Server Receive an alert when a mark crosses a visual threshold Can use on any v... In this episode we cover the new features in Tableau 10.3. This version debuted on May 31st, and a 10.3.1 update was released on 6/21/17. Data Driven Alerts Only on Tableau Server Receive an alert when a mark crosses a visual threshold Can use on any viz with a continuous numeric axis Can sign up […] For the Love of Data full false 15:59
    E016 – For the Love of Sunscreen https://www.fortheloveofdata.com/e016/?utm_source=rss&utm_medium=rss&utm_campaign=e016 Wed, 31 May 2017 07:59:59 +0000 http://www.fortheloveofdata.com/?p=259 https://www.fortheloveofdata.com/e016/#respond https://www.fortheloveofdata.com/e016/feed/ 0 In this episode, data sheds some (sun)light on what Rob did wrong on a recent trip to the Caribbean and explains the terrible sunburn he has right now. Just in time for Memorial Day and Summer, we take a look at many recent findings and how they will lead us to a healthier outdoor lifestyle. […] In this episode, data sheds some (sun)light on what Rob did wrong on a recent trip to the Caribbean and explains the terrible sunburn he has right now. Just in time for Memorial Day and Summer, we take a look at many recent findings and how they will lead us to a healthier outdoor lifestyle.

    A LOT of this content came from the Environmental Working Group (EWG). Please visit their site for more great info and the source of much of this episode.

    EWG recently released it’s 2017 EWG Sunscreen Guide with research and guidance on sunscreen efficacy, ingredients, and health risks. It is chock full of great information to keep you safe and dispels many misconceptions that most people hold.

    Why are sun rays harmful?

    • UV radiation penetrates the skin and produces genetic mutations that can cause cancer
    • UVA
      • Less intense than UVB, but 30-50x more prevalent
      • Dominant tanning ray
      • UVA rays penetrate deeper, suppress the immune system, cause harmful free radicals to form, and are associated with higher risk of melanoma
    • UVB
      • UVB rays are the primary cause of sunburns and non-melanoma skin cancer.
      • Most intense from 10AM-4PM April through October
      • Most reflected by snow or ice
      • The chemicals in sunscreen help combat UVB rays more than UVA

    Why the Sun (UV Exposure) is Harmful3

    • New melanoma cases among American adults has tripled since the 1970s, from 7.9 per 100,000 people in 1975 to 25.2 per 100,000 in 2014 (NCI 2017)
    • Melanoma death rate for white American men, the highest risk group, has escalated sharply, from 2.6 deaths per 100,000 in 1975 to 4.4 in 2014
    • Since 2003, the rates of new melanoma cases among both men and women have been climbing by 1.7 and 1.4 percent per year, respectively, according to the federal Centers for Disease Control and Prevention (CDC 2016)
    • More than 3 million Americans develop skin cancer each year (ACS 2017)
    • Most cases involve one of two disfiguring but rarely fatal forms of skin cancer – basal and squamous cell carcinomas. Studies suggest that basal and squamous cell cancers are strongly related to UV exposure over years.
      • Several researchers have found that regular sunscreen use lowers the risk of squamous cell carcinoma (Gordon 2009, van der Pols 2006) and diminishes the incidence of actinic keratosis – sun-induced skin changes that may advance to squamous cell carcinoma (Naylor 1995, Thompson 1993)
      • Researchers have not found strong evidence that sunscreen use prevents basal cell carcinoma (Green 1999, Pandeya 2005, van der Pols 2006, Hunter 1990, Rosenstein 1999, Rubin 2005).
      • Both UVA and UVB rays can cause melanoma, as evidenced by laboratory studies on people with extreme sun exposures. In the general population, there is a strong correlation between melanoma risk and a person’s number of sunburns, particularly those during childhood (Dennis 2010).
      • The use of artificial tanning beds dramatically increases melanoma risk (Coleho 2010).
    • People who rely on sunscreens tend to burn, and sunburns are linked to cancer.
      • When people use sunscreen properly to prevent sunburn, they often extend their time in the sun. They may prevent burns, but they end up with more cumulative exposure to UVA rays, which inflict subtler damage (Autier 2009, Lautenschlager 2007).

    However, research isn’t conclusive how the link between UV exposure and sunscreen.

    • Scientists don’t know conclusively whether sunscreen can help prevent melanoma. There are studies on both sides that say it helps or it does not.
    • Several factors suggest that regular sun exposure may not be as harmful as intermittent and high-intensity sunlight. Paradoxically, outdoor workers report lower rates of melanoma than indoor workers (Radespiel-Troger 2009).
      • Melanoma rates are higher among people who live in northern American cities with less year-round UV intensity than among residents of sunnier cities (Planta 2011).
      • Researchers speculate that higher vitamin D levels for people with regular sun exposure may play a role in reduced melanoma risk (Godar 2011, Newton-Bishop 2011, Field 2011).
        • So DRINK MILK!
      • The consensus among researchers is that the most important step people can take to reduce their melanoma risk is to avoid sunburn but not all sun exposure (Planta 2011).

    What is SPF?

    • SPF = Sun Protection Factor
    • How much longer it will take for sun to redden skin than without it (i.e., SPF 15 = 15x longer for the sun to redden you.

    • IBISWorld, a market research company, reports that sunscreen product sales grew 2.6 percent a year between 2011 and 2016, and generated $394 million annually (IBISWorld 2016)3

    Effects by Age

    • Baby skin is thinner and absorbs more water
    • Infant and toddler skin has less melanin, which protects from UV light
    • The older you are the thicker and more pigmented you get, which is more protective
    • Very few studies are done on the effects on small children
    • Adults older than 60 are also more sensitive to sunlight

    Tanning beds are BAD!

    • Emit up to 12x the UVA of the sun
    • People who use tanning beds are 1.5-2.5x more likely to get cancer.
    • The risk of melanoma goes up when you use a tanning bed at any age, but the  International Agency for Research on Cancer calculates that if you start using tanning beds before age 30, your risk of developing melanoma jumps by 75 percent3.

    Vitamin A is a bad ingredient

    Vitamin A in the form of retinyl palmitate can harm skin when combined with sunlight. Luckily its usage has been falling.

    Sprays are convenient, but not the best option

    Inhaling the chemicals in the spray can be bad, most people apply too light of a coat, and people miss spots. Despite this their use is on the rise, increasing 27%.

    High SPFs are deceiving2

    • Correctly applied SPF 50 blocks 98% of UVB rays; SPF 100 blocks 99%
    • The higher the SPF, the more UVB it blocks, but the less UVA it blocks
    • The way sunscreens are measured may not reflect real world conditions
      • In lab measurements, small changes in light can change an SPF 100 sunscreens rating to SPF 37
    • People spend more time in the sun when they wear a higher SPF
    • Higher doses of ingredients may be harmful when absorbed into the skin
    • If you don’t apply enough, or misapply, an SPF 100 sunscreen’s actual rating could be as low as SPF 3.2. T-Shirts are SPF 5.
    • Most countries cap advertisements at 50+ (Europe, Japan, Canada, etc.); Australia caps at 30

    European Sunscreens > American Sunscreens?

    Several European companies have developed chemicals that are better at blocking UVA, but these have not yet been approved by the FDA. Europe also requires that the advertised SPF (which is its UVB rating) be no more than 3x the UVA rating.

    Tips to Stay Safe in the Sun

    Know how intense the sun is

    Check a site like http://sunburnmap.com/

    Know your ingredients and pick the right SPF

    Know what protects you best. Check if a sunscreen’s claims are accurate, and check how harmful the ingredients may be at http://wsw.ewg.org/sunscreen/

    FDA-Approved Sunscreens Side Effects
    Active Ingredient/UV Filter Name Range Covered
    UVA1: 340-400 nm
    UVA2: 320-340 nm
    UVB: 290-320 nm
    Chemical Absorbers:
    Aminobenzoic acid (PABA) UVB
    Avobenzone UVA1 Relatively high skin allergen
    Cinoxate UVB
    Dioxybenzone UVB, UVA2
    Ecamsule (Mexoryl SX) UVA2
    Ensulizole (Phenylbenzimiazole Sulfonic Acid) UVB
    Homosalate UVB Slight skin penetration; disrupts some hormones
    Meradimate (Menthyl Anthranilate) UVA2
    Octocrylene UVB Relatively high allergen
    Octinoxate (Octyl Methoxycinnamate) UVB Slight skin penetration; acts like hormone in body; moderate allergen
    Octisalate ( Octyl Salicylate) UVB
    Oxybenzone UVB, UVA2 Penetrates skin significantly; acts like estrogen in the body; relatively high allergen
    Padimate O UVB
    Sulisobenzone UVB, UVA2
    Trolamine Salicylate UVB
    Physical Filters:
    Titanium Dioxide UVB, UVA2 Inhalation concerns
    Zinc Oxide UVB,UVA2, UVA1 Inhalation concerns

    Table From http://www.skincancer.org/prevention/uva-and-uvb

    Follow these tips

    • Seek the shade, especially between 10 AM and 4 PM.
    • Do not burn.
    • Avoid tanning and UV tanning booths.
    • Cover up with clothing, including a broad-brimmed hat and UV-blocking sunglasses.
    • Use a broad spectrum (UVA/UVB) sunscreen with an SPF of 15 or higher every day. For extended outdoor activity, use a water-resistant, broad spectrum (UVA/UVB) sunscreen with an SPF of 30 or higher.
    • Apply 1 ounce (2 tablespoons) of sunscreen to your entire body 30 minutes before going outside. Reapply every two hours, or immediately after swimming or excessive sweating.
    • Keep newborns out of the sun. Sunscreens should be used on babies over the age of six months.
    • Examine your skin head-to-toe every month.
    • See your physician every year for a professional skin exam.
    • Don’t forget to sunscreen your lips

    Most tips From http://www.skincancer.org/prevention/uva-and-uvb

    At a glance, do these things:

    Other places to protect yourself

    • Car windows block a lot of UVB, but not UVA
      • Two studies found significantly more melanoma on the left side of the body/face, suggesting long exposure in cars puts you at more risk
      • Car windshields block a lot of UVB and UVA because of the plastic in the middle (around SPF 50); side windows do not do so well (around SPF 16)
      • Transparent window films block out almost 100% of both UVA and UVB
    • Skip the sunroof and convertible
    • Check office windows and skylights to see if they are glass or plastic and if they are treated with a UV film

    Tips if you get a Sunburn17

    • Take frequent cool baths or showers to help relieve the pain. As soon as you get out of the bathtub or shower, gently pat yourself dry, but leave a little water on your skin. Then, apply a moisturizer to help trap the water in your skin. This can help ease the dryness.
    • Use a moisturizer that contains aloe vera or soy to help soothe sunburned skin. If a particular area feels especially uncomfortable, you may want to apply a hydrocortisone cream that you can buy without a prescription. Do not treat sunburn with “-caine” products (such as benzocaine), as these may irritate the skin or cause an allergic reaction.
    • Consider taking aspirin or ibuprofen to help reduce any swelling, redness and discomfort.
    • Drink extra water. A sunburn draws fluid to the skin’s surface and away from the rest of the body. Drinking extra water when you are sunburned helps prevent dehydration.
    • If your skin blisters, allow the blisters to heal. Blistering skin means you have a second-degree sunburn. You should not pop the blisters, as blisters form to help your skin heal and protect you from infection.
    • Take extra care to protect sunburned skin while it heals. Wear clothing that covers your skin when outdoors. Tightly-woven fabrics work best. When you hold the fabric up to a bright light, you shouldn’t see any light coming through.

    Tips from https://www.aad.org/public/skin-hair-nails/injured-skin/treating-sunburn

    Music

    “Wear Sunscreen Commencement Speech” by Mike Harper, KNVE

    Sources

    1. https://www.ewg.org/sunscreen/report/executive-summary/
    2. http://www.ewg.org/sunscreen/report/whats-wrong-with-high-spf/
    3. http://www.ewg.org/sunscreen/report/skin-cancer-on-the-rise/
    4. http://www.ewg.org/sunscreen/best-kids-sunscreens/
    5. http://www.ewg.org/sunscreen/worst-kids-sunscreens/
    6. https://www.ewg.org/sunscreen/best-sunscreens/best-beach-sport-sunscreens/
    7. http://www.ewg.org/sunscreen/about-the-sunscreens/730906/
    8. http://sunburnmap.com/
    9. https://www.cdc.gov/mmwr/pdf/wk/mm6118.pdf
    10. http://lifehacker.com/sunscreen-showdown-creams-vs-sprays-1784495399
    11. http://www.skincancer.org/prevention/uva-and-uvb
    12. http://www.bananaboat.com/sun-safety/spf-chart
    13. https://sydology.com/2014/07/03/sun-smarts/spf-chart/
    14. http://www.npr.org/sections/health-shots/2011/06/06/137010355/a-babys-skin-is-no-match-for-the-sun
    15. http://www.webmd.com/skin-problems-and-treatments/tc/sunburn-topic-overview#1
    16. http://www.everydayhealth.com/skin-and-beauty/sunscreen-mistakes-that-hurt-your-skin.aspx
    17. https://www.aad.org/public/skin-hair-nails/injured-skin/treating-sunburn
    18. http://www.nytimes.com/2011/04/05/health/05really.html
    19. http://www.autoblog.com/2013/09/06/not-all-car-windows-protect-against-uv-rays/
    ]]>
    In this episode, data sheds some (sun)light on what Rob did wrong on a recent trip to the Caribbean and explains the terrible sunburn he has right now. Just in time for Memorial Day and Summer, we take a look at many recent findings and how they will l... In this episode, data sheds some (sun)light on what Rob did wrong on a recent trip to the Caribbean and explains the terrible sunburn he has right now. Just in time for Memorial Day and Summer, we take a look at many recent findings and how they will lead us to a healthier outdoor lifestyle. […] For the Love of Data full false 31:39
    E015 – BBQ Showdown (Pellet Grill vs Big Green Egg) https://www.fortheloveofdata.com/e015/?utm_source=rss&utm_medium=rss&utm_campaign=e015 Sun, 30 Apr 2017 02:48:04 +0000 http://www.fortheloveofdata.com/?p=246 https://www.fortheloveofdata.com/e015/#respond https://www.fortheloveofdata.com/e015/feed/ 0 Join me and my special guest, Colby “meat whore” Pritchett (@colbypritchett) on this BBQ showdown where we pit the Big Green Egg against the Green Mountain Grill Pellet Smoker. We also cover the history, styles, stats, and health facets of different types of BBQ.  History Bbq evolves from the spanish word ‘barbacoa’, but where the […] Join me and my special guest, Colby “meat whore” Pritchett (@colbypritchett) on this BBQ showdown where we pit the Big Green Egg against the Green Mountain Grill Pellet Smoker. We also cover the history, styles, stats, and health facets of different types of BBQ.

     History

    • Bbq evolves from the spanish word ‘barbacoa’, but where the word actually originated is still debated.
    • BBQ dates back to the colonial era. George Washington even attended bbq’s
    • Woods commonly selected for their flavor include mesquite, hickory, maple, guava, kiawe, cherry, pecan, apple and oak. Woods to avoid include conifers. These contain resins and tars, which impart undesirable resinous and chemical flavors.
    • The most popular foods for cooking on the grill are, in order: burgers (85 percent), steak (80 percent), hot dogs (79 percent) and chicken (73 percent).
    • MAY is national BBQ month
    • Only 10% of grill owners have a backyard kitchen, equipped with premium furniture and lighting?
    • The longest barbecue measured 8,000 m (20,246 ft) and was created by the people of Bayambang, (Philippines), in Bayambang, Pangasinan, Philippines on 4 April 2014. The record attempt took place during the Malangsi Fish-tival in order to celebrate the 400th anniversary of the city Bayambang. The barbecue was made up of 8,000 grills connected to each other, each measuring 1m in length, 58 cm in height and 21 cm in width. 50,000 kg of fish, 2,000 kg of salt, 480 blocks of ice and 6,000 bags of charcoal were used. 8,000 people were involved.

     Styles

    There are different regional barbecue styles all across the country. Although they all cook their meat low and slow, that’s where the similarities stop. Some cook pig, some smoke different cuts of beef, some lamb, and some chicken. Sauces are also varied: some are vinegar and pepper-based; others utilize brown sugar and molasses; in some, mustard is the predominant flavor; and tomato is the primary flavor in others. While there are plenty of nuances and micro-regional styles, there are four styles that anyone who claims to be a barbecue lover should know about.

    In North Carolina, barbecue revolves around the pig: the “whole hog” in the east and the shoulder in the west. The pork is chopped up and usually mixed with a vinegar-based sauce that’s heavy on the spices and contains only a small amount of tomato sauce, if any.

    In Memphis, it’s all about the ribs. Wet ribs are slathered with barbecue sauce before and after cooking, and dry ribs are seasoned with a dry rub. You’ll also find lots of barbecue sandwiches in Memphis: chopped pork on a bun topped with barbecue sauce, pickles, and coleslaw.

    Kansas City barbecue uses a wide variety of meat (but especially beef) and here it’s all about the sauce, which is thick and sweet. Kansas City is a barbecue melting pot, so expect to find plenty of ribs, brisket, chicken, and pulled pork there, all served with plenty of sauce. Brisket burnt ends are also a specialty here.

    And there are a few different styles native to Texas, but the most famous variety is the Central Texas Hill Country “meat market” style: heavy on the beef brisket, which has been given a black pepper-heavy rub. Sauce and side dishes usually play second fiddle, because in Texas it’s all about the meat, be it ginormous beef ribs, pork ribs, chicken, brisket, or sausage.

    – http://lehighvalleymarketplace.com/get-sauced-the-nations-top-bbq-regions/

     

    Brisket Cuts

    • USDA Utility, Cutter, Canner Beef. These are the lowest grades of beef and used primarily by processors for soups, canned chili, sloppy Joe’s, etc. You will not likely see them in a grocery.
    • USDA Standard or Commercial Beef. Practically devoid of marbling. If it does not have a grade on the label it is probably standard or commercial. These grades are fine for stewed or ground meat, but they are a bad choice for the grill. About 2% fat.
    • USDA Select Beef. Slight marbling. If you know what you are doing you can make this stuff tender. Otherwise, get a higher grade. About 2 to 4% fat.
    • USDA Choice Beef. Noticeable marbling, but not a lot. This is a good option for backyard cooks. About half of all beef is marked USDA Choice. There are actually three numbered sublevels of USDA Choice. Certified Angus Beef (CAB) is limited to only the top two levels. Reliable sources tell me that Walmart “Choice Premium” is USDA Choice. The word “premium” is all about marketing and not to be confused with USDA Prime. 4-10% fat. A 12 ounce ribeye typically sell for about $8 to 10 retail at the time of this writing in 2010, and prices fluctuate depending on supply and demand as well as weather which impacts the cost of feed.
    • USDA Prime Beef. Significant “starry night” marbling. Often from younger cattle. Prime is definitely better tasting and more tender than Choice. Only about 3% of the beef is prime and it is usually reserved for the restaurant trade. About 10 to 13% fat, about $20-30 for a 12 ounce ribeye at retail. A dry aged steak can be15-18% fat and $30-35 or more for a 12 ounce ribeye.
    • Black Angus. Black Angus cattle are considered by many to be an especially flavorful breed. Alas, it is almost impossible to know if what you are buying really is Angus.
    • Certified Angus Beef. The Certified Angus Beef (CAB) brand is a trademarked brand designed to market quality beef. To wear the CAB logo, the carcass is supposed to pass 10 quality control standards and CAB must be either USDA Prime or one of the two upper sublevels of USDA Choice. Most of it is USDA Choice. CAB costs a bit more because the American Angus Association charges a fee to “certify” the cattle and higher markups take place on down the line.
    • Interestingly, CAB does not actually certify that the beef labeled Certified Angus Beef is from the highly regarded Angus breed. Their major control is that the cattle must have a black hide, which is a genetic indicator that there are Angus genes in the cattle, but not a guarantee.
    • Wagyu Beef. Wagyu cattle have Japanese blood lines and are now raised in the US and other countries. Their genetic heritage can be any of a number of Japanese cattle breeds. American Wagyu does not have to adhere to the standards as Kobe beef (below), and many of the Wagyu are cross bred with local breeds to make them better adapted to the local climates and diseases. Wagyu and Angus crosses are frequent, and they make mighty fine meat. Wagyu is usually extremely marbled, usually 4 to 10 BMS, more than USDA Prime, but not as much as Kobe, and the flavor and texture is distinctive. It is also about twice the price of USDA Prime. One can only wonder how long before the cross breeding and lack of enforceable standards dilute the quality.

    Nutrition Facts

    Brisket Sales

    • Beef Brisket unit sales (in millions of pounds)

    • 2014 Brisket Sales by Holiday in US (millions of pounds)

    • 538 – Where’s the Beef
      • US Cattle Herds are shrinking (97mm in ’07 –> 88.5mm in ’14)
        • Fertilizer, fuel, and feed rose
        • Droughts hit
      • Prices are rising

     2016 Sales by Restaurant Pecan Lodge Ten 50 BBQ Franklin’s Austin
    Brisket 6,700 2,100 10,662
    Sausage 1,525 2,000 1,200
    Ribs 1,823
    Mac & Cheese 4,000
    Potato Salad 75
    Beans 1,600 600
    Peach Cobbler 340
    Sides 600
    Torpedos 6,500
    Rolls/Bread 4,200 4,000
    Notes Brisket is their single largest expense – more than rent, electricity, etc.
    • Dickey’s – uses BigData and near real-time analytics of store data (synced every 20 min.) to analyze sales trends, inventory, etc.
      • If ribs aren’t selling well, they can send a text message coupon out to affect sales
      • Tools: iOLAP vendor, implemented Yellowfin BI and Syncsort DMX ETL on Amazon Redshift

    Other Stats

    •  75% of U.S. adults own a grill or smoker.
    • The majority of grill owners (63%) use their grill or smoker year-round and 43% cook at least once a month through winter.
    • Nearly a third of current owners plan to grill with greater frequency this year.
    • Barbecuing isn’t just an evening activity: 11% of grill owners prepared breakfast in the past year.
    • The five most popular days to barbecue, in order are: July Fourth; Labor Day & Memorial Day (tied); Father’s Day; Mother’s Day.
    • The top three reasons for cooking outdoors, in order are: to improve flavor; for personal enjoyment; for entertaining family and friends.
    • Gas grills are easily the most popular style, the choice of 62% of households that own a grill.

    Pellet Grills

    • Traeger patent granted in 1986 and expired in 2006
    • Continuous fuel source like gas; indirect heating like a traditional smoker, so no flame ups, precise temperature control
    • For people who approach cooking as a science rather than an art (but there’s still art to it)
    • Induction fan makes grill like a convection oven
    • Hopper -> Auger -> firebox -> induction fan
    • Pro Tips:
      • MAKE SURE YOU DON’T RUN OUT OF FUEL
      • Have your vent open almost all the way
      • Turn off in proper way to prevent clogs and lock-ups
        • Taking it apart to clean it is fraught with peril
      • It may still have hot spots like any other grill or oven
      • Use food grade pellets, not cheap ones for heaters (these can be scrap wood, shredded pallets, etc.)
      • Wifi sounds cool and is, but sometimes it is temperamental and easier just to use w/o it, particularly if in a hurry
      • Use your own remote thermometers to watch different parts of grill and multiple pieces of meat at once
      • I still use a gas grill to do direct heat or searing
      • Can use a thermal blanket to insulate during winter or in cold locations – will use less pellets when you do this
      • Get the smallest grill you can stand. The bigger the grill the more pellets required to cook, so you may just be paying to heat air
    • What to look for:
      • Variable temperature setting (not three positions)
      • Hopper capacity
      • Meat probes
      • Shelves and hooks
      • Wifi / smart phone connectivity – verify whether it is only on local Wi-fi or internet capable
      • Larger temperature range offers more options for cold smoking, steaks, etc.
      • Some have options for pizza stones, sear plates, etc.
    • Cool infographic about pellet grills

    – Infographic from Grilling with Rich

     

    Big Green Egg

    • The design is based on ancient clay cooking vessels up to 3,000 years old.
    • Kamado style clay pot grills with removable lids originated in Japan. Kamado means “cooking range” or “stove” in Japanese.
    • Very fuel efficient as they hold heat extremely well regardless of the weather. The fact that is holds heat and traps in moisture causes  the meat to stay juicy and not dry out.
    • US Air Force servicemen started bringing Kamado style grills back to the US after World War II.
    • In the 1960s people started manufacturing them in the US.
    • Ed Fisher discovered these grills overseas and returned to the US to start the Big Green Egg company in 1974.

    Health Tips

    • Grilling Danger #1: Char
      • While char marks in grilled meat look appealing and give a tasty flavor, the char is laden with cancer-causing compounds called heterocyclic amines (HCAs) that form when meat and high heat are combined to create a blackened crust. The more char that’s created, the more carcinogens result that coat your food. High levels of HCAs can cause cancer in laboratory animals exposed to them, and epidemiological studies show that eating charred meats may be associated with an increased risk of colorectal, pancreatic and prostate cancer.
    • Grilling Danger #2: Smoke
      • Barbecue smoke contains polycyclic aromatic hydrocarbons (PAHs), toxic chemicals that can damage your lungs. As meat cooks, drippings of fat hit the coals and create PAHs, which waft into the air. If you are a grill chef who loves to stand over the barbeque, you are inhaling these toxins. The smoky smell on your clothes and in your hair is also coating the inside of your lungs. The more your grill smokes, the more PAH is generated. The toxins are absorbed along with that delicious smoky flavor right into your food.
    • Grilling Danger #3: Harmful byproducts
      • When food is cooked at very high temperatures, a chemical chain reaction can occur that creates inflammatory products called advanced glycation end products (AGEs) that are harmful to your cells and associated with cellular stress and aging. As suggested by the name ‘end product,’ your body cannot digest them or get rid of them easily. Over time, AGEs accumulate in your organs and cause damage. Where do you find AGEs in the barbeque? In the char.
    • How to avoid the dangers
      • Use marinades and rubs – Coating the meat in herbs with a rub containing rosemary, thyme, pepper or smothering with thick marinades not only adds delicious flavor but can also help reduce the creation of carcinogens by grilling by up to 96%. A tasty marinade also reduces dripping fat and smoke and helps prevent char, thereby lowering the amount of all 3 threats – HCAs, PAH, and AGEs – in your food. Take home message: Boosting flavor can reduce risk.
      • Pre-cook your meat – As easy way to decrease toxins created by the barbecuing is to pre-cook your meat halfway over low heat in a skillet or the oven before putting them on the grill. Precooking removes some of the fat that can drip and smoke, and it greatly reduces the amount of time your meat sits on the grill being exposed to toxins. Less time at high heat also means fewer AGEs are created in your meat. Extra bonus: with precooking, you can barbeque the food much faster to feed the hungry troops.
        • Marinate the food in alcohol before barbecuing it. According to research published by the Journal of Agricultural and Food Chemistry, soaking meat in a marinade of beer – especially stout or black beer – reduces the creation of PAHs (cancer-causing ­carcinogens) when it’s grilled by around 50%
      • Reduce drippings – Using a simple piece of aluminum foil as a protective barrier under the meat helps prevent drippings from smoking, thereby reducing the amount of PAH blowing into your food and your lungs. Keeping drippings in the foil can also help to keep your food moist. Another great way to reduce drippings is to choose leaner cuts of meat and trim off any excess fat before you put them on the grill.
      • Grill veggies – Grilled vegetables do not contain the HCA carcinogens even when charred. Vegetable kabobs made with peppers, cherry tomatoes and red onions are great on the grill, and offer many healthy nutrients and cancer fighting substances you can’t get from a steak or chicken breast.

     

    Music: Good BBQ by the Riptones via FreeMusicArchive.org

     

    Sources:

    1. https://www.forbes.com/sites/bernardmarr/2015/06/02/big-data-at-dickeys-barbecue-pit-how-analytics-drives-restaurant-performance/#51eee8106d95
    2. https://www.statista.com/statistics/542950/beef-brisket-unit-sales-us/
    3. https://redcedarbison.com/wp-content/uploads/2014/05/NutChart_txt_2013.jpg
    4. http://amazingribs.com/recipes/beef/zen_of_beef_grades.html
    5. http://www.dallasobserver.com/restaurants/pecan-lodge-s-justin-and-diane-fourton-on-the-challenge-of-great-barbecue-7464853
    6. https://www.statista.com/statistics/542985/beef-brisket-unit-sales-us-summer-holiday/
    7. http://austin.eater.com/2016/6/15/11944024/austin-barbecue-statistics
    8. https://fivethirtyeight.com/features/wheres-the-beef/
    9. http://dallas.eater.com/2016/6/16/11952242/dallas-barbecue-joints-by-the-numbers
    10. http://barbecuebible.com/2016/01/05/bbq-trends-2016/
    11. https://www.forbes.com/sites/larryolmsted/2016/04/28/the-united-states-of-barbecue-americas-love-affair-with-backyard-cooking/#55d7001f5a1d
    12. http://www.motherjones.com/environment/2016/06/july-4-independence-day-grill-bbq-statistics-fires-injuries-carbon
    13. https://en.wikipedia.org/wiki/Pellet_grill
    14. http://grillingwithrich.com/infographic-the-history-of-pellet-grills/
    15. http://barbecuebible.com/2015/02/20/new-pellet-grills/
    16. http://www.traegergrills.com/blog/history-of-the-bbq
    17. https://www.firecraft.com/article/history-of-pellet-grills
    18. http://www.thedailymeal.com/eat/10-things-you-didn-t-know-about-barbecue
    19. https://mobile-cuisine.com/did-you-know/barbecue-fun-facts/
    20. http://www.brickmarketdeli.com/2016/05/fun-facts-about-grilling/
    21. http://www.guinnessworldrecords.com/world-records/longest-barbecue
    22. http://eggheadforum.com/discussion/76256/pros-and-cons
    23. http://www.mirror.co.uk/lifestyle/health/your-bbq-could-give-you-3937181
    24. http://blog.doctoroz.com/oz-experts/the-hidden-dangers-of-grilling
    25. http://lehighvalleymarketplace.com/get-sauced-the-nations-top-bbq-regions/
    26. https://www.bbqguys.com/bbq-learning-center/buying-guides/kamado-grills-history
    27. https://en.wikipedia.org/wiki/Big_Green_Egg
    28. http://biggreenegg.com/about/
    29. http://barbecuebible.com/2015/08/25/where-did-kamado-grills-come-from/
    ]]>
    Join me and my special guest, Colby “meat whore” Pritchett (@colbypritchett) on this BBQ showdown where we pit the Big Green Egg against the Green Mountain Grill Pellet Smoker. We also cover the history, styles, stats, Join me and my special guest, Colby “meat whore” Pritchett (@colbypritchett) on this BBQ showdown where we pit the Big Green Egg against the Green Mountain Grill Pellet Smoker. We also cover the history, styles, stats, and health facets of different types of BBQ.  History Bbq evolves from the spanish word ‘barbacoa’, but where the […] For the Love of Data full false 1:01:22
    E014 – For the Love of Allergies https://www.fortheloveofdata.com/e014/?utm_source=rss&utm_medium=rss&utm_campaign=e014 Tue, 28 Mar 2017 03:00:46 +0000 http://www.fortheloveofdata.com/?p=221 https://www.fortheloveofdata.com/e014/#respond https://www.fortheloveofdata.com/e014/feed/ 0 Achooo! Did you know that seasonal allergies affect about 50 million people in the US, penicillin kills about 400 people/year, and some people are allergic to cockroaches?! Learn all about allergies in this episode. A note about this episode’s content: Most of the allergy information in this episode is very short statistics that were commonly […] Achooo! Did you know that seasonal allergies affect about 50 million people in the US, penicillin kills about 400 people/year, and some people are allergic to cockroaches?! Learn all about allergies in this episode.

    A note about this episode’s content:

    Most of the allergy information in this episode is very short statistics that were commonly repeated in several sources. In many cases, I simply collected these statements and presented them below. Unless specifically noted below, please consider all the information as referenced from another source. See list of sources at the bottom of the show notes.

    Allergies Defined

    An allergy is when your immune system reacts to a foreign substance, called an allergen. It could be something you eat, inhale into your lungs, inject into your body or touch. This reaction could cause coughing, sneezing, itchy eyes, a runny nose and a scratchy throat. In severe cases, it can cause rashes, hives, low blood pressure, breathing trouble, asthma attacks and even death.1

    There is no cure for allergies. You can manage allergies with prevention and treatment. More Americans than ever say they suffer from allergies. It is among the country’s most common, but overlooked, diseases.1

    Who is affected?

    • In about 50% of all homes in the U.S., there are at least 6 detectable allergens present in the environment.
    • Nasal allergies affect about 50 million people in the United States. (30% of adults, 40% of children)
    • Odds that a child with one allergic parent will develop allergies: 33%.
    • Odds that a child with two allergic parents will develop allergies: 70%.
    • Allergies are increasing and have been steadily for the past 50 years
    • Most common health issue for kids
    • Percentage of the U.S. population that tests positive to one or more allergens: 55%
    • Females are slightly more likely to have food allergies than males with percentages of reported reactions at 4.1 and 3.8 respectively.
    • Non-Hispanic white children have the highest percentage of reported food allergies at 4.1, non-Hispanic blacks at 4.0, and Hispanic children at 3.1.

    Lethal enforcers

    • The most common triggers for anaphylaxis, a life-threatening reaction, are medicines, food and insect stings.
    • Medicines cause the most allergy related deaths.
    • African-Americans and the elderly have the most deadly reactions to medicines, food or unknown allergens.
    • Deadly reactions from venom are higher in older white men.
    • Over the years, deadly drug reactions have increased a lot.

    It ain’t cheap

    • In 2010, Americans with nasal swelling spent about $17.5 billion on health costs.
    • They have also lost more than 6 million work and school days and made 16 million visits to their doctor.
    • Food allergies cost about $25 billion each year.

    Heyyyyy… Fever (Allergic Rhinitis)

    • Worldwide, allergic rhinitis affects between 10 percent and 30 percent of the population.
    • 7.8% of adults get hay fever
    • In 2010, white children were more likely to have hay fever than African-American children.
    • Global warming may have added four weeks to pollen season in the last 10-15 years

    Ragweed pollen count by year is on the rise.

    Allergies around the US

    Pollen map from pollen.com

    Pollen.com details on Dallas, TX

    Pollen.com Dallas, TX history.

    The Eczema-Allergy Connection9

    Eczema can flare up when you are around allergies. Children with eczema are also more likely to have food allergies, such as to eggs, nuts, or milk. They often make eczema symptoms worse for kids but not for adults.

    • Genes – a gene flaw that causes a lack of a type of protein, called filaggrin, weakens that skin barrier and makes it easier for allergens to get into the body.
    • How the body reacts to allergens – people with eczema may have small gaps in the skin that make it dry out quickly and let germs and allergens into the body. Allergens cause inflammation and lead to eczema.
    • Too many antibodies – people with eczema have above average levels of Immunoglobulin E (IgE), a type of antibody that plays a role in the body’s allergic response.

    Tips to avoid Hay Fever8

    1. Reduce your stress – less stress = milder symptoms
    2. Exercise more – a survey found that people who exercise have the mildest symptoms and this reduces stress, too. However, avoid exercising outdoors when the pollen count is high (early morning and early evening). Better yet, exercise indoors if symptoms are severe.
    3. Eat well
      1. Healthy diets = milder symptoms.
      2. However, foods that can worsen hay fever symptoms for some people include apples, tomatoes, stoned fruits, melons, bananas and celery.
      3. Eat foods rich in omega 3 and 6 essential fats which can be found in oily fish, nuts, seeds, and their oils. These contain anti-inflammatory properties, and may help reduce symptoms of hay fever.
    4. Cut down on alcohol – beer, wine and spirits contain histamine, the chemical that sets off allergy symptoms in your body. Alcohol also dehydrates you, making your symptoms seem worse.
    5. Sleep well = mildest symptoms. People who get seven hours of sleep or more report less symptoms than those getting five hours sleep or less a night.
    6. Get pricked – Immunotherapy (allergy shots) helps reduce hay fever symptoms in about 85% of people with allergic rhinitis.3

    Other allergies

    Skin in the game

    Skin allergies include skin inflammation, eczema, hives, chronic hives and contact allergies. Plants like poison ivy, poison oak and poison sumac are the most common skin allergy triggers. But skin contact with cockroaches and dust mites, certain foods or latex may also cause skin allergy symptoms.

    • In 2012, 8.8 million children had skin allergies.
    • Children age 0-4 are most likely to have skin allergies.
    • In 2010, African-American children in the U.S. were more likely to have skin allergies than white children.

    That PB&J that is to die for…literally. (Food Allergies)

    Children have food allergies more often than adults. Eight foods cause most food allergy reactions. They are milk, soy, eggs, wheat, peanuts, tree nuts, fish and shellfish.

    • Percentage of the people in the U.S. who believe they have a food allergy: up to 15%.
    • Percentage of the people in the U.S. who actually have a food allergy: 3% to 4%.
    • Peanut is the most common allergen. Milk is second. Shellfish is third.
    • Peanut and tree nut allergies affect about 1% of the US.
    • In 2014, 4 million children in the US have food allergies.
    • 8% of children have a food allergy
      • Also, 38.7 % of food-allergic children have a history of severe reactions.
      • 30.4% are allergic to multiple foods.

    Bad medicine (Drug Allergies)

    • Penicillin is the most common allergy trigger for those with drug allergies. Up to 10 percent of people report being allergic to this common antibiotic.
    • Penicillin kills about 400 people / year.
    • Bad drug reactions may affect 10 percent of the world’s population. These reactions affect up to 20 percent of all hospital patients.

    No glove love

    • Only about 1 percent of people in the U.S. have a latex allergy.
    • However, health care workers are becoming more concerned about latex allergies. About 8-12 percent of health care workers will get a latex allergy.
    • Approximately 220 cases of anaphylaxis and 3 deaths per year are due to latex allergy.

    Bug me not

    People who have insect allergies are often allergic to bee and wasp stings and poisonous ant bites. Cockroaches and dust mites may also cause nasal or skin allergy symptoms.

    • Insect sting allergies affect 5 percent of the population.
    • At least 40 deaths occur each year in the United States due to insect sting reactions.
    • Adults are about 4x more likely to die from an insect sting than a kid. Basically, if you still have a reaction when you’re an adult, it affects you hard.
    • Venom immunotherapy is 97% effective in preventing insect sting reactions in sensitive patients

    Music:

    datagroove by Goto80

    Sources:

    1. http://www.aafa.org/page/allergy-facts.aspx
    2. http://www.aaaai.org/about-aaaai/newsroom/allergy-statistics
    3. http://acaai.org/news/facts-statistics/allergies
    4. http://www.webmd.com/allergies/allergy-statistics
    5. http://www.healthline.com/health/allergies/statistics#1
    6. http://www.allergyassociatesinc.com/allergy-statistics/
    7. https://www.pollen.com
    8. http://www.nhs.uk/Livewell/hayfever/Pages/5lifestyletipsforhayfever.aspx
    9. http://www.webmd.com/skin-problems-and-treatments/eczema/treatment-16/eczema-allergies-link
    10. http://www.businessinsider.com/pollen-season-gets-worse-each-year-2015-6
    ]]>
    Achooo! Did you know that seasonal allergies affect about 50 million people in the US, penicillin kills about 400 people/year, and some people are allergic to cockroaches?! Learn all about allergies in this episode. Achooo! Did you know that seasonal allergies affect about 50 million people in the US, penicillin kills about 400 people/year, and some people are allergic to cockroaches?! Learn all about allergies in this episode. A note about this episode’s content: Most of the allergy information in this episode is very short statistics that were commonly […] For the Love of Data full false 20:57
    013 – For the Love of Graph Databases https://www.fortheloveofdata.com/e013/?utm_source=rss&utm_medium=rss&utm_campaign=e013 Tue, 21 Feb 2017 17:30:31 +0000 http://www.fortheloveofdata.com/?p=206 https://www.fortheloveofdata.com/e013/#respond https://www.fortheloveofdata.com/e013/feed/ 0 Where did graphs come from? (Graph Theory History) In its simplest form, Graph Theory defines a graph as a construct made up of vertices, nodes, or points which are connected by edges, arcs, or lines.1 The connections may be directed, indicating a direction from one node to another, or undirected. Properties are attributes associated with […] Where did graphs come from? (Graph Theory History)

    In its simplest form, Graph Theory defines a graph as a construct made up of vertices, nodes, or points which are connected by edges, arcs, or lines.1 The connections may be directed, indicating a direction from one node to another, or undirected. Properties are attributes associated with nodes that describe the node in some detail.

    Graph theory is applied in many disciplines from linguistics to computer science, physics, and chemistry. Popular uses will be discussed below. Leonhard Euler published “Seven Bridges of Königsberg” in 1736; this is commonly attributed as the first paper about graph theory. James Joseph Sylvester published a paper in 1878 where the term “graph” was first introduced. The first textbook was later published in 1936.1

    There are various algorithms that define how to best traverse through a graph from one node to another based on the edges between them.

    So what…I’ve never used a Graph Database.

    • Have you ever used Google? If so, then you’ve used the most well-known implementation of a graph database in recent times.
    • Google, Faceook, and LinkedIn all use proprietary forms of graph databases to underpin parts of their websites.

    How Google uses Graphs

    In the original 1998 academic paper that Sergey Brin and Lawrence Page wrote, they described PageRank, the graph portion of their first implementation of Google.

    Basically, all webpages are treated as nodes. The hyperlinks between the pages are edges, and an algorithm assigns a weight to the credibility of each page. The more links a page has to credible sources, the higher that page’s credibility becomes. A search is a) broken down into a series of words, b) used to find pages that most closely correlate to those words, and c) page results are ranked according to their credibility, or PageRank.

    As of mid-2016, the size of Google’s index as 130 trillion. Google has a nice infographic site on how search works here.

    What’s so good about a graph database?

    For use cases involving complex relationships and traversal of these, graphs make great choices. They can provide10:

    • Flexible and agile – a graph database should closely match the structure of the data it uses. This allows developers to start work sooner without the added complexity of mapping data across tables. Neo4J call this ‘whiteboard friendliness’ – meaning what you draw as the design on your whiteboard is how the data is stored in your database.
    • Greater performance – compared to NoSQL stores or relational databases, graph databases offer much faster access to complex connected data, mainly as they lack expensive ‘join’ operations. In one example, a graph database was 1000x faster than a relational database when working with a query depth of four.

      [Caveat: I did not perform this comparison, but I imagine a properly indexed instance of an Oracle database could complete this query in a decent amount of time, perhaps not as fast as Neo4j, but I bet it would at least finish the query.]
    • Lower latency – users of graph databases experience lower levels of latency. As the nodes and links ‘point’ to one another, millions of related records can be traversed per second and query response time remains constant irrespective of the overall database size.

      – Sample graph query
    • Good for semi-structured data – graph databases are schema free, meaning patchy data, or data with exceptional attributes, don’t pose a structural problem.

    (All of these bullets above are from https://cambridge-intelligence.com/keylines/graph-databases-data-visualization/)

    When should you use a graph database?

    The most popular and hottest use cases of graph DBs at the moment are:

    • Social network connections
    • Credit card fraud analysis
    • Recommendation engines
    • Master Data Management (MDM) – i.e., 360-degree view of customer
    • Logistics planning for transportation, traffic, shipping, etc.
    • Computer/telecom network planning and analysis

    These boil down to the following uses10:

    • Path finding: Their traversal efficiency make graph databases an effective path-finding mechanism. Links can be weighted, or assigned relative distances or times, to ascertain the shortest and most efficient routes between two nodes in a network.
    • Mapping dependencies: networks of computers and hardware can be modeled as graphs to find components with many dependents that may be potential weak points or vulnerabilities. Other dependency networks, for example corporate or investment structures can be mapped in a similar manner.
    • Communications: Communications between people can be stored as graphs. Applying network analysis measures can help find influential individuals.

    The Panama Papers13,14

    In 2016 11.5 million documents comprising 2.6TB of information were leaked from a Panama law firm (Mossack Fonseca). These documents were scanned and processed into the Neo4j graph database where investigative journalist used graph visualizations to uncover hidden insights and relationships that would have otherwise been missed.

    See the articles at Neo4J for more information on how this information was analyzed.

    What graph databases should I use9?

    Neo4j is far and away the most popular graph database. Neo4j and several of the other top graph DBs are all open source. Below is the trend of popularity for these databases from DB-engines.com. Neo4j is first with a score of 36.27, followed by OrientDB (5.87) and Titan (5.08).

    Rank

    DBMS

    Database Model

    Score

    Feb

    2017

    Jan

    2017

    Feb

    2016

    Feb

    2017

    Jan

    2017

    Feb

    2016

    1.

    1.

    1.

    Neo4j 

    Graph DBMS

    36.27

    +0.00

    +3.98

    2.

    2.

    2.

    OrientDB 

    Multi-model

    5.87

    +0.06

    -0.55

    3.

    3.

    3.

    Titan

    Graph DBMS

    5.08

    -0.42

    -0.27

     

    Tips for converting from a RDBMS to Graph (from Neo4j)12:

    • Each entity table is represented by a label on nodes
    • Each row in a entity table is a node
    • Columns on those tables become node properties.
    • Remove technical primary keys, keep business primary keys
    • Add unique constraints for business primary keys, add indexes for frequent lookup attributes
    • Replace foreign keys with relationships to the other table, remove them afterwards
    • Remove data with default values, no need to store those
    • Data in tables that is denormalized and duplicated might have to be pulled out into separate nodes to get a cleaner model.
    • Indexed column names, might indicate an array property (like email1, email2, email3)
    • Join tables are transformed into relationships, columns on those tables become relationship properties

    Music:

    Music for today’s podcast is Cyanos by Graphiqs Groove via FreeMusicArchive.org.

    Sources:

    1. https://en.wikipedia.org/wiki/Graph_theory
    2. https://en.wikipedia.org/wiki/Graph_database
    3. https://blogs.cornell.edu/info2040/2011/09/20/pagerank-backbone-of-google/
    4. http://ilpubs.stanford.edu:8090/361/1/1998-8.pdf
    5. https://neo4j.com/why-graph-databases/
    6. https://en.wikipedia.org/wiki/Neo4j
    7. https://academy.datastax.com/resources/getting-started-graph-databases
    8. http://www.predictiveanalyticstoday.com/top-graph-databases/
    9. http://db-engines.com/en/ranking/graph+dbms
    10. https://cambridge-intelligence.com/keylines/graph-databases-data-visualization/
    11. http://bitnine.net/rdbms-vs-graph-db/?ckattempt=2
    12. https://neo4j.com/developer/graph-db-vs-rdbms/
    13. https://neo4j.com/blog/icij-neo4j-unravel-panama-papers/
    14. https://neo4j.com/blog/analyzing-panama-papers-neo4j/
    ]]>
    Where did graphs come from? (Graph Theory History) In its simplest form, Graph Theory defines a graph as a construct made up of vertices, nodes, or points which are connected by edges, arcs, or lines.1 The connections may be directed, Where did graphs come from? (Graph Theory History) In its simplest form, Graph Theory defines a graph as a construct made up of vertices, nodes, or points which are connected by edges, arcs, or lines.1 The connections may be directed, indicating a direction from one node to another, or undirected. Properties are attributes associated with […] For the Love of Data full false 20:03
    012 -For the Love of Guns https://www.fortheloveofdata.com/012-for-the-love-of-guns-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=012-for-the-love-of-guns-for-the-love-of-data Fri, 27 Jan 2017 10:15:09 +0000 http://www.fortheloveofdata.com/?p=189 https://www.fortheloveofdata.com/012-for-the-love-of-guns-for-the-love-of-data/#respond https://www.fortheloveofdata.com/012-for-the-love-of-guns-for-the-love-of-data/feed/ 0 Sheer Quantity 3% of gun owners own almost 50% of all civilian guns. These 7.7mm “super owners”  own between 8-140 guns (on average 17)15 In 2013, U.S. gun manufacturers built 10,844,792 guns, and we imported an additional 5,539,539; the number dropped slightly in 2014.16 There are over 300 million guns owned by civilians (legal and […] Sheer Quantity
    • 3% of gun owners own almost 50% of all civilian guns. These 7.7mm “super owners”  own between 8-140 guns (on average 17)15
    • In 2013, U.S. gun manufacturers built 10,844,792 guns, and we imported an additional 5,539,539; the number dropped slightly in 2014.16
    • There are over 300 million guns owned by civilians (legal and illegal)11
    • The government holds approximately 2.7mm guns
    NPR.org stacked bar chart showing firearms by type and year

    History

    National Firearms Act of 1934 (NFA)4

    In 1934 Congress passed a law taxing the makers and distributors of firearms as a way to curtail the usage of weapons commonly used in gang activity at the time. It also required firearms to be registered with the Secretary of Treasury and compelled holders of unregistered firearms to register them and be subject to prosecution for having an unregistered firearm. This provision was ruled to have violated the 5th Amendment to the Constitution (against self-incrimination) in 1968. At this point the NFA was unenforceable.

    Gun Control Act of 1968 (GCA)1

    The assassination of JFK prompted this law because Oswald’s weapon was purchased from a mail-order catalog. The NRA supported this measure, and its passage in October 1968 came after recent assassinations of MLK and Robert Kennedy. The bill banned mail-order sales and prevented felons, drug users and mentally ill citizens from owning guns. The bill required firearms sellers to be licensed and prevented various interstate transactions unless they took place under a federally licensed dealer.

    The bill established that persons over 18 could purchase rifles and shotguns, and one must have been over 21 to purchase a handgun. People would have to fill out Form 4473, the Firearms Transaction Record, when purchasing a gun from a dealer to certify that they are none of these prohibited parties6. The bill also required that all guns made or imported into the US bear a serial number and removal of that identifier became a felony offense. Furthermore, this bill closed the loophole in the NFA by preventing the registration of a firearm from being used as evidence in any crime occurring before the time of registration.

    President Johnson, who asked for provisions of the bill, wanted it to also license individuals and said it fell short of protecting Americans at a time when 160 million guns existed in the US2. Johnson stated that the gun lobby defeated this measure. In 1993, the Brady Bill enhanced this by requiring more stringent background checks before selling a gun to a purchaser.

    Firearm Owners’ Protection Act of 1986 (FOPA)3

    In 1982 a Senate subcommittee report found that 75% of ATF prosecutions regarding firearms targeted ordinary, law-abiding citizens on technicalities or entrapment. This report and lobbying prompted the passage of FOPA in 1986. The law loosened restrictions on interstate gun sales and mailed ammunition, banned machine guns made after the bill passed from being sold to the general public, and limited ATF inspections to once a year, generally.

    Registry Prohibition

    • A key part of FOPA was the restriction that the government cannot require firearms, their owners, or transactions involving firearms to be reported to any government entity.
    • The ATF is barred from consolidating or centralizing dealer records
      • The bureau consolidated 252 million records of active shop owners from 2000-2016, but had to delete them after the GAO found they did not comply with FOPA
    • According to Pew Research Center, most Americans favor a federal database to track gun sales (70% overall, 55% Republicans)13

    Partisan Views of Gun Proposals

    Traces:

    • 1,500 / day or 373,000 / year in 2015 by 50 Bureau of Alcohol, Tobacco, and Firearms (ATF) agents
    • Urgent traces done in 24 hours; average trace takes 5 business days
    • These records are stored in 15,000 boxes
    • Required to be “unsearchable” – no keyword searches, sorting by date or anything else.
    • Some records are on toliet paper or napkins (a snub by shop owners who dislike the reporting requirement)
    • As of 2013, 70% of traces ID the buyer of a gun14
    • 285 million records from closed up shops saved in 25 “data systems”7 known as the Firearms Tracing System (FTS)
      • All bullets below are from https://en.wikipedia.org/wiki/Firearm_Owners_Protection_Act
      • Multiple Sale Reports. Over 460,000 (2003) Multiple Sales reports (ATF F 3310.4 – a registration record with specific firearms and owner name and address – increasing by about 140,000 per year). Reported as 4.2 million records in 2010.
      • Suspect Guns. All guns suspected of being used for criminal purposes but not recovered by law enforcement. This database includes (ATF’s own examples[citation needed]), individuals purchasing large quantities of firearms, and dealers with improper record keeping. May include guns observed by law enforcement in an estate, or at a gun show, or elsewhere.[citation needed] Reported as 34,807 in 2010.
      • Traced Guns. Over 4 million detail records from all traces since inception. This is a registration record which includes the personal information of the first retail purchaser, along with the identity of the selling dealer.
      • Out of Business Records. Data is manually collected from paper Out-of-Business records (or input from computer records) and entered into the trace system by ATF. These are registration records which include name and address, make, model, serial and caliber of the firearm(s), as well as data from the 4473 form – in digital or image format. In March, 2010, ATF reported receiving several hundred million records since 1968.
      • Theft Guns. Firearms reported as stolen to ATF. Contained 330,000 records in 2010. Contains only thefts from licensed dealers and interstate carriers (optional). Does not have an interface to the FBI’s National Crime Information Center (NCIC) theft data base, where the majority of stolen, lost and missing firearms are reported. See eTrace below.

    Hawaii & the “Rap Back” FBI database12

    • In 2016, Hawaii became the first state to require gun owners names to be posed to the FBI “Rap Back” database. This allows them to be notified if a gun owner from their state is arrested for a crime anywhere in the US.
    • Visitors to Hawaii packing heat must register and be placed on the list, but they request to be removed from the database after departure.

     

    eTrace

    • When a gun is recovered at a crime scene, it can and usually is ran through a firearms trace with the ATF. This is done through a system called eTrace.
    • eTrace is a digital system that tracks submissions and trace results
    • It is more dynamic and usable because once a gun is in this system, it may be searched by owner name, serial number, etc.
    • However, non-crime scene guns are not in this system

    ATF 2014 Firearms Trace Data10

    The top 10 states with the most recoveries and traces are:

    1. California
    2. Florida
    3. Texas
    4. Illinois
    5. Georgia
    6. North Carolina
    7. Ohio
    8. Pennsylvania
    9. Maryland
    10. New York

    In 2014, the number of firearms recovered and traced = 246,087

    Top Categories of Recovered Firearms

    • Pistol – 131,562
    • Revolver – 43,799
    • Rifle – 38,854
    • Shotgun – 29,970
    • Derringer – 2,197
    • Receiver/Frame – 1,301
    • Machinegun – 717
    • The national possessor age is 36 years old.
    • In 2014, pistols and revolvers accounted for the majority of traced firearms.
    • National time-to-crime average is 10.88 years.

     

    Sources:

    1. https://en.wikipedia.org/wiki/Gun_Control_Act_of_1968
    2. http://www.presidency.ucsb.edu/ws/?pid=29197
    3. https://en.wikipedia.org/wiki/Firearm_Owners_Protection_Act
    4. https://www.atf.gov/rules-and-regulations/national-firearms-act
    5. https://en.wikipedia.org/wiki/Firearm_Owners_Protection_Act
    6. http://www.gq.com/story/inside-federal-bureau-of-way-too-many-guns
    7. https://www.thetrace.org/2016/08/atf-ridiculous-non-searchable-databases-explained/
    8. https://fivethirtyeight.com/features/gun-deaths/
    9. https://www.atf.gov/resource-center/fact-sheet/fact-sheet-national-tracing-center
    10. https://www.atf.gov/resource-center/atf-2014-firearms-trace-data
    11. http://www.npr.org/2016/01/05/462017461/guns-in-america-by-the-numbers
    12. https://news.vice.com/article/hawaii-track-gun-owners-fbi-rap-back-crime-database
    13. http://www.pewresearch.org/fact-tank/2016/01/05/5-facts-about-guns-in-the-united-states/
    14. https://www.reference.com/government-politics/track-owner-gun-its-serial-number-1027e7316a578d2c#
    15. http://www.motherjones.com/politics/2016/09/gun-ownership-america-super-owners
    16. https://www.atf.gov/resource-center/docs/2016-firearms-commerce-united-states/download

    Music

    Gunslinger by The Long Ryders via FreeMusicArchive.org

    ]]>
    Sheer Quantity 3% of gun owners own almost 50% of all civilian guns. These 7.7mm “super owners”  own between 8-140 guns (on average 17)15 In 2013, U.S. gun manufacturers built 10,844,792 guns, and we imported an additional 5,539, Sheer Quantity 3% of gun owners own almost 50% of all civilian guns. These 7.7mm “super owners”  own between 8-140 guns (on average 17)15 In 2013, U.S. gun manufacturers built 10,844,792 guns, and we imported an additional 5,539,539; the number dropped slightly in 2014.16 There are over 300 million guns owned by civilians (legal and […] For the Love of Data full false 28:00
    011 – Top 10 Data Predictions for 2017 https://www.fortheloveofdata.com/011-top-10-data-predictions-for-2017-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=011-top-10-data-predictions-for-2017-for-the-love-of-data Fri, 30 Dec 2016 22:12:37 +0000 http://www.fortheloveofdata.com/?p=184 https://www.fortheloveofdata.com/011-top-10-data-predictions-for-2017-for-the-love-of-data/#respond https://www.fortheloveofdata.com/011-top-10-data-predictions-for-2017-for-the-love-of-data/feed/ 0 Happy New Year! Thank you to all listeners and subscribers for your support this past year. 10 – Data borders will break down – logical data lakes and logical data warehouses will grow as companies embrace data virtualization products like Denodo. Data preparation tools, like the new Project Maestro from Tableau, will allow people to […] Happy New Year!
    Thank you to all listeners and subscribers for your support this past year.

    10 – Data borders will break down – logical data lakes and logical data warehouses will grow as companies embrace data virtualization products like Denodo. Data preparation tools, like the new Project Maestro from Tableau, will allow people to seamlessly pull from a) on-premise databases and excel files; b) cloud repositories like Redshift and BigTable; and c) hosted products like Workday and Salesforce.

    9 – Data Quality and “Refined” data sets will become more important – with the uptick in BigData, sensor data, and data lakes, users will have a glut of information at their disposal (some have this already). Automated solutions that  assess data quality or specially created intermediate data sets will become more and more important6. In many Data Lake architectures and Hadoop based ecosystems, curated or moderately processed datasets are becoming the norm for widespread usage by the enterprise. Data scientists and power users will continue to harness raw data sets for their explorations, but these refined data sets will be used to reduce heavy lifting and “recreating the wheel” for many analysts.

    8 – Collaborative BI and analytics will become more mainstream – Sites like data.world and collaborative features in products such as Tableau will be embraced by more users than ever before in 2017. Taking cues from social media, these tools and techniques will produce more living datasets and visualizations with near real-time data as static reporting continues to decline as a percentage of overall reporting. Users will interact with each other and gain economies of scale by not reinventing the wheel when someone else has already done the heavy lifting.

    7 – Internet of Things (IoT) will continue to expand – Currently, most firms use an age or time-based approach to maintain and replace equipment. Up to 50% of spend using this approach may be wasted, according to ARC Advisory Group3. This study also found that 82% of failures occur randomly. New sensors will be deployed and real-time data will continue to swing upward across many industries. Businesses will be able to use this data to respond to events like power outages as they occur and use predictive analytics and historical information for preventative maintenance. Using this data will allow companies to move from a time-based or cyclical check schedules to an event-based ones that can detect even small changes in performance that may spell trouble.

    6 – Converged Intelligence will improve our lives – the trend for companies to share datasets and provide APIs to their services will enable more collaborative experiences to help customers and differentiate companies from their competitors. Services like IFTT (If this, then that) will offer more and more connections, largely driven by community contributions.  Partnerships like SolarCity, Nest, and the Tesla Powerwall will share data to produce synergies that can save money and reduce energy dependence. People will leverage internet of things (IoT) [see #1 above] devices and home automation like SmartThings to make us more comfortable. Whether it is automatically adjusting your lights, TV, and devices when you want to watch a movie or automatically adjusting your Thermostat when you leave and arm your alarm, connected living will grow.

    A word of caution: data sharing may be open and driven by users opting-in, but in some instances it will be hidden and used to exploit customers without their knowledge.

    5 – Data breaches will continue – Stakes are getting higher as hackers attempt to sway political campaigns, ransomware is on the rise, and data breaches are increasing. As data becomes more open and shareable, attack vectors are much greater and opportunities are higher. Enterprises need to make sure they are vetting cloud and hosted solutions properly to make sure they are secure, but they also need to realize that cloud providers may be able to provide economies of scale and make data safer than individual organizations can on their own.

    4 – So…Security will have to get more proactive – As hackers start to use IoT and continue DDOS, companies need to work together to defend against threats. Tools like Watson for Cyber Security will user in this new era. We will move from predictive analytics into cognitive to discover threats, identify all assets exposed, and then perform a second-order threat analysis to see what other services may suffer or what may be targeted next. These tasks can be performed by machine clusters faster and more completely than an army of analysts.

    3 – You’ll continue to hear about blockchain initiatives, but it will be mostly hype in 2017 – According to Gartner, Blockchain is nearing the peak of the hype cycle4. However, I think other items close to the peak, like home automation and IoT will see more adoption than blockchain. IMHO, these others can be adopted on a smaller scale and are more readily available to the general public than blockchain related deployments. Many people are forecasting that blockchain related tech won’t hit mainstream for another 5-10 years5. Nevertheless, the concept and some early uses of it are pretty interesting, such as Smart Contracts. Also, friendly FYI, something that uses a blockchain is not automatically anonymous, as in the case of bitcoin.

    2 – The line between Data Scientist and analyst/programmer will blur even more – analysts and programmers will take special courses here and there to beef up their statistics and data science chops. I think the demand for data scientists will bifurcate in 2017: a subset of organizations will spring for data scientists and the high salaries they command; however, the majority of firms will push for their analysts or tools to do low level data science work. Tools like Tableau and R Studio are making it easier for analysts to dabble in statistical and predictive analytics. Firms, such as New Knowledge, are offering “Data Scientist as a Service”, and tons of online courses, e-books, and knowledge bases have sprung up to spread data science fundamentals to the masses.

    1 – BYOT, Bring Your Own Tool, will continue to gain momentum – Enterprises can no longer place all their eggs in one basket when it comes to a BI or reporting tool. Tools such as Tableau have proven their ability to uproot entrenched stalwarts like IBM Cognos, and traditional BI tools appear stale and financially infeasible compared to a plethora of specialized, cheaper alternatives. Traditional BI tools will still have their place in firms that have enterprise-level agreements and are slow to change, but as more and more users demand features that these tools can’t support, or go out and acquire alternatives through “shadow procurement”, the traditional tools and expertise in firms will erode. It is now more important than ever for IT organizations to focus on architectures that make a wide array of data available to the entire organization regardless of device or access tool of choice. Good governance policies and data czars needs to focus on data quality, establishment and maintenance of metadata, and publishing best practices around the types of tools and reports/visualizations that are best for specific scenarios. Firms need to evaluate the benefits of having multiple tools and the flexibility and productivity it gives their employees vs. the supportability and procurement benefits of working with a smaller number of providers.

    Music: Auld Lang Syne by Fresh Nelly, from Free Music Archive.

    Sources:

    1. http://www.tableau.com/resource/top-10-bi-trends-2017
    2. https://electrek.co/2016/02/25/solarcity-tesla-powerwall-nest-hawaii/
    3. https://www.ibm.com/blogs/internet-of-things/as-much-as-half-of-every-dollar-you-spend-on-preventive-maintenance-is-wasted/
    4. http://www.gartner.com/newsroom/id/3412017
    5. https://www.ft.com/content/3bea303c-7a7e-11e6-b837-eb4b4333ee43
    6. http://www.eweek.com/database/slideshows/10-predictions-for-the-data-analytics-market-for-2017.html
    ]]>
    Happy New Year! Thank you to all listeners and subscribers for your support this past year. 10 – Data borders will break down – logical data lakes and logical data warehouses will grow as companies embrace data virtualization products like Denodo. Happy New Year! Thank you to all listeners and subscribers for your support this past year. 10 – Data borders will break down – logical data lakes and logical data warehouses will grow as companies embrace data virtualization products like Denodo. Data preparation tools, like the new Project Maestro from Tableau, will allow people to […] For the Love of Data full false 28:00
    010 For the Love of Thanksgiving – For the Love of Data https://www.fortheloveofdata.com/010-for-the-love-of-thanksgiving-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=010-for-the-love-of-thanksgiving-for-the-love-of-data Fri, 25 Nov 2016 07:40:32 +0000 http://www.fortheloveofdata.com/?p=179 https://www.fortheloveofdata.com/010-for-the-love-of-thanksgiving-for-the-love-of-data/#respond https://www.fortheloveofdata.com/010-for-the-love-of-thanksgiving-for-the-love-of-data/feed/ 0 Holiday Weight Gain Studies Studies are very mixed on whether holiday eating causes weight gain. The Good1 In several studies over the last thirty-one years, subjects gained approximately 3/4 to 2 lbs. during the holiday season However, in one study participants felt they had gained 4x as much weight as they actually gained Two other […] Holiday Weight Gain Studies

    Studies are very mixed on whether holiday eating causes weight gain.

    The Good1

    • In several studies over the last thirty-one years, subjects gained approximately 3/4 to 2 lbs. during the holiday season
    • However, in one study participants felt they had gained 4x as much weight as they actually gained
    • Two other key finding:
      • Although the amount of weight gained between the holidays was small, it represented the majority of the weight gained for the year
      • Weight gained between the holidays typically is not lost the next year (it represents the annual amount of increase for many people).

    The Bad3

    • During “eating holidays”, like Thanksgiving and Christmas, participants consumed 14% more than on normal days
    • Some participants (outliers perhaps?) consumed over 900 calories more on special occasions than normal days
    • Obese individuals indulged at an even higher level during holidays

    Other

    • Children tend to gain more weight over the summer when school is out than during the holidays2

    Theme music for this month’s episode is “Turkey Time” by Monk Turner4.

    Sources:

    1. http://letstalknutrition.com/holiday-weight-gain-separating-fact-from-fiction/
    2. http://www.hit107.com/news/feed/2016/11/study-reveals-the-time-of-year-child-obesity-rates-rise-the-most/
    3. http://acsh.org/news/2015/11/24/does-holiday-feasting-affect-obesity-rates
    4. http://freemusicarchive.org/music/Monk_Turner/Calendar/Monk_Turner_-_Calendar_-_11_Turkey_Time
    ]]>
    Holiday Weight Gain Studies Studies are very mixed on whether holiday eating causes weight gain. The Good1 In several studies over the last thirty-one years, subjects gained approximately 3/4 to 2 lbs. during the holiday season However, Holiday Weight Gain Studies Studies are very mixed on whether holiday eating causes weight gain. The Good1 In several studies over the last thirty-one years, subjects gained approximately 3/4 to 2 lbs. during the holiday season However, in one study participants felt they had gained 4x as much weight as they actually gained Two other […] For the Love of Data full false 5:52
    009 For the Love of Algorithms – For the Love of Data https://www.fortheloveofdata.com/009-for-the-love-of-algorithms-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=009-for-the-love-of-algorithms-for-the-love-of-data Mon, 31 Oct 2016 03:30:11 +0000 http://www.fortheloveofdata.com/?p=169 https://www.fortheloveofdata.com/009-for-the-love-of-algorithms-for-the-love-of-data/#respond https://www.fortheloveofdata.com/009-for-the-love-of-algorithms-for-the-love-of-data/feed/ 0 Worst Pun Ever: Today, we are talking about Al… Al Gore… Al Gore Rhythms… Algorithms! Definition: a step-by-step procedure for solving a problem or accomplishing some end especially by a computer1 Inputs: Many algorithms use census data or FICO score as one of their prime inputs Plus any custom information you give a website Plus […] Worst Pun Ever: Today, we are talking about Al… Al Gore… Al Gore Rhythms… Algorithms!

    Definition: a step-by-step procedure for solving a problem or accomplishing some end especially by a computer1

    Inputs:

    • Many algorithms use census data or FICO score as one of their prime inputs
    • Plus any custom information you give a website
    • Plus any information they glean about you from other sites (when you visit a site with a Facebook Share button, Facebook can track that you’re there17)
    • Websites are constantly looking at ways to break our anonymity (fingerprinting) so they can track us and serve us more relevant or lucrative ads5.

    Fun Stuff

    • Chess – algorithms are so good that humans haven’t been able to beat a 4- CPU  PC since about 20056
    • Rubik’s Cube – the machine record is 0.887 seconds vs. just over 5 seconds for a human7

    • Poker – scientists solved all moves for Heads Up Limit Hold ‘Em – 3.16 x 10^17 moves. You may be able to win individual games, but it is HIGHLY unlikely that you can win over time8

    Machine Learning

    • Can include intentional or unintentional bias.
    • @JonathonMorgan did a post on Medium and a podcast on Partially Derivative about using a machine learning model to find alt-right white supremacists on Twitter and track their degree of radicalization over time. He did this by training a model with their tweets and analyzing their usage of words like “Jewish” vs. more mainstream usage3,4.
    – From Medium / Jonathon Morgan’s Post

    Pricing2:

    • Amazon lists its results over competitors, even when higher including shipping for non-prime customers; however, it claims it’s algorithm is customer-centric
    • Princeton Review charges between $6,600 and $8,400 for its online course in some zip codes. It charged higher in zip codes with higher incomes and some with higher Asian populations.

    News/Search:

    • Link Analysis, how two entities relate to each other is used by Google’s PageRank, Facebook’s News Feed, and LinkedIn’s job/connection recommendations. It was developed in 1976 and first used by two other search indexes before Google began using it in 1998.13
    • However, algorithms cause sites to cater to information similar to your preexisting views, or for what they think you will find interesting, rather than presenting balanced, holistic content.14
      • Medium recommends articles based on how long it thinks you will read.
      • Some sites tailor related content, content types (video, etc.), and sharing buttons based on where you enter their site from.
      • These choices and filters can lead you into a content bubble that leads you down a path of more and more specific, and sometimes extreme, viewpoints.
    • Facebook uses hundreds of features, or input variables, when assigning a relevancy score to posts you see in your news feed.15
    • When you have a Facebook account and you visit a page that has a like or share button, Facebook can log your visit and use that to tailor content or ads when you visit their site.16 See here17 for a relatively up-to-date list of features used in Facebook’s newsfeed algorithm (time spent viewing, friend’s posts receive priority, likes/reactions, etc. are all key inputs).

    Serious Consequences

    • Some algorithms for car insurance weight FICO credit scores higher than drunk driving convictions.9,10,11
    • Cathy O’Neil calls them “Weapons of Math Destruction” (WMDs) if they are: widespread, secretive, and have the potential to do great harm9,10
    • Kronos, a small big data HR company hired by large firms to screen applicants employs a personality test as part of their screening of candidates. Some argue that this unfairly excludes them from jobs, with no explanation of the reason, in a manner that violates the American’s with Disabilities Act (ADA).9,10,12

    Theme Music: Algorithm of Desire by Measles Mumbs Rubella, courtesy of FreeMusicArchive.

    Sources:

    1. http://www.merriam-webster.com/dictionary/algorithm
    2. https://www.propublica.org/article/breaking-the-black-box-when-algorithms-decide-what-you-pay
    3. http://partiallyderivative.com/podcast/2016/09/27/s2e14-the-model-is-racist
    4. https://medium.com/@jonathonmorgan/the-radical-right-and-the-threat-of-violence-f66288ac8c4#.kssqef9jz
    5. http://fivethirtyeight.com/features/internet-tracking-has-moved-beyond-cookies/
    6. http://www.extremetech.com/extreme/196554-a-new-computer-chess-champion-is-crowned-and-the-continued-demise-of-human-grandmasters
    7. http://gizmodo.com/in-just-0-887-seconds-another-machine-has-already-shatt-1758009774
    8. http://bigthink.com/ideafeed/computer-scientists-create-unbeatable-poker-playing-computer
    9. http://fivethirtyeight.com/features/whos-accountable-when-an-algorithm-makes-a-bad-decision/
    10. https://weaponsofmathdestructionbook.com/
    11. https://www.wired.com/2016/10/big-data-algorithms-manipulating-us/
    12. https://www.theguardian.com/science/2016/sep/01/how-algorithms-rule-our-working-lives
    13. https://medium.com/@_marcos_otero/the-real-10-algorithms-that-dominate-our-world-e95fa9f16c04#.8mczwtxzt
    14. http://www.cjr.org/news_literacy/algorithms_filter_bubble.php
    15. http://www.slate.com/articles/technology/cover_story/2016/01/how_facebook_s_news_feed_algorithm_works.html
    16. https://www.technologyreview.com/s/541351/facebooks-like-buttons-will-soon-track-your-web-browsing-to-target-ads/
    17. https://blog.bufferapp.com/facebook-news-feed-algorithm
    ]]>
    Worst Pun Ever: Today, we are talking about Al… Al Gore… Al Gore Rhythms… Algorithms! Definition: a step-by-step procedure for solving a problem or accomplishing some end especially by a computer1 Inputs: Many algorithms use census data or FICO score a... Worst Pun Ever: Today, we are talking about Al… Al Gore… Al Gore Rhythms… Algorithms! Definition: a step-by-step procedure for solving a problem or accomplishing some end especially by a computer1 Inputs: Many algorithms use census data or FICO score as one of their prime inputs Plus any custom information you give a website Plus […] For the Love of Data full false 35:59
    008 For the Love of Politics – For the Love of Data https://www.fortheloveofdata.com/008-for-the-love-of-politics-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=008-for-the-love-of-politics-for-the-love-of-data Wed, 28 Sep 2016 15:27:27 +0000 http://www.fortheloveofdata.com/?p=159 https://www.fortheloveofdata.com/008-for-the-love-of-politics-for-the-love-of-data/#respond https://www.fortheloveofdata.com/008-for-the-love-of-politics-for-the-love-of-data/feed/ 0 History of Data in Politics First off, 538’s podcast, What’s the Point did a great four part series on the data of politics that covered the history of politics from the late 1800s through the primaries. So please check out the above links for more context behind this. A brief history of data in politics […] History of Data in Politics

    First off, 538’s podcast, What’s the Point did a great four part series on the data of politics that covered the history of politics from the late 1800s through the primaries. So please check out the above links for more context behind this. A brief history of data in politics shows the major ways candidates appealed to constituents progressed along this path:

    • Party Elite chose candidates.
    • Direct outreach – Candidates engage voters directly, including things like voting after parties offering booze to those who voted for a particular candidate.
    • TV – When TV came along, suddenly candidates could reach the majority of voters just by running ads on three networks.
    • Direct Mail – Politicians could use subscriber lists from certain magazines to target specific groups that might be interested in their policies.
    • Micro Targeting – This started around 2004 where data analysis identified target demographics to go after, advertise, and appeal to.
    • Individual Targeting – Howard Dean (2004) was a pioneer in this effort, coalescing state voter lists together and appending commercial data. This continued in 2008 with Barack Obama where they found that there was still a significant diversity in micro groups. The trend was refined in 2012 where campaigns used individual data to feed into, test, and refine their models.

    However, there are roots back to 1891 when James Clarkson, the RNC chairman, assembled a file that featured the “age, occupation, nativity, residence and all the other facts in each votersʼ life, and had them arranged alphabetically, so that literature could be sent constantly to every voter directly.”10

    The Obama campaign in 2008 and 2012 hired enormous amounts of staffers — 342 in the 2012 race alone in technology, digital data and analytics.

    History of Voting Trends by State7

    All Elections since 1876 (the year Texas A&M was founded, whoop!)

    e008-maps-by-year

    Most Democratic (1932)

    e008-map-democrat

    Most Republican (1972)

    e008-map-republican

    Voter Turnout Rates

    In voter turnout data by country since 2000, the US ranks #159 out of #196 with just over 55% average voter turnout. We can and should do better.

    Rank Average of Voter Turn­out (%)# of Data PointsMin YearMax YearCountry
    199.8320022011Lao People's Dem. Republic
    299.3420022016Viet Nam
    397.9320032013Rwanda
    496.5120042004Equatorial Guinea
    595.1320032013Cuba
    694.1520012013Australia
    794320032013Malta
    893.8420012015Singapore
    991.3320042013Luxembourg
    1091.2320022008Faroe Islands
    1191320022012Bahamas
    1290.3420032014Belgium
    1389.9420002015Tajikistan
    1489.8420002015Ethiopia
    1589.8520002016Nauru
    1689.7320042014Uruguay
    1787.4320042013Turkmenistan
    1887.2320042014Antigua and Barbuda
    1987.1320042014Uzbekistan
    2086.4520012015Denmark
    2185.1320052013Aruba
    2284.6420022014Bolivia
    2384.5420032013Iceland
    2484.4420012013Liechtenstein
    2584.1420022015Turkey
    2683.5520002016Peru
    2783.3220012007Timor-Leste
    2883.1420022014Sweden
    2982.4420042014Tunisia
    3081.9320062014Cook Islands
    3181.6320042014Guinea-Bissau
    3281.6320022011Seychelles
    3381.5420012016Cyprus
    3480.2420012013Italy
    3580.1220052013Cayman Islands
    3680120022002Tuvalu
    3779.4320022012Sierra Leone
    3879.2320042014Botswana
    3979.1420022013Austria
    4078.6420022014Brazil
    4178.6420002014Mauritius
    4278.4220042014Namibia
    4378.2320042013Malaysia
    4477.9420012013Chile
    4577.9520022012Netherlands
    4677.8220012016Samoa
    4777.7120062006Palestinian Territory, Occupied
    4877.6520022014New Zealand
    4977320032013Monaco
    5076.9120122012Papua New Guinea
    5176.9420012013Norway
    5276.7320042014Indonesia
    5376.6220112015Gibraltar
    5476.6320012014Fiji
    5576.4420012015Guyana
    5676320052014Maldives
    5775.8320042014South Africa
    5875.6420032015Belize
    5975.6320032013Cambodia
    6075.6720012015Argentina
    6175.5420002012Belarus
    6275.5520002016Mongolia
    6375.4520012015Andorra
    6475.2320032013Grenada
    6575.1220082012Angola
    6675120032003Yemen
    6774.8520012013Philippines
    6874.8420022013Germany
    6974.1220052011Liberia
    7074420002012Ghana
    7173.9320022010Sao Tome and Principe
    7273.8320042014Panama
    7373.8320032012Bermuda
    7473.6320012011Nicaragua
    7573.5220102015Myanmar
    7673.5420002015Anguilla
    7773.3520002015Sri Lanka
    7872.8320022013Togo
    7972.7320052015Burundi
    8072.5420012014Montserrat
    8171.9620002016Spain
    8271.4120152015Comoros
    8370.9420022013Ecuador
    8470.7320022013Kenya
    8570.5320012014Bangladesh
    8670.5620002015Greece
    8769.7420002015Saint Kitts and Nevis
    8869.6320062012Montenegro
    8969.6420032015Virgin Islands, British
    9069.5420012012San Marino
    9169.4520022016Kazakhstan
    9268.4620012014Thailand
    9368.2320022013Cameroon
    9467.9420022014Costa Rica
    9567.8220022013Guinea
    9667.5120072007Kiribati
    9767.5320052014Iraq
    9867.2420012015Saint Vincent and The Grenadines
    9966.9220052011Central African Republic
    10066.9620002015Trinidad and Tobago
    10166.8320072013Bhutan
    10266.5420032015Finland
    10366.4620012015Israel
    10466.3420012016Uganda
    10566.2420052014Tonga
    10666.1420022016Ireland
    10766.1420022014Hungary
    10865.9320012013Mauritania
    10965.9320032013Paraguay
    11065.6420002015Suriname
    11165.3420012014Solomon Islands
    11265.1320072015Oman
    11365520012016Taiwan
    11464.7220062011Congo, Democratic Republic of
    11564.4520022016Vanuatu
    11664.4520062013Kuwait
    11764.3320012011Zambia
    11864.2420022015Burkina Faso
    11963.9420032015Guatemala
    12063.5320022010Netherlands Antilles
    12163.4320032013Djibouti
    12263.3120082008Nepal
    12363.2420012015United Kingdom
    12463520022014Latvia
    12563520022014Macedonia, former Yugoslav Republic (1993-)
    12662.6620002015Canada
    12762.6420012016Cape Verde
    12862.6520002015Croatia
    12962.5520012014Moldova, Republic of
    13062.4520002015Kyrgyzstan
    13162.3520002014Slovenia
    13262420032015Estonia
    13361.8320032013Barbados
    13461.7520022014Ukraine
    13561.6420022014Bahrain
    13661.5620002014Japan
    13761.5220002003Yugoslavia, FR/Union of Serbia and Montenegro
    13861.4320042014Malawi
    13961.1420002015Tanzania, United Republic of
    14061.1420022013Czech Republic
    14160.9320042014India
    14260.5520022016Slovakia
    14360.1520022015Portugal
    14459.8320032011Russian Federation
    14559.3320032012Armenia
    14659.3220022013Madagascar
    14759.3420032012Georgia
    14859.2220102015Sudan
    14959.2420032015Benin
    15058.6320022012France
    15157.9420022016Dominican Republic
    15257.9320082016Iran, Islamic Republic of
    15357.8520072016Serbia
    15457.6420002014Dominica
    15557.3520012014Bulgaria
    15657.1420032016Syrian Arab Republic
    15757520002014Bosnia and Herzegovina
    15855.9420012013Honduras
    15955.7820002014United States
    16055.5420002015Venezuela
    16155.3420032013Jordan
    16255.1520002016Korea, Republic of
    16355.1420022016Jamaica
    16454.8320002012Palau
    16554.5220022011Chad
    16653.4320042016Niger
    16753.1420022015Lesotho
    16852.1620002015Mexico
    16951.9420012013Albania
    17051.7220122014Libya
    17151.4420002012Lithuania
    17251.2420002012Romania
    17350.8520002015Azerbaijan
    17448.9320012011Saint Lucia
    17548.6220072013Micronesia, Federated States of
    17648.5320002009Lebanon
    17748.1520012015Poland
    17848220072015Marshall Islands
    17947.8420032015Switzerland
    18047.6220052010Afghanistan
    18146.7320022013Pakistan
    18246.2320012012Senegal
    18345.6320002008Zimbabwe
    18445.3420042014Kosovo
    18544.7320022011Morocco
    18643.7520002015El Salvador
    18743.2320042014Mozambique
    18842.6420022014Colombia
    18941.6320022012Algeria
    19040.5320032015Nigeria
    19139.2320022012Gambia
    19236.5420052015Egypt
    19334.3120112011Gabon
    19434.1220002011Côte d'Ivoire
    19532.2420002015Haiti
    19631.8320022013Mali

    e008-voter-turnout-by-year

    e008-cps-age

    e008-cps-educ

    e008-cps-race

    e008-electorate-demo-race

    • 30+ vote much at much higher rates than younger voters.
    • The more educated you are, the more likely you are to vote.
    • Most commonly black or white; Hispanics are the lowest consistently since 1984.
    • White share has been declining, but is still an overwhelming 77% of the vote8.

    Sources:

    1. http://fivethirtyeight.com/features/a-history-of-data-in-american-politics-part-1-william-jennings-bryan-to-barack-obama/
    2. http://fivethirtyeight.com/features/a-history-of-data-in-american-politics-part-3-the-2016-primaries/
    3. http://www.fec.gov/pubrec/fe2012/federalelections2012.shtml
    4. http://www.electproject.org/home/voter-turnout/voter-turnout-data
    5. http://www.census.gov/library/visualizations/2016/comm/electorate-profiles/cb16-tps25_voting_texas.html
    6. http://www.presidency.ucsb.edu/elections.php
    7. http://www.fairvote.org/voter_turnout#voter_turnout_101
    8. http://www.idea.int/vt/viewdata.cfm
    9. http://www.electproject.org/home/voter-turnout/demographics
    10. http://41lscp16wiqd3klpnn1z0t2a.wpengine.netdna-cdn.com/wp-content/uploads/2015/05/Kreiss_NiemanFinal.pdf
    11. http://nationbuilder.com/voterfile
    ]]>
    History of Data in Politics First off, 538’s podcast, What’s the Point did a great four part series on the data of politics that covered the history of politics from the late 1800s through the primaries. So please check out the above links for more con... History of Data in Politics First off, 538’s podcast, What’s the Point did a great four part series on the data of politics that covered the history of politics from the late 1800s through the primaries. So please check out the above links for more context behind this. A brief history of data in politics […] For the Love of Data full false
    007 For the Love of Olympics – For the Love of Data https://www.fortheloveofdata.com/007-for-the-love-of-olympics-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=007-for-the-love-of-olympics-for-the-love-of-data Thu, 25 Aug 2016 07:30:09 +0000 http://www.fortheloveofdata.com/?p=110 https://www.fortheloveofdata.com/007-for-the-love-of-olympics-for-the-love-of-data/#respond https://www.fortheloveofdata.com/007-for-the-love-of-olympics-for-the-love-of-data/feed/ 0 Fun Fact: The main riff in NBC’s Olympic them is from Bugler’s Dream (1958) by  Leo Arnaud. History Most believe games started in 776 BC as part of a religious festival in Greece to honor Zeus; however, some evidence suggests it could have started as early as the 10th century BC The stadion race was the […]

    Fun Fact: The main riff in NBC’s Olympic them is from Bugler’s Dream (1958) by  Leo Arnaud.


    History

    • Most believe games started in 776 BC as part of a religious festival in Greece to honor Zeus; however, some evidence suggests it could have started as early as the 10th century BC
    • The stadion race was the first event, a 600 foot race. This may have been the only event for the first 13 Olympics
    • They occurred every four years for twelve centuries, until 396 AD; then there was a break in games until 1896

    How to Qualify for the Olympics

    • Individual:
      • For each gender, up to three people per country can attend if they meet the entry standard
      • For each gender, one person per country can attend if no one meets the standard
    • Team: Each country may send one team that meets the entry standard
    • Slightly more complicated criteria for relays and marathon – generally involving your finish in various qualifying events

    Fun Fact: The marathon was not added until 1896 in Athens and was standardized at 26.2 miles in the 1908 London games because that was the distance between Windsor Castle and White City Stadium.


    Cost of the Games

    Many people feel the Olympics are a terrible investment for the host country. Rio’s estimated cost was $3bn, but it is projected to be at least 50% over budget at approximately $4.6bn.

    MetricValue
    BRL / month1972
    BRD to USD0.31
    USD / month611.32
    USD / year7335.84
    Estimated cost of games4600000000
    Cost in # of yearly salaries627058.39
    Population209567920
    Cost in % of population0.002992

    Sochi is the most expensive so far, but Summer games are typically more expensive than Winter.

    Cost line graph Olympic games

    Cost table for historic games


    Fun Fact: The first Winter games were in 1924 (Chamonix).


    Who are the Athletes?

    Country Rankings

    CountryPopulation AthletesRank by # of Athletes
    United States3241187875631
    Brazil2095679204832
    Germany806823514403
    Australia243093304284
    France646681294085
    United Kingdom (Great Britain)651111433726
    China13823233323527
    Canada362863783208
    Japan1263237153129
    Spain4606460431210
    CountryPopulation AthletesRank by Population
    China13823233323521
    India13268015761232
    United States3241187875633
    Indonesia260581100284
    Brazil2095679204835
    Pakistan19282650276
    Nigeria186987563777
    Bangladesh16291086478
    Russian Federation1434398322839
    Mexico12863200412510
    CountryPopulation AthletesRank Per Capita
    Republic of the Cook Islands*2094891
    Palau2150152
    Nauru1026323
    San Marino3195054
    British Virgin Islands*3065945
    Bermuda*6166286
    Saint Kitts and Nevis5618377
    Seychelles97026108
    Tuvalu994319
    Antigua and Barbuda92738910

    Fun Fact: The flame started at the 1928 Amsterdam games.


    Gender & Age Breakdown

    Men vs. Women Pie Chart

    Gender by Age bar chart

    Oldest / Youngest by Avg. Age

    Who are the Oldest and Youngest of All Time?

    CategoryMaleFemale
    Oldest CompetitorOscar Swahn (Sweden)
    Age 72
    1920, Shooting
    Lorna Johnstone (UK)
    Age 70
    1972, Equestrian
    Oldest Gold MedalistOscar Swahn (Sweden)
    Age 64
    1912, Shooting
    Lida "Eliza" Pollock (USA)
    Age 63
    1904, Team Archery (Bronze)
    Oldest MedalistOscar Swahn (Sweden)
    Age 72
    1920, Shooting (Silver)
    Lida "Eliza" Pollock (USA)
    Age 63
    1904, Archery (Bronze)
    Youngest Gold MedalistKlaus Zerta (Germany)
    Age 13
    1960, Rowing
    Donna Elizabeth de Varona (USA)
    Age 13
    1960, Swimming - Team
    Youngest MedalistDimitrios Loundras (Greece)
    Age 10
    1896, Gymnastics - Team (Bronze)
    Luigina Giavotti (Italy)
    Age 11
    1928, Gymnastics - Team (Silver)

    Fun Fact: Boxing and wrestling were added in 708 BC and 688 BC respectively.


    A Look at the Medals

    Summer Medal Values & Rewards

    • Gold: $600 (The gold medal consist of just 1% of actual gold, 92.5% silver and 6.16% copper).
    • Silver: $325 (While in silver medal, the gold is replaced by more copper, the rest of the material is the same like gold medal)
    • Bronze: $3 (Bronze medal however is 97% copper and 2.5% zinc and 0.5% tin)

    CNN bar graph - who pays for the gold

    Who are the Big Winners at Rio 2016?

    Total Medals

    • Italy and Canada had a strong showing in total medals, but fell off in gold medals
    • Top 10 controlled almost 60% of total medals
    CountryTotal Medals % of Total MedalsRunning TotalRank
    United States1210.124229979466120.124229979466121
    China700.0718685831622180.196098562628342
    United Kingdom (Great Britain)670.0687885010266940.264887063655033
    Russian Federation560.0574948665297740.32238193018484
    Germany420.0431211498973310.365503080082145
    France420.0431211498973310.408624229979475
    Japan410.0420944558521560.450718685831626
    Australia290.0297741273100620.480492813141687
    Italy280.0287474332648870.509240246406578
    Canada220.022587268993840.531827515400419
    Korea, South210.0215605749486650.5533880903490810

    Fun Fact: If Texas were a country, it would rank 8th for # of medals in the 2016 Summer Olympics.



    Total Gold

    • Brazil and Argentina won many golds, but few others
    • Top 10 controlled 70% of total golds
    CountryGold % of Gold MedalsRunning TotalRank
    United States460.149837133550490.149837133550491
    United Kingdom (Great Britain)270.0879478827361560.237785016286642
    China260.0846905537459280.322475570032573
    Russian Federation190.0618892508143320.384364820846914
    Germany170.0553745928338760.439739413680785
    Japan120.0390879478827360.478827361563526
    France100.032573289902280.51140065146587
    Korea, South90.0293159609120520.540716612377858
    Netherlands80.0260586319218240.566775244299679
    Australia80.0260586319218240.59283387622159
    Hungary80.0260586319218240.618892508143329
    Italy80.0260586319218240.644951140065159
    Brazil70.0228013029315960.6677524429967410
    Spain70.0228013029315960.6905537459283410

    Percentage of Medal Type by Country

    • Six countries won nothing but Gold
    • Fiji and Argentina dominated in Golds as a % of total medals
    CountryGold Silver Bronze Total Medals
    Puerto Rico*1001
    Singapore1001
    Tajikistan1001
    Kosovo1001
    Jordan1001
    Fiji1001
    Argentina0.750.2504
    Jamaica0.545454545454550.272727272727270.1818181818181811
    Hungary0.533333333333330.20.2666666666666715
    Croatia0.50.30.210
    Greece0.50.166666666666670.333333333333336
    Slovakia0.50.504
    Bahrain0.50.502
    Vietnam0.50.502
    Independent Olympic Athletes0.500.52
    Cote d'Ivoire0.500.52
    The Bahamas0.500.52

    Fun Fact: Swimming was added as an event in 1896 (freestyle); backstroke was added in 1904.



    Michael Phelps

    • 32nd among 205 currently competing countries as far as most medals won
    • 28 total medals – 23 gold, 3 silver, 2 bronze
    • 13 individual medals puts him ahead of Leonidas of Rhodes – sprinter form 152BC
    • 50 miles swam per week in prep for 2008 Olympics; 12,000 calories consumed each day
    • If Katie Ledecky maintained her current medal pace, she’d be 39 before she tied Phelps
    • He hasn’t won bronze since 2004

     

    Popularity of Events

    Swimming, Track and Field, Gymnastics, and Soccer are the most popular sports for people to watch. 538 did an interesting comparison in the 2012 Olympics to come up with a medal multiplier based on number of events vs. number of viewers. The US, China, and Russia dominate on an adjusted medal count.

     

    See the chart below (again based on London 2012). Sailing, for instance, has a lot of events but not much viewership, so it gets a reduction. Soccer, however, has only a few events but a large amount of viewers, so it’s multiplier is very high.

    538 Medal Multiplier

    Growth – Interest in the Olympics, number of events, number of competitors, and costs are all going up. On a per capita basis, it hasn’t been this hard to win a medal since 1896.

    E007-538-gold-per-capita

    Sources

    1. http://www.penn.museum/sites/olympics/olympicorigins.shtml
    2. http://www.npr.org/sections/thetorch/2016/08/11/487838010/what-team-usa-looks-like-a-by-the-numbers-look-at-america-s-olympic-athletes?utm_medium=RSS&utm_campaign=news
    3. https://github.com/flother/rio2016
    4. http://www.npr.org/sections/thetorch/2016/08/14/489832779/if-michael-phelps-were-a-country-where-would-his-gold-medal-tally-rank
    5. http://www.foxsports.com/olympics/gallery/28-incredible-facts-about-michael-phelps-28-olympic-medals-23-golds-count-how-many-081316
    6. http://www.worldometers.info/world-population/population-by-country/http://olympstats.com/
    7. http://olympstats.com/
    8. http://www.topendsports.com/events/summer/oldest-youngest.htm
    9. http://fivethirtyeight.com/features/winning-an-olympic-gold-medal-hasnt-been-this-difficult-since-1896/
    10. http://fivethirtyeight.com/features/which-countries-medal-in-the-sports-that-people-care-about/
    11. http://fivethirtyeight.com/features/hosting-the-olympics-is-a-terrible-investment/
    12. https://arxiv.org/ftp/arxiv/papers/1607/1607.04484.pdf
    13. http://www.chron.com/olympics/article/Where-Texas-would-rank-in-Olympic-medal-count-if-9176024.php
    14. http://www.tradingeconomics.com/brazil/wages
    15. https://www.google.com/?ion=1&espv=2#q=brl%20to%20usd
    16. http://www.totalsportek.com/news/olympic-gold-medal-prize-money/
    17. http://edition.cnn.com/2016/08/19/sport/olympic-rewards-by-country/
    18. https://www.olympic.org/swimming-equipment-and-history
    19. https://en.wikipedia.org/wiki/Athletics_at_the_2016_Summer_Olympics_%E2%80%93_Qualification#Qualifying_standards
    ]]>
    Fun Fact: The main riff in NBC’s Olympic them is from Bugler’s Dream (1958) by  Leo Arnaud. History Most believe games started in 776 BC as part of a religious festival in Greece to honor Zeus; however, some evidence suggests it could have started as e... Fun Fact: The main riff in NBC’s Olympic them is from Bugler’s Dream (1958) by  Leo Arnaud. History Most believe games started in 776 BC as part of a religious festival in Greece to honor Zeus; however, some evidence suggests it could have started as early as the 10th century BC The stadion race was the […] For the Love of Data full false 26:15
    006 For the Love of Cheesecake – For the Love of Data https://www.fortheloveofdata.com/006-for-the-love-of-cheesecake/?utm_source=rss&utm_medium=rss&utm_campaign=006-for-the-love-of-cheesecake Sat, 30 Jul 2016 03:32:07 +0000 http://www.fortheloveofdata.com/?p=89 https://www.fortheloveofdata.com/006-for-the-love-of-cheesecake/#respond https://www.fortheloveofdata.com/006-for-the-love-of-cheesecake/feed/ 0 National Cheesecake Day! June 30, 2016 is National Cheesecake Day, a likely commercially driven holiday to which I, for one, am happy to fall victim. Adam’s PB Cup Fudge Ripple is one of my favorites and also one of the worst (go figure).   History (#4) Believed to have originated around 2,000 BC in Greece […]

    National Cheesecake Day!

    June 30, 2016 is National Cheesecake Day, a likely commercially driven holiday to which I, for one, am happy to fall victim. Adam’s PB Cup Fudge Ripple is one of my favorites and also one of the worst (go figure).

     

    History (#4)

    • Believed to have originated around 2,000 BC in Greece
    • Was served to athletes in first Olympic games as a source of energy
    • Original recipe, documented in 230 AD was: mashed cheese, honey, flour heated into a mass
    • Around 18th century, more modern-like recipe emerged

     

    Facts

    • Sonya Thomas holds the record for eating 11 pounds of cheesecake in 9 min. (9/26/2004). (#5)
    • Largest Cheesecake weight 6,900 pounds and was formed in Lowville, NY on 9/21/2013.

    The cake measured 2.292 m (7 ft 6.25 in) in diameter, and .787 m (2 ft 7 in) tall. (#6)

    Cheesecake Factory

    Cheesecake Factory, back in 2013, began using IBM big data analytics to analyze consumption and ingredients on products across all their locations (#1). They also had 2.1b in revenue in 2015 (#3).

    Cheesecake Factory Nutrition Info
    Cheesecake Factory Nutrition Info (#2)

     

    Sources:

    1. https://www-03.ibm.com/press/us/en/pressrelease/40436.wss
    2. http://www.cheesecakefactorynutrition.com/restaurant-nutrition-chart.php
    3. http://www.statista.com/statistics/321517/revenue-of-the-cheesecake-factory/
    4. http://www.cheesecake.com/History-Of-Cheesecake.asp
    5. http://www.majorleagueeating.com/records.php
    6. http://www.guinnessworldrecords.com/world-records/largest-cheesecake
    ]]>
    National Cheesecake Day! June 30, 2016 is National Cheesecake Day, a likely commercially driven holiday to which I, for one, am happy to fall victim. Adam’s PB Cup Fudge Ripple is one of my favorites and also one of the worst (go figure). National Cheesecake Day! June 30, 2016 is National Cheesecake Day, a likely commercially driven holiday to which I, for one, am happy to fall victim. Adam’s PB Cup Fudge Ripple is one of my favorites and also one of the worst (go figure).   History (#4) Believed to have originated around 2,000 BC in Greece […] For the Love of Data full false 6:50
    005 For the Love of Fireworks – For the Love of Data https://www.fortheloveofdata.com/005-for-the-love-of-fireworks/?utm_source=rss&utm_medium=rss&utm_campaign=005-for-the-love-of-fireworks Tue, 28 Jun 2016 13:11:31 +0000 http://www.fortheloveofdata.com/?p=81 https://www.fortheloveofdata.com/005-for-the-love-of-fireworks/#respond https://www.fortheloveofdata.com/005-for-the-love-of-fireworks/feed/ 0 News: Big Data falls off the hype cycle: http://www.datasciencecentral.com/profiles/blogs/big-data-falls-off-the-hype-cycle Tableau 10 in beta: http://www.tableau.com/about/blog/2016/4/10-reasons-join-tableau-10-beta-53165 http://www.tableau.com/coming-soon Fireworks! NOTE: Overall, statistics are hard to follow across sites and even different reports from the same groups.    2015 Consumption Statistics: Consumption: 260.7 Million lbs. (Consumer), 24.6 million lbs. (Display) (#5 APA) *** The consumer weight of fireworks used […] News:
    Fireworks!
    NOTE: Overall, statistics are hard to follow across sites and even different reports from the same groups. 
     
    2015 Consumption Statistics:
    • Consumption: 260.7 Million lbs. (Consumer), 24.6 million lbs. (Display) (#5 APA)
      • *** The consumer weight of fireworks used is roughly equivalent to the weight of the entire population of Hawaii! ***
    • Revenue: $755 million (Consumer), $340 million (Display) (#5 APA). Focusing on the consumer spending:
      • This is over 100x more than the revenue Katy Perry would have generated with the 7 million U.S. sales of her song “Firework” on iTunes.
      • This is more than the all the money we spent at In ‘N Out Burger in 2015.
      • If one person spent the same amount on Roman Candles, it would take them over 1,000 years to use the amount of fireworks we purchase in a year.
     E005_fireworks_table
     
    Are fireworks Dangerous?
    • 67% of fireworks injuries occur around July 4th
    E005_cpsc_injuriesbyage
    • Injuries by Age: This graph makes it seem like young adults are most commonly injured, but when you look at 20 year bands, it breaks down differently:
      • 0-19 = 47%
      • 5-24 = 49%
      • 10-24 (smaller than 20yr band) = 32%
      • 25-44 = 34%
    • 12,000 fireworks injuries (CPSC) out of 31 million injuries = .04% of injuries are fireworks (#7 APA)
    • 11 Deaths (#9 CPSC)
    E005_apa_consumpinjuriesC:\Users\ROBERT~1.FUR\AppData\Local\Temp\enhtmlclip\Image(3).png
    – #8 APA
    • Usage is growing but injuries are falling according to the American Pyrotechnics Association
      • Injuries are falling while consumption goes up. Injuries are also falling as our population has increased. However, in absolute terms injuries are relatively constant
    E005_consumpinjuries_total
    • APA also contends that fireworks injuries are a small minority of total injuries to kids
    E005_apa_injury_piechartC:\Users\ROBERT~1.FUR\AppData\Local\Temp\enhtmlclip\Image(4).png
     – #10 APA
     
    Only three states (DE, MA, NJ) ban fireworks (#8 APA)
     
    Links:
    1. Consumer Products Safety Commission (CPSC) Fireworks Infographic –  http://www.cpsc.gov/PageFiles/150398/Fireworks-Infographic-2015-web.pdf?epslanguage=en
    2. National Fire Protection Agency –  http://www.nfpa.org/public-education/by-topic/outdoors-and-seasonal/fireworks/reports-and-statistics-about-fireworks
    3. Washington State Patrol –  http://www.wsp.wa.gov/fire/statistics.htm
    4. Statistics Brain (Various Sources) –  http://www.statisticbrain.com/firework-statistics/
    5. American Pyrotechincs Association –  http://www.americanpyro.com/industry-facts-figures
    6. 2015 US Population –  http://www.usnews.com/opinion/blogs/robert-schlesinger/2014/12/31/us-population-2015-320-million-and-world-population-72-billion
    7. Fireworks injuries in perspective –  http://www.americanpyro.com/assets/docs/FactsandFigures/fireworks%20injuries%20perspecitive.2016.pdf
    8. Fireworks liberalization-  http://www.americanpyro.com/assets/docs/FactsandFigures/consumpvinjuriesliberalizationgraph%201980-2010.pdf
    9. CPSC 2014 Fireworks Report –  http://www.cpsc.gov/en/Media/Documents/Research–Statistics/Injury-Statistics/Fuel-Lighters-and-Fireworks/2014-Fireworks-Annual-Report/?utm_source=rss&utm_medium=rss&utm_campaign=Fuel%2c+Lighters+and+Fireworks+Injury+Statistics
    10. APA Injuries to Children –  http://www.americanpyro.com/assets/docs/FactsandFigures/injuries%20to%20children%20ages%205-18%202016.pdf
    11. US Income –  http://www.deptofnumbers.com/income/us/
    12. NFPA Fireworks Info Sheet – http://www.nfpa.org/~/media/files/research/fact-sheets/fireworksfactsheet.pdf?la=en
    13. APA Fireworks Injures vs. Consumption – http://www.americanpyro.com/assets/docs/FactsandFigures/fireworks%20related%20injuries%20rtable%201976%20-2015.pdf
    14. Katy Perry Firework Wikipedia – https://en.wikipedia.org/wiki/Firework_(song)
    15. In ‘N Out Burger Sales – http://nrn.com/top-100/2015-top-100-restaurant-chain-countdown#slide-43-field_images-136081
    ]]>
    News: Big Data falls off the hype cycle: http://www.datasciencecentral.com/profiles/blogs/big-data-falls-off-the-hype-cycle Tableau 10 in beta: http://www.tableau.com/about/blog/2016/4/10-reasons-join-tableau-10-beta-53165 http://www.tableau. News: Big Data falls off the hype cycle: http://www.datasciencecentral.com/profiles/blogs/big-data-falls-off-the-hype-cycle Tableau 10 in beta: http://www.tableau.com/about/blog/2016/4/10-reasons-join-tableau-10-beta-53165 http://www.tableau.com/coming-soon Fireworks! NOTE: Overall, statistics are hard to follow across sites and even different reports from the same groups.    2015 Consumption Statistics: Consumption: 260.7 Million lbs. (Consumer), 24.6 million lbs. (Display) (#5 APA) *** The consumer weight of fireworks used […] For the Love of Data full false 20:35
    004 The History of Hadoop – For the Love of Data https://www.fortheloveofdata.com/004-history-of-hadoop/?utm_source=rss&utm_medium=rss&utm_campaign=004-history-of-hadoop Wed, 25 May 2016 03:17:54 +0000 http://www.fortheloveofdata.com/?p=59 https://www.fortheloveofdata.com/004-history-of-hadoop/#respond https://www.fortheloveofdata.com/004-history-of-hadoop/feed/ 0 Let me set the stage for you… It’s 2003: Chicago just won the Oscar for Best Picture and Grand Theft Auto: Vice City is the top selling video game. Apple iPods still have scroll wheels and iTunes just started selling music for the first time. From a tech standpoint, Windows XP is all the rage […] Let me set the stage for you…

    It’s 2003: Chicago just won the Oscar for Best Picture and Grand Theft Auto: Vice City is the top selling video game. Apple iPods still have scroll wheels and iTunes just started selling music for the first time. From a tech standpoint, Windows XP is all the rage as the latest Windows OS and folks with a lot of money to spend are buying PCs with a Pentium 4 3.0 GHz processor, 512MB of RAM (or maybe up to 2GB max), and an 80GB hard drive.  Oracle just released version 10g and Microsoft proponents are still using SQL Server 2000. Internet Explorer 6 dominants the browser wars with about 85% market share and two-thirds of the US still connect to the internet with a modem.

    (Stats from various Google searches, CNET desktop reviews, and http://www.internetworldstats.com/articles/art030.htm)

    In the years leading up to Hadoop’s inception, Doug Cutting, the first node in the Hadoop cluster, had been working on Lucene, a full text search libary, and then began work on indexing web pages with University of Washington graduate student Mike Cafarella. The project was called Apache Nutch, and it was a sub-project of Lucene. They made good progress getting Nutch to work on a single machine, but they reached the processing limits of that one machine and began manually clustering four machines together. The duo started to spend the majority of their time figuring out a way to scale the infrastructure layer for better indexing. In October 2003, Google released their Google File System paper. This paper did not describe exactly what Google did to implement their solution, but it was an excellent blueprint for what Cutting and Cafarella wanted to do. They spent most of the next year (2004) working on their implementation and labeled it the Nutch Distributed File System (NDFS). In this implementation, they made a key decision to replicate each chunk of data on multiple nodes, typically three, for redundancy.

     After solving for infrastructure redundancy, the team set their sights on improving the computational side and taking advantage of the stable fabric of nodes. Google again provided a spark of inspiration with their MapReduce research paper. The approach provided parallelization, distribution, and fault tolerance; all of these work in conjunction to work through tasks quickly, regardless of hardware failures that might occur along the way.

    In 2006, Cutting went to work for Yahoo, and the storage and compute components of Lucene separated into a sub-project called Hadoop. The name originated from a toy yellow elephant that belonged to Cutting’s son. In April Hadoop 0.1.0 was released and it sorted almost 2TB of data in 48 hours. By April of 2007 Yahoo was running two Hadoop clusters of 1,000 machines and other companies like Facebook and LinkedIn start to use the tool.

    By 2008, Hadoop hit critical mass along several fronts. Yahoo transitioned the search index that drove their website over to Hadoop and contributed Pig to the Apache Software Foundation. Facebook also contributed Hive, bringing SQL atop Hadoop. The product also spawned commercial legs when Cloudera was founded; Cutting joined their ranks the following year.

    In 2011 Hortonworks spun off from Yahoo, and the following year Yahoo’s Hadoop cluster reached 42,000 nodes. Also in 2012, Hadoop contributors began to replace MapReduce with YARN, an offshoot of MapReduce’s resource management and scheduling components. Late in the year Apache Hadoop 1.0 becomes generally available. In 2013, Yahoo begins YARN in production and Hadoop 2.2 debuts.

    Fast forward to today and several vast ecosystems exist around Hadoop in among different prepackaged distributions. The most popular of these are Cloudera, Hortonworks, and MapR. Below is a snapshot of Hortonworks and Cloudera’s packaged components:

    Hortonworks:
    hortonworks
    Cloudera:
     cloudera
    Sources:
    ]]>
    Let me set the stage for you… It’s 2003: Chicago just won the Oscar for Best Picture and Grand Theft Auto: Vice City is the top selling video game. Apple iPods still have scroll wheels and iTunes just started selling music for the first time. Let me set the stage for you… It’s 2003: Chicago just won the Oscar for Best Picture and Grand Theft Auto: Vice City is the top selling video game. Apple iPods still have scroll wheels and iTunes just started selling music for the first time. From a tech standpoint, Windows XP is all the rage […] For the Love of Data full false 7:38
    003 The Data of Taxes – For the Love of Data https://www.fortheloveofdata.com/003-the-data-of-taxes/?utm_source=rss&utm_medium=rss&utm_campaign=003-the-data-of-taxes Thu, 31 Mar 2016 22:06:14 +0000 http://www.fortheloveofdata.com/?p=49 https://www.fortheloveofdata.com/003-the-data-of-taxes/#respond https://www.fortheloveofdata.com/003-the-data-of-taxes/feed/ 0 Huge thanks to @Deepak90Mittal for hanging out with me on this episode! News Prologue: Gartner’s Magic Quadrant for BI is out – overhauled methodology, Oracle is out; Tableau and Microsoft (PowerBI) reign supreme! https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204&st=sb) SQL Server on Linux! = millions of geeks rejoice and it may spell the end of Windows in the data center. […] Huge thanks to @Deepak90Mittal for hanging out with me on this episode!

    News Prologue:

    1. Gartner’s Magic Quadrant for BI is out – overhauled methodology, Oracle is out; Tableau and Microsoft (PowerBI) reign supreme!
    2. SQL Server on Linux! = millions of geeks rejoice and it may spell the end of Windows in the data center.
    3. Excel is the most popular DataViz tool by a longshot, followed by Python, D3, and Tableau

    A) 2016 State Comparison:

    1. What you should drink Where – comparison of per gallon taxes on beer, wine and spirits converted to a per drink equivalency
      1. Beer – Missouri
      2. Wine – Lousiana
      3. Liquor – Missouri
      4. Best overall: Missouri, Wisconsin, California, Texas

    2. Tax Freedom Day (a.k.a., Working for the Man) –
      Interesting way to look at taxes – how long you have to work to cover federal, state, and local taxes for the year.
      E003-TaxFreedomDay
    3. Gas Prices – Texas is pretty low (#42) on gas taxes! PA is #1; NY is #3; CA is #5

    Gas tax rates 2016: http://taxfoundation.org/blog/state-gasoline-tax-rates-2016
    E003-GasTaxMap

    B) Federal Income Tax Stats

    A rough calculation of the rate at which individual tax returns are filed within the US:

    Start of Year:   1/1/2016
    Filing Date: 4/18/2016
    Days Elapsed: 108
    Total Est. Returns (using 2013 #): 138,313,155
    Total filed per day*:                   1,280,677
    Total filed per hour*:                         53,362
    Total filed per minute*: 889
    Total filed per second*: 15
    * All calculations rounded to nearest whole number

    Key Findings from the report (mostly using data from 2013):

    • In 2012, the top 50 percent of all taxpayers (69.2 million filers) paid 97.2 percent of all income taxes while the bottom 50 percent paid the remaining 2.8 percent.
    • The top 1 percent (1.3 million filers) paid a greater share of income taxes (37.8 percent) than the bottom 90 percent (124.5 million filers) combined (30.2 percent).
    • The top 1 percent of taxpayers paid a higher effective income tax rate than any other group, at 27.1 percent, which is over 8 times higher than taxpayers in the bottom 50 percent (3.3 percent).
    ]]>
    Huge thanks to @Deepak90Mittal for hanging out with me on this episode! News Prologue: Gartner’s Magic Quadrant for BI is out – overhauled methodology, Oracle is out; Tableau and Microsoft (PowerBI) reign supreme! https://www.gartner.com/doc/reprints? Huge thanks to @Deepak90Mittal for hanging out with me on this episode! News Prologue: Gartner’s Magic Quadrant for BI is out – overhauled methodology, Oracle is out; Tableau and Microsoft (PowerBI) reign supreme! https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204&st=sb) SQL Server on Linux! = millions of geeks rejoice and it may spell the end of Windows in the data center. […] For the Love of Data full false 29:30
    002 What Hot Models Look Like – For the Love of Data https://www.fortheloveofdata.com/002-what-hot-models-look-like/?utm_source=rss&utm_medium=rss&utm_campaign=002-what-hot-models-look-like Mon, 29 Feb 2016 07:04:07 +0000 http://www.fortheloveofdata.com/?p=37 https://www.fortheloveofdata.com/002-what-hot-models-look-like/#respond https://www.fortheloveofdata.com/002-what-hot-models-look-like/feed/ 0 Summary: Hot models…data models that is. A survey of many of the most popular data modeling approaches in the news today. Third Normal Form, Anchor Modeling, Data Vault, Data Lakes, Data Swamps. What do they do well, what do they do badly, and which is the one true data model to rule them all? (Hint: it depends, as usual.) Third […] Summary:
    Hot models…data models that is. A survey of many of the most popular data modeling approaches in the news today. Third Normal Form, Anchor Modeling, Data Vault, Data Lakes, Data Swamps. What do they do well, what do they do badly, and which is the one true data model to rule them all? (Hint: it depends, as usual.)
    Third Normal Form (3NF) (a.k.a. Naomi Sims)
    History: E.F. Codd defined 3NF in 1971 while working at IBM.
    Basic Concept:
    “The Key, the Whole Key, and Nothing but the Key” -Bill Kent
    The gold standard for purist relational database design. If a table has the following characteristics:
    1. 1NF – a) Values in a particular field must be atomic and b) a single row cannot have repeating groups of attributes
    2. 2NF – in addition to being in 1NF, all non-key attributes of the table depend on the primary key
    3. There is no transitive functional dependency
    Pros:
    • A battle-tested, well-understood modeling approach that is extremely useful for transactional (OLTP) applications
    • Easy to insert, update, delete data because of referential integrity
    • Avoids redundancy, requiring less space and less points of contact for data changes
    • Many software tools exist to automatically create, reverse engineer, and analyze databases according to 3NF
    • Writing to a 3NF DB is very efficient
    Cons:
    • Reading from a DB in 3NF is not as efficient
    • Not as easily accessed by end-users because of the increased number of joins
    • More difficult to produce analytics (trends, period-to-date aggregations, etc.)
    • Many times even transactional systems are slightly de-normalized from 3NF for performance or audit-ability
    • Some people feel that 3NF is no longer as appropriate in an era of cheap storage, incredibly fast computing, and APIs
     
    3nf
    Source: ewebarchitecture.com
     
    Anchor Modeling (incorporates Sixth Normal Form [6NF]) (a.k.a. Gisele Bundchen)
    History: Created in 2004 in Sweden
    Basic Concepts: Mimics a temporal database
    • anchors – entities or events
      • Example: A person
    • attributes – properties of anchors
      • Example: A person’s name; can be historical, such as favorite color)
    • ties – relationships between anchors
      • Example: Siblings
    • knots – shared properties, such as states or reference tables – combination of an anchor and a single attribute (no history)
      • Example: Gender – only male/female
    Pros:
    • Incremental change approach – previous versions of a schema are always encompassed in new changes, so backwards compatibility is always preserved
    • Reduced storage requirements by using knots
    Cons:
    • Many entities are created in the database
    • Joins become very complex; hard for end user to understand model
    • Daunting for new technical resources to come up to speed initially
     
    anchor
    Source: bifuture.blogspot.com
     
     
    Data Vault (DV) (a.k.a. Heidi Klum)
    History: Dan Linstedt developed the started implementing data vaults in 1990 and published the first version of the methodology (DV1.0) in 2000. He published an updated version (DV2.0) in 2013. The methodology is proprietary and Dan restricts who can train others by maintaining a copyright on the methodology and requiring people who train others to be Data Vault certified. You can still implement data vaults; you just cannot train others on it without being certified.
    Basic Concept:
    “A single version of the facts (not a single version of the truth)”
    “All the data, all the time” – Dan Linstedt
    The data fault consists of three primary structures and supporting structures such as reference tables and point-in-time bridge tables. The three main structures are:
    1. Hubs – a list of unique business keys that change infrequently with no other descriptive attributes (except for meta data about load times and data sources). A good example of this is a car or a driver.
    2. Links – relationships or transactions between hubs. These only define the link between entities and can easily support many-to-many relationships; again no descriptive attributes on these tables other than a few meta-attributes. An example of this would be a link between cars and their drivers.
    3. Satellites – Satellites may attach to hubs or links and are descriptive attributes about the entity to which they connect. A satellite for a car hub could describe the year, make, model, current value, etc. These often have some sort of effective dating.
    General best practices:
    • Separate attributes from different source systems into their own satellites, at least in a raw data vault. Using this approach it may be common to have a raw data vault that contains source system specific information with all history and attributes maintained and a second downstream business data vault. The business data vault will contain only the relevant attributes, history, or merged data sets that have meaning to the users of that vault.
      • Having a raw mart allows you to preserve all historical data and rebuild the business vault if needs change without having to go back to source systems and without losing data if it is no longer available the source system.
    • Track all changes to all elements so that your data vault contains a complete history of all changes.
    • Start small with a few sources and grow over time. You don’t have to adopt a big bang approach and you can derive value quickly.
    • It is acceptable to add new satellites when changes occur in the source system. This allows you to iteratively develop your ETL without breaking previous ETL routines already created and tested.
    DV2.0 – DV1.0 was merely the model. DV2.0 is:
    • An updated modeling approach. Key changes include:
      • Numeric IDs are replaced with hash values, created in the staging area, that support better integration with NoSQL repositories
      • Because hashes are used, you can parallelize data loads even further because you do not have to lookup a surrogate ID if you have the business key to hash from when you’re bringing in data. This means you can load hubs, links, and satellites at the same time in some cases
      • Referential integrity is disabled during loading
    • Recommended architectures around staging areas, marts, virtualization, and NoSQL
    • Additional methodology recommendations around Agile, Sixth Sigma, CMMI, TQM, etc.
    Pros:
    • Preserves all data, all the time – this provides the capability for tremendous analysis and responding to changing business needs. The approach allows you to obtain data from multiple sources iteratively and rapidly, preserving backwards compatibility
    • Works extremely well with massively parallel processing (MPP) databases and hardware
    • Can be loaded extremely rapidly, particularly using the DV2.0 modeling approach
    • Lends itself very well to ETL and DW automation/virtualization
    • DV2.0 covers a wide spectrum of modeling needs from staging and marts to methodology
    Cons:
    • The data model can spawn a lot of tables and make queries very complicated very quickly.
    • The raw data mart is really not meant for end users to query/explore directly
    • Iterative additions make the data model more complicated
    • Although storage may be cheap, keeping all changes for all data in all sources can lead to data sprawl. This also makes a pared down information mart almost a necessity.
    • Raw DV data is not cleansed and data from multiple sources are not blended when being stored
     dv
    Data Lake (DL) (a.k.a. Brooklyn Decker)
    History: Term was coined by Pentaho CTO James Dixon in a blog post in 2010 referring to Pentaho’s data architecture approach to storing data in Hadoop.
    Basic Concept: A massive, big data repository, typically on Hadoop or HDFS, at least. Key points are that it is:
    1. Schema-less – data is written to the lake in its raw form without cleansing
    2. Ingests different types of data (relational, event-based, documents, etc.) in batch and/or real-time streaming
    3. Automated meta data management – a best practice is to use tools to automatically catalog meta data to track available attributes, last access times, data lineage, and data quality
    4. Typically multiple products are used to load data into and read data from the lake
    5. Rapid ability to ingest new data sources
    6. Typically only a destination; it is usually not a source from which operational systems will source data
    Pros:
    • Useful when you do not know what attributes will be needed or used.
    • Schema on Read – can ingest any type of data and allow different users to assess value during analysis
    • Extremely large scale at low to moderate cost
    • Can and will use a variety of tools/technologies to analyze/visualize/massage data into a useful form
    Cons:
    • Can me seen as a vast wasteland of disorganized data, particularly without good meta data
    • Consumers must understand raw data in various systems to know how to integrate and cleanse it in order to derive meaningful information
    • High likelihood that different consumers will perform very similar operations to retrieve data (i.e., overlap and duplication of efforts). Slight differences between groups can lead to reconciling differences
    • Uncleansed data and multiple versions of the same data may possibly lead to duplication if not handled/filtered carefully
    • It isn’t SQL – Some users will have to use more than just SQL to derive useful information from data
      • Offloading ETL can require significant rework of existing processes to move to something like Hive
    • Using multiple tool sets can lead to training and supportability challenges if not governed properly
    • Data curation can by very challenging
     datalake
     
    Data Swamp (DS) (a.k.a. Tyra Banks)
    History: I’m not including a lot of history here, because this is really an extension of a Data Lake (gone bad).
    Basic Concept: A data swamp is a data lake that has been poorly maintained or documented, lacks meta data, or has so much raw data that you don’t know where to start for insights. Or, it could be a combination of several of those points. When you start tracking tons of data from all different sources, but you don’t know who is using what, how to merge data sets, or how to use most of the data in your “data lake”, you’ve really got a data swamp.
    Pros:
    • Hey, you must’ve done something right to get all that data into the repository…?
    • At least you haven’t lost data that you can’t go back and get.
    • If it were easy, everyone would be doing it 🙂
    Cons:
    • You’ve likely spent a lot of time and effort putting in a data lake/HDFS/Hadoop/Hive/etc. and you’re struggling to operate it at scale or to answer the questions you set out to answer.
    • You need meta data to clue users into what is most useful, relevant, or recent
    • You probably need to look into key use cases (low hanging fruit) and start from that point as a place to begin using/resuscitating your repository.
    *** The assignment of model names to each data model was an incredibly (un)scientific process of googling various terms like “most famous supermodel <year>”, “<year> top supermodel”, etc. and teasing out the most likely #1. Feel free to disagree and let me know your vote and how you obtained it.
    ]]>
    Summary: Hot models…data models that is. A survey of many of the most popular data modeling approaches in the news today. Third Normal Form, Anchor Modeling, Data Vault, Data Lakes, Data Swamps. What do they do well, what do they do badly, Summary: Hot models…data models that is. A survey of many of the most popular data modeling approaches in the news today. Third Normal Form, Anchor Modeling, Data Vault, Data Lakes, Data Swamps. What do they do well, what do they do badly, and which is the one true data model to rule them all? (Hint: it depends, as usual.) Third […] For the Love of Data full false 33:21
    001 The Data of Church – For the Love of Data https://www.fortheloveofdata.com/001-the-data-of-church-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=001-the-data-of-church-for-the-love-of-data Fri, 11 Dec 2015 22:04:48 +0000 http://www.fortheloveofdata.com/?p=25 https://www.fortheloveofdata.com/001-the-data-of-church-for-the-love-of-data/#respond https://www.fortheloveofdata.com/001-the-data-of-church-for-the-love-of-data/feed/ 0 Churches have a wealth of data that other organizations could only dream about–a weekly stream of attendees and donors who also participate in a wide variety of activities around the organization. In this episode, I sit down with Glen Brechner, the Executive Director of Chase Oaks Church in Plano. Chase Oaks is one of the top […] Churches have a wealth of data that other organizations could only dream about–a weekly stream of attendees and donors who also participate in a wide variety of activities around the organization. In this episode, I sit down with Glen Brechner, the Executive Director of Chase Oaks Church in Plano. Chase Oaks is one of the top twenty churches in the DFW metroplex and is in the top 20% of megachurches in the US.

    We discuss how they track member participation and donation information, how they consolidate and align data across multiple campuses, and challenges and opportunities they see with data.

    Some of the tools they use include:

    • Excel (doesn’t everybody!)
    • Mortarstone
    • Shelby
    • Arena

    Please leave a comment about the episode and let me know if you have any questions.

    ]]>
    Churches have a wealth of data that other organizations could only dream about–a weekly stream of attendees and donors who also participate in a wide variety of activities around the organization. In this episode, I sit down with Glen Brechner, Churches have a wealth of data that other organizations could only dream about–a weekly stream of attendees and donors who also participate in a wide variety of activities around the organization. In this episode, I sit down with Glen Brechner, the Executive Director of Chase Oaks Church in Plano. Chase Oaks is one of the top […] For the Love of Data full false 43:23
    000 Introducing “For the Love of Data with Robert Furr” (and what it means for you) https://www.fortheloveofdata.com/e0/?utm_source=rss&utm_medium=rss&utm_campaign=e0 Mon, 19 Oct 2015 04:53:23 +0000 http://www.fortheloveofdata.com/?p=13 https://www.fortheloveofdata.com/e0/#respond https://www.fortheloveofdata.com/e0/feed/ 0 Data, Analytics, Business Intelligence… how do I keep track of what is going on in this ever-expanding technology realm? “For the Love of Data” is a monthly podcast covering data, big data, huge data, tiny data, analytics, and business intelligence trends across the industry. Join the discussion, write a review, or give us your feedback on our […] Data, Analytics, Business Intelligence… how do I keep track of what is going on in this ever-expanding technology realm? “For the Love of Data” is a monthly podcast covering data, big data, huge data, tiny data, analytics, and business intelligence trends across the industry. Join the discussion, write a review, or give us your feedback on our site.

    This introductory episode covers the podcast’s format, why I want to do it (because I love data!), and who may benefit from listening.

    ]]>
    Data, Analytics, Business Intelligence… how do I keep track of what is going on in this ever-expanding technology realm? “For the Love of Data” is a monthly podcast covering data, big data, huge data, tiny data, analytics, Data, Analytics, Business Intelligence… how do I keep track of what is going on in this ever-expanding technology realm? “For the Love of Data” is a monthly podcast covering data, big data, huge data, tiny data, analytics, and business intelligence trends across the industry. Join the discussion, write a review, or give us your feedback on our […] For the Love of Data full false 6:16