Uncategorized – For the Love of Data https://www.fortheloveofdata.com We love data and how it intersects with news, products, technologies, and companies. Listen to our podcast and join the discussion to stay informed on the latest and greatest in the world of BI and analytics. Sun, 26 Mar 2023 02:50:03 +0000 en-US hourly 1 https://i0.wp.com/www.fortheloveofdata.com/wp-content/uploads/2015/10/drawing400x400.png?fit=32%2C32&ssl=1 Uncategorized – For the Love of Data https://www.fortheloveofdata.com 32 32 For the Love of Data is a monthly podcast devoted to all things data from industry news, new products, and cool data visualizations. Host Robert Furr and others hold discussions, interviews, reviews, and arguements to determine where the information technology industry is heading, with an emphasis on Business Intelligence (BI), Information Management (IM), and data analytics. Topics like data science, analytics, strategy, and governance are just a few of the topics on the table. SQL, NoSQL, Tableau, R, Oracle, MySQL, SQL Server... these are just a few of many tools we will noodle on during each episode. Uncategorized – For the Love of Data false episodic Uncategorized – For the Love of Data admin@fortheloveofdata.com podcast Insight on the latest in the world of data, analytics and BI Uncategorized – For the Love of Data https://fortheloveofdata.com/wp-content/uploads/powerpress/drawing3000x3000.jpg https://www.fortheloveofdata.com/feed/podcast/ c9c7bad3-4712-514e-9ebd-d1e208fa1b76 E032 – 2018 State of DevOps Report https://www.fortheloveofdata.com/e32/?utm_source=rss&utm_medium=rss&utm_campaign=e32 Sun, 30 Sep 2018 04:50:10 +0000 http://www.fortheloveofdata.com/?p=386
  • Intro
  • Greg’s Background
  • Intro to DevOps
  • Tools you’ve used
  • Intro to the report & this year vs. previous years
    • Feels more general and high-level than some of the previous reports (no MTTR mentioned for instance)
  • Who took the survey?
    • Surveyed over 30,000 people in 7 years (~4,300 / yr)
    • Technology is overrepresented – 38% of total respondents
    • Energy & Resources was only 2%
    • Tech + FS = 50%
    • Infosec = only 3% of people
    • 29% were dedicated DevOps (14% IT, 15% Dev/Eng)
  • Keywords

    • Data is mentioned 43 times in the report
    • Security = 65x
    • Agile = 7
    • DevOps = 328
    • Top 10 words in Word Cloud: (removed Puppet | State of DevOps footer on each page)
      • 245DevOps
      • 210teams
      • 149Stage
      • 123practices
      • 111can
      • 99team
      • 79organizations
      • 75services
      • 73business
      • 69success
    • No DataOps, no SecOps or DevSecOps
  • C-suite seems out of touch with conditions on the ground
    • Differences in perception – p. 30
    • Sometimes overstate team’s opinion by a factor of 2x
  • Stages in the report:

    • First
      • Stage 0: Build the foundation
    • Second
      • Stage 1: Normalize the technology stack
      • Stage 2: Standardize and reduce variability
      • Stage 3: Expand DevOps practices
    • Third
      • Stage 4: Automate infrastructure delivery
      • Stage 5: Provide self-service capabilities
  • CAMS = Culture, Automation, Measurability, Sharing
  • Principal Industries:
    • Top: Tech, Financial Services, Manufacturing/Industry
    • Bottom: Non-Profit, Energy/Resources, Media
    • Trend: Most to least competition?
  • Music:

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    ]]>
    Intro Greg’s Background Intro to DevOps Tools you’ve used Intro to the report & this year vs. previous years Feels more general and high-level than some of the previous reports (no MTTR mentioned for instance) Who took the survey? Surveyed over 30, Intro Greg’s Background Intro to DevOps Tools you’ve used Intro to the report & this year vs. previous years Feels more general and high-level than some of the previous reports (no MTTR mentioned for instance) Who took the survey? Surveyed over 30,000 people in 7 years (~4,300 / yr) Technology is overrepresented – 38% of […] Uncategorized – For the Love of Data full false 45:52
    E031 – Data Collaboration with Cursor https://www.fortheloveofdata.com/e31/?utm_source=rss&utm_medium=rss&utm_campaign=e31 Thu, 30 Aug 2018 23:50:23 +0000 http://www.fortheloveofdata.com/?p=378 Learn about Cursor, a new platform for collaboration around data, hosted platforms and BI artifacts. I sat down with Adam Weinstein, CEO and Co-Founder of Cursor, to learn about the platform.

    About Cursor

    Cursor offers a data search and analytics hub that makes disparate data accessible and actionable, enabling technical and business users alike to effortlessly get answers, collaborate and gain insights. Founded by a trio of data leaders from Salesforce, LinkedIn, and Pandora, Cursor’s easy-to-deploy software has been adopted by teams at Apple, Atlassian, Deloitte, Incedo, LinkedIn, NovumRx, and Slack. Cursor is based in San Francisco, CA.

    Cursor Press

    Topics:

    1. What is Adam’s background?
    2. How BI has evolved over the past 10-20 years.
    3. What some of the most pressing challenges are for organizations today?
    4. What should people being doing today, outside of a specific tool, to get better at collaborating?
    5. How can Cursor help with those challenges?
    6. How is content secured on the platform? (separating data from metadata)
    7. Where can people find out more about Cursor?
    8. What’s next for Cursor as far as features or a roadmap?
    9. What are some tools that Adam can’t live without in your daily work?

    Music:

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    ]]>
    Learn about Cursor, a new platform for collaboration around data, hosted platforms and BI artifacts. I sat down with Adam Weinstein, CEO and Co-Founder of Cursor, to learn about the platform. About Cursor Cursor offers a data search and analytics hub t... Learn about Cursor, a new platform for collaboration around data, hosted platforms and BI artifacts. I sat down with Adam Weinstein, CEO and Co-Founder of Cursor, to learn about the platform. About Cursor Cursor offers a data search and analytics hub that makes disparate data accessible and actionable, enabling technical and business users alike to […] Uncategorized – For the Love of Data full false 40:28
    E030 – July 2018 News Roundup https://www.fortheloveofdata.com/e30/?utm_source=rss&utm_medium=rss&utm_campaign=e30 Tue, 31 Jul 2018 01:49:19 +0000 http://www.fortheloveofdata.com/?p=374 July 2018 News Roundup

    This month’s episode is a roundup of news from a variety of sources covering three main topics:

    1. BI / Dataviz Tools
    2. Databases and Platforms
    3. Tools and Frameworks

    Note: Most of the text extracts below are direct quotations from new sources cited in the source list at the bottom of these show notes. This episode is a compilation from those sources.

    BI / Dataviz Tools

    PowerBI enhancements (7/12/18)

    • Microsoft has updated its Power BI analytics service in an effort to expand data prep capabilities and unify data analytics across platforms.
    • “Using the Power Query experience familiar to millions of Power BI Desktop and Excel users, business analysts can ingest, transform, integrate and enrich big data directly in the Power BI web service – including data from a large and growing set of supported on-premises and cloud-based data sources, such as Dynamics 365, Salesforce, Azure SQL Data Warehouse, Excel and SharePoint,” the post reads.
    • Power BI now supports data in Azure Data Lake Storage, and integrates with SQL Server Analysis Services and SQL Server Reporting Services.
    • Microsoft today announced the general availability of Visio Visual for Power BI. Based on the feedback collected from the customers during the preview period, Microsoft has made the following changes to the Visio Visual:
      • Support for Power BI Mobile app
      • The ability to change the diagram link embedded earlier and to copy an embedded link to the clipboard
      • Configurable auto-zoom settings that can be turned on and off
      • Support for complex diagrams using layers
      • Overall performance improvements

    Tableau acquires Empirical Systems

    • Tableau last month announced the acquisition of Empirical Systems, an artificial intelligence (AI) startup with an automated discovery and analysis engine designed to spot influencers, key drivers, and exceptions in data.

    Looker Enhances Data Science Capability with Integration for Google Cloud BigQuery ML

    • With Looker and BQML, data teams can now save time and eliminate unnecessary processes by creating machine learning (ML) models directly in Google BigQuery via Looker – without the need to transfer data into additional ML tools. BQML predictive functionality will also be integrated into new or existing Looker Blocks allowing users to surface predictive measures in dashboards and applications.

    DBs and Platforms

    MemSQL Unveil Significant Update to Database for Real-time Modern Applications and Analytical Systems (Version 6.5 released)

    • Queries are now up to four times faster than the previous MemSQL version (which was already 10x faster than legacy database providers), enabling insights in milliseconds across billions of rows.
    • New automated workload optimization capabilities provide a consistent database response under ultra-high concurrency without the need for manual tuning or specialized DBA resources.
    • Additions to the MemSQL industry-leading “transform-as-you-ingest” capabilities allow customers to use stored procedures for in-database transformations to easily build real-time data pipelines.
    • Resource optimization improvements for multi-tenant deployments deliver greater control and scalability for varied database sizes whether on-premises or in the cloud.

    Hortonworks Data Platform 3.0

    • Even a Hadoop stalwart such as Hortonworks Inc. sees the writing on the wall, which is why, in its recent 3.0 release, it emphasized heterogeneous object storage. The new Hortonworks Data Platform 3.0 supports data storage in all of the major public-cloud object stores, including Amazon S3, Azure Storage Blob, Azure Data Lake, Google Cloud Storage and AWS Elastic MapReduce File System.
    • HDP’s latest storage enhancements include a consistency layer, NameNode enhancements to support scale-out persistence of billions of files with lower storage overhead, and storage-efficiency enhancements such as support for erasure coding across heterogeneous volumes. HDP workloads access non-HDFS cloud storage environments via the Hadoop Compatible File System API.
    • My thoughts: Are Hadoop and HDFS Dying???
    • As we are heading into the fourth industrial revolution, HDP 3.0 is a giant leap for the Big Data ecosystem, with major changes across the stack and expanded eco-system (Deep Learning and 3rd Party Dockerized Apps). HDP 3.0 can be deployed both on-premise and in the major cloud platforms – AWS, Microsoft Azure, and Google Cloud. Many of the HDP 3.0 new features are based on Apache Hadoop 3.1 and include containerization, GPU support, Erasure Coding and Namenode Federation. In order to provide a Trusted Data Lake, we are installing Apache Ranger and Apache Atlas by default with HDP 3.0. In order to streamline the stack, we have removed components such as Apache Falcon, Apache Mahout, Apache Flume, and Apache Hue, and absorbed Apache Slider functionalities into Apache YARN.

    Tools and Frameworks

    Python 3.7.0 is now available

    • Data classes that reduce boilerplate when working with data in classes.
    • A potentially backward-incompatible change involving the handling of exceptions in generators.
    • A “development mode” for the interpreter.
    • Nanosecond-resolution time objects.
    • UTF-8 mode that uses UTF-8 encoding by default in the environment.
    • A new built-in for triggering the debugger.
    • Easier access to debuggers through a new breakpoint() built-in
    • Simple class creation using data classes
    • Customized access to module attributes
    • Improved support for type hinting
    • Higher precision timing functions
    • More importantly, Python 3.7 is fast.
      • Each new release of Python comes with a set of optimizations. In Python 3.7, there are some significant speed-ups, including:
        • There is less overhead in calling many methods in the standard library.
        • Method calls are up to 20% faster in general.
        • The startup time of Python itself is reduced by 10-30%.
        • Importing typing is 7 times faster.
    • You can easily get an idea of how much time the imports in your script takes, using -X importtime:

    Apache OpenNLP 1.9.0 released

    • The Apache OpenNLP team is pleased to announce the release of Apache OpenNLP 1.9.0.
    • The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text.
    • It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution.
    • Apache OpenNLP 1.9.0 binary and source distributions are available for download from our download page: download page
    • The OpenNLP library is distributed by Maven Central as well. See the Maven Dependency page for more details: Maven Dependency
    • What’s new in Apache OpenNLP 1.9.0
      • This release introduces new features, improvements and bug fixes. Java 1.8 and Maven 3.3.9 are required.
      • Additionally the release contains the following changes:
        • Brat Document Parser should support name type filters
        • Brat format support fails on multi fragment annotations
        • Remove MD5 hashes from Release process
        • Use String[] instead of StringList in LanguageModel API
        • BRAT Annotator service Fails to start
        • Token model creation fails without at least one <SPLIT> tag
        • Update Penn Treebank URL
        • Explain the new format of feature generator XML config
        • Unify code to sum up input context features
        • FeatureGeneratorUtil can recognize Japanese Hiragana and Katakana letters

    TensorFlow 1.9.0

     

    PYPL Language Rankings: Python ranks #1, R at #7 in popularity

    Music:

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    ]]>
    July 2018 News Roundup This month’s episode is a roundup of news from a variety of sources covering three main topics: BI / Dataviz Tools Databases and Platforms Tools and Frameworks Note: Most of the text extracts below are direct quotations from new ... July 2018 News Roundup This month’s episode is a roundup of news from a variety of sources covering three main topics: BI / Dataviz Tools Databases and Platforms Tools and Frameworks Note: Most of the text extracts below are direct quotations from new sources cited in the source list at the bottom of these show […] Uncategorized – For the Love of Data full false 15:37
    E29 – Is Data The New Oil? https://www.fortheloveofdata.com/e29/?utm_source=rss&utm_medium=rss&utm_campaign=e29 Fri, 29 Jun 2018 11:56:28 +0000 http://www.fortheloveofdata.com/?p=366 Is Data the New Oil?
    • Concept originated by Clive Humby, the British mathematician who established Tesco’s Clubcard loyalty program. Humby highlighted the fact that, although inherently valuable, data needs processing, just as oil needs refining before its true value can be unlocked.
    • Why it is the new oil
      • Valuable commodity
      • Different uses among many applications
      • Currently the big buzz of most large companies (Google, Facebook, Apple, etc.)
      • Quantity is generally better in both
      • AI is the darling of so many industries right now, and it is entirely dependent on data
      • There are ethical concerns with how we source and use this, just like there were and are geopolitical and ethical concerns with how we source and use Oil
      • Certain things cannot function (currently) without oil (passenger airplanes, boats)
        • Same with data: Oil & Gas, Netflix, Agriculture, Manufacturing, Healthcare, a general enabler
    • Why it isn’t the new oil
      • Oil is finite, but data is not
        • Rob’s Counterpoint: There is a shelf life on data that makes it less usable over time
      • Data does not have a standard price benchmark like oil
      • Not a physical asset; can be duplicated or shared relatively easily
      • Oil requires huge amounts of resources to recover and transport
        • Rob’s Counterpoint: building a successful “app” with the scale to generate meaningful data does have some costs, albeit not the scale of oil
      • Data is more useful the more that it is used, whereas oil loses energy the more it is used/processed
        • Rob’s Counterpoint: Oil is not useful by itself to most people; it’s really the product oil becomes or enables that is useful

    The Data of Oil

    • Difference between operating on surface vs. subsea: small tubing error occurs…
      • Surface: 2-3 hours downtime; a few thousand $$ to fix
      • Subsea: 3 months downtime, $40-50mm to fix, not including lost revenue due to  deferred production (ex. 15,000 bpd well * $67/barrel * 90 days = $90.45mm)
    • A good sized offshore platform generates revenue greater than the entire country of Belize ($2.3bn vs. $1.8bn)
    • Of all the oil we can find, we generally only recover 10-20% in a field with current technology
    • 45-50% of oil generated in the US is used for transportation
    • US consumption per day is about 2 ½ gallons of crude oil / day / person
    • The U.S. has 4% of the world’s population but uses 25% of the world’s oil
    • Total daily oil consumption around the world is 84,249,000 barrels/day
    • Top 3 countries by proven oil reserves are: Venezuela, Saudi Arabia, Canada; US is #10
    • Gas is 12,200 Wh/kg vs. Li-Ion at 265 Wh/kg (~46x more energy dense)
    • MTTF (Mean time to failure) – 500 years on some parts – needed to operate in subsea environments for 30 years
    • Area of dinner plate = 10.5”
      • Area = Pi * R^2 * 20,000 PSI
      • Area = ¼ * Pi * D^2 * 20,000 PSI
      • 0.25 * pi * 10.5 * 10.5 * 20,000 = 1.73180295029137E6
      • = 1,731,802 pounds on a single dinner plate (equivalent to ~9 737 Jets)
    • Length Records
      • Analogy: Standing on top of the Empire State Building in NYC and trying to put a straw in a coke can sitting on the sidewalk below
      • Deepest Well (scientific study) = Kola Superdeep Borehole= 40,230 ft.
      • CHAYVO WELL – SAKHALIN-I PROJECT-The current world record holder for longest well; depth of 44,291 feet with a horizontal reach of 39,478 feet
      • DEEPWATER HORIZON – drilled the deepest oil well in history. The well was drilled to 35,050 vertical depth

      Well depth by year
      Well depth by year

    Music:

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    Is Data the New Oil?

    The Data of Oil Sources:

    Other Data is the New Oil Sources:

    Other Oil Sources:

    ]]>
    Is Data the New Oil? Concept originated by Clive Humby, the British mathematician who established Tesco’s Clubcard loyalty program. Humby highlighted the fact that, although inherently valuable, data needs processing, Is Data the New Oil? Concept originated by Clive Humby, the British mathematician who established Tesco’s Clubcard loyalty program. Humby highlighted the fact that, although inherently valuable, data needs processing, just as oil needs refining before its true value can be unlocked. Why it is the new oil Valuable commodity Different uses among many applications […] Uncategorized – For the Love of Data full false 59:55
    E028 – Bimodal BI and Data Virtualization https://www.fortheloveofdata.com/e28/?utm_source=rss&utm_medium=rss&utm_campaign=e28 Sun, 27 May 2018 04:31:15 +0000 http://www.fortheloveofdata.com/?p=363 Today we’re back with another guest from the Netherlands. I’m not sure what it is about the Dutch, but they’ve been on a roll with some helpful thought leadership when it comes to data. My guest is Rick van der Lans, a highly-respected analyst, consultant, author, and international lecturer specializing in data warehousing, business intelligence, big data, and database technology.

    I came across one of Rick’s whitepapers a few months ago on data virtualization. We got in touch and sat down to talk more in depth about the topic. Rick has a lot of data street cred. For many years, he has served as the chairman of the annual European Enterprise Data and Business Intelligence Conference in London and the annual Data Warehousing and Business Intelligence Summit in The Netherlands. He has written tons of articles, blogs, and several books, including the first book on SQL. There will be links to some of the places and things Rick has written and other info in the show notes below.

    Topics:

    • Rick’s background: author, blogger, consultant – worked on data virtualization (DV) for last 6-7 years
    • How did Rick get interested in DV?
    • Classical data warehouses vs. logical data warehouses
    • What is bi-modal BI? (term introduced by Gartner in 2014)
      • Agile/Self-Service vs. longer, more cautious approach
    • Bi-modal BI vs. the Data Quadrant
    • Comparison of major Data Virtualization Vendors
      • Denodo
      • Tibco DV Manager (bought from Cisco recently)
      • Red Hat
      • Data Virtuality Ultrarep
      • Others (AtScale, Cero, StoneBond, IBM – new entry acquired from Rocket Software)
      • Some are more mature, some are newer (Denodo vs. Tibco = green apples vs. red apples)
    • Companies rolling their own DV (in-memory / views vs. a dedicated tool)
    • DV products are not DB views on steroids
    • Lineage / impact analysis and other features
    • Caching vs. materialization – can store cached data in a virtual table in an intermediary data store. Can be help performance or prevent interference from a transactional source (keeping results consistent for an entire week).
    • How DV can help organizations that are struggling
    • How DV may not be a silver bullet
    • How are different industries embracing these principles?
    • What patterns do you see in companies embracing these principles?
    • What companies should not use this? DV not great at this time on unstructured audio / video, auto-tagging of images
    • Why a classical DWH experienced person may fail at DV
    • What are the warning signs that a DV is going off the rails?
      • Fuzzy logic needed to combine disparate sources
      • Not an integration cureall
      • How you deploy these with projects
    • How to get started? (pick a single, sexy report as a starting point)
    • Where do you go next?  (how to unify other data delivery systems, data marketplaces, API gateways)
    • How to avoid misconceptions about DV (it is slow, only about integration, etc.)
    • How to contact Rick
    • The first book on SQL

    Places to find Rick’s work:

    He has published blogs for the following websites:

    He has written the following books:

    Music:

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    ]]>
    Today we’re back with another guest from the Netherlands. I’m not sure what it is about the Dutch, but they’ve been on a roll with some helpful thought leadership when it comes to data. My guest is Rick van der Lans, a highly-respected analyst, Today we’re back with another guest from the Netherlands. I’m not sure what it is about the Dutch, but they’ve been on a roll with some helpful thought leadership when it comes to data. My guest is Rick van der Lans, a highly-respected analyst, consultant, author, and international lecturer specializing in data warehousing, business intelligence, big […] Uncategorized – For the Love of Data full false 48:27
    E025 – The Hype of AI – Transcription https://www.fortheloveofdata.com/e025-the-hype-of-ai-transcription/?utm_source=rss&utm_medium=rss&utm_campaign=e025-the-hype-of-ai-transcription Wed, 28 Feb 2018 05:28:25 +0000 http://www.fortheloveofdata.com/?p=337 Today’s episode will be transcribed using machine learning from webASR, a free service provided through the University of Sheffield’s Machine Intelligence for Natural Interfaces (MINI).

    Buyer Beware: This is not a perfect transcription! We highly recommend you listen to the episode.

    Segment [112.73 sec.-137.58 sec.], Speaker 004
    All right everybody i’m here with dorothy’s fitzpatrick e. me energy principle from go he is gonna join me today to talk about a topic that i’m really excited about because it’s something that we hear about in the media lot these days is wearing it are gonna ee i. going to talk about the hyper ve so dorothy’s tell us a little bit about yourself in what your topic is.

    Segment [137.59 sec.-155.82 sec.], Speaker 007
    O. k. robert dah it’s patrick i’m a managing crisp like co based in chicago and leader with the date and alex team here a cool and for those you don’t know where the largest management consulting company that folks is just on the financial services industry.

    Segment [158.11 sec.-274.89 sec.], Speaker 009
    You want to know what is so interesting about artificial intelligence and why i think it’s hyped up a little well um as with any new technologies is a period of understanding what exactly it is and how it can be applied original intelligence has been around for a very long time they call time was coined back in the sixties and what’s interesting about to me is that it’s going to be this this cycle right now there’s hope cycle um i love going from something as can solve all the world’s problems to the realities of hot classically done with it today um there was back in the seventies and eighties a lot of problems round here i am indeed we’re back to the thirties people have been talking about machines taking over their jobs through our nation um and what really struck me recently was so came up to me and said i’ve got to steal set that we need some help with can you just get some out one of your your one of your e. i. machine only i grew them to work out and fix it for us and what really struck me was even within our own industry was in our own company there are people that don’t quite get what’s possible what’s not possible official intelligence so um so that’s that’s why i think it’s interesting topic i will go ahead to find artificial intelligence for you and the way that i see it is that it is and a ticking the function of being able to make decisions um and by decisions i mean anything around pattern recognition or helping to decide whether a credit card should be issued an art or um you know determining whether an egg is bad not and minute that in some way using using machines and and.

    Segment [274.93 sec.-308.48 sec.], Speaker 009
    More so there are two different ways of looking at official intelligence one is that you simply take a decision and codified in some it was rules um or you can go step further and ashley have machine and try and learn the rules itself through neural networks for example machinery and then and come up with the answer as the second case is very interesting because it may come up
    with unexpected patterns and for what for what i think the way that he thinks the world works.

    Segment [309.02 sec.-508.14 sec.], Speaker 011
    Absolutely i’m glad you find it in that context because i know the term i means many things to many people in for some people the scenes of really starts to pop up when we start seeing things like g. i. and so for the purposes of this discussion we’re not talking about eighteen eighty four he generally i like some philosophers talk about i think were working through any more to a business context is that a fair assessment i think we talk about the hype around us um that is that is true the recurrence why i was estimated that around twenty twenty five will be something called a singularity which is when human and machine intelligence will come together and this is an important topic of people don’t all fun talk about one went on with the realities of a. i. it’s not in human versus machine context it’s in an augmentation or current context so around twenty three five that yes by the the combined intelligence of machines and humans will explode and and indeed she’s me take over humans and in terms of pottery able to do but today were you yeah we’re talking about more realistic things within certain colours are confined conditions such as a game of chess one anne and deep blue was able to gary casper off chest and more recently being able to at google’s arthur go and beat up the game that is the hardest game for humans to play apparently more recently and then even more recently than that i believe that the reason and he either learns the game of chess from scratch and was still able to beat the best chess pastures out there either human are machine in record time so um this is where the grey area is when i’m talking about tara but we are talking about some very smart machines right and i think that one of the differences here is specialised e. iris is generally so if you think i’ve got a calculator or like you said the machines it had been trained to play specific games like chess or go in i know it will has recently built a machine that they were able to teach it boat butchers and go so that’s a little bit of and all that mean but it’s sir it’s a far cry from the intuition that human possesses today it’s a tell me a little bit about why you picked this topic the hype of i at this particular point in time against it’s a case of um we’re hearing a lot about it in the news and there’s a lot of excitement particularly from the scientist side of things all possibilities but also a lot of fear as well what people think about their jobs and have seen numbers about how many jobs may be automated over the next twenty years there’s a lot of concern about that so one of the things that we need to do we talking to our clients is reset that and say yes you will probably see automated trucks on the road and five years but no it’s not likely that medicine is going to be on was the next twenty years so those sort of things make it for nick this a very interesting time and so that’s why i picked up.

    Segment [508.49 sec.-522.32 sec.], Speaker 004
    And you mentioned you really really you feel like the concerns about the piece of learning are hybrid now will become a reality if that will be thirty five isn’t it.

    Segment [524.28 sec.-529.86 sec.], Speaker 003
    Um yeah i guess so is is that proven.

    Segment [531.64 sec.-610.79 sec.], Speaker 011
    Case where people tend to underestimate when things can be done over time and that experts underestimate the dreamers over estimate but the experts to undress what can be done so we were living in arthur c. clarke and william gibson time’s right now and so i do believe that it’s out there i do believe that it is it’s possible probable will happen and it may not be perfect but there’s a lot there’s a lot going on and what you may have just heard the background is my google assistant kicking in i’ve got me then aye that’s cellar oh qualities i don’t understand i had it just sing because time i here absolutely there you go o. k. signs there’s no issue off something you mentally i’m going to try to put some insurance together for this in a to a pipe see and hear us he is a philosopher for an answer it’s been a good amount of time talking about he ain’t some of the things that you’re talking about with with chess and go in the see he touched on our recent one of his pockets all into to that if anybody’s interested in listening doric tell me a little bit about what’s going on in this space ring out for you.

    Segment [610.97 sec.-634.21 sec.], Speaker 006
    So um ones the one of the interesting realities that we have to face is that um were all within a lab oh aye ayes certainly possible it’s being used in very specific circumstances very constrained conditions but in the general case for most oh.

    Segment [634.51 sec.-763.35 sec.], Speaker 006
    For most users in most businesses are simply not quite ready yes to utilise it i so google can do things with voice recognition it’s very smart there microsoft is getting really good image recognition as well um the car manufacturers are getting pretty good at putting thomas vehicles are introducing them to a different level so you may have assisted cruise control for example but in very specific circumstances and it’s going to be a leap to to actually bring that into the more general sense of business were he notes most of the work and decisions have been done by by people these days and one of the realities of his coming to bear is our days are simply is not the area and by data i just mean regular dear i mean the information that’s that’s produced by processes um or maybe produced by now since and there are coming out the systems and it is if if you are aware of that the work is scientists do right now they often complain that over half the half the job is just cleansing dated cleaning date it and it’s not just that the days it is dirty and it may be from generated with a certain perspective in mind so for example it may come from a sealed system the say system is good at knowing what was sold it may not be good at knowing what was purchased it may not be good at knowing what the credit worthiness of the customer was or who they were buying it four and so it’s really important to know where the diggers coming from um some did me just peel off for environmental reasons for mum to not be correct and this is very important the internet of things um and in other cases and very importantly for me i think the long run is that somebody do
    it comes with inherent bias and a good example is um there was some say done.

    Segment [763.74 sec.-919.3 sec.], Speaker 006
    Avon the care of of of patients in hospital with certain conditions and they found that those that had asthma i had better outcomes and the simple reason was not because they were less secure and certain type patient but it was that that when they came into the hospital they were being treated differently right from the outset because they had asthma they were they were being treated much more carefully for for when they were diagnosed with influence so they had a lower mortality rate than the regular population who was who was it was diagnosed with its influence and so if you just look straight up you you would think that it’s o. k. to give that those with with as less care because they have a lower likelihood of of of of passing when in reality they take extra care was already built it so that’s us an example of bison so it could be dirty it could be off because of the way that was captured or it could have inherent spices in it so that is one sort of reality that we were dealing with um the piece of it as well as that when you try and bring different pieces of data together they don’t always match up just right and so um that’s another classic problem that you know you and i were talking earlier today with a client of ours there is now string of days having different perspectives on the data is is critical to this as well so that she can make the proper decisions and oftentimes this is um this is something that the people are work the dealer building or they may have just you know they may have got idea of how things should go based on their own experiences and pass it was a very difficult again to to codify into business rules so all that taken into context texan of course that was the joke that you know there’s no chance of was will take over the world we can’t even work out on coffin sick so yeah so could it could you know all these things were to go into that yes there’s great promise and yes there are specific cases of how it’s working very well for us today but in the more general sense and there’s still a lot of work to be done and it’s not until two businesses are able to generate good quality data that they’ll be able to make good quality decisions.

    Segment [919.73 sec.-942.57 sec.], Speaker 004
    Thanks at there you think your honour i’m gonna read say and sharon seen so what you say is there is there’s almost a world of basic blocking tackling that we have to do with our seed of data to get it ready not only four and ever said to be questions that people may want to ask you ringing on and on table two but one of the other challenges is how do we.

    Segment [942.79 sec.-953.48 sec.], Speaker 004
    Track dealing cash now for questions that we don’t know we’re going to ask in the future and so i think that’s a really interesting piece to think about it if you any thoughts on that.

    Segment [954.01 sec.-992.61 sec.], Speaker 003
    Yeah um yes i’ve got a traditionally dealer whereas it back round on data hoarder by nature as much as much data and admire as much history to that date as
    i can and oh it is it is true that it’s particularly bringing that historical aspect things change over time you may introduce extra products maybe more countries created or less currencies available of the theft throws everything out of whack a little bit and all that basic block and tackle and is can be challenging him yeah you’re right that um that is his his be.

    Segment [993.54 sec.-1015.33 sec.], Speaker 009
    It is difficult to know now how the information is going to be used the future to and therefore how to avoid the challenges come forward so i think the basic rule is going all eyes wide open and one of the key qualities of a deed of scientists is to understand their domain and therefore understand dealers were.

    Segment [1015.68 sec.-1022.41 sec.], Speaker 004
    So tell me a little bit of doubt what if we use cases for rear fang chaucer squires.

    Segment [1023.73 sec.-1026.68 sec.], Speaker 009
    For free i am.

    Segment [1026.76 sec.-1102.69 sec.], Speaker 009
    It’s still reasonably straightforward stuff so um is is get the work of of um quantities of them analysts as they may have been call at one time or dear the scientists now our statisticians it’s it’s looking for specific patterns in the data again that because of by a stupid book up something that’s happening within maybe markets or something that is happening on the website so for example um very typical use case is the credit decision process in deciding whether somebody should get a credit card and after a alone done what i ideal rages for that arm another example is potentially looking for that intrusions on the website looking for parrots and to see if he’s coming in from a strange part of the world that your customer has never visited before is unlikely to visit aunt and fraud of course there’s another big area to to try and identify the patterns of spending have have changed significantly enough to comment toot toot toot throw up a red flag and contact the customer and ask them what is this really you.

    Segment [1102.83 sec.-1165.08 sec.], Speaker 009
    So those are some pretty pretty straightforward examples based on you’ll notice that all three of them are based on the soil idea of pattern recognition and trendy looking for out liars no calling out who’s out liars um unfortunately this is where it gets interesting again the inn in these cases um there are parties out there that will try and gain the system they will um they will accept our fake identities so they can apply for credit cards and they’re not just doing it to smile nominations is thrown to the next they’re not just doing it once or twice he doing it many hundreds of thousands of times per second so they can try and gain the system and find loopholes and and try and get through quickly and so for example if a bank which traditionally tries to lower the
    barriers to its customers for spending money um they will issue a credit card that having gone through a fall.

    Segment [1165.16 sec.-1246.52 sec.], Speaker 007
    Credit check process they’ll say you know all we know you want to spend the money right now we know you’re going to spend it very quickly um so we’re gonna issue it to you right now and then we’re going to follow up and it just make sure everything’s good um so these bad bad parties can take advantage of that and apply for a credit card to get approval spend the money on before the crash the cars shut down once he find out that’s you know that’s not a real person um and this is is yeah what is there in the pinks have a huge motivation to do that because if they don’t true cos she’s going to go to the next institution right yeah correct i mean so for example you mister be standing in a best buy um register with a three thousand r. t. v. that you’re just itching to bring home and they can to tell the difference between that and another and another was some sort of hacking consortium and attempting to to problem so yes very difficult um get an extremely smart with the way that they do these things and when organising when using on operation a i can they can turn it back against itself and down so that i was his pupil humans are suffering because er because of the people using our tools against us.

    Segment [1247.33 sec.-1255.13 sec.], Speaker 004
    So when you ask you a question i want you to touch on robot process on a mission or pied and how that differs from me i.

    Segment [1255.76 sec.-1266.45 sec.], Speaker 004
    But when you know when does the line start to blur there because you know one could argue that basic work here is you’re giving it a task or apparently you wanted to perform you.

    Segment [1266.65 sec.-1277.81 sec.], Speaker 008
    But you would you call that basically i were you think there has to be deficiency sierra yeah i think i think there’s a difference between um.

    Segment [1278.67 sec.-1371.87 sec.], Speaker 009
    There’s a difference between automation of decisions and artificial intelligence um and it’s a very good question saw a. by process on a nation basically is the oh more advanced form of what we used to call flow tools and and it’s more advanced insofar as it can do a lot more it can not only incorporate the manual processes that we know of all but if you actually some reaching into systems and do things with screen scrape that weren’t possible before it is it is just it at its next generation it’s not revolutionary my mind oh but where does get interesting is is the level at which you can incorporate this and some artificial intelligence in there but um what we’ve seen is that it’s still pretty it’s kept separate saw p. a. will have certain rules in their business rules a little run like a rolls engine to get those things done it’s very strict and he doesn’t make its own decisions everything has to follow that pattern that
    have been programmed into it but when you get into um a iron and deep worrying machine learning where you’re using think is structure such as neural networks and to help you make decisions and it’s not quite as black and white that component is normally bolted onto the r. p. a. at that stage so it will go through the process of arm.

    Segment [1371.95 sec.-1471.95 sec.], Speaker 007
    Doing everything for her mortgage application is a great example so will go through the process of collecting all of the information it will send out emails and record and look for responses back will ask you to send documentation the ducking he shall come back it me use optical character recognition of some sort of other and for to look at that documentation then process and certain way but it’s not until he gets that stage of actually making a decision based on a laurels um very balls are so i very type of decisions and looking at other much more complicated information as people tended to ken and to then they could make a decision and it jumps back into the market cross again with its decision so i’ll give you i’ll give you a good example here of um of what to what level we’re talking so your pew sir o’clock forward and do it step step step step step cos the a. i. side because of its power and breath and the things they can breathe bringing it can be considering more things than just i don’t know your income it could be considering the amount of time that it took for you to produce documents it could be considering um any number of variables that are shown too too have an impact on that decision um in marketing it even gets more interesting because if you were if you go to our website for example arm you can do something called motto very testing the website that world hope you understand.

    Segment [1471.95 sec.-1597.92 sec.], Speaker 010
    The best wait to present things to certain people that you may have never met before but through their clicks through your website you can start making to terminations has to offer of cohort they fall into one say fall into and you can have an impact on whether they will buy or not called going through the funnel in a website and the collar of a button or a particular image that shone to them or particularly out it can have significant impact on whether or not to going to buy a classic example starter lawrence played with the colour of a button and to offer a baggage it was baggage adults like seventy five bucks they would deliver your bank directly to your door so you wouldn’t have to wait at the airport um and they found there was er a one percent uptake in people taking that offer because they choose the coral the button that one per cent for delta airlines is tens of millions of dollars and that’s just from change the colour of the button so any idea what you’re reduced size it worked on there what’s going to impact the that that the uptake of the people if you want you know what’s the to people flashy clicking on that button and it also helped with the other side of it with his which is how come i had to identify people are going to be receptive to that signal to make that mash do it sort sort of work into areas and that’s um that’s worse of you go beyond just or p. a. and go into very smartly segmenting somebody and then show them something that’s very smartly gone to help them do what you are looking for just increase revenue be interesting to think of the
    canna example of that we’re someone might use this knowledge to actually discourage someone for making a purchase like if they’re about to reach capacity on a flight or something they start discouraging that in the hopes of encouraging someone to take a different why for and on to buy a similar but slightly different product it is has more star or more cycles to you.

    Segment [1598.55 sec.-1603.13 sec.], Speaker 009
    Oh yeah that does get more complicated because.

    Segment [1603.5 sec.-1669.48 sec.], Speaker 007
    There are there are various things and play just like going into a store and days the assistant says i’m sorry we don’t have blue socks but if you go next story i’m pretty sure they have blue socks in the short term and they’ve lost that sock sale but in the long term they may have increased loyalty so that was a very that’s a really good example of um i was sort of a soft number that you’re making a bet on we’ll play off in the long run and sold as this idea that a lifetime value for customer is more it is is uses this concert of loyalty it’s not just about the revenue for an immediate self respect his lifetime value so um that goes into zoo at this i was i was he some psychology behind that um there’s also a partnership o. i. d.’s as well so that we can sell it to you but somebody else can and so yeah this is when it comes becomes very complicated and not only from the day prospective but you know if you’re a company like amber’s on erm you were you’re you’re.

    Segment [1669.86 sec.-1726.21 sec.], Speaker 007
    There are potentially erm big impacts to doing stuff like that so and answers we cats are to you but we can and they know they can fly that merchant with requests the marshes going to feel are the she has bad reviews recently then how does that impact of the clients experiences for the royalty to you so that that could be it becomes challenging but up i was on and glass brother because it’s a very interesting example you know that i was on the only real way that you can see sun exposing its its use of such things as and i know of two areas one recommendations of products and whether the weather is deeper and i’m not exactly sure if it’s more than other people like you things like this out but the arty put a lot of work into it needs to simplify it to that idea of recommended products and the second thing is that he’s around um.

    Segment [1726.47 sec.-1831.44 sec.], Speaker 008
    The shows that they recommend and that flicks famously runs oh competition to try and improve its recommendation engine which to me seems a lot of work for something that isn’t still isn’t very good erm you know it’s it’s it and throws in all sorts of things as well my kids wash you know some kids show and suddenly i’m getting recommendations from work consumers i don’t you know since he clicked on the current profile on the mary or you throw one kid’s toy in your right in your court in new recommendations are all there they were the explorer for the next six months right but but we’re dealing with very smart people here were dealing with very smart companies and so i think it’s indicative um yeah sorry one as reality checks that you have to call sometimes inclined to say it was so simple don’t you think that somebody
    bigger than you would be doing it right now oh so that’s that sort of like you know that one of the things you sometimes you have to say is yes it’s a great idea but um he’s just too difficult right now aren’t we advise you to to hold off maybe think of difference plan the planet that technology so you’re touching on some challenges here inside do want you to go into it on this let’s take the example of work is on so for instance i met my wife and i we use one account for her entire family and i would be interested to know your take on do leader purcell activity in divided into different segments of profiles like you know maybe i do most of my purchases between nine p. m. and eleven p. m. at night where is my wife does most of hers between twelve and one o’clock.

    Segment [1831.7 sec.-1833.55 sec.], Speaker 012
    You know how much.

    Segment [1833.96 sec.-1971.18 sec.], Speaker 003
    Power do we have rain out and builder’s segment asians even within one warden in where do you see people doing that there were some challenges with that they’re not generally do that today and there are simple challenges i’ll give you an example oh there it is again in banks want to keep things easy for the customer so they here and there are banks that will make it easy for you to sign up for a joint account and in doing so they say o. k. just get personal data to take in their name oh click the box yes i agree and then get person beats type in their name and click the box yes i agree that building you’ve got a joint account with a problem with is there’s no guarantee that it was actually two people that did that it could be the same person doing it twice and that is quite literally there and compliance because it is required by law for you to have a contract the signature suckle soon a your car as a contract with your customers and if you can prove that there were two costume separately saying and then you’re you’re a compliance so that’s a simple example of a way isn’t it sorry sorry challenging um the there are certain signals that can be used for she not very reliable so i should talk to me sees about such things and how how to do this and it’s just it’s simply and too difficult to to make things oh hello genius certainly in other words make it a nice little split between one group and another group and it just it it’s too hard and me mean not be you and your wife for me that you may be purchasing for your friend she may be purchasing for a friend’s daughter you may be percy if your dad and so it’s too easy to make assumptions about it then so yeah we’re still faced with the you look for that watch and hammers on and then everywhere you go from that point forward the next three months you got to see ads for watches all the time and only watchers almost the point that you feel coerced into buying a watch but that’s exactly maybe what they want so and it it’s it’s still very it’s still very difficult because the world is more complex place them down and computers can quite understand.

    Segment [1971.32 sec.-2051.9 sec.], Speaker 008
    Well since i’ve been thinking about through another when you’re away so one of the challenges i have is that we like to use one in as i can’t like you say but my wife kept finding like she’d emails else yesterday give her for her birthday and things like that so yeah i started my own account in a row i
    buy everything in that account she’s under strict instructions not to open evening that comes addressed to me that when i walk into that account there’s nothing on it that i’m ever going to want to buy for myself for a recommendation so it’s like you said my facebook is littered with you know things like she’s remained a racket ball in so it’ll be everything for wrecking ball worth a racket washes and i couldn’t care less about something like that so that the the the seventies is how we twist it for me yeah what did you do bring up least think so which is which is that they’re using these methods for trying to have somebody self select themselves and so um yeah different counts and her family accounts and family are accounts from now as well because there is protection for for underage um individuals online and so they have to be treated slightly differently and so you you know it’s it you he it’s a necessary for you to say um.

    Segment [2051.94 sec.-2138.06 sec.], Speaker 003
    This person is is i want dinner’s myself and and adults that can do things for herself and her financial responsibility um whereas my child you know me log in and there they can’t be marketed to um they shouldn’t be able to hear explicit music but stuff yes anything engines having those kinds of filter her lenses this you yeah yeah so yeah but but when you set it set up this family counts now ich ich as with everything else to get the very clever so if you use the microsoft ecosystem for quite a while you’ve had this family concept where i can gift um money to my son so he can purchase schemes but at the same time i can also call family safety i can also lock down his account so he cannot be you will not receive advertise means he will not receive invitations to become a friend he will not receive chats he won’t be able to go online and play any games at a rate of about fourteen and so i like as a parent i really want that but on the flip side microsoft lot are getting a really good picture of who my family is how old my kids are and what they can what they can do with that information is is limited sometimes but um does give a lot more signal to two what’s going on in the real world.

    Segment [2138.19 sec.-2144.56 sec.], Speaker 004
    So you’re touching on something today that wrote down on those you to talk about from.

    Segment [2144.62 sec.-2216.25 sec.], Speaker 012
    You saying some things that were going on the space and you talking about a different pattern recognition to do things like longer conditions and things like that and you’re starting out a touch on the ethics behind me i and i do want to spend a minute talking about that is that something that i think is very concerning because you know some of the things we talking about what on as doesn’t recommend me that’s more of a convenience challenge for me bye if something is going to recommend or exclude me from a world or you know there are some albums out there now that are being used to decide who makes for things like that it’s arson theory significant ethical clock consequences there right then depending on what the model is it’s a neural network forces just a simple pattern recognition you have varying levels of ability to understand why contribute to that decision in so what are
    some things that you think you know for clients and just even in general a philosophical standpoint what worries you in the space so there’s creasing concern from people so keen on mask is probably the standard bearer for arm.

    Segment [2216.46 sec.-2235.86 sec.], Speaker 003
    Join concern very high in a general sense and he’s talking about everybody being to go to morrison’s that’s an interesting he’s office think that if we’re ready you dozy hole k. well yeah most my ten year old son said something about the matrix last night in such a context but um.

    Segment [2236.32 sec.-2260.1 sec.], Speaker 003
    He don’t they do have to he does have very good points and you’re right it’s about blondes oh there is another in with it thank you as concept of red lining um where they basically do things erm it’s like districts map of these districts and they say the dish to have to be one bag per district and the reason they have to do that is because of banks.

    Segment [2260.71 sec.-2340.41 sec.], Speaker 003
    Could decide where they put their own branches they’d only ever put them in high income areas high high income a low bankruptcy areas because those the customers have they what um but they are banking is at some levels a social right and so the banks are required by law to to build their branches in under certain areas to provide banking and services such as the ability to simply have a safe place to put your money and so they require a lot to do that again if you put an a. i. on that just like you said it would probably not do that they would just put everything in all banks and i think marie’s so why is also what how do we deal with that today is we to do without through guidance in compliance and regulations and this is what you are muslim and his fans are saying and i agree with that is that we have to think now about the regulations that need to be in place to make sure that the these these albums that are used are not these are not simply cold logical albums they do take um the the ethics and and considerations for different for diverse populations into place into play and so i absolutely agree with that that.

    Segment [2340.42 sec.-2351.83 sec.], Speaker 003
    They’ll need to to to have that there and this is it’s for looking yes but today it’s possible to two to do these things down again you know.

    Segment [2352.26 sec.-2435.53 sec.], Speaker 003
    My son goes o. k. goo and o. k. cool comes up and he’s able to trick isn’t it to responding to him um we’re not there the eyes not there yet to be able to be trusted to make sessions but it didn’t it will be at some point in time and when it makes the decisions it has to be for good and should not be able to be abused or should not be able to two and disenfranchised certain certain groups of population the the piece about this is you know i talked to see larry in and people come together in my view of of the way things the two progress he’s through he’s through a teeming her augmentation for now
    i does need to be supervised and i think in the future it will still continue needing to be supervised at different levels until we’ve worked out of the kinks and i don’t believe there will be a case worry i will be truly truly a timeless and be able to make all all decisions i think there’s if you do look at necklace is a series called altered carbon i’m only doing the first episode of there yeah so you’ll see that there sonny eyes in there that are better the pride themselves and be able to manipulate humans and and does not worry wannabe.

    Segment [2436.31 sec.-2450.71 sec.], Speaker 004
    So would you say that brain now from an ethical standpoint do you think we’re keeping pace with the ethics concerns or do you think we’ve gone over skis all a bit weird he capabilities are outpacing.

    Segment [2450.78 sec.-2463.79 sec.], Speaker 011
    My from the store the ethics well you brought a proper parole example and it’s very um it is a very good case and we need to keep looking out for these um i think it is.

    Segment [2464.33 sec.-2504.25 sec.], Speaker 007
    Pass it’s so if you take it out of the picture altogether you can also remember the banks are fighting regulation and being imposed on them anyway and they just they want to do that what they wanna do and um the challenges that the ice is commit enable that to happen quicker so you see flash crash is happening because we have a rhythmic training systems that make decisions much quicker than humans can stop them and so i think that erm it was a cases of our schooling or skis absolutely and i think these.

    Segment [2504.77 sec.-2533.15 sec.], Speaker 007
    Events i’m going to be important to remember that we can try and avoid them in the future but we’re humans we make mistakes and we’re going to build machines that make mistakes and it’s always happened and always will happen so we do have to train off you know what we can do a sometimes it i sort of think that too there’s a couple of er analogies you can use with fire and one is that um.

    Segment [2533.44 sec.-2569.79 sec.], Speaker 007
    We fire is good for for cooking food but is not controlled properly can burn your hamster so so but we still use it we still use fire we still weak because it has certain benefits to us um and then the others are analogy that around fires that um again we can use it for cook for cooking things and keeping us warm but it can also be used as a weapon and that’s where we talked before about the i ivor say i if it helps good people do good things we can hope that people do bad things to you i really like that you feeling thankful i’m going to seal that one from you.

    Segment [2571.42 sec.-2616.39 sec.], Speaker 008
    So when you sum up time doesn’t child is your ethics of bias in regulation are definitely ones that i i feel like ferns and even individuals are coming into contact with those inherited always buy back to the high part of this tell me what some of the places are a world where he eyes just incredibly over hyped like people are saying that it’s going to be fired but it’s really just a spark from the lighter that’s nice erm it’s so i think it’s i think it’s in there in that again i mentioned some people being overly concerned with the chart at this point in time i don’t think i don’t think we’re quite there yes um zoo didn’t do that.

    Segment [2616.75 sec.-2632.61 sec.], Speaker 003
    Machines taking over the world terminator it’s not quite there yes um but it is cooked it haven’t you have to be mindful we have to get ahead of this um so there’s to of the negative side and then that positive side of it is.

    Segment [2633.82 sec.-2702.04 sec.], Speaker 003
    He’s in terms of what people think can be done today isn’t he’s just isn’t possible it’s o. i. don’t we have erm hey why don’t we have a robots that can down come in and help take care of elderly people which i think is a black mirror but the way we just have that today i don’t just when were just craving hate me i am robot thing that does this it’s fantastic to give me two and and are like is a simple case of i’ve got some cindy the can keys get get on your machine learning things to fix it for us saint we up the area so i think he’s the sort of the coca cola things considers the name of side the positive side of the fear in that promise are not quite aligned with three hours here and there but there is the other agent we haven’t talked about here are the vendors and the few tourists who it’s with in there it’s within their their own interests which hopes this thing up a little bit too much and so um i think yeah i heard this the start about.

    Segment [2702.82 sec.-2744.91 sec.], Speaker 003
    If you add the word block chain it’s year tear or cryptic currencies is the name company and it is ten times easier to get funding um and so it’s that sort of thing where people don’t want to miss out this fear of missing out right and we don’t miss out those hyped up by certain people who were who were you wanna sell newspapers on i was sell product on sale software they want to get funding for the company and um actually they had little bit too much er so i think it’s again week we are now living in a world of arthur c. clarke and william gibson um and.

    Segment [2745.11 sec.-2761.02 sec.], Speaker 008
    We were probably underestimating the power of what can be done a long time but this structure just we go keep great reality check your urine reminding me other company you with the name but they were like a mini fashion company listed on.

    Segment [2761.06 sec.-2844.15 sec.], Speaker 004
    The stock exchange and they were at the very bottom about to get a d. with sting if they didn’t keep their market valuation over now so they re branded as a cricketer currency trading company in there you instantly shot up and gavin back into the round when they needed to be anywhere else in there for about a week before people caught on and said how are these are these folks are we doing manufacturing the gonna make it in’t crib difference trading and if i start phoning back to white year to work the afford to they must but i think the mayor should be under investigation by the s. e. c. so i’ll add one thing here i think some of the promise that people are incorporating and their products of court and corey i and i see two issues here i can cyber security and in some of the day and then my personal fitness products for instance it seems like every company now has some kind of quorum call d. i. feature but i think there’s things that you can argue about whether it is or is not ee i. and b. what use is it like it like what we said before people aren’t even good enough to do the blocking tackling nerve see edie edie nor are helping me to support their fitness goals to do we really care about you if there’s any idea recommends who they should be friends with there on a fitness tracking website.

    Segment [2846.32 sec.-2868.13 sec.], Speaker 008
    So i’m kinda curious like this but it seems like there are some people that are just brilliant you his features in calling to me i media not here they may not have any nine um yeah yeah o. k. labels i mean in some sense yeah maybe artificial intelligence um but again.

    Segment [2868.34 sec.-2959.9 sec.], Speaker 003
    You know going back to anything that could be on it in some way ah ah could be could seem to be oh smart so we’re talking at all this and i think it was marvellous came out and wired about a machine that was beaten maturing test for eighteen minutes and if you don’t know the true test is erm is that measure of a eyes out so we can have a conversation with the machine i think it’s human um and so according to that were already there but look at all things can do so it’s it thing gets mark ah ah ah ah just computer chips get faster we get more memory he gets more and more access to data so they can learn more um but um you know if you scale it back in there in the physical realm you’re looking at insects and can you say hot appointed insect has become intelligent are hot plants and animals become intelligent and as different measures that and so erm i think you know in learning busy world a world that’s sounds as labels and i think people will focus on um what it actually does what it does it is it it’s a spectrum thing right it’s not it’s not black and white is a tiara isn’t it is it molly high for me the measures is our fisherman and artificial ones are making decisions right incisions artificially incisions to me it’s a it’s a sort of a hurry.

    Segment [2961.34 sec.-2966.09 sec.], Speaker 012
    What are you most excited about in the space ring ow ha.

    Segment [2966.53 sec.-2970.62 sec.], Speaker 003
    I’m really excited about our.

    Segment [2972.23 sec.-2998.05 sec.], Speaker 003
    This revolution i do see it as a revolutionist i think the world economic forum as market as the fourth the fourth industrial revolution um i’m excited that this is going to arm provide a lift to what we do are beyond anything that we’ve seen before so um.

    Segment [2998.75 sec.-3116.19 sec.], Speaker 001
    You’re just like oh meeting lou took people out of factories to a certain extent because it was so who happened um but the way that um a flour miller a windmill which is the first case of an automated system um was able to take people out of the mess bills and out of having to grind the flour themselves um in some parts think that in some parts are tribes of to grind flour but way that it was able to live people three people up his most exciting to me so how that plays out with him grip grandeur contest what we’ve been talking about is and this is when you when you’re considering what charge against him what jobs are going to do the jobs are highly manual highly mean you’re i’m going to go and so those people are going to be enabled to do much more interesting things and it did the people that really like to do those things are going to potentially become artisans are craftsmen the people who didn’t like doing them can consider taking on other jobs and it’s not just about intelligence we’ve got remember sometimes it’s about empathy and those of the jobs that i’m going to go anywhere so if you’re a nurse you’re in a great place right now and you will continue to be a great place because um because that’s not something that can be automated that human touch that human relationship could never be replaced by by machine and so so they will be more i think there’ll be more empathy in the world more artists in the world and yell the more machines but those machines are gonna be doing things which we don’t want to be doing anyway i’m us again was your archers and you want to you know make that thing you’re suffering by hand and your people are very very successful and they still have one shops because they like making furniture inspector um but in a gentle kiss in john walker’s case it will lift everybody up and enable us to um.

    Segment [3116.57 sec.-3118.88 sec.], Speaker 007
    Two.

    Segment [3119.04 sec.-3134.84 sec.], Speaker 003
    Have certain things commented and by a. i. and so we can think about the more more interesting cases and um the more strategic cases as well.

    Segment [3134.87 sec.-3303.64 sec.], Speaker 005
    I think you may need to be some cost eerie implementation of these things were even if something tune completely replace the job it was done me before we have some method of transition you know like you said maybe we have an assistance phase in we transition the people that want to get out of that into something else read in just cove abruptly and in saying sorry you’re
    got to retrain into something completely different yeah do it’s not gonna happen that way because because the lucky seven she easy to be trained at some level it is off on missions need to be trained at some lower need to be monitored so this is a whole cycle of take your process manual now reverse engineering understanding how it works sunny what the rules are in there and the people are doing it right now probably want to understand best but then there’s also these people who have never done it before but rains her fresh just blank slates perspective to ask the right questions to say oh that’s cool but how do you do that and you’re you’re something nice for oh i never thought about you know dot about going into detail on how i do x. y. z. anne you going to have that period of reverse engineering period recreating and so all the people that you know that create these things the colder as the engineers will have to create is um solicitors opportunity and cast to that piece and then once at this stood up and running then the past people that were the subject matter experts then get to sit back and admire this thing as as mentors are supervisors and make sure it’s running good for a period of time and and they again i don’t think the irish step away i think there there and there are jobs just become a little bit more interesting we’ll be dealing with the edge pieces a bit more so another analogy is if you walk into a doctor’s office these days these days you’re more likely to see a nurse practitioner who takes care of the eighty percent of the cases because eighty percent of the cases the scene out i know the doctor’s only deal that twenty percent of the more unique or difficult or um um challenging cases and are always there to back up the nurse practitioners in case case question but the doctors get to focus onto a hive that higher value more interesting work so i think the same thing’s gonna happen oh in in in automation artificial intelligence as well there will be significant investment in the transition and then even after the transition does people at war in that position before we just have more interesting work so you talked a little bit about this being something that could open us up to you know maybe slow down a little bit and focus more on the mark of a craft.

    Segment [3303.75 sec.-3322.82 sec.], Speaker 004
    Because we we have this a politician and then we talk a little bit about the d. i. been something nicking assist us and would potentially allow us to do more faster stronger how do we balance between those two in the side like halloween.

    Segment [3323.14 sec.-3379.58 sec.], Speaker 010
    Worth getting to a situation where allows us to slow down and be more mine for reading just forces us to do things even quicker and faster then accelerate the pace of art is more than they are today yeah so that’s gonna be a balance between the organisations that are driving this change in investing in paying for the change and um the workers who are big affected by it um it’s a doubt that’s going to be a bounce there that and i don’t think it is i don’t think he’s an easy way to strike a certain to determine this to be done um the and screw very very specific to circumstances so against consider some real world examples here um d. e. a. company here may use um knitting machines to knit sweaters.

    Segment [3380.04 sec.-3438.21 sec.], Speaker 007
    Because they want to lift them close to where they’re gonna sell them or they want patterns to be done in certain ways and easily available under the machines where there other companies that aside you know what we’re just gonna ship this work offshore and get it done cheap cheaper um in the short run and but a cheaper resources and then get it shipped back here where we’ll sell us and so there’s all sorts of different things and play there that you have to consider i don’t think i think you’ll be every case may be different and will be measured in some regard and um you know we may seek out more unique things and and it will be done by by by people who like to who like to be artists and sometimes we just need to screw we don’t care working from an arm if if if it’s made by a machine that’s cool as long as it’s ready when i need it where i need it.

    Segment [3438.29 sec.-3449.62 sec.], Speaker 004
    I think he’d be interesting to think about the implications of it coming from different providers like come for microsoft forces who all verses as in white the.

    Segment [3449.8 sec.-3483.01 sec.], Speaker 012
    Be as far as quality of life verses utility um you talk you talk about it i have a personality oh little beauty but more so you know i think it was initially the garden but he i think he’s developed is going to be marketed toward use cases that the people that created them are most suited to so i would assume that he eye for facebook would be more socially rented in the eye from it l. u. s. may be more generally applicable to.

    Segment [3483.13 sec.-3484.79 sec.], Speaker 004
    A multitude of.

    Segment [3484.85 sec.-3487.96 sec.], Speaker 004
    He u. s. slash industry.

    Segment [3488.03 sec.-3502.2 sec.], Speaker 004
    Which yeah but but more so how the culture of the company that makes it transfers over into the eye itself either in vice personality ooh in here we encourage you to use the c. i. a. to you.

    Segment [3502.4 sec.-3511.22 sec.], Speaker 004
    To make what why dance better for us as we encourage you to use the c. i. to perform better faster stronger than you could ever before.

    Segment [3511.61 sec.-3521.62 sec.], Speaker 003
    So interesting is interesting sort of concerts i think you’ve been a few different things there so it’s a little bit layering yeah i see that um.

    Segment [3521.9 sec.-3523.31 sec.], Speaker 003
    There.

    Segment [3523.87 sec.-3600.51 sec.], Speaker 003
    The eyes will be the use cases in particular will be based on market demand much man needs and i do think the tools the name may be different um i think certainly based on who implements them um whether it’s a company or a certain set of consultants oh we’ll have an impact on the quality of those tools the quality of the a. i. i don’t eat a lot of personality no i think it’s going to be either a fears quality type of type of detention job ah um the so he panics and probably go this is to is to have something that’s free of bias as much as possible or at least he’s compliant with the regulations that will be imposed upon us um and the nice thing about the iron is that it should become easier to order and to test and um are down we’re working on a way to make um we talk talked deep learning are dark dark marion and andy either where it’s very difficult to see how decisions are made and there are people like you making that more transparent um so i think that’s.

    Segment [3601.99 sec.-3607.14 sec.], Speaker 003
    I think that is is the way that will go like.

    Segment [3607.69 sec.-3657.75 sec.], Speaker 005
    Some t. v. shows you’ll see their architect and people are that are better at doing that than others but again i think it’s on quality and fit to mention i don’t think it’s on her to personality dimension but think that it was wasn’t west world where they started talking about the creators and artistic sides coming through and the characters um it’s er it that’s interesting absolutely but i don’t think we’ve got it there area true and that is another show that even though it’s on my recommendation i have not gotten to that yet not a good shot well i was on a t. v. if it were particularly if it’s anything to do with that is that allergens i have a list of shows that are on my to do list to catch up on all three corrigan is in west brom is another.

    Segment [3658.15 sec.-3666.34 sec.], Speaker 004
    But i i’m overdue for a lot of those today i am from the t. v. seasons.

    Segment [3666.87 sec.-3688.73 sec.], Speaker 012
    So i think you so much for joining me today i do have one of the question for you but for someone ignored you fruit for seamus time with me and sharing your thoughts it’s been a pleasure chatting with ian i look forward to hopefully having some more discussions like this in the future with you course you’re welcome so i’m going to give you the last word here and tell me something about the space that we don’t know.

    Segment [3689.08 sec.-3693.56 sec.], Speaker 009
    Oh my gosh share so much already um.

    Segment [3694.19 sec.-3698.55 sec.], Speaker 012
    Wow let me think of something i didn’t ask you.

    Segment [3698.66 sec.-3700.89 sec.], Speaker 012
    You think i’m a solution no.

    Segment [3703.51 sec.-3714.31 sec.], Speaker 003
    Wow i am absolutely dry blank maybe for heavenly i to back me up who could help answer this question um yeah.

    Segment [3715.03 sec.-3745.65 sec.], Speaker 007
    Here’s an interesting fact is a game called dover to which does competition for million dollar purse and um it was considered to be a milestone when and here i was able to beat that game and beat big she would play and game that surprised me and i get it may seem this interest trivial but to a lot of people it was a big deal so we’ve got something you didn’t know there is definitely something i didn’t know think fish in there.

    Segment [3746.2 sec.-3754.04 sec.], Speaker 004
    All right that is gonna wrap things up for this episode thank you so much for the time and i look forward to chatting with you again soon.

    Segment [3754.21 sec.-3756.98 sec.], Speaker 012
    All right cheers sorry but.

    ]]>
    E024 – Will machine learning kill traditional database indexes? https://www.fortheloveofdata.com/e24/?utm_source=rss&utm_medium=rss&utm_campaign=e24 Wed, 31 Jan 2018 07:47:49 +0000 http://www.fortheloveofdata.com/?p=328 In this episode my friend Vikas Popuri and I chat about Google’s paper comparing ML models to traditional DB indexes.

    Background:

    • Google used learned indexes , machine learning models, to access data and compared these to B-Tree, Hash, and Bloom Filter indices
    • Trained a model using multiple stages where the earlier stages could approximate a location and later stages would work with a subset to improve accuracy. Each stage could choose a different model to advance the search further.
    • FYI, the diagram below looks like a decision tree, but it is not. Each stage/model could have different distributions and could repeat the model used above or below.

    • They achieved access time and space savings across the board, even without using GPUs or TPUs (Tensor Processing Units)
    • “Retraining the model” – the tests were performed on a static data set, so no retraining or index maintenance was required.

    Observations / Questions:

    • Used Tensorflow with Python as the front end — apparently a lot of initial overhead with this as a test stack.
    • B-Tree indexes to some extent are a model, especially if they don’t store every key and instead store the first key in a page.
    • The paper made some rudimentary assumptions, such as using a random hash function.
      • What if the data is not static? How long would it take to retrain the model vs. maintain an index?
      • What if data profiling caused you to index certain attributes and not others?
      • What are the best practices with this newer approach
    • The power of being able to use different models at different stages is intriguing. You could also potentially maintain traditional indexes as a backup / failsafe that would upper bound to the performance of a B-Tree.
    • Load times – The folks from Google commented that they could retrain a simple model on a 200M data set in “just [a] few seconds if implemented in C++”
    • Recursive question: do you need an optimizer to optimize the optimization path?
    • Room for improvement:
      • GPUs/TPUs
      • Incorporating common queries into the model to know what questions people are asking

    Music

    Deep Sky Blue by Graphiqs Groove via FreeMusicArchive.org

    Sources:

    ]]>
    In this episode my friend Vikas Popuri and I chat about Google’s paper comparing ML models to traditional DB indexes. Background: Google used learned indexes , machine learning models, to access data and compared these to B-Tree, Hash, In this episode my friend Vikas Popuri and I chat about Google’s paper comparing ML models to traditional DB indexes. Background: Google used learned indexes , machine learning models, to access data and compared these to B-Tree, Hash, and Bloom Filter indices Trained a model using multiple stages where the earlier stages could approximate a location […] Uncategorized – For the Love of Data full false 23:45
    E023 – 2017 Data Digest https://www.fortheloveofdata.com/e23/?utm_source=rss&utm_medium=rss&utm_campaign=e23 Sat, 30 Dec 2017 07:12:22 +0000 http://www.fortheloveofdata.com/?p=322 This episode reflects on some of the hottest topics from 2017 and their impact their data has on our lives this year and into 2018.

    Cryptocurrency

    Many of these data points come from here.

    • Since the year began, the aggregate market cap of all cryptocurrencies combined has increased by more than 3,200% as of Dec. 18
    • Bitcoin went through the roof, hitting an all-time high of 1 BTC = $19,891 on 12/17/2017.

    • BTC makes up 54% of the aggregate $589 billion market cap of all cryptocurrencies
    • The graphics-card hardware needs of miners has been a big reason why NVIDIA and Advanced Micro Devices have seen a double-digit percentage surge in sales recently
    • Back on Dec. 10, CBOE Global Markets (NASDAQ:CBOE) became the first to introduce bitcoin futures trading, with CME Group (NASDAQ:CME) following a week later
    • 612 new cryptocurrencies began trading in 2017
    • Top 10 cryptocurrencies in 2017 as of 12/29 according to BitInfoCharts.com (pretty similar list on AtoZForex.com):
    Cryptocurrency Price in USD Price in BTC First Trade Exchange volume 24h
    BTC

    Bitcoin

    $ 15,030.33

    +9.79% ($1,340) in 12h

    +9.56% ($1,312) in 7d

    1 BTC

    +0% in 12 hours

    +0% in 7 days

    2010-07-17 100,317 BTC

    100,316.59 BTC

    1,250,728,823.58 USD

    XRP

    Ripple

    $ 1.4

    +11.79% ($0.15) in 12h

    +28.92% ($0.31) in 7d

    0.000093 BTC

    +1.82% in 12 hours

    +17.67% in 7 days

    2014-08-14 462,239,606 XRP

    36,699.78 BTC

    551,610,001.98 USD

    ETH

    Ethereum

    $ 750.82

    +9.3% ($63.9) in 12h

    +12.07% ($80.9) in 7d

    0.05 BTC

    -0.45% in 12 hours

    +2.29% in 7 days

    2014-09-30 784,632 ETH

    34,510.75 BTC

    518,708,138.27 USD

    BCH

    Bitcoin Cash

    $ 2,571.87

    +8.04% ($191) in 12h

    +6.17% ($149) in 7d

    0.171 BTC

    -1.59% in 12 hours

    -3.1% in 7 days

    2017-08-01 209,597 BCH

    33,824.21 BTC

    508,389,215 USD

    LTC

    Litecoin

    $ 255.88

    +11.54% ($26.5) in 12h

    +0.89% ($2.26) in 7d

    0.017 BTC

    +1.59% in 12 hours

    -7.91% in 7 days

    2012-07-13 1,156,615 LTC

    18,070.26 BTC

    271,602,057.82 USD

    IOT

    IOTA

    $ 3.87

    +11.04% ($0.38) in 12h

    +2.12% ($0.08) in 7d

    0.00026 BTC

    +1.14% in 12 hours

    -6.8% in 7 days

    2017-08-30 37,838,946 IOT

    9,288.14 BTC

    139,603,822.48 USD

    XMR

    Monero

    $ 366.67

    +7.1% ($24.3) in 12h

    +9.42% ($31.6) in 7d

    0.024 BTC

    -2.46% in 12 hours

    -0.13% in 7 days

    2014-06-04 275,568 XMR

    6,354.54 BTC

    95,510,916.21 USD

    DASH

    Dash

    $ 1,126.09

    +10.81% ($110) in 12h

    +2.5% ($27.5) in 7d

    0.075 BTC

    +0.93% in 12 hours

    -6.45% in 7 days

    2014-02-20 85,797 DASH

    6,073.26 BTC

    91,283,097.97 USD

    XVG

    VERGE

    $ 0.168

    +41.89% ($0.05) in 12h

    +60.02% ($0.06) in 7d

    0.000011 BTC

    +29.23% in 12 hours

    +46.05% in 7 days

    2016-02-18 606,321,139 XVG

    5,940.19 BTC

    89,283,020.8 USD

    ICX

    ICON

    $ 5.73

    +4.78% ($0.26) in 12h

    +183.99% ($3.72) in 7d

    0.00038 BTC

    -4.56% in 12 hours

    +159.21% in 7 days

    2017-11-11 13,061,177 ICX

    5,072.57 BTC

    76,242,347.41 USD

    Data Breaches

    1. Equifax – 9/7/2017 – 143mm US consumers affected
      1. Stock plunged nearly $4bn in the aftermath
      2. https://www.equifaxsecurity2017.com/
    2. RNC Voter List – nearly every registered voter, ~200mm Americans
    3. Yahoo’s 2013 breach revelation – affected accounts went from 1bn to 3bn
    4. Uber – 57mm user accounts and drivers, paid to keep it under wraps
    5. 560mm Passwords – a massive list of 560mm credentials compiled into one database of breaches from at least 10 services

    You can check if your account is part of a compromise at have i been pwned or SpyCloud.

     

    World Affairs

    The World Bank has a fascinating article with 12 charts covering food assistance, climate change, education, nutrition, elections, energy and a tribute to Hans Rosling, who made us see the world in new ways with breathtaking visualizations.

    Other Data Tidbits

    • Most popular Instagram Post: Beyonce – https://www.instagram.com/p/BP-rXUGBPJa/
    • Most retweeted Twitter Post: Carter’s quest for Wendy’s Chicken Nuggest – https://twitter.com/carterjwm/status/849813577770778624/photo/1
    • Oracle bought API management firm Apiary. Be on the lookout for how that evolves for the tool and for Oracle
    • RPA saw continued growth and implementations. Expect more in 2018.
    • Kubernetes is becoming the de facto standard for container management and was upgraded to Adopt by TechRadar. Expect it to continue to gain steam and start influencing data solutions more in 2018.

    Music:

    Auld Lang Syne by Fresh Nelly, from Free Music Archive.

    Sources:

    1. https://www.coindesk.com/price/
    2. https://www.investing.com/currencies/btc-usd-historical-data
    3. https://bitinfocharts.com/new-cryptocurrencies-2017.html
    4. https://atozforex.com/news/top-10-cryptocurrency-2017/
    5. https://www.fool.com/investing/2017/12/19/16-cryptocurrency-facts-you-should-know.aspx
    6. http://cryptocurrencyfacts.com/
    7. https://gizmodo.com/the-great-data-breach-disasters-of-2017-1821582178
    8. https://www.equifaxsecurity2017.com/
    9. http://clark.com/personal-finance-credit/equifax-data-breach-a-look-back-at-our-biggest-story-of-2017/
    10. http://beta.latimes.com/business/hiltzik/la-fi-hiltzik-equifax-breach-20170908-story.html
    11. https://haveibeenpwned.com/
    12. https://www.instagram.com/p/BP-rXUGBPJa/
    13. https://www.usnews.com/news/national-news/articles/2017-12-12/twitters-top-10-most-retweeted-tweets-of-2017
    14. http://www.worldbank.org/en/news/feature/2017/12/15/year-in-review-2017-in-12-charts
    15. https://www.youtube.com/watch?v=YpKbO6O3O3M
    16. https://www.informationweek.com/strategic-cio/digital-business/2017-year-in-review—exponential-automation/a/d-id/1330648?
    17. https://www.thoughtworks.com/radar/platforms/kubernetes

     

    ]]>
    This episode reflects on some of the hottest topics from 2017 and their impact their data has on our lives this year and into 2018. Cryptocurrency Many of these data points come from here. Since the year began, This episode reflects on some of the hottest topics from 2017 and their impact their data has on our lives this year and into 2018. Cryptocurrency Many of these data points come from here. Since the year began, the aggregate market cap of all cryptocurrencies combined has increased by more than 3,200% as of Dec. […] Uncategorized – For the Love of Data full false 23:57
    E022 – Tech Spec – Tableau Project Maestro Data Prep https://www.fortheloveofdata.com/e22/?utm_source=rss&utm_medium=rss&utm_campaign=e22 Thu, 30 Nov 2017 10:34:43 +0000 http://www.fortheloveofdata.com/?p=308 Zip file of all the sample data, Maestro flows, and Tableau workbook I used to get a first impression: E022_maestro_demo_files.

    Screenshots

    Sample Flow from Tableau

    Field Selection

    Data Profiling

    Filters

    Join Clause

    Refresh / Run Flow

    File Output Options

    Pros:

    1. Has the clean, intuitive feel of Tableau. I did my hands-on test with no training or previous exposure
    2. Lots of features for a first release – joins, unions, type conversion, calculated fields, data connectors, etc.
    3. Easy to click into any part of your flow and see data
    4. Ability to edit inline – much like tweaking an Excel pivot table
    5. Data profiling is a nice visual cue to begin working with data
    6. Ability to sort, filter, rename, add calculated fields anywhere along the way
    7. Great for quick and dirty data prep that you know is heading into Tableau for ad-hoc analysis

    Cons:

    1. Ability to sort, filter, rename, add calculated fields anywhere along the way – this can get messy for others to come behind you to maintain or see what is happening
    2. Reconciliation issues between reports will now be complicated by similar flows doing slightly different things
    3. You have to remove header fields from Excel if you want Maestro to latch onto and display field names from table. By default, it looks at first row and gives generic names if column headings aren’t there (i.e., F1, F2, …)
    4. Can only have one flow open at any time
    5. Performance seems a tiny bit slow on my example with ~13,000 rows. Curious to see how it will perform against larger data sets, RDBMS, and big data connectors
    6. Only outputs to TDE or Hyper formats currently. No ability to save as CSV, XLSX, PDF, or write back to a data store
    7. Unable to source data from a TDE or Tableau Workbook
    8. No reuse of common transformations or logic across different flows
    9. NO community generated content yet – since it is very new, you can’t Google for answers or YouTube videos. Established, mature ETL and data prep tools will continue to have a leg up on this front for a while.

    Music

    Deep Sky Blue by Graphiqs Groove

    Sources:

    1. https://www.tableau.com/project-maestro
    2. https://prerelease.tableau.com/
    3. https://www.eia.gov/electricity/data/eia923/
    4. https://www2.census.gov/programs-surveys/popest/tables/2010-2016/state/totals/nst-est2016-01.xlsx
    ]]>
    Zip file of all the sample data, Maestro flows, and Tableau workbook I used to get a first impression: E022_maestro_demo_files. Screenshots Sample Flow from Tableau Field Selection Data Profiling Filters Join Clause Refresh / Run Flow File Output Optio... Zip file of all the sample data, Maestro flows, and Tableau workbook I used to get a first impression: E022_maestro_demo_files. Screenshots Sample Flow from Tableau Field Selection Data Profiling Filters Join Clause Refresh / Run Flow File Output Options Pros: Has the clean, intuitive feel of Tableau. I did my hands-on test with no training or […] Uncategorized – For the Love of Data full false 22:59
    E019 – Tech Spec – Cognos Analytics (11.0.6) https://www.fortheloveofdata.com/e019/?utm_source=rss&utm_medium=rss&utm_campaign=e019 Sun, 20 Aug 2017 22:09:44 +0000 http://www.fortheloveofdata.com/?p=283 Join me as I chat with my colleague and Cognos guru John Frazier about the latest release of Cognos, leading up to the anticipated release of the next version, 11.0.7, near the end of Q3.

    The latest version of Cognos (11.0.6) debuted on March 21, 2017. You can sign up for a perpetually free trial (like Tableau Online) here.

    Version 11 was originally released in December 2015 and was mainly a UI redesign on top of Cognos 10 features. Analysis and Query Studios will eventually be deprecated.

    New Features in 11 vs. 10

    • New UI – responsive web design on UI, but not on reports
    • Better self-service capabilities and collaboration for teams
    • Upload data files – upload delimited text or Excel files to be stored in a columnar format (Parquet) on the file system (not in memory or in the DB). These are immediately usable in dashboards and don’t require entry into FM.
    • Data modules (intent based modeling based on Watson) similar to FM packages
      • Note: Dashboards only use uploaded files and data modules
    • Available on cloud
    • Mobile and desktop from a single report
    • Active reports as prompts
    • Free cloud trial
    • Admin console is unchanged

    New Features in 11.0.6

    • Mapping enhancements
      • Multiple admin boundaries, add’l postal code support
    • Dashboarding enhancements
      • Direct access to OLAP packages (Framework packages accessible since 11.0.5)
      • Widgets using data from the same source are connected by default
      • New grid widget
      • Color gradient by measure
      • Date filters can include blanks
    • Portal enhancements
      • Share/embed through overflow menu
      • Folder customizations can be done directly through the UI more easily (without uploading JSON configs)
      • Create shortcuts and report views
    • Storytelling enhancements
      • New guided journey templates
      • New animations (side fade, slide, scale, zoom, pivot)
      • Better pins (smart named, better search and filter)
      • Timelines – smart names
      • Change scene template while working on your story/dashboard
    • Reporting enhancements
      • Better lineage support for FM packages
      • Business glossary (w/IBM InfoSphere Information Governance Catalog integration)
      • Better freeze list column heading control
      • Better query support when editing data modules
      • Report templates – can save for your team or save as style reference reports
    • Support for Planning Analytics
      • Dashboard support for TM1 / Planning cubes
      • REST connectivity to planning analytics
      • Support for attribute hierarchies
      • Support for localized Planning Analytics cubes
    • Data server enhancements
      • Support for Google BigQuery and Google Cloud SQL via the BigQuery JDBC and MySQL JDBC drivers, respectively.
      • JDBC URL for Data Server Connections
      • Test connection feedback (this is not just in admin console now)

    John’s Likes/Dislikes with v11:

    • For those who are “used” to ReportStudio there is a pretty “steep” learning curve to locate where particular tools or components have been moved.
    • To be fair, ReportStudio had some counter-intuitive placements for some of these same tools (e.g. Hierarchy of design elements, etc.) that caused major headaches for new report designers.
    • Overall the new interface is more “intuitive” and the novice report developers I’ve worked with have picked it up remarkably quickly.
    • There are some changes that are really “nice” – like being able to see which Lists/Graphs use a particular query right from the query tree without having to “search” for where it is used on the “right click” menu.

    Music

    Deep Sky Blue by Graphiqs Groove

    Sources

    1. https://www.ibm.com/analytics/us/en/technology/products/cognos-analytics/
    2. https://www.ibm.com/communities/analytics/cognos-analytics-blog/the-latest-release-of-cognos-analytics-is-here/
    3. http://newintelligence.ca/top-12-reasons-to-upgrade-to-cognos-analytics-a-k-a-cognos-11/
    4. https://www.ibm.com/support/knowledgecenter/SSEP7J_11.0.0/com.ibm.swg.ba.cognos.ca_new.doc/c_ca_nf_deprecated.html
    5. https://www.ibm.com/support/knowledgecenter/en/SSEP7J_11.0.0/com.ibm.swg.ba.cognos.ca_new.doc/c_ca_nf_11_0_x.html
    6. https://www.slideshare.net/senturus/cognos-analytics-version-11-questions-answered
    ]]>
    Join me as I chat with my colleague and Cognos guru John Frazier about the latest release of Cognos, leading up to the anticipated release of the next version, 11.0.7, near the end of Q3. The latest version of Cognos (11.0.6) debuted on March 21, Join me as I chat with my colleague and Cognos guru John Frazier about the latest release of Cognos, leading up to the anticipated release of the next version, 11.0.7, near the end of Q3. The latest version of Cognos (11.0.6) debuted on March 21, 2017. You can sign up for a perpetually free trial […] Uncategorized – For the Love of Data full false 31:00
    E018 – Tech Spec – Sia, ultimate blockchain file storage https://www.fortheloveofdata.com/e018/?utm_source=rss&utm_medium=rss&utm_campaign=e018 Sun, 30 Jul 2017 03:25:20 +0000 http://www.fortheloveofdata.com/?p=279 What if you could store your data in the cloud, encrypted, for a fraction of the cost of Amazon S3, Google, or Azure? With Sia, a decentralized file storage solution that leverages blockchain, you can. Learn more about how it works in this episode.

    Blockchain Overview

    A blockchain is a permissionless distributed database that maintains a continuously growing list of transactional data records. The system’s design means it is hardened against tampering and revision, even by operators of the nodes that store data. The initial and most widely known application of the block chain technology is the public ledger of transactions for bitcoin, but its structure has been found to be highly effective for other financial vehicles.

    CONSENSUS BUILDING The ability for a significant number of nodes to converge on a single consensus of the most up-to-date version of a large data set such as a ledger TRANSACTION VALIDITY The ability for any node that creates a transaction to determine whether the transaction is valid, able to take place, and become final (i.e. that there were no conflicting transactions) AUTOMATED RESOLUTION An automated form of resolution that ensures that conflicting transactions (such as two or more attempts to spend the same balance in different places) never become part of the confirmed data set.

    Blockchain block detail

    [Illustration by Matthäus Wander (Wikimedia)]

    • Timestamp: The time when the block was found.
    • Reference to Parent (Prev_Hash): This is a hash of the previous block header which ties each block to its parent, and therefore by induction to all previous blocks. This chain of references is the eponymic concept for the blockchain.
    • Merkle Root (Tx_Root): The Merkle Root is a reduced representation of the set of transactions that is confirmed with this block. The transactions themselves are provided independently forming the body of the block. There must be at least one transaction: The Coinbase. The Coinbase is a special transaction that may create new bitcoins and collects the transactions fees. Other transactions are optional.
    • Target: The target corresponds to the difficulty of finding a new block. It is updated every 2016 blocks when the difficulty reset occurs.
    • The block’s own hash: All of the above header items (i.e. all except the transaction data) get hashed into the block hash, which for one is proof that the other parts of the header have not been changed, and then is used as a reference by the succeeding block.

    Why You Can't Cheat at Bitcoin 2. But one miner wants to alter a transaction in block 74. 3. He'd haveto make his changes and redo all the computations for blocks 74—90 and do block 91. That's 18 blocks Of expensive computing. I . Say everybody is working on block 91. 4. What's worse, he'd have to do it all before everybody else in the Bitcoin network finished just the one block (number 91) that they're working on.

    Sia Overview

    • Decentralized network that places encrypted pieces of your data on dozens of notes
    • Aims to be fastest, cheapest, most secure storage solution and compete with AWS, GCP, Azure
    • Users pay in Siacoins, a cryptocurrency like Bitcoin
      • Must go USD -> Bitcoin -> Siacoin -> Wallet -> File Upload
    • Open source
    • Started by David Vorick and Luke Champine through a VC backed Boston-based company called Nebulous Inc
    • Origins in the HackMIT 2013 conference
    • Uses ASICs (application specific integrated circuits) for mining
      • These are purpose built integrated circuits, not general multi-use devices
      • Evolution from CPU -> GPU – ASIC
      • Faster and less vulnerable to attacks than GPUs
      • Why? See here.
      • Created a company to make ASICs called obelisk.
      • ~$2,500 per machine
    • Current price is about 124 Siacoin to $1USD

    Pros

    • Decentralized, peer-to-peer
    • Encrypted and immutable
    • Hosts can earn money by renting free disk space to renters
      • Must maintain 95% uptime to preserve collateral

    Possible Issues

    • Renters uploading illegal content to hosts
      • However, renters would have to pay for the bandwidth leechers use to download files
    • Slow at this point
    • Low number of users

    Music

    Deep Sky Blue by Graphiqs Groove

    Sources:

    ]]>
    What if you could store your data in the cloud, encrypted, for a fraction of the cost of Amazon S3, Google, or Azure? With Sia, a decentralized file storage solution that leverages blockchain, you can. Learn more about how it works in this episode. What if you could store your data in the cloud, encrypted, for a fraction of the cost of Amazon S3, Google, or Azure? With Sia, a decentralized file storage solution that leverages blockchain, you can. Learn more about how it works in this episode. Blockchain Overview A blockchain is a permissionless distributed database that maintains […] Uncategorized – For the Love of Data full false 16:00
    E014 – For the Love of Allergies https://www.fortheloveofdata.com/e014/?utm_source=rss&utm_medium=rss&utm_campaign=e014 Tue, 28 Mar 2017 03:00:46 +0000 http://www.fortheloveofdata.com/?p=221 Achooo! Did you know that seasonal allergies affect about 50 million people in the US, penicillin kills about 400 people/year, and some people are allergic to cockroaches?! Learn all about allergies in this episode.

    A note about this episode’s content:

    Most of the allergy information in this episode is very short statistics that were commonly repeated in several sources. In many cases, I simply collected these statements and presented them below. Unless specifically noted below, please consider all the information as referenced from another source. See list of sources at the bottom of the show notes.

    Allergies Defined

    An allergy is when your immune system reacts to a foreign substance, called an allergen. It could be something you eat, inhale into your lungs, inject into your body or touch. This reaction could cause coughing, sneezing, itchy eyes, a runny nose and a scratchy throat. In severe cases, it can cause rashes, hives, low blood pressure, breathing trouble, asthma attacks and even death.1

    There is no cure for allergies. You can manage allergies with prevention and treatment. More Americans than ever say they suffer from allergies. It is among the country’s most common, but overlooked, diseases.1

    Who is affected?

    • In about 50% of all homes in the U.S., there are at least 6 detectable allergens present in the environment.
    • Nasal allergies affect about 50 million people in the United States. (30% of adults, 40% of children)
    • Odds that a child with one allergic parent will develop allergies: 33%.
    • Odds that a child with two allergic parents will develop allergies: 70%.
    • Allergies are increasing and have been steadily for the past 50 years
    • Most common health issue for kids
    • Percentage of the U.S. population that tests positive to one or more allergens: 55%
    • Females are slightly more likely to have food allergies than males with percentages of reported reactions at 4.1 and 3.8 respectively.
    • Non-Hispanic white children have the highest percentage of reported food allergies at 4.1, non-Hispanic blacks at 4.0, and Hispanic children at 3.1.

    Lethal enforcers

    • The most common triggers for anaphylaxis, a life-threatening reaction, are medicines, food and insect stings.
    • Medicines cause the most allergy related deaths.
    • African-Americans and the elderly have the most deadly reactions to medicines, food or unknown allergens.
    • Deadly reactions from venom are higher in older white men.
    • Over the years, deadly drug reactions have increased a lot.

    It ain’t cheap

    • In 2010, Americans with nasal swelling spent about $17.5 billion on health costs.
    • They have also lost more than 6 million work and school days and made 16 million visits to their doctor.
    • Food allergies cost about $25 billion each year.

    Heyyyyy… Fever (Allergic Rhinitis)

    • Worldwide, allergic rhinitis affects between 10 percent and 30 percent of the population.
    • 7.8% of adults get hay fever
    • In 2010, white children were more likely to have hay fever than African-American children.
    • Global warming may have added four weeks to pollen season in the last 10-15 years

    Ragweed pollen count by year is on the rise.

    Allergies around the US

    Pollen map from pollen.com

    Pollen.com details on Dallas, TX

    Pollen.com Dallas, TX history.

    The Eczema-Allergy Connection9

    Eczema can flare up when you are around allergies. Children with eczema are also more likely to have food allergies, such as to eggs, nuts, or milk. They often make eczema symptoms worse for kids but not for adults.

    • Genes – a gene flaw that causes a lack of a type of protein, called filaggrin, weakens that skin barrier and makes it easier for allergens to get into the body.
    • How the body reacts to allergens – people with eczema may have small gaps in the skin that make it dry out quickly and let germs and allergens into the body. Allergens cause inflammation and lead to eczema.
    • Too many antibodies – people with eczema have above average levels of Immunoglobulin E (IgE), a type of antibody that plays a role in the body’s allergic response.

    Tips to avoid Hay Fever8

    1. Reduce your stress – less stress = milder symptoms
    2. Exercise more – a survey found that people who exercise have the mildest symptoms and this reduces stress, too. However, avoid exercising outdoors when the pollen count is high (early morning and early evening). Better yet, exercise indoors if symptoms are severe.
    3. Eat well
      1. Healthy diets = milder symptoms.
      2. However, foods that can worsen hay fever symptoms for some people include apples, tomatoes, stoned fruits, melons, bananas and celery.
      3. Eat foods rich in omega 3 and 6 essential fats which can be found in oily fish, nuts, seeds, and their oils. These contain anti-inflammatory properties, and may help reduce symptoms of hay fever.
    4. Cut down on alcohol – beer, wine and spirits contain histamine, the chemical that sets off allergy symptoms in your body. Alcohol also dehydrates you, making your symptoms seem worse.
    5. Sleep well = mildest symptoms. People who get seven hours of sleep or more report less symptoms than those getting five hours sleep or less a night.
    6. Get pricked – Immunotherapy (allergy shots) helps reduce hay fever symptoms in about 85% of people with allergic rhinitis.3

    Other allergies

    Skin in the game

    Skin allergies include skin inflammation, eczema, hives, chronic hives and contact allergies. Plants like poison ivy, poison oak and poison sumac are the most common skin allergy triggers. But skin contact with cockroaches and dust mites, certain foods or latex may also cause skin allergy symptoms.

    • In 2012, 8.8 million children had skin allergies.
    • Children age 0-4 are most likely to have skin allergies.
    • In 2010, African-American children in the U.S. were more likely to have skin allergies than white children.

    That PB&J that is to die for…literally. (Food Allergies)

    Children have food allergies more often than adults. Eight foods cause most food allergy reactions. They are milk, soy, eggs, wheat, peanuts, tree nuts, fish and shellfish.

    • Percentage of the people in the U.S. who believe they have a food allergy: up to 15%.
    • Percentage of the people in the U.S. who actually have a food allergy: 3% to 4%.
    • Peanut is the most common allergen. Milk is second. Shellfish is third.
    • Peanut and tree nut allergies affect about 1% of the US.
    • In 2014, 4 million children in the US have food allergies.
    • 8% of children have a food allergy
      • Also, 38.7 % of food-allergic children have a history of severe reactions.
      • 30.4% are allergic to multiple foods.

    Bad medicine (Drug Allergies)

    • Penicillin is the most common allergy trigger for those with drug allergies. Up to 10 percent of people report being allergic to this common antibiotic.
    • Penicillin kills about 400 people / year.
    • Bad drug reactions may affect 10 percent of the world’s population. These reactions affect up to 20 percent of all hospital patients.

    No glove love

    • Only about 1 percent of people in the U.S. have a latex allergy.
    • However, health care workers are becoming more concerned about latex allergies. About 8-12 percent of health care workers will get a latex allergy.
    • Approximately 220 cases of anaphylaxis and 3 deaths per year are due to latex allergy.

    Bug me not

    People who have insect allergies are often allergic to bee and wasp stings and poisonous ant bites. Cockroaches and dust mites may also cause nasal or skin allergy symptoms.

    • Insect sting allergies affect 5 percent of the population.
    • At least 40 deaths occur each year in the United States due to insect sting reactions.
    • Adults are about 4x more likely to die from an insect sting than a kid. Basically, if you still have a reaction when you’re an adult, it affects you hard.
    • Venom immunotherapy is 97% effective in preventing insect sting reactions in sensitive patients

    Music:

    datagroove by Goto80

    Sources:

    1. http://www.aafa.org/page/allergy-facts.aspx
    2. http://www.aaaai.org/about-aaaai/newsroom/allergy-statistics
    3. http://acaai.org/news/facts-statistics/allergies
    4. http://www.webmd.com/allergies/allergy-statistics
    5. http://www.healthline.com/health/allergies/statistics#1
    6. http://www.allergyassociatesinc.com/allergy-statistics/
    7. https://www.pollen.com
    8. http://www.nhs.uk/Livewell/hayfever/Pages/5lifestyletipsforhayfever.aspx
    9. http://www.webmd.com/skin-problems-and-treatments/eczema/treatment-16/eczema-allergies-link
    10. http://www.businessinsider.com/pollen-season-gets-worse-each-year-2015-6
    ]]>
    Achooo! Did you know that seasonal allergies affect about 50 million people in the US, penicillin kills about 400 people/year, and some people are allergic to cockroaches?! Learn all about allergies in this episode. Achooo! Did you know that seasonal allergies affect about 50 million people in the US, penicillin kills about 400 people/year, and some people are allergic to cockroaches?! Learn all about allergies in this episode. A note about this episode’s content: Most of the allergy information in this episode is very short statistics that were commonly […] Uncategorized – For the Love of Data full false 20:57
    011 – Top 10 Data Predictions for 2017 https://www.fortheloveofdata.com/011-top-10-data-predictions-for-2017-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=011-top-10-data-predictions-for-2017-for-the-love-of-data Fri, 30 Dec 2016 22:12:37 +0000 http://www.fortheloveofdata.com/?p=184 Happy New Year!
    Thank you to all listeners and subscribers for your support this past year.

    10 – Data borders will break down – logical data lakes and logical data warehouses will grow as companies embrace data virtualization products like Denodo. Data preparation tools, like the new Project Maestro from Tableau, will allow people to seamlessly pull from a) on-premise databases and excel files; b) cloud repositories like Redshift and BigTable; and c) hosted products like Workday and Salesforce.

    9 – Data Quality and “Refined” data sets will become more important – with the uptick in BigData, sensor data, and data lakes, users will have a glut of information at their disposal (some have this already). Automated solutions that  assess data quality or specially created intermediate data sets will become more and more important6. In many Data Lake architectures and Hadoop based ecosystems, curated or moderately processed datasets are becoming the norm for widespread usage by the enterprise. Data scientists and power users will continue to harness raw data sets for their explorations, but these refined data sets will be used to reduce heavy lifting and “recreating the wheel” for many analysts.

    8 – Collaborative BI and analytics will become more mainstream – Sites like data.world and collaborative features in products such as Tableau will be embraced by more users than ever before in 2017. Taking cues from social media, these tools and techniques will produce more living datasets and visualizations with near real-time data as static reporting continues to decline as a percentage of overall reporting. Users will interact with each other and gain economies of scale by not reinventing the wheel when someone else has already done the heavy lifting.

    7 – Internet of Things (IoT) will continue to expand – Currently, most firms use an age or time-based approach to maintain and replace equipment. Up to 50% of spend using this approach may be wasted, according to ARC Advisory Group3. This study also found that 82% of failures occur randomly. New sensors will be deployed and real-time data will continue to swing upward across many industries. Businesses will be able to use this data to respond to events like power outages as they occur and use predictive analytics and historical information for preventative maintenance. Using this data will allow companies to move from a time-based or cyclical check schedules to an event-based ones that can detect even small changes in performance that may spell trouble.

    6 – Converged Intelligence will improve our lives – the trend for companies to share datasets and provide APIs to their services will enable more collaborative experiences to help customers and differentiate companies from their competitors. Services like IFTT (If this, then that) will offer more and more connections, largely driven by community contributions.  Partnerships like SolarCity, Nest, and the Tesla Powerwall will share data to produce synergies that can save money and reduce energy dependence. People will leverage internet of things (IoT) [see #1 above] devices and home automation like SmartThings to make us more comfortable. Whether it is automatically adjusting your lights, TV, and devices when you want to watch a movie or automatically adjusting your Thermostat when you leave and arm your alarm, connected living will grow.

    A word of caution: data sharing may be open and driven by users opting-in, but in some instances it will be hidden and used to exploit customers without their knowledge.

    5 – Data breaches will continue – Stakes are getting higher as hackers attempt to sway political campaigns, ransomware is on the rise, and data breaches are increasing. As data becomes more open and shareable, attack vectors are much greater and opportunities are higher. Enterprises need to make sure they are vetting cloud and hosted solutions properly to make sure they are secure, but they also need to realize that cloud providers may be able to provide economies of scale and make data safer than individual organizations can on their own.

    4 – So…Security will have to get more proactive – As hackers start to use IoT and continue DDOS, companies need to work together to defend against threats. Tools like Watson for Cyber Security will user in this new era. We will move from predictive analytics into cognitive to discover threats, identify all assets exposed, and then perform a second-order threat analysis to see what other services may suffer or what may be targeted next. These tasks can be performed by machine clusters faster and more completely than an army of analysts.

    3 – You’ll continue to hear about blockchain initiatives, but it will be mostly hype in 2017 – According to Gartner, Blockchain is nearing the peak of the hype cycle4. However, I think other items close to the peak, like home automation and IoT will see more adoption than blockchain. IMHO, these others can be adopted on a smaller scale and are more readily available to the general public than blockchain related deployments. Many people are forecasting that blockchain related tech won’t hit mainstream for another 5-10 years5. Nevertheless, the concept and some early uses of it are pretty interesting, such as Smart Contracts. Also, friendly FYI, something that uses a blockchain is not automatically anonymous, as in the case of bitcoin.

    2 – The line between Data Scientist and analyst/programmer will blur even more – analysts and programmers will take special courses here and there to beef up their statistics and data science chops. I think the demand for data scientists will bifurcate in 2017: a subset of organizations will spring for data scientists and the high salaries they command; however, the majority of firms will push for their analysts or tools to do low level data science work. Tools like Tableau and R Studio are making it easier for analysts to dabble in statistical and predictive analytics. Firms, such as New Knowledge, are offering “Data Scientist as a Service”, and tons of online courses, e-books, and knowledge bases have sprung up to spread data science fundamentals to the masses.

    1 – BYOT, Bring Your Own Tool, will continue to gain momentum – Enterprises can no longer place all their eggs in one basket when it comes to a BI or reporting tool. Tools such as Tableau have proven their ability to uproot entrenched stalwarts like IBM Cognos, and traditional BI tools appear stale and financially infeasible compared to a plethora of specialized, cheaper alternatives. Traditional BI tools will still have their place in firms that have enterprise-level agreements and are slow to change, but as more and more users demand features that these tools can’t support, or go out and acquire alternatives through “shadow procurement”, the traditional tools and expertise in firms will erode. It is now more important than ever for IT organizations to focus on architectures that make a wide array of data available to the entire organization regardless of device or access tool of choice. Good governance policies and data czars needs to focus on data quality, establishment and maintenance of metadata, and publishing best practices around the types of tools and reports/visualizations that are best for specific scenarios. Firms need to evaluate the benefits of having multiple tools and the flexibility and productivity it gives their employees vs. the supportability and procurement benefits of working with a smaller number of providers.

    Music: Auld Lang Syne by Fresh Nelly, from Free Music Archive.

    Sources:

    1. http://www.tableau.com/resource/top-10-bi-trends-2017
    2. https://electrek.co/2016/02/25/solarcity-tesla-powerwall-nest-hawaii/
    3. https://www.ibm.com/blogs/internet-of-things/as-much-as-half-of-every-dollar-you-spend-on-preventive-maintenance-is-wasted/
    4. http://www.gartner.com/newsroom/id/3412017
    5. https://www.ft.com/content/3bea303c-7a7e-11e6-b837-eb4b4333ee43
    6. http://www.eweek.com/database/slideshows/10-predictions-for-the-data-analytics-market-for-2017.html
    ]]>
    Happy New Year! Thank you to all listeners and subscribers for your support this past year. 10 – Data borders will break down – logical data lakes and logical data warehouses will grow as companies embrace data virtualization products like Denodo. Happy New Year! Thank you to all listeners and subscribers for your support this past year. 10 – Data borders will break down – logical data lakes and logical data warehouses will grow as companies embrace data virtualization products like Denodo. Data preparation tools, like the new Project Maestro from Tableau, will allow people to […] Uncategorized – For the Love of Data full false 28:00
    010 For the Love of Thanksgiving – For the Love of Data https://www.fortheloveofdata.com/010-for-the-love-of-thanksgiving-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=010-for-the-love-of-thanksgiving-for-the-love-of-data Fri, 25 Nov 2016 07:40:32 +0000 http://www.fortheloveofdata.com/?p=179 Holiday Weight Gain Studies

    Studies are very mixed on whether holiday eating causes weight gain.

    The Good1

    • In several studies over the last thirty-one years, subjects gained approximately 3/4 to 2 lbs. during the holiday season
    • However, in one study participants felt they had gained 4x as much weight as they actually gained
    • Two other key finding:
      • Although the amount of weight gained between the holidays was small, it represented the majority of the weight gained for the year
      • Weight gained between the holidays typically is not lost the next year (it represents the annual amount of increase for many people).

    The Bad3

    • During “eating holidays”, like Thanksgiving and Christmas, participants consumed 14% more than on normal days
    • Some participants (outliers perhaps?) consumed over 900 calories more on special occasions than normal days
    • Obese individuals indulged at an even higher level during holidays

    Other

    • Children tend to gain more weight over the summer when school is out than during the holidays2

    Theme music for this month’s episode is “Turkey Time” by Monk Turner4.

    Sources:

    1. http://letstalknutrition.com/holiday-weight-gain-separating-fact-from-fiction/
    2. http://www.hit107.com/news/feed/2016/11/study-reveals-the-time-of-year-child-obesity-rates-rise-the-most/
    3. http://acsh.org/news/2015/11/24/does-holiday-feasting-affect-obesity-rates
    4. http://freemusicarchive.org/music/Monk_Turner/Calendar/Monk_Turner_-_Calendar_-_11_Turkey_Time
    ]]>
    Holiday Weight Gain Studies Studies are very mixed on whether holiday eating causes weight gain. The Good1 In several studies over the last thirty-one years, subjects gained approximately 3/4 to 2 lbs. during the holiday season However, Holiday Weight Gain Studies Studies are very mixed on whether holiday eating causes weight gain. The Good1 In several studies over the last thirty-one years, subjects gained approximately 3/4 to 2 lbs. during the holiday season However, in one study participants felt they had gained 4x as much weight as they actually gained Two other […] Uncategorized – For the Love of Data full false 5:52
    009 For the Love of Algorithms – For the Love of Data https://www.fortheloveofdata.com/009-for-the-love-of-algorithms-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=009-for-the-love-of-algorithms-for-the-love-of-data Mon, 31 Oct 2016 03:30:11 +0000 http://www.fortheloveofdata.com/?p=169 Worst Pun Ever: Today, we are talking about Al… Al Gore… Al Gore Rhythms… Algorithms!

    Definition: a step-by-step procedure for solving a problem or accomplishing some end especially by a computer1

    Inputs:

    • Many algorithms use census data or FICO score as one of their prime inputs
    • Plus any custom information you give a website
    • Plus any information they glean about you from other sites (when you visit a site with a Facebook Share button, Facebook can track that you’re there17)
    • Websites are constantly looking at ways to break our anonymity (fingerprinting) so they can track us and serve us more relevant or lucrative ads5.

    Fun Stuff

    • Chess – algorithms are so good that humans haven’t been able to beat a 4- CPU  PC since about 20056
    • Rubik’s Cube – the machine record is 0.887 seconds vs. just over 5 seconds for a human7

    • Poker – scientists solved all moves for Heads Up Limit Hold ‘Em – 3.16 x 10^17 moves. You may be able to win individual games, but it is HIGHLY unlikely that you can win over time8

    Machine Learning

    • Can include intentional or unintentional bias.
    • @JonathonMorgan did a post on Medium and a podcast on Partially Derivative about using a machine learning model to find alt-right white supremacists on Twitter and track their degree of radicalization over time. He did this by training a model with their tweets and analyzing their usage of words like “Jewish” vs. more mainstream usage3,4.
    – From Medium / Jonathon Morgan’s Post

    Pricing2:

    • Amazon lists its results over competitors, even when higher including shipping for non-prime customers; however, it claims it’s algorithm is customer-centric
    • Princeton Review charges between $6,600 and $8,400 for its online course in some zip codes. It charged higher in zip codes with higher incomes and some with higher Asian populations.

    News/Search:

    • Link Analysis, how two entities relate to each other is used by Google’s PageRank, Facebook’s News Feed, and LinkedIn’s job/connection recommendations. It was developed in 1976 and first used by two other search indexes before Google began using it in 1998.13
    • However, algorithms cause sites to cater to information similar to your preexisting views, or for what they think you will find interesting, rather than presenting balanced, holistic content.14
      • Medium recommends articles based on how long it thinks you will read.
      • Some sites tailor related content, content types (video, etc.), and sharing buttons based on where you enter their site from.
      • These choices and filters can lead you into a content bubble that leads you down a path of more and more specific, and sometimes extreme, viewpoints.
    • Facebook uses hundreds of features, or input variables, when assigning a relevancy score to posts you see in your news feed.15
    • When you have a Facebook account and you visit a page that has a like or share button, Facebook can log your visit and use that to tailor content or ads when you visit their site.16 See here17 for a relatively up-to-date list of features used in Facebook’s newsfeed algorithm (time spent viewing, friend’s posts receive priority, likes/reactions, etc. are all key inputs).

    Serious Consequences

    • Some algorithms for car insurance weight FICO credit scores higher than drunk driving convictions.9,10,11
    • Cathy O’Neil calls them “Weapons of Math Destruction” (WMDs) if they are: widespread, secretive, and have the potential to do great harm9,10
    • Kronos, a small big data HR company hired by large firms to screen applicants employs a personality test as part of their screening of candidates. Some argue that this unfairly excludes them from jobs, with no explanation of the reason, in a manner that violates the American’s with Disabilities Act (ADA).9,10,12

    Theme Music: Algorithm of Desire by Measles Mumbs Rubella, courtesy of FreeMusicArchive.

    Sources:

    1. http://www.merriam-webster.com/dictionary/algorithm
    2. https://www.propublica.org/article/breaking-the-black-box-when-algorithms-decide-what-you-pay
    3. http://partiallyderivative.com/podcast/2016/09/27/s2e14-the-model-is-racist
    4. https://medium.com/@jonathonmorgan/the-radical-right-and-the-threat-of-violence-f66288ac8c4#.kssqef9jz
    5. http://fivethirtyeight.com/features/internet-tracking-has-moved-beyond-cookies/
    6. http://www.extremetech.com/extreme/196554-a-new-computer-chess-champion-is-crowned-and-the-continued-demise-of-human-grandmasters
    7. http://gizmodo.com/in-just-0-887-seconds-another-machine-has-already-shatt-1758009774
    8. http://bigthink.com/ideafeed/computer-scientists-create-unbeatable-poker-playing-computer
    9. http://fivethirtyeight.com/features/whos-accountable-when-an-algorithm-makes-a-bad-decision/
    10. https://weaponsofmathdestructionbook.com/
    11. https://www.wired.com/2016/10/big-data-algorithms-manipulating-us/
    12. https://www.theguardian.com/science/2016/sep/01/how-algorithms-rule-our-working-lives
    13. https://medium.com/@_marcos_otero/the-real-10-algorithms-that-dominate-our-world-e95fa9f16c04#.8mczwtxzt
    14. http://www.cjr.org/news_literacy/algorithms_filter_bubble.php
    15. http://www.slate.com/articles/technology/cover_story/2016/01/how_facebook_s_news_feed_algorithm_works.html
    16. https://www.technologyreview.com/s/541351/facebooks-like-buttons-will-soon-track-your-web-browsing-to-target-ads/
    17. https://blog.bufferapp.com/facebook-news-feed-algorithm
    ]]>
    Worst Pun Ever: Today, we are talking about Al… Al Gore… Al Gore Rhythms… Algorithms! Definition: a step-by-step procedure for solving a problem or accomplishing some end especially by a computer1 Inputs: Many algorithms use census data or FICO score a... Worst Pun Ever: Today, we are talking about Al… Al Gore… Al Gore Rhythms… Algorithms! Definition: a step-by-step procedure for solving a problem or accomplishing some end especially by a computer1 Inputs: Many algorithms use census data or FICO score as one of their prime inputs Plus any custom information you give a website Plus […] Uncategorized – For the Love of Data full false 35:59
    008 For the Love of Politics – For the Love of Data https://www.fortheloveofdata.com/008-for-the-love-of-politics-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=008-for-the-love-of-politics-for-the-love-of-data Wed, 28 Sep 2016 15:27:27 +0000 http://www.fortheloveofdata.com/?p=159 History of Data in Politics

    First off, 538’s podcast, What’s the Point did a great four part series on the data of politics that covered the history of politics from the late 1800s through the primaries. So please check out the above links for more context behind this. A brief history of data in politics shows the major ways candidates appealed to constituents progressed along this path:

    • Party Elite chose candidates.
    • Direct outreach – Candidates engage voters directly, including things like voting after parties offering booze to those who voted for a particular candidate.
    • TV – When TV came along, suddenly candidates could reach the majority of voters just by running ads on three networks.
    • Direct Mail – Politicians could use subscriber lists from certain magazines to target specific groups that might be interested in their policies.
    • Micro Targeting – This started around 2004 where data analysis identified target demographics to go after, advertise, and appeal to.
    • Individual Targeting – Howard Dean (2004) was a pioneer in this effort, coalescing state voter lists together and appending commercial data. This continued in 2008 with Barack Obama where they found that there was still a significant diversity in micro groups. The trend was refined in 2012 where campaigns used individual data to feed into, test, and refine their models.

    However, there are roots back to 1891 when James Clarkson, the RNC chairman, assembled a file that featured the “age, occupation, nativity, residence and all the other facts in each votersʼ life, and had them arranged alphabetically, so that literature could be sent constantly to every voter directly.”10

    The Obama campaign in 2008 and 2012 hired enormous amounts of staffers — 342 in the 2012 race alone in technology, digital data and analytics.

    History of Voting Trends by State7

    All Elections since 1876 (the year Texas A&M was founded, whoop!)

    e008-maps-by-year

    Most Democratic (1932)

    e008-map-democrat

    Most Republican (1972)

    e008-map-republican

    Voter Turnout Rates

    In voter turnout data by country since 2000, the US ranks #159 out of #196 with just over 55% average voter turnout. We can and should do better.

    Rank Average of Voter Turn­out (%)# of Data PointsMin YearMax YearCountry
    199.8320022011Lao People's Dem. Republic
    299.3420022016Viet Nam
    397.9320032013Rwanda
    496.5120042004Equatorial Guinea
    595.1320032013Cuba
    694.1520012013Australia
    794320032013Malta
    893.8420012015Singapore
    991.3320042013Luxembourg
    1091.2320022008Faroe Islands
    1191320022012Bahamas
    1290.3420032014Belgium
    1389.9420002015Tajikistan
    1489.8420002015Ethiopia
    1589.8520002016Nauru
    1689.7320042014Uruguay
    1787.4320042013Turkmenistan
    1887.2320042014Antigua and Barbuda
    1987.1320042014Uzbekistan
    2086.4520012015Denmark
    2185.1320052013Aruba
    2284.6420022014Bolivia
    2384.5420032013Iceland
    2484.4420012013Liechtenstein
    2584.1420022015Turkey
    2683.5520002016Peru
    2783.3220012007Timor-Leste
    2883.1420022014Sweden
    2982.4420042014Tunisia
    3081.9320062014Cook Islands
    3181.6320042014Guinea-Bissau
    3281.6320022011Seychelles
    3381.5420012016Cyprus
    3480.2420012013Italy
    3580.1220052013Cayman Islands
    3680120022002Tuvalu
    3779.4320022012Sierra Leone
    3879.2320042014Botswana
    3979.1420022013Austria
    4078.6420022014Brazil
    4178.6420002014Mauritius
    4278.4220042014Namibia
    4378.2320042013Malaysia
    4477.9420012013Chile
    4577.9520022012Netherlands
    4677.8220012016Samoa
    4777.7120062006Palestinian Territory, Occupied
    4877.6520022014New Zealand
    4977320032013Monaco
    5076.9120122012Papua New Guinea
    5176.9420012013Norway
    5276.7320042014Indonesia
    5376.6220112015Gibraltar
    5476.6320012014Fiji
    5576.4420012015Guyana
    5676320052014Maldives
    5775.8320042014South Africa
    5875.6420032015Belize
    5975.6320032013Cambodia
    6075.6720012015Argentina
    6175.5420002012Belarus
    6275.5520002016Mongolia
    6375.4520012015Andorra
    6475.2320032013Grenada
    6575.1220082012Angola
    6675120032003Yemen
    6774.8520012013Philippines
    6874.8420022013Germany
    6974.1220052011Liberia
    7074420002012Ghana
    7173.9320022010Sao Tome and Principe
    7273.8320042014Panama
    7373.8320032012Bermuda
    7473.6320012011Nicaragua
    7573.5220102015Myanmar
    7673.5420002015Anguilla
    7773.3520002015Sri Lanka
    7872.8320022013Togo
    7972.7320052015Burundi
    8072.5420012014Montserrat
    8171.9620002016Spain
    8271.4120152015Comoros
    8370.9420022013Ecuador
    8470.7320022013Kenya
    8570.5320012014Bangladesh
    8670.5620002015Greece
    8769.7420002015Saint Kitts and Nevis
    8869.6320062012Montenegro
    8969.6420032015Virgin Islands, British
    9069.5420012012San Marino
    9169.4520022016Kazakhstan
    9268.4620012014Thailand
    9368.2320022013Cameroon
    9467.9420022014Costa Rica
    9567.8220022013Guinea
    9667.5120072007Kiribati
    9767.5320052014Iraq
    9867.2420012015Saint Vincent and The Grenadines
    9966.9220052011Central African Republic
    10066.9620002015Trinidad and Tobago
    10166.8320072013Bhutan
    10266.5420032015Finland
    10366.4620012015Israel
    10466.3420012016Uganda
    10566.2420052014Tonga
    10666.1420022016Ireland
    10766.1420022014Hungary
    10865.9320012013Mauritania
    10965.9320032013Paraguay
    11065.6420002015Suriname
    11165.3420012014Solomon Islands
    11265.1320072015Oman
    11365520012016Taiwan
    11464.7220062011Congo, Democratic Republic of
    11564.4520022016Vanuatu
    11664.4520062013Kuwait
    11764.3320012011Zambia
    11864.2420022015Burkina Faso
    11963.9420032015Guatemala
    12063.5320022010Netherlands Antilles
    12163.4320032013Djibouti
    12263.3120082008Nepal
    12363.2420012015United Kingdom
    12463520022014Latvia
    12563520022014Macedonia, former Yugoslav Republic (1993-)
    12662.6620002015Canada
    12762.6420012016Cape Verde
    12862.6520002015Croatia
    12962.5520012014Moldova, Republic of
    13062.4520002015Kyrgyzstan
    13162.3520002014Slovenia
    13262420032015Estonia
    13361.8320032013Barbados
    13461.7520022014Ukraine
    13561.6420022014Bahrain
    13661.5620002014Japan
    13761.5220002003Yugoslavia, FR/Union of Serbia and Montenegro
    13861.4320042014Malawi
    13961.1420002015Tanzania, United Republic of
    14061.1420022013Czech Republic
    14160.9320042014India
    14260.5520022016Slovakia
    14360.1520022015Portugal
    14459.8320032011Russian Federation
    14559.3320032012Armenia
    14659.3220022013Madagascar
    14759.3420032012Georgia
    14859.2220102015Sudan
    14959.2420032015Benin
    15058.6320022012France
    15157.9420022016Dominican Republic
    15257.9320082016Iran, Islamic Republic of
    15357.8520072016Serbia
    15457.6420002014Dominica
    15557.3520012014Bulgaria
    15657.1420032016Syrian Arab Republic
    15757520002014Bosnia and Herzegovina
    15855.9420012013Honduras
    15955.7820002014United States
    16055.5420002015Venezuela
    16155.3420032013Jordan
    16255.1520002016Korea, Republic of
    16355.1420022016Jamaica
    16454.8320002012Palau
    16554.5220022011Chad
    16653.4320042016Niger
    16753.1420022015Lesotho
    16852.1620002015Mexico
    16951.9420012013Albania
    17051.7220122014Libya
    17151.4420002012Lithuania
    17251.2420002012Romania
    17350.8520002015Azerbaijan
    17448.9320012011Saint Lucia
    17548.6220072013Micronesia, Federated States of
    17648.5320002009Lebanon
    17748.1520012015Poland
    17848220072015Marshall Islands
    17947.8420032015Switzerland
    18047.6220052010Afghanistan
    18146.7320022013Pakistan
    18246.2320012012Senegal
    18345.6320002008Zimbabwe
    18445.3420042014Kosovo
    18544.7320022011Morocco
    18643.7520002015El Salvador
    18743.2320042014Mozambique
    18842.6420022014Colombia
    18941.6320022012Algeria
    19040.5320032015Nigeria
    19139.2320022012Gambia
    19236.5420052015Egypt
    19334.3120112011Gabon
    19434.1220002011Côte d'Ivoire
    19532.2420002015Haiti
    19631.8320022013Mali

    e008-voter-turnout-by-year

    e008-cps-age

    e008-cps-educ

    e008-cps-race

    e008-electorate-demo-race

    • 30+ vote much at much higher rates than younger voters.
    • The more educated you are, the more likely you are to vote.
    • Most commonly black or white; Hispanics are the lowest consistently since 1984.
    • White share has been declining, but is still an overwhelming 77% of the vote8.

    Sources:

    1. http://fivethirtyeight.com/features/a-history-of-data-in-american-politics-part-1-william-jennings-bryan-to-barack-obama/
    2. http://fivethirtyeight.com/features/a-history-of-data-in-american-politics-part-3-the-2016-primaries/
    3. http://www.fec.gov/pubrec/fe2012/federalelections2012.shtml
    4. http://www.electproject.org/home/voter-turnout/voter-turnout-data
    5. http://www.census.gov/library/visualizations/2016/comm/electorate-profiles/cb16-tps25_voting_texas.html
    6. http://www.presidency.ucsb.edu/elections.php
    7. http://www.fairvote.org/voter_turnout#voter_turnout_101
    8. http://www.idea.int/vt/viewdata.cfm
    9. http://www.electproject.org/home/voter-turnout/demographics
    10. http://41lscp16wiqd3klpnn1z0t2a.wpengine.netdna-cdn.com/wp-content/uploads/2015/05/Kreiss_NiemanFinal.pdf
    11. http://nationbuilder.com/voterfile
    ]]>
    History of Data in Politics First off, 538’s podcast, What’s the Point did a great four part series on the data of politics that covered the history of politics from the late 1800s through the primaries. So please check out the above links for more con... History of Data in Politics First off, 538’s podcast, What’s the Point did a great four part series on the data of politics that covered the history of politics from the late 1800s through the primaries. So please check out the above links for more context behind this. A brief history of data in politics […] Uncategorized – For the Love of Data full false
    007 For the Love of Olympics – For the Love of Data https://www.fortheloveofdata.com/007-for-the-love-of-olympics-for-the-love-of-data/?utm_source=rss&utm_medium=rss&utm_campaign=007-for-the-love-of-olympics-for-the-love-of-data Thu, 25 Aug 2016 07:30:09 +0000 http://www.fortheloveofdata.com/?p=110

    Fun Fact: The main riff in NBC’s Olympic them is from Bugler’s Dream (1958) by  Leo Arnaud.


    History

    • Most believe games started in 776 BC as part of a religious festival in Greece to honor Zeus; however, some evidence suggests it could have started as early as the 10th century BC
    • The stadion race was the first event, a 600 foot race. This may have been the only event for the first 13 Olympics
    • They occurred every four years for twelve centuries, until 396 AD; then there was a break in games until 1896

    How to Qualify for the Olympics

    • Individual:
      • For each gender, up to three people per country can attend if they meet the entry standard
      • For each gender, one person per country can attend if no one meets the standard
    • Team: Each country may send one team that meets the entry standard
    • Slightly more complicated criteria for relays and marathon – generally involving your finish in various qualifying events

    Fun Fact: The marathon was not added until 1896 in Athens and was standardized at 26.2 miles in the 1908 London games because that was the distance between Windsor Castle and White City Stadium.


    Cost of the Games

    Many people feel the Olympics are a terrible investment for the host country. Rio’s estimated cost was $3bn, but it is projected to be at least 50% over budget at approximately $4.6bn.

    MetricValue
    BRL / month1972
    BRD to USD0.31
    USD / month611.32
    USD / year7335.84
    Estimated cost of games4600000000
    Cost in # of yearly salaries627058.39
    Population209567920
    Cost in % of population0.002992

    Sochi is the most expensive so far, but Summer games are typically more expensive than Winter.

    Cost line graph Olympic games

    Cost table for historic games


    Fun Fact: The first Winter games were in 1924 (Chamonix).


    Who are the Athletes?

    Country Rankings

    CountryPopulation AthletesRank by # of Athletes
    United States3241187875631
    Brazil2095679204832
    Germany806823514403
    Australia243093304284
    France646681294085
    United Kingdom (Great Britain)651111433726
    China13823233323527
    Canada362863783208
    Japan1263237153129
    Spain4606460431210
    CountryPopulation AthletesRank by Population
    China13823233323521
    India13268015761232
    United States3241187875633
    Indonesia260581100284
    Brazil2095679204835
    Pakistan19282650276
    Nigeria186987563777
    Bangladesh16291086478
    Russian Federation1434398322839
    Mexico12863200412510
    CountryPopulation AthletesRank Per Capita
    Republic of the Cook Islands*2094891
    Palau2150152
    Nauru1026323
    San Marino3195054
    British Virgin Islands*3065945
    Bermuda*6166286
    Saint Kitts and Nevis5618377
    Seychelles97026108
    Tuvalu994319
    Antigua and Barbuda92738910

    Fun Fact: The flame started at the 1928 Amsterdam games.


    Gender & Age Breakdown

    Men vs. Women Pie Chart

    Gender by Age bar chart

    Oldest / Youngest by Avg. Age

    Who are the Oldest and Youngest of All Time?

    CategoryMaleFemale
    Oldest CompetitorOscar Swahn (Sweden)
    Age 72
    1920, Shooting
    Lorna Johnstone (UK)
    Age 70
    1972, Equestrian
    Oldest Gold MedalistOscar Swahn (Sweden)
    Age 64
    1912, Shooting
    Lida "Eliza" Pollock (USA)
    Age 63
    1904, Team Archery (Bronze)
    Oldest MedalistOscar Swahn (Sweden)
    Age 72
    1920, Shooting (Silver)
    Lida "Eliza" Pollock (USA)
    Age 63
    1904, Archery (Bronze)
    Youngest Gold MedalistKlaus Zerta (Germany)
    Age 13
    1960, Rowing
    Donna Elizabeth de Varona (USA)
    Age 13
    1960, Swimming - Team
    Youngest MedalistDimitrios Loundras (Greece)
    Age 10
    1896, Gymnastics - Team (Bronze)
    Luigina Giavotti (Italy)
    Age 11
    1928, Gymnastics - Team (Silver)

    Fun Fact: Boxing and wrestling were added in 708 BC and 688 BC respectively.


    A Look at the Medals

    Summer Medal Values & Rewards

    • Gold: $600 (The gold medal consist of just 1% of actual gold, 92.5% silver and 6.16% copper).
    • Silver: $325 (While in silver medal, the gold is replaced by more copper, the rest of the material is the same like gold medal)
    • Bronze: $3 (Bronze medal however is 97% copper and 2.5% zinc and 0.5% tin)

    CNN bar graph - who pays for the gold

    Who are the Big Winners at Rio 2016?

    Total Medals

    • Italy and Canada had a strong showing in total medals, but fell off in gold medals
    • Top 10 controlled almost 60% of total medals
    CountryTotal Medals % of Total MedalsRunning TotalRank
    United States1210.124229979466120.124229979466121
    China700.0718685831622180.196098562628342
    United Kingdom (Great Britain)670.0687885010266940.264887063655033
    Russian Federation560.0574948665297740.32238193018484
    Germany420.0431211498973310.365503080082145
    France420.0431211498973310.408624229979475
    Japan410.0420944558521560.450718685831626
    Australia290.0297741273100620.480492813141687
    Italy280.0287474332648870.509240246406578
    Canada220.022587268993840.531827515400419
    Korea, South210.0215605749486650.5533880903490810

    Fun Fact: If Texas were a country, it would rank 8th for # of medals in the 2016 Summer Olympics.



    Total Gold

    • Brazil and Argentina won many golds, but few others
    • Top 10 controlled 70% of total golds
    CountryGold % of Gold MedalsRunning TotalRank
    United States460.149837133550490.149837133550491
    United Kingdom (Great Britain)270.0879478827361560.237785016286642
    China260.0846905537459280.322475570032573
    Russian Federation190.0618892508143320.384364820846914
    Germany170.0553745928338760.439739413680785
    Japan120.0390879478827360.478827361563526
    France100.032573289902280.51140065146587
    Korea, South90.0293159609120520.540716612377858
    Netherlands80.0260586319218240.566775244299679
    Australia80.0260586319218240.59283387622159
    Hungary80.0260586319218240.618892508143329
    Italy80.0260586319218240.644951140065159
    Brazil70.0228013029315960.6677524429967410
    Spain70.0228013029315960.6905537459283410

    Percentage of Medal Type by Country

    • Six countries won nothing but Gold
    • Fiji and Argentina dominated in Golds as a % of total medals
    CountryGold Silver Bronze Total Medals
    Puerto Rico*1001
    Singapore1001
    Tajikistan1001
    Kosovo1001
    Jordan1001
    Fiji1001
    Argentina0.750.2504
    Jamaica0.545454545454550.272727272727270.1818181818181811
    Hungary0.533333333333330.20.2666666666666715
    Croatia0.50.30.210
    Greece0.50.166666666666670.333333333333336
    Slovakia0.50.504
    Bahrain0.50.502
    Vietnam0.50.502
    Independent Olympic Athletes0.500.52
    Cote d'Ivoire0.500.52
    The Bahamas0.500.52

    Fun Fact: Swimming was added as an event in 1896 (freestyle); backstroke was added in 1904.



    Michael Phelps

    • 32nd among 205 currently competing countries as far as most medals won
    • 28 total medals – 23 gold, 3 silver, 2 bronze
    • 13 individual medals puts him ahead of Leonidas of Rhodes – sprinter form 152BC
    • 50 miles swam per week in prep for 2008 Olympics; 12,000 calories consumed each day
    • If Katie Ledecky maintained her current medal pace, she’d be 39 before she tied Phelps
    • He hasn’t won bronze since 2004

     

    Popularity of Events

    Swimming, Track and Field, Gymnastics, and Soccer are the most popular sports for people to watch. 538 did an interesting comparison in the 2012 Olympics to come up with a medal multiplier based on number of events vs. number of viewers. The US, China, and Russia dominate on an adjusted medal count.

     

    See the chart below (again based on London 2012). Sailing, for instance, has a lot of events but not much viewership, so it gets a reduction. Soccer, however, has only a few events but a large amount of viewers, so it’s multiplier is very high.

    538 Medal Multiplier

    Growth – Interest in the Olympics, number of events, number of competitors, and costs are all going up. On a per capita basis, it hasn’t been this hard to win a medal since 1896.

    E007-538-gold-per-capita

    Sources

    1. http://www.penn.museum/sites/olympics/olympicorigins.shtml
    2. http://www.npr.org/sections/thetorch/2016/08/11/487838010/what-team-usa-looks-like-a-by-the-numbers-look-at-america-s-olympic-athletes?utm_medium=RSS&utm_campaign=news
    3. https://github.com/flother/rio2016
    4. http://www.npr.org/sections/thetorch/2016/08/14/489832779/if-michael-phelps-were-a-country-where-would-his-gold-medal-tally-rank
    5. http://www.foxsports.com/olympics/gallery/28-incredible-facts-about-michael-phelps-28-olympic-medals-23-golds-count-how-many-081316
    6. http://www.worldometers.info/world-population/population-by-country/http://olympstats.com/
    7. http://olympstats.com/
    8. http://www.topendsports.com/events/summer/oldest-youngest.htm
    9. http://fivethirtyeight.com/features/winning-an-olympic-gold-medal-hasnt-been-this-difficult-since-1896/
    10. http://fivethirtyeight.com/features/which-countries-medal-in-the-sports-that-people-care-about/
    11. http://fivethirtyeight.com/features/hosting-the-olympics-is-a-terrible-investment/
    12. https://arxiv.org/ftp/arxiv/papers/1607/1607.04484.pdf
    13. http://www.chron.com/olympics/article/Where-Texas-would-rank-in-Olympic-medal-count-if-9176024.php
    14. http://www.tradingeconomics.com/brazil/wages
    15. https://www.google.com/?ion=1&espv=2#q=brl%20to%20usd
    16. http://www.totalsportek.com/news/olympic-gold-medal-prize-money/
    17. http://edition.cnn.com/2016/08/19/sport/olympic-rewards-by-country/
    18. https://www.olympic.org/swimming-equipment-and-history
    19. https://en.wikipedia.org/wiki/Athletics_at_the_2016_Summer_Olympics_%E2%80%93_Qualification#Qualifying_standards
    ]]>
    Fun Fact: The main riff in NBC’s Olympic them is from Bugler’s Dream (1958) by  Leo Arnaud. History Most believe games started in 776 BC as part of a religious festival in Greece to honor Zeus; however, some evidence suggests it could have started as e... Fun Fact: The main riff in NBC’s Olympic them is from Bugler’s Dream (1958) by  Leo Arnaud. History Most believe games started in 776 BC as part of a religious festival in Greece to honor Zeus; however, some evidence suggests it could have started as early as the 10th century BC The stadion race was the […] Uncategorized – For the Love of Data full false 26:15
    004 The History of Hadoop – For the Love of Data https://www.fortheloveofdata.com/004-history-of-hadoop/?utm_source=rss&utm_medium=rss&utm_campaign=004-history-of-hadoop Wed, 25 May 2016 03:17:54 +0000 http://www.fortheloveofdata.com/?p=59 Let me set the stage for you…

    It’s 2003: Chicago just won the Oscar for Best Picture and Grand Theft Auto: Vice City is the top selling video game. Apple iPods still have scroll wheels and iTunes just started selling music for the first time. From a tech standpoint, Windows XP is all the rage as the latest Windows OS and folks with a lot of money to spend are buying PCs with a Pentium 4 3.0 GHz processor, 512MB of RAM (or maybe up to 2GB max), and an 80GB hard drive.  Oracle just released version 10g and Microsoft proponents are still using SQL Server 2000. Internet Explorer 6 dominants the browser wars with about 85% market share and two-thirds of the US still connect to the internet with a modem.

    (Stats from various Google searches, CNET desktop reviews, and http://www.internetworldstats.com/articles/art030.htm)

    In the years leading up to Hadoop’s inception, Doug Cutting, the first node in the Hadoop cluster, had been working on Lucene, a full text search libary, and then began work on indexing web pages with University of Washington graduate student Mike Cafarella. The project was called Apache Nutch, and it was a sub-project of Lucene. They made good progress getting Nutch to work on a single machine, but they reached the processing limits of that one machine and began manually clustering four machines together. The duo started to spend the majority of their time figuring out a way to scale the infrastructure layer for better indexing. In October 2003, Google released their Google File System paper. This paper did not describe exactly what Google did to implement their solution, but it was an excellent blueprint for what Cutting and Cafarella wanted to do. They spent most of the next year (2004) working on their implementation and labeled it the Nutch Distributed File System (NDFS). In this implementation, they made a key decision to replicate each chunk of data on multiple nodes, typically three, for redundancy.

     After solving for infrastructure redundancy, the team set their sights on improving the computational side and taking advantage of the stable fabric of nodes. Google again provided a spark of inspiration with their MapReduce research paper. The approach provided parallelization, distribution, and fault tolerance; all of these work in conjunction to work through tasks quickly, regardless of hardware failures that might occur along the way.

    In 2006, Cutting went to work for Yahoo, and the storage and compute components of Lucene separated into a sub-project called Hadoop. The name originated from a toy yellow elephant that belonged to Cutting’s son. In April Hadoop 0.1.0 was released and it sorted almost 2TB of data in 48 hours. By April of 2007 Yahoo was running two Hadoop clusters of 1,000 machines and other companies like Facebook and LinkedIn start to use the tool.

    By 2008, Hadoop hit critical mass along several fronts. Yahoo transitioned the search index that drove their website over to Hadoop and contributed Pig to the Apache Software Foundation. Facebook also contributed Hive, bringing SQL atop Hadoop. The product also spawned commercial legs when Cloudera was founded; Cutting joined their ranks the following year.

    In 2011 Hortonworks spun off from Yahoo, and the following year Yahoo’s Hadoop cluster reached 42,000 nodes. Also in 2012, Hadoop contributors began to replace MapReduce with YARN, an offshoot of MapReduce’s resource management and scheduling components. Late in the year Apache Hadoop 1.0 becomes generally available. In 2013, Yahoo begins YARN in production and Hadoop 2.2 debuts.

    Fast forward to today and several vast ecosystems exist around Hadoop in among different prepackaged distributions. The most popular of these are Cloudera, Hortonworks, and MapR. Below is a snapshot of Hortonworks and Cloudera’s packaged components:

    Hortonworks:
    hortonworks
    Cloudera:
     cloudera
    Sources:
    ]]>
    Let me set the stage for you… It’s 2003: Chicago just won the Oscar for Best Picture and Grand Theft Auto: Vice City is the top selling video game. Apple iPods still have scroll wheels and iTunes just started selling music for the first time. Let me set the stage for you… It’s 2003: Chicago just won the Oscar for Best Picture and Grand Theft Auto: Vice City is the top selling video game. Apple iPods still have scroll wheels and iTunes just started selling music for the first time. From a tech standpoint, Windows XP is all the rage […] Uncategorized – For the Love of Data full false 7:38
    003 The Data of Taxes – For the Love of Data https://www.fortheloveofdata.com/003-the-data-of-taxes/?utm_source=rss&utm_medium=rss&utm_campaign=003-the-data-of-taxes Thu, 31 Mar 2016 22:06:14 +0000 http://www.fortheloveofdata.com/?p=49 Huge thanks to @Deepak90Mittal for hanging out with me on this episode!

    News Prologue:

    1. Gartner’s Magic Quadrant for BI is out – overhauled methodology, Oracle is out; Tableau and Microsoft (PowerBI) reign supreme!
    2. SQL Server on Linux! = millions of geeks rejoice and it may spell the end of Windows in the data center.
    3. Excel is the most popular DataViz tool by a longshot, followed by Python, D3, and Tableau

    A) 2016 State Comparison:

    1. What you should drink Where – comparison of per gallon taxes on beer, wine and spirits converted to a per drink equivalency
      1. Beer – Missouri
      2. Wine – Lousiana
      3. Liquor – Missouri
      4. Best overall: Missouri, Wisconsin, California, Texas

    2. Tax Freedom Day (a.k.a., Working for the Man) –
      Interesting way to look at taxes – how long you have to work to cover federal, state, and local taxes for the year.
      E003-TaxFreedomDay
    3. Gas Prices – Texas is pretty low (#42) on gas taxes! PA is #1; NY is #3; CA is #5

    Gas tax rates 2016: http://taxfoundation.org/blog/state-gasoline-tax-rates-2016
    E003-GasTaxMap

    B) Federal Income Tax Stats

    A rough calculation of the rate at which individual tax returns are filed within the US:

    Start of Year:   1/1/2016
    Filing Date: 4/18/2016
    Days Elapsed: 108
    Total Est. Returns (using 2013 #): 138,313,155
    Total filed per day*:                   1,280,677
    Total filed per hour*:                         53,362
    Total filed per minute*: 889
    Total filed per second*: 15
    * All calculations rounded to nearest whole number

    Key Findings from the report (mostly using data from 2013):

    • In 2012, the top 50 percent of all taxpayers (69.2 million filers) paid 97.2 percent of all income taxes while the bottom 50 percent paid the remaining 2.8 percent.
    • The top 1 percent (1.3 million filers) paid a greater share of income taxes (37.8 percent) than the bottom 90 percent (124.5 million filers) combined (30.2 percent).
    • The top 1 percent of taxpayers paid a higher effective income tax rate than any other group, at 27.1 percent, which is over 8 times higher than taxpayers in the bottom 50 percent (3.3 percent).
    ]]>
    Huge thanks to @Deepak90Mittal for hanging out with me on this episode! News Prologue: Gartner’s Magic Quadrant for BI is out – overhauled methodology, Oracle is out; Tableau and Microsoft (PowerBI) reign supreme! https://www.gartner.com/doc/reprints? Huge thanks to @Deepak90Mittal for hanging out with me on this episode! News Prologue: Gartner’s Magic Quadrant for BI is out – overhauled methodology, Oracle is out; Tableau and Microsoft (PowerBI) reign supreme! https://www.gartner.com/doc/reprints?id=1-2XXET8P&ct=160204&st=sb) SQL Server on Linux! = millions of geeks rejoice and it may spell the end of Windows in the data center. […] Uncategorized – For the Love of Data full false 29:30