Uploaded image for project: 'Apache Any23 (Retired)'
  1. Apache Any23 (Retired)
  2. ANY23-389

RDFa extraction breaks when base element uses relative href

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 2.3
    • 2.3
    • extractors
    • None

    Description

      I noticed that when extracting from html such as this:

      <html prefix="og: http://ogp.me/ns#">
      <head>
          <base href="">
          <link rel="icon" type="image/x-icon" href="https://static1.squarespace.com/static/55085720e4b0813599644fae/t/56291c91e4b0377cf53e5981/favicon.ico"/>
          <meta property="og:site_name" content="36°N"/>
          <meta property="og:title" content="36°N Friends &amp; Family Night"/>
          <meta property="og:latitude" content="36.1604966"/>
          <meta property="og:longitude" content="-95.9889172"/>
          <meta property="og:street-address" content="201 North Elgin Avenue"/>
          <meta property="og:locality" content="Tulsa"/>
          <meta property="og:region" content="OK"/>
          <meta property="og:postal-code" content="74120"/>
          <meta property="og:country-name" content="United States"/>
          <meta property="og:url" content="https://www.36degreesnorth.co/events/2018/8/2/36n-friends-family-night"/>
          <meta property="og:type" content="website"/>
          <meta property="og:description" content="Hey 36°N Members! Grab your family or a close friend, and join us for a fun night at the ballpark. We reserved the Coors Light Refinery Deck at ONEOK Field, so we can all hang out, enjoy a buffet and watch the game in the shade.  Dinner starts at 6:30. Game starts at 7:00.  $5/person. $20/family (co"/>
          <meta property="og:image" content="http://static1.squarespace.com/static/55085720e4b0813599644fae/5768549715d5db9b150af935/5a62695653450a1e55940197/1528903903136/DRILLERS+FAMILY+NIGHT-+square.png?format=1000w"/>
          <meta property="og:image:width" content="800"/>
          <meta property="og:image:height" content="800"/>
      </head><body></body>
      </html>
      

      none of the rdfa11 triples (neither the og properties nor the icon property) are extracted as expected, apparently due to the underlying rdfa11 parser requiring an absolute base href rather than a relative one.

      Attachments

        Activity

          People

            hansbrende Hans Brende
            hansbrende Hans Brende
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: