NAME Web::PageMeta - get page open-graph / meta data SYNOPSIS use Web::PageMeta; my $page = Web::PageMeta->new(url => "https://www.apa.at/"); say $page->title; say $page->image; async fetch previews and images: use Web::PageMeta; my @urls = qw( https://www.apa.at/ http://www.diepresse.at/ https://metacpan.org/ https://github.com/ ); my @page_views = map { Web::PageMeta->new( url => $_ ) } @urls; Future->wait_all( map { $_->fetch_image_data_ft, } @page_views )->get; foreach my $pv (@page_views) { say 'title> '.$pv->title; say 'img_size> '.length($pv->image_data); } # alternativelly instead of Future->wait_all() use Future::Utils qw( fmap_void ); fmap_void( sub { return $_[0]->fetch_image_data_ft }, foreach => [@page_views], concurrent => 3 )->get; DESCRIPTION Get (not only) open-graph web page meta data. can be used in both normal and async code. For any other than 200 http status codes during data downloads, HTTP::Exception is thrown. ACCESSORS new Constructor, only "url" is required. url HTTP url to fetch data from. timeout In addition to AnyEvent::HTTP timeout will also check time during download as the data are being downloaded and dies when over the limit. Default 5 minutes. max_size Will die when the document or image size is greater than this limit. Default 100MB. user_agent User-Agent header to use for http requests. Default is one from Chrome 89.0.4389.90. extra_headers HashRef with extra http request headers. cookie_jar Accepts optional HTTP::Cookies compatible object that must provide get_cookies() method. If set will send http cookie headers with each request. title Returns title of the page. description Returns description of the page. image Returns image location of the page. image_data Returns image binary data of "image" link. Will throw 404 exception if there is not "image" link. page_meta Returns hash ref with all open-graph data. extra_scraper Web::Scraper::LibXML object to fetch image, title or description from different than default location. use Web::Scraper::LibXML; use Web::PageMeta; my $escraper = scraper { process_first '.slider .camera_wrap div', 'image' => '@data-src'; }; my $wmeta = Web::PageMeta->new( url => 'https://www.meon.eu/', extra_scraper => $escraper, ); page_body_hdr Returns array ref with page [$body,$headers]. Can be useful for post-processing or special/additional data extractions. Only text/html content-type is accepted for fetching. fetch_page_meta_ft Returns future object for fetching paga meta data. See "ASYNC USE". On done "page_meta" hash is returned. fetch_image_data_ft Returns future object for fetching image data. See "ASYNC USE" On done "image_data" scalar is returned. fetch_page_body_hdr_ft Returns future object for fetching page content and headers. See "ASYNC USE" On done "page_body_hdr" array ref is returned. ASYNC USE To run multiple page meta data or image http requests in parallel or to be used in async programs "fetch_page_meta_ft" and fetch_image_data_ft returning Future object can be used. See "SYNOPSIS" or t/02_async.t for sample use. SEE ALSO https://ogp.me/ AUTHOR Jozef Kutej, <jkutej at cpan.org> LICENSE AND COPYRIGHT Copyright 2021 jkutej@cpan.org This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License. See http://dev.perl.org/licenses/ for more information.