Age | Commit message (Collapse) | Author |
|
Give this condition a more descriptive name.
|
|
|
|
|
|
For now I'm going to allow URLs to be printed out of their apparent
visual order. Change the test so that it passes.
|
|
|
|
Add a test with a simple text-only PDF with three URLs.
Currently I'm getting the following failure, so visibly the order is not
necessarily the same as the visible order, and multi-line hyperlinks can
be encoded as two link areas:
---- tests::get_urls_from_pdf_extracts_urls_from_pdf stdout ----
thread 'tests::get_urls_from_pdf_extracts_urls_from_pdf' panicked at 'assertion failed: `(left == right)`
left: `["http://www.gutenberg.org/ebooks/11", "https://ia800908.us.archive.org/6/items/alicesadventures19033gut/19033-h/images/i002.jpg", "https://science.nasa.gov/news-article/black-hole-image-makes-history"]`,
right: `["http://www.gutenberg.org/ebooks/11", "https://science.nasa.gov/news-article/black-hole-image-makes-history", "https://ia800908.us.archive.org/6/items/alicesadventures19033gut/19033-h/images/i002.jpg", "https://ia800908.us.archive.org/6/items/alicesadventures19033gut/19033-h/images/i002.jpg"]`', src/lib.rs:65:9
|
|
Facilitate testing by returning a vec of URLs instead of printing them
directly to STDOUT.
|
|
|
|
Turns out when I removed the `unwrap`s in
92f8f57b76b32c3d3e52d4b61dcdf25969f47ab7, the `return`s I added to the
`match` expressions caused the loops to exit early without iterating
over all the objects in the PDF.
Remove the `return`s and fix up the expression return types to get URLs
printing again.
|
|
Create a custom error type to use instead of the `unwrap`s.
|
|
Get rid of `::str`-prefixed calls.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Thanks to plinth (https://stackoverflow.com/users/20481/plinth) on Stack
Overflow, learned that URLs are stored in /A entries in a PDF:
> To get the link to go somewhere you'll need either a /Dest or an /A
> entry in the link annot (but not both). /Dest is an older artifact for
> page-level navigation - you won't use this. Instead, use the /A entry
> which is an action dictionary. So if you wanted to navigate to the url
> http://www.google.com, you would make your annotation look like this:
>
> << /Type /Annot /Subtype /Link /Rect [ x1 y1 x2 y2 ]
> /A << /Type /Action /S /URI /URI (http://www.google.com) >>
> >>
https://stackoverflow.com/questions/19492229/add-a-hyperlink-into-a-pdf-document/19496996#19496996
To extract URLs, find the /A objects and get the text value of their
`URI` fields.
|
|
Walk the different objects in the PDF to discover how hyperlinks are
stored and how I can access them.
|
|
$ rustc --version
rustc 1.38.0 (625451e37 2019-09-23)
$ cargo init --bin
|