Class HtmlToPlainText


  • public class HtmlToPlainText
    extends Object
    HTML to plain-text. This example program demonstrates the use of jsoup to convert HTML input to lightly-formatted plain-text. That is divergent from the general goal of jsoup's .text() methods, which is to get clean data from a scrape.

    Note that this is a fairly simplistic formatter -- for real world use you'll want to embrace and extend.

    To invoke from the command line, assuming you've downloaded the jsoup jar to your current directory:
    java -cp jsoup.jar org.jsoup.examples.HtmlToPlainText url [selector]
    where url is the URL to fetch, and selector is an optional CSS selector.

    Author:
    Jonathan Hedley, jonathan@hedley.net, Andreas Rudolph
    • Constructor Detail

      • HtmlToPlainText

        public HtmlToPlainText()
    • Method Detail

      • getPlainText

        public String getPlainText​(org.jsoup.nodes.Element element)
        Format an Element to plain-text
        Parameters:
        element - the root element to format
        Returns:
        formatted text
      • toPlainText

        public static String toPlainText​(String html)
      • toPlainText

        public static String toPlainText​(org.jsoup.nodes.Document document)
      • toPlainText

        public static String toPlainText​(org.jsoup.nodes.Element element)