Monday, 2 September 2013

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.


while reading another xml file using smb jcifs. Here i am refer this example read .xml file using smb jcifs . Xml contain some UTF-8 characters inside a XML file, and parser is not configure to parse the UTF-8 properly, characters like copyright , reserve etc. Throw an exception :-


com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
 at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:684)
 at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:554)
 at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1742)
 at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1416)
 at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2793)
 at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:647)
 at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511)
 at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808)
 at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737)
 at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119)
 at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:232)
 at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:284)
 at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:124)
 at com.gwtech.source.XmlReader.readAllXML(XmlReader.java:61)
 at com.gwtech.source.XmlReader.execute(XmlReader.java:343)
 at org.quartz.core.JobRunShell.run(JobRunShell.java:191)
 at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516)


To read content in UTF-8 format modify input source :-

SmbConnect.java class found Here Do some modification :-

Change this code :-
InputStream inputStream = sFile.getInputStream();
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
    Document doc = dBuilder.parse(inputStream);
to This
InputStream stream = new SmbFileInputStream(fXmlFile);
  Reader reader = new InputStreamReader(stream);
  InputSource inputSource = new InputSource(reader);
  inputSource.setEncoding("UTF-8"); // set UTF-8 character encoding 

  DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
  DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
  Document doc = dBuilder.parse(inputSource);

Now Everything is working fine hope this help you :)

No comments:

Post a Comment