How can I display technical unicode characters?

RandomCoder

Well-Known Member
Licensed User
Longtime User
I am attempting to write a print capture utillity for some very old machines that we have.
The machines produce a results page that contains lots of sections that are seperated with line art i.e | _ ¬ etc. Normally the machines print via a parallel port to a centronics printer, I have removed the printer and replaced it with a parallel-to-serial converter which then connects into my PC.

Using the OnComm event I can capture the data. I've tried using Serial.InputString, Serial.InputArray and also Agraham's SerialEx library in which I am able to set the encoding type. But no matter what I do I am unable to display the lines as they should be.

Now I am wondering if the problem is actually to do with the TextBox and the Font that is being used?
Is it possible to select which Font the TextBox use's and does anyone know which Font I should choose?

Attached is a file that contains the raw data that I grabbed using Serial.InputArray. This almost displays correctly in Notepad so I know that I'm nearly there.

I think (but I'm not sure) that I need some of the characters from this codepage http://unicode.org/charts/PDF/U2300.pdf.
And I know that the decimal values of the characters that are not being displayed correctly are -
179 - Vertical line
191 - Top right-hand corner
192 - Bottom left-hand corner
196 - Horizontal line
217 - Bottom right-hand corner
218 - Top right-hand corner
I also get a value 255 where I would expect to get a space, I can't explain this but it's something that I can live with as it only occurs twice and is part of a section title.

Also attached is my work in progress :BangHead:

Any help is as always very gratefully received.

Thanks,
RandomCoder
 

Attachments

  • RawData.txt
    2 KB · Views: 226
  • PrintCapture (Byte Mode).sbp
    3.6 KB · Views: 218

agraham

Expert
Licensed User
Longtime User
What encoding did you try with SerialEx? I would expect that code page 437 is the correct one. The Unicode code points for 437 are in this table here Code page 437 - Wikipedia, the free encyclopedia and don't seem to be those that you think they are.

Edit :- It may be that encoding 437 is not present by default in .NET as it doesn't have an asterisk beside it in the Encodings table. I haven't actually checked this. In its'absence, for efficiency, you could build an array of length 256, preload it with the Unicode code points of 437 then just use the received bytes as an index into the array and assemble the Unicode string in a StringBuilder from my StringsEx library which is much faster than a normal String for this.

Here is a text file with code page 437 Unicode values http://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/PC/CP437.TXT. Perhaps you could write a little program to parse this into the array I mentioned above.
 
Last edited:

RandomCoder

Well-Known Member
Licensed User
Longtime User
What encoding did you try with SerialEx? I would expect that code page 437 is the correct one. The Unicode code points for 437 are in this table here Code page 437 - Wikipedia, the free encyclopedia and don't seem to be those that you think they are.

I think I tried this one but can't be sure. As the machine is DOS based I know that I tried all of the (DOS) codes and I'm fairly sure that I used all the IBM ones too. I'll give it another go tomorrow (I'm at home now).

... for efficiency, you could build an array of length 256, preload it with the Unicode code points of 437 then just use the received bytes as an index into the array and assemble the Unicode string in a StringBuilder from my StringsEx library which is much faster than a normal String for this.

Thanks, I can definately do this. What's the trick to finding the correct code pages?:sign0188: Also I've not used your StringBuilder command before so this is the perfect opportunity to learn something new. Will a normal TextBox recognise these codes and display correctly?

Here is a text file with code page 437 Unicode values http://www.unicode.org/Public/MAPPIN...T/PC/CP437.TXT. Perhaps you could write a little program to parse this into the array I mentioned above.

Your the man, I really don't know how you do it :sign0156:

Kind regards,
RandomCoder
 

klaus

Expert
Licensed User
Longtime User
I tried it with the BinaryFile library and the 437 character set.
And Courier New font to display the text in a TextBox.
The Courier New font is changed with the FormLib.

Attached a desktop sample program, with a big TextBox to display your data without a scrollbar.

Best regards.
 

Attachments

  • TestText.sbp
    875 bytes · Views: 204
  • RawData.jpg
    RawData.jpg
    50.4 KB · Views: 254

RandomCoder

Well-Known Member
Licensed User
Longtime User
@Klaus.

I didn't intend on you doing it for me but thanks very much for taking the time and providing me with a sample program.

My work is almost done now, all I need to do is extract the necessary bits to create a filename, and produce a method of searching for the desired file.
I plan on having the app runnning in the background and produce a preview of the captured file before saving it to disc.

Thankyou to everyone that helped me.

Regards,
RandomCoder.
 

RandomCoder

Well-Known Member
Licensed User
Longtime User
I need slightly more help.....

I've not been able to check whether Serial.Encoding = 437 works using the SerilaEx library as I didn't manages to get on the machine today.

However, I did attempt to create my own codepage lookup array as was suggested by Agraham.
This wasn't working at work (running on Win XP using B4PPC Version 6.5), as it failed to display most of the codes correctly, but now that I've got home and having written a demo program it is (running Win 7 using B4PPC Beta Version 6.76 ) :BangHead:

I think I know the reason why, but would like to check...
I'm presuming it's to do with the system font that is used (but could it be due to the Beta handling it diffferently)?
Tomorrow I will be able to confirm this by setting the Forms font to "Courier New" which Klaus has already kindly shown is capable of displaying box drawings.

The last thing that bothers me is that the B4PPC help says that Chr()...
"Returns the ASCII character represented by the given number.
Syntax: Chr (Integer)
Integer ranges from 0 to 255."
At first I expected that this was the problem as I'm passing it a value of 9474 and above for the box drawing characters.
Is the help wrong? And, what is the maximum value that Char can accept?

Attached is the demo that now works (at least for me at home :confused: )

Regards,
RandomCoder
 

Attachments

  • CP437 Demo.sbp
    1.6 KB · Views: 210
  • cp437_DOSLatinUS to Unicode.txt
    9.3 KB · Views: 221

RandomCoder

Well-Known Member
Licensed User
Longtime User
That's great news.

@Agraham
Is my implementation of the Codepage array as you intended?
It appeared to me that I needed to store the values in Decimal as I assumed an array of Strings storing the Hex value would not work as the code would be interpreted as a string and not as hex.

Thanks,
RandomCoder
 
Top