Friday 17 November 2017

Normalisert Mantissa Binære Alternativer


Fremgangsmåte for og apparatur for normalisering av binært nummer med flytende punkt US 5513362 AA etterbehandling utføres på en mantissa M og en eksponent E av et flytende punkt binært tall som et resultat av subtraksjon for eksempel for å oppnå en mantissa m og en Eksponent e av resultatet av etterbehandling Derfor blir en utgang E-1 av en decrementer og en utgangsmengde av kansellering av Mantissa LSA innført i en fremrykkende 1 detekteringskrets innført i en minimumsvalgskrets. Minimumsvalgskretsen er tilpasset å angi et skiftebeløp SH til E-1 og et størrelsesrelatert dømmesignal CR til 1 når E-1 er mindre enn LSA, det vil si når en denormaliseringsbehandling er nødvendig når E-1 ikke er mindre enn LSA, når en normalisering av behandling er nødvendig, SH er satt til LSA og CR er satt til 0 En venstre skifter er tilpasset til å levere som verdi av resultatet, en verdi oppnådd ved å utføre en vensterskiftbehandlingsprosess med en skiftmengde SH på mantissaen MA velger c krets er tilpasset til å levere som eksponent e for resultatet 0 når CR er lik 1 og en utgang E-LSA av en subtraherende krets når CR er lik 0 Dette muliggjør denormaliseringen av behandling av et flytende punkt binært nummer til utføres med høy hastighet tilsvarende det der en normalisering av prosessering utføres. 3.1 Et operativt behandlingsapparat for å utføre en skiftbehandling på en mantissa, som har flere bitposisjoner og radixpunkt, av et flytende punkt binært tall og for å justere en eksponent for det flytende punkt binære nummer, idet apparatet omfatter. advansjon 1 detekteringsorgan for å detektere bitposisjonen til en fremadrettet 1 i nevnte mantissa og for å tilveiebringe en forskjell mellom bitposisjonen og bitposisjonen for en bit-bit-posisjon som en mengde av kansellering av nevnte mantissa enn signifikant enn radixpunkt-skjæringsinnretningen for å tilveiebringe en verdi oppnådd ved å subtrahere 1 fra eksponeringsparringsanordningen for å sammenligne i størrelse med hverandre to inngangsdata, dvs. en utgang fra dekningsorganet og en mengde avbrytelse som er tilført fra den fremadrettede 1 detekteringsinnretningen, for derved å forsyne, som et resultat av størrelsesforholds-dommen, inngangsdataene som er den minste, og også for å tilveiebringe et størrelses-forhold dømmesignal som representerer w hvor inngangsdata er mindre ut av de to inngangsdataene. Utdragingsanordninger for å tilveiebringe en verdi oppnådd ved subtrahering fra eksponenten en avbruddsmengde som er tilført fra de fremadrettede 1 detekteringsorganer. Seleksjonsinnretninger for tilførsel som en eksponent for et resultat av en operativ prosessering 0 når et størrelsesrelatert dømmesignal fra nevnte sammenligningsorgan representerer at utgangen av nevnte dekningsorganer er mindre enn de to inngangsdata, og en utgang fra nevnte subtraheringsorgan når størrelsesforholdsdømmesignalet representerer ellers og skiftende innretninger for å levere som en mantissa av nevnte resultat av den operative prosessering en verdi oppnådd ved å utføre på nevnte mantissa av det flytende punkts binære tall, en venstre forskyvning i hvilken skiftmengde er lik et resultat av størrelses - relativ dømmekraft, som har flere biter, som er tilveiebrakt fra sammenligningsorganet. hvor sammenligningsinnretningen har en minimumsvalsekrets for å formidle magnitukten de forholdet mellom de to inngangsdataene for hvert siffer derav fra et mest signifikant siffer til et minst signifikant siffer, for derved å tilveiebringe resultatet av størrelsesforholdsdomene, startende med det mest signifikante sifferet, og skiftemidlet omfatter venstre 2 kk 0 1, 2 n-1 bit shifters som henholdsvis tilsvarer lavere n-biter av et resultat av størrelsesforholds-dømming som er tilveiebrakt fra nevnte minimumsvalgskrets og som er forbundet i kaskad til hverandre. 2 En operativ prosesseringsanordning for å utføre en skiftebehandling på en mantissa, som har flere bitposisjoner og et radixpunkt, av et flytende punkt binært tall og for å justere en eksponent for det flytende punkts binære nummer, idet apparatet omfatter. advancing 1 detekteringsmiddel for å detektere bitposisjonen av en fremover 1 i nevnte mantissa og for å tilveiebringe en forskjell mellom nevnte bitposisjon og bitposisjonen for en bit i en mengde av kansellering av nevnte mantissa mer seg vesentlig enn radixpunkts-skjæringsinnretningen for å tilveiebringe en verdi oppnådd ved subtrahering av 1 fra eksponeringsparrings - og valginnretningen for å sammenligne i størrelsesorden med hverandre to inngangsdata, dvs. en avbruddsmengde som er tilført fra den fremadrettede 1 detekteringsinnretningen og eksponenten for derved å tilveiebringe som følge av størrelses-forholdsavgjørelse den nevnte mengde av kansellering når mengden av kansellering er mindre enn nevnte eksponent og en utgang fra nevnte dekregeringsorgan når avbrytelsesmengden ikke er mindre enn eksponenten, og også for å tilveiebringe et størrelsesrelatert dømmesignal som representerer hvilken inngangsdata som er mindre ut av de to inngangsdata. uttrekkingsanordninger for å tilveiebringe en verdi oppnådd ved subtrahering fra en eksponent en avbruddsmengde som er tilført fra de fremadrettede 1 detekteringsmidler. tilveiebringer som en eksponent for et resultat av en operativ behandling en utgang fra nevnte subtraheringsanordning når en størrelsesrelasjonsdømmesignal f rom sier at sammenligning og valg av midler representerer at ut av de to inngangsdata er nevnte avbruddsmengde som er tilveiebrakt fra den fremadrettede 1 detekteringsinnretningen, mindre og 0 når nevnte størrelsesforholdsdømmesignal representerer ellers og skiftingsorganer for tilførsel som mantissa av nevnte resultat av den operative prosessering, en verdi oppnådd ved å utføre, på nevnte mantissa av det flytende punkts binære tall, en venstreforskyvningsbehandlingen i hvilken forskyvningsbeløpet er lik et resultat av størrelsesforholdsdomene, som har flere biter, tilveiebrakt fra nevnte sammenlignings - og utvalgte innretninger. hvor sammenlignings - og valginnretningen har en sammenlignings - og valgkrets for å formidle størrelsesforholdet mellom de to inngangsdata for hvert siffer derav fra et mest signifikant siffer til et minst signifikant siffer for derved å tilveiebringe resultatet av størrelsesforholdsdomene, som starter med det mest signifikante tallet, og skiftemidlene omfatter venstre 2 kk 0, 1, 2 n-1 bit shifters som respes korresponderer tilsvarende med lavere n-biter av et resultat av størrelsesforholds-dommen som er tilveiebrakt fra nevnte sammenlignings - og utvelgelseskrets og som er forbundet i kaskad til hverandre.3 Et operativt behandlingsapparat for å utføre en skiftbehandlingen på en mantissa, som har flere bitposisjoner og et radix-punkt, av et flytende punkt binært tall og for å justere en eksponent for det flytende punkts binære nummer, idet apparatet omfatter. advancing 1 detekteringsinnretning for å detektere bitposisjonen av en fremmende 1 i nevnte mantissa og for tilførsel, som en mengde kansellering av nevnte mantissa, en forskjell mellom nevnte bitposisjon og bitposisjonen for en bit-en-bitposisjon som er mer signifikant enn radixpunktet. avtrekkerorganet for å tilveiebringe som en følge av subtraksjon en verdi oppnådd ved subtrahering fra eksponenten , en mengde avbrudd som er forsynt med forsiden, som fremmer 1 detekteringsmiddel, og også for tilførsel av et størrelsesrelasjonsdømmesignal som representerer hvorvidt eller n den nevnte eksponent er lik eller mindre enn mengden av kansellering. første valgmiddel for å tilveiebringe som eksponent for et resultat av en operativ behandling 0 når et størrelsesforholdsdømme-signal fra nevnte subtraheringsmiddel representerer at eksponenten ikke er større enn nevnte mengde av kansellering og et resultat av subtraksjon som tilføres fra subtraheringsanordningen når størrelsesforholdsdømmesignalet representerer ellers. Andre valgorganer for tilførsel av eksponenten når et størrelsesforholdsdømme-signal fra nevnte subtraheringsmiddel representerer at eksponenten ikke er større enn nevnte mengde av kansellering og mengden av kansellering som er forsynt foran fremadrettet 1 detekteringsinnretning når størrelsesforholdsdømmesignalet representerer ellers og skiftbehandlingsanordninger for å tilveiebringe som en mantissa av resultatet av operasjonsbehandlingen en verdi oppnådd ved å utføre, på nevnte mantissa av det flytende punkts binære nummer, en venstre skiftbehandling i whi ch forskyvningsbeløpet er lik en verdi oppnådd ved å subtrahere 1 fra en utgang fra nevnte andre utvelgelsesanordning når et størrelsesrelatert dømmesignal fra nevnte subtraheringsmiddel representerer at eksponenten ikke er større enn nevnte avbrytelsesmengde, og hvor skiftmengden er lik den nevnte utgang i seg selv av det andre valgorganet når størrelsesforholdsdømmesignalet representerer ellers. hvor skiftbehandlingsinnretningen har en venstre skifter for å tilveiebringe en verdi oppnådd ved å utføre på mantissa en venstre skiftbehandlingsprosess i hvilket skift beløpet er lik en utgang fra det andre utvelgelsesorganet og en høyre 1-bits shifter for å levere som en mantissa av resultatet av en operativ behandling en verdi oppnådd ved å utføre en rett 1-bits skiftbehandling på en utgang fra nevnte venstre skifter når et størrelsesforholds-dømmesignal fra subtraheringsanordningen representerer at eksponenten ikke er større enn mengden av kansellering, og selve utgangen av den venstre s Foreliggende oppfinnelse vedrører fremgangsmåte og apparat for en operativ prosessering ved bruk av et binært antall av en flytpunktsrepresentasjon i henhold til IEEE Institute of Electrical and Electronics Engineers Standard 754 eller en som er tilpasset den. Med nylig komplikasjon av vitenskapelig teknisk beregning eller grafisk prosedyre, er det økt etterspørsel etter høyhastighets og nøyaktig flytpunktsoperasjon En datamaskin er tilpasset til å utføre en behandling ved bruk av bare begrensede sifre i et flytpunktsnummer Følgelig oppstår det ofte feil i et resultat oppnådd ved flytende punktoperasjon Operativ presisjon avhenger av maskinvarearrangementet på en datamaskin, men ved å følge IEEE-standarden 754 kan feil som oppstår fra maskinvarearrangementet forhindres. I IEEE Std 754 , et format hvis totale bitnummer er 32, inkludert et 1-bits tegn S, en 8-bi t eksponent E og en 23-bits brøkdel F, er spesifisert for et enkelt-presisjon flytende punkt binært tall. Også et format hvis totale bitnummer er 64, inkludert et 1-bit tegn S, en 11-bit eksponent E og en 52- bit fraksjon F, er spesifisert for et dobbelt-presisjon flytende punkt binært nummer Generelt er det brukt et flytpunktspunkt for hvilket normalisering har blitt utført slik at en virtuell ikke-nullverdierbit og radixpunktet er plassert over det viktigste bit MSB fra fraksjonen F Imidlertid blir en forspenning gitt til en faktisk eksponent slik at eksponenten E er en positiv verdi For enkel presisjon brukes for eksempel som eksponent E en verdi oppnådd ved å tilsette 127 som en forspenning til en faktisk eksponent Det vil si et ekte tall R1 uttrykt som et normalisert antall enkeltpresisjon uttrykkes som følger. hvor 1 F er en mantissa M. I IEEE Std 754 er det definert at når et operativt resultat er en nabolagsverdi av 0, er dette representert som et denormalisert tall For enkel presisjon blir eksponenten E f. eks. 0 og det blir utført en denormaliseringsbehandling for å skifte fraksjonen F slik at vekten av nullverdierbiten øvre med en bit enn radixpunktet er 2 -126 I dette tilfellet, en reell verdi R2 uttrykt som et denormalisert tall er uttrykt som følger. der i mantissa M er 0 F. Det er et fenomen at antall siffer av et effektivt tall blir kraftig redusert når det er lagt til to tall hvorav absoluttverdiene er vesentlig det samme og av hvilke tegn som er forskjellige fra hverandre Et slikt fenomen kalles kansellering I subtraksjon av flytpunktstallene er det noe forskjellig i verdi fra hverandre, når en eksponent av minuend er lik en eksponent av subtrahend, subtrahering av deres mantissas utføres uten sifferposisjon som begrunner operasjonen For eksempel når en mantissa av minuend er 1 100101 og en mantissa av subtrahend er 1 100010, er resultatet av subtraksjon av mantissaen s er lik 0 000011 Når verdien av biten øvre med en bit enn radixpunktet er 0 i resultatet av en operasjon, sies det at avbrytelse av mantissa er blitt generert. Antall nuller som kontinuerlig er tilstede fra posisjonen til biten øvre med en bit enn radixpunktet kalles en mengde kansellering av mantissa. I dette eksemplet er mengden av kansellering av mantissa 5. Et flytende punktnummer som presenterer slik kansellering av mantissa, blir normalisert ved å utføre, på en mantissa M har en forskyvningsbehandlingsprosess som har en forskyvningsbeløp som er lik mengden av kansellering og ved å korrigere en eksponent E slik at mengden av kansellering trekkes fra eksponenten E I den følgende beskrivelse er det nødvendig med en venstre skiftmengde på det tidspunktet når kansellering av mantissa er blitt generert, vil bli uttrykt som en mengde avbryter LSA. Når eksponenten E ikke er større enn mengden av kansellering av mantissa LSA og mengden av kansellering LSA trekkes fra m eksponenten E for normalisering, blir eksponenten etter korreksjon ikke større enn 0 Når et operativt resultat ikke kan uttrykkes som et normalisert nummer, er denormaliseringsbehandlingen som er nevnt, nødvendig. Maskinen til en konvensjonell datamaskin er tilpasset til å utføre en prosessering bare av et normalisert tall Nærmere bestemt, når det vurderes at en verdi oppnådd ved å utføre en normalisering av prosessering på et operasjonelt resultat i en maskinvare, ikke kan uttrykkes som et normalisert nummer, blir normaliseringsprosessen avbrutt som antatt at et unntak har skjedd, og en denormaliseringsbehandling blir da betrodd til programvaren Følgelig utføres denormaliseringsbehandlingen etter at normaliseringen har blitt utført. Dette presenterer problemet at et ønsket operasjonelt resultat ikke kan oppnås ved høy hastighet. Oppfinnelsen av oppfinnelsen. Det er en gjenstand ifølge foreliggende oppfinnelse for å muliggjøre en denormaliseringsbehandling av et flytende punkt binært tall som skal utføres d ved høy hastighet tilsvarer en hastighet ved hvilken en normalisering av prosessering utføres. For å oppnå ovennevnte gjenstand er foreliggende oppfinnelse anordnet slik at før eksekvering av en normalisering av prosessering, en eksponent E og en mengde kansellering av mantissa LSA sammenlignes i stor grad med hverandre og, basert på sammenligningsresultatet, blir enten en normaliseringsbehandling eller en denormaliseringsbehandling utført. Ifølge foreliggende oppfinnelse sammenlignes en eksponent E og en mengde kansellering av Mantissa LSA i størrelsesorden med hver andre og det vurderes om resultatet av en operativ behandling er et normalisert tall eller et denormalisert nummer Når resultatet av en operativ behandling er et normalisert tall E er større enn LSA, blir mengden av kansellering LSA valgt som et forskyvningsbeløp SH for en mantissa M, og en verdi oppnådd ved å subtrahere mengden av kansellering LSA fra eksponenten E, velges som en eksponent e for resultatet normalisere behandlingen på den andre hånd, når resultatet av en operativ behandling er et denormalisert tall E ikke er større enn LSA, blir en verdi oppnådd ved subtrahering 1 fra eksponenten E valgt som skiftmengde SH for mantissa M, og 0 velges som eksponenten e av resultatet deformaliserer behandlingen Nærmere bestemt, selv om resultatet av en operativ behandling er et denormalisert nummer, kan behandlingen utføres med høy hastighet på samme måte som for et normalisert nummer. RETFEKT BESKRIVELSE AV TEGNINGENE. FIG 1 er en flytdiagram som viser strømmen av en behandling i en operativ prosesseringsmetode i henhold til en utførelsesform av foreliggende oppfinnelse. FIG 2 er et blokkdiagram som viser arrangementet av et første operativt behandlingsapparat ifølge en utførelsesform av foreliggende oppfinnelse. FIG 3 er en kretsdiagram som viser arrangementet av innsiden av en minimumsvalgskrets vist i figur 2.FIG 4 er et blokkdiagram som viser arrangementet av en annen operativ behandling apparat ifølge en utførelsesform av den foreliggende oppfinnelse. FIG 5 er et kretsdiagram som viser arrangementet av innsiden av en sammenlignings - og valgkrets vist i figur 4. FIG 6 er et blokkdiagram som viser arrangementet av et tredje operativt behandlingsapparat i henhold til en utførelsesform av den foreliggende oppfinnelse. FIG 7 er et kretsdiagram som viser arrangementet av innsiden av en subtraherende krets vist i figur 6 og fig. 8 er et blokkdiagram som viser arrangementet av et fjerde operativt behandlingsapparat ifølge en utførelsesform av foreliggende oppfinnelse. Med henvisning til de vedlagte tegninger vil den følgende beskrivelse diskutere en operativ prosesseringsmetode ifølge en utførelsesform av foreliggende oppfinnelse og et operativt behandlingsapparat som skal anvendes ved utøvelse av fremgangsmåten ovenfor. FIG 1 viser en sekvens av å utføre en etterbehandling på en mantissa M og en eksponent E av et inntastende flytende punkt binært nummer oppnådd som resultat av en drift for eksempel, subtraksjon av normaliserte tall, og dermed konvertere mantittene M og eksponenten E til en mantissa m og en eksponent e av et binært utgående flytpunkt. Følgende beskrivelse vil diskutere sekvensen trinnvis for enkel presisjon, men Den operative prosesseringsmetoden som er vist i fig. 1 kan også påføres for dobbel presisjon. For å oppnå en mengde kansellering av mantissa LSA, oppdages bitposisjonen til den fremadrettende 1 i en mantissa M. Mengden av kansellering LSA oppnås som en forskjell mellom bitposisjonen til fremføringsveien 1 som er detektert og posisjonen til biten øvre med en bit enn radixpunktstrinnet 101. Deretter sammenlignes en eksponent E og mengden av kansellering LSA i størrelse med hvert annet trinn 102. Når E er ikke større enn LSA, utføres en denormaliseringsbehandling slik at resultatet av en operativ behandling blir uttrykt som et denormalisert nummer. Det er følgelig nødvendig å redusere eksponenten E slik at eksponenten E er lik 0 og å utføre, på mantissa M, en vensterskiftbehandling som har en forskyvningsbeløp som tilsvarer mengden av en slik reduksjon. Biten øvre med en bit enn radixpunktet i et normalisert tall har vekt av 2 -127, men vekten av en slik bit i et denormalisert tall er 2-12 som vist i ligningen 2 Følgelig er det påkrevd at 1 bit blir redusert fra skiftbeløpet når en venstre skiftbehandling utføres på mantissa M I denne forbindelse er skiftebeløpet SH for mantissa satt til E-1 trinn 103, og en eksponent e av resultatet av den operative prosessering er satt til 0 trinn 104. På den annen side, når E er større enn LSA, forskyvningsbeløpet SH for mantissa er satt til LSA for å utføre et normaliseringsbehandlingstrinn 105, og en eksponent e av resultatet av en operativ behandling er satt til E-LSA-trinn 106 På denne tid er eksponenten e E-LSA er positiv. På et trinn 107 utføres en venstre skiftbehandling på mantissa M acco rding til skiftbeløpet SH oppnådd ved trinn 103 eller 105, hvorved man får en mantissa m av resultatet av den operative prosessering. I henhold til den operative prosesseringsmetode som er nevnt ovenfor styres prosessens strømning basert på resultatet av sammenligningen I størrelsesorden mellom eksponenten E og mengden av kansellering av mantissa LSA Følgelig, selv om resultatet av en operativ behandling er et denormalisert nummer, kan behandlingen utføres med høy hastighet på samme måte for et normalisert nummer. Alternativt kan trinnet 103 være endret slik at skiftbeløpet SH er satt til E i stedet for E-1, og en rett 1-bit skiftbehandling kan bare utføres på mantissa M bare når E ikke er større enn LSA før eller etter trinn 107 hvor venstre skift behandling utføres på mantissa M. Den følgende beskrivelsen vil suksessivt diskutere første til fjerde operativ prosesseringsapparat som skal brukes i utøvelsen av den operative prosesseringsmetoden abov e-nevnt. Det første operative behandlingsapparat som er vist i fig. 2, omfatter en dekrementer 201, en fremmende 1 detekteringskrets 202, en minimumsvalgskrets 203, en venstre skifterenhet 204, et mantisseresultatregister 205, en subtraherende krets 206, en valgkretsen 207 og et eksponentresultatregister 208. Avkorteren 201 er tilpasset til å tilveiebringe en verdi oppnådd ved å subtrahere 1 fra en eksponent E Den fremadrettede 1 detekteringskretsen 202 er innrettet til å søke en mantissa M i retningen fra biten øvre med en bit enn radix peker til minst signifikante bit LSB for derved å detektere posisjonen til den første bit som er lik 1 og også tilpasset til å tilveiebringe en forskjell mellom posisjonen til den således oppdagede bit og som en mengde avbryter LSA. posisjonen til biten øvre med en bit enn radixpunktet. Minivurderingsvelkretsen 203 er tilpasset til å sammenligne i størrelse med hverandre to inngangsdata, dvs. en utgang E-1 til dekrementoren 201 og en utgang LSA av den fremadrettede 1 detekterende krets 202, for derved å tilveiebringe som inngangsdata, hvilken som helst, den minste, og for å tilveiebringe et størrelsesrelasjonsdømmesignal CR som representerer hvilken inngangsdata som er mindre ut av de to inngangsdataene Når E-1 er mindre enn LSA, da E ikke er større enn LSA, er SH lik E-1 og CR er lik 1 Når E-1 ikke er mindre enn LSA, da E er større enn LSA , SH er lik LSA og CR er lik 0 Den venstre forskyvningsenhet 204 er innrettet til å levere som en mantissa m av resultatet av en operativ behandling en verdi oppnådd ved å utføre, på mantissa M, en vensterskiftbehandling som har et forskyvningsbeløp som er angitt av en utgang SH av minimumsvalgskretsen 203 Mantisseresultatregisteret 205 er tilpasset til å lagre en utgang m fra den venstre forskyvningsenhet 204. Subtraheringskretsen 206 er innrettet til å tilveiebringe en verdi oppnådd ved å subtrahere en utgang LSA av den fremadgående 1 detekterings krets 202 fra en eksponent ET han velger krets 207 er innrettet til å levere som eksponent e av resultatet av en operativ behandling 0 når CR er lik 1 og en utgang E-LSA av subtraherings kretsen 206 når CR er lik 0 Eksponentresultatregisteret 208 er tilpasset til å lagre en utgang e av utvelgelseskretsen 207. I henhold til arrangementet i figur 2 dømmer minimumsvalgskretsen 203 om resultatet av en operativ behandling er et normalisert nummer eller et detormalisert nummer, basert på faktumet om eller ikke en verdi oppnådd ved subtrahering av en utgang LSA av den fremadgående 1 detekteringskretsen 202 fra en utgang E-1 til dekrementer 201, er negativ. Skiftingsbeløpet SH for mantissa M og en eksponent e av resultatet av en operativ behandling er fastslått slik at, basert på resultatet av dommen som derved blir gjort, enten å normalisere prosessering eller en denormaliseringsprosessering, skal utføres. På denne tiden blir den venstre forskyvningsenhet 204 vanligvis brukt for både å normalisere behandlingen og d denormaliseringsprosessen. Minimalverdieringsvelkretsen 203 i fig. 2 har funksjonen at to 8-bits inngangsdata X, Y sammenlignes i størrelse med hverandre og inngangsdataene som er mindre er satt som en utgangsdata Z, og at den logiske verdien av en størrelsesrelasjonsdømmesignalutgangsterminal B er satt til 1 når X er mindre enn Y Som vist i figur 3, har minimumsvalgskretsen 203 en inngangskrets 311, en mellomstasjonskrets 312 og en utgang krets 313 og er anordnet slik at størrelsesforholdet mellom de to inngangsdata X, Y for hver av sifferene blir forplantet fra det høyeste siffer til det laveste sifferet, og dermed ved høy hastighet bestemme en utgangsdata Z som suksessivt starter med det høyeste sifferet Se japansk patentanmeldt publikasjon 3-12735. Når de respektive bitene av inngangs - og utgangsdataene X, Y, Z er satt som Xi, Yi, Zi i 0 til 7, en størrelsesforholdsbestemmelsesfunksjon gi og en størrelsesforholdsholdingsfunksjon pi dannes for hvert siffer i inngangskretsen 311 gi 1 representerer at Xi er mindre enn Yi, og pi 1 representerer at Xi er lik Yi. Mellomkretsen 312 danner, basert på utgangene gi og pi av inngangskretsen 311, en størrelsesorden - relasjonsbestemmelsesfunksjon gjk og en størrelsesforholdsholdingsfunksjon pjk for sifferene fra det ytre siffer til kth-sifferet j er mindre enn k For eksempel representerer g67 1 størrelsesforholdet mellom to biter som X7X6 er mindre enn Y7Y6 og p67 1 representerer ekvivalensforholdet for to biter som X7X6 er lik Y7Y6 Videre representerer g47 1 størrelsesforholdet mellom fire biter som X7X6X5X4 er mindre enn Y7Y6Y5Y4 og p47 1 representerer ekvivalensforholdet for fire biter som X7X6X5X4 er lik Y7Y6Y5Y4 Disse størrelsene - relasjonsbestemmende funksjoner gi, gjk og størrelsesforholdsholdingsfunksjonene pi, pjk blir forplantet fra høyeste siffer til laveste siffer. Når størrelsesforholdsbestemmelsesfunksjonen gi7 for sifrene fra hvert siffer er sifferet til det høyeste sifferet det syvende sifferet er oppnådd på den ovennevnte måten, er Xi valgt i hvert siffer når gi7 er lik 1 og Yi er valgt i hvert siffer når gi7 er lik 0 Da er Xi eller Yi således valgt er satt som Zi Således kan en 8-bits utgangsdata Z-minimumsverdi oppnås suksessivt fra den høyeste biten. I utgangskretsen 313 i fig. 3 er imidlertid henholdsvis Z7 og Z6 bestemt i henhold til g7 og g67, og Z5 og Z4 bestemmes i henhold til g47 og Z3 til Z0 bestemmes i henhold til g07 Forstørrelsesforholdsbestemmelsesfunksjonen g07 for sifferene fra det første siffer til det 7. siffer som er lik 1 når X er mindre enn Y, og som er lik 0 når X ikke er mindre enn Y, blir tilført fra størrelsesforholdsdømmesignalutgangsterminalen B. Som vist i fig. 2 blir den venstre shifterenheten 204 dannet ved å forbinde fem 16-bits, 8-bits, 4 - bit, 2-bit og 1-bit venstre skifter til hverandre som ordnet i denne rekkefølgen fra inngangen siden av en mantissa M De nedre fem bitene av en utgang Z7 til Z0 av minimumsvalgskretsen 203 tjener som styresignaler fra de fem venstre skiftere henholdsvis Mer spesifikt når en utgang skifter mengde SH av minimumsvalgskretsen 203 bestemmes suksessivt fra den høyeste bit, blir shifterene i venstre shifterenhet 204 operativt operert, begynner med 16-bits shifteren hvor mengden av skift er størst. Hver gang sifferene i en utgang av minimum Verdivalgskretsen 203 bestemmes suksessivt fra det høyeste cifferet. På en mantissa M utføres en skiftbehandlingsbehandlingsprosess med en skiftmengde på 2 kbit som svarer til det således bestemte siffer. Som beskrevet ovenfor, er arrangementet i figurene 2 og 3 har den minsteverdiervelgerkretsen 203 for å bestemme en utgangsdata Z suksessivt fra det høyeste sifferet, og den multi-trinns venstre skifterenhet 204 har flere skiftere som skal suksesseres y betjenes, startende med shifteren hvor skiftebeløpet er størst. Dette gjør at venstre skiftbehandling på en mantissa M kan utføres ved høy hastighet. Minstelevdsvelgerkretsen 203 er av 8-bitarrangementet og venstre skifterenhet 204 er av 5-trinns arrangementet av venstre 2 kk 0 til 4 bit shifters, med antall biter av hver av mantissa M og eksponenten E for enkel presisjon tatt i betraktning. Imidlertid kan slike arrangementer hensiktsmessig endres i henhold til antall biter av hver av mantittene M og eksponenten E. I et annet operativt behandlingsapparat i figur 4 erstattes den minsteverdiervelgerkrets 203 vist i figur 2 med en sammenlignings - og valgkrets 401 En valgkrets 402 i figur 4 avviger fra valgkretsen 207 i fig. 2 ved at valgkretsen 402 er innrettet til å tilveiebringe en utgang E-LSA av subtraheringskretsen 206 når CR er lik 1 og 0 når CR er lik 0. Sammenligning og valg av c krets 401 er tilpasset å sammenligne i størrelse med hverandre to inngangsdata, dvs. en utgang LSA av den fremadrettede 1 detektorkretsen 202 og en eksponent E og for å levere, som en forskyvningsandel SH, utgangen LSA når LSA er mindre enn eksponenten E og en utgang E-1 til dekrementer 201 når utgangsenheten LSA ikke er mindre enn eksponenten E Også sammenlignings - og utvelgelseskretsen 401 er innrettet til å tilveiebringe et størrelsesrelasjonsdømmesignal CR som representerer hvilken LSA eller E er mindre Når LSA er mindre enn E, er SH lik LSA og CR er lik 1, og når LSA ikke er mindre enn E, er SH lik E-1 og CR er lik 0. I henhold til arrangementet i fig. 4 er sammenlignings - og valgkretsen 401 tilpasset for å bedømme om resultatet av en operativ behandling er et normalisert tall eller et denormalisert tall basert på det faktum hvorvidt en verdi oppnådd ved subtrahering av eksponenten E fra en utgang LSA av den fremadrettende En detekteringskrets 202 er negativ I motsetning til minimen omverdieringsvelgerkretsen 203 i fig. 2, kan sammenlignings - og valgkretsen 401 begynne å sammenligne i størrelsesorden to inngangsdata med hverandre før en utgang av dekrementer 201 bestemmes, slik at dommen kan gjøres ved en høyere hastighet. forskyvningsbeløp SH for mantissa M og en eksponent e av resultatet av en operativ behandling kan bestemmes slik at, basert på resultatet av dommen som derved blir gjort, enten å foreta en normalisering av prosessering eller en denormaliseringsbehandling, skal utføres. På denne tiden, venstre skifterenhet 204 blir ofte brukt til normalisering av prosessering og denormaliseringsbehandling. Sammenlignings - og valgkretsen 401 i fig. 4 har den funksjonen at første og andre 8-bits inngangsdata X, Y sammenlignes i størrelse med hverandre, derved å levere som utgangsdata Z, X når X er mindre enn Y og en tredje 8-bits inngangsdata S når X ikke er mindre enn Y, og at den logiske verdien av størrelsesforholdsdømmesignalutgangsterminalen Bi s er satt til 1 når X er mindre enn Y Som vist på fig. 5 har sammenlignings - og valgkretsen 401 en inngangskrets 411, en mellomstasjonskrets 412 og en utgangskrets 413 og er også anordnet som minimumsvalgskretsen 203, slik at størrelsesforholdet mellom de to inngangsdataene X, Y for hver av sifferene blir forplantet fra det høyeste siffer til det laveste sifferet, og dermed bestemmer utgangsdata Z med høy hastighet utgangen med det høyeste siffer. Arrangementet i FIGS 4 and 5 has the comparing and selecting circuit 401 for determining the output data Z successively from the highest digit, and the multi-stage left shifter unit 204 having a plurality of shifters to be successively operated, starting with the shifter in which the shift amount is the greatest This enables the left shift processing on a mantissa M to be executed at a higher speed The comparing and selecting circuit 401 is of the 8-bit arrangement and the left shifter unit 204 is of the 5-stage arrangemen t of left 2 k k 0 to 4 bit shifters, with the number of bits of each of the mantissa M and the exponent E for single precision taken into consideration However, such arrangements may be suitably changed according to the number of bits of each of the mantissa M and the exponent E. In a third operational processing apparatus shown in FIG 6, a decrementer 201, an advancing 1 detecting circuit 202 and a mantissa result register 205, a first selecting circuit 207 and an exponent result register 208 respectively have the same functions as those of the component elements designated by the same reference numerals in FIG 2 In FIG 6, there are also disposed a subtracting circuit 601, a second selecting circuit 602 and a left shifter 603.The subtracting circuit 601 is adapted to supply, as a result of subtraction, a value obtained by subtracting an output LSA of the advancing 1 detecting circuit 202 from an exponent E, and also to supply a magnitude-relation judging signal Ib representing whether or not E is equal to or smaller than LSA When E is not greater than LSA, Ib is equal to 1, and when E is greater than LSA, Ib is equal to 0 The first selecting circuit 207 is adapted to supply, as an exponent e of the result of an operational processing, 0 when Ib is equal to 1, and an output E-LSA of the subtracting circuit 601 when Ib is equal to 0 The second selecting circuit 602 is adapted to supply, as a shift amount SH, an output E-1 of the decrementer 201 when Ib is equal to 1, and an output LSA of the advancing 1 detecting circuit 202 when Ib is equal to 0 The left shifter 603 is adapted to supply, as a mantissa m of the result of an operational processing, a value obtained by executing, on a mantissa M, a left shift processing having a shift amount specified by an output SH of the second selecting circuit 602 The inside arrangement of the left shifter 603 is not limited to the multi-stage arrangement of the left shifter unit 204 in FIG 2.The subtracting circuit 601 in FIG 6 ha s the both functions of the subtracting circuit 206 and the minimum value selecting circuit 203 shown in FIG 2 More specifically, the subtracting circuit 601 is adapted to supply a subtraction result E-LSA to be subjected to the correction of an exponent E, and to judge whether the result of an operational processing is a normalized number or a denormalized number, based on the fact whether or not a value obtained by subtracting LSA from E is equal to or smaller than 0 Then, the shift amount SH of the mantissa M and an exponent e of the result of an operational processing can be determined such that, based on the judgment thus made, either a normalize processing or a denormalize processing is to be executed At this time, the left shifter 601 is commonly used for the normalize processing and the denormalize processing. The subtracting circuit 601 in FIG 6 has the function that a subtraction result X-Y of two 8-bit input data X, Y is set as an output data Z, and that the logical value of the magnitude-relation judging signal Ib is set to 1 when X is not greater than Y As shown in FIG 7, the subtracting circuit 601 has an input circuit 611, an intermediate circuit 612 and an output circuit 613, and is arranged such that the magnitude relation of the two input data X, Y for each of the digits is propagated from the lowest digit to the highest digit, thus determining the output data Z. When the respective bits of the input and output data X, Y, Z are set as Xi, Yi, Zi i 0 to 7 , the input circuit 611 forms a digit borrow generating signal Igi and a digit borrow propagating signal Ipi for each digit As widely known, the digit borrow generating signal Igi is a signal for executing subtraction, which is formed such that Igi 1 represents that, in an operation of Xi-Yi as to the ith digit, digit borrowing has taken place from the i 1 th digit However, Igi 1 also represents that Xi is not greater than Yi As widely known, the digit borrow propagating signal Ipi is another signal for executing subtraction, which is formed for judging that, in an operation of Xi-Yi, when digit borrowing has taken place from the ith digit to the i-1 th digit and if Ipi is equal to 1, digit borrowing has taken place from the i 1 th digit However, since digit borrowing from the i 1 th digit takes place due to the digit borrowing which has taken place on the i-1 th digit, Ipi 1 also represents that Xi is equal to Yi. Based on the outputs Igi and Ipi of the input circuit 611, the intermediate circuit 611 forms a digit borrow generating signal Igjk and a digit borrow propagating signal Ipjk for the digits from the kth digit to the j th digit k is smaller than j For example, the digit borrow generating signal Ig32 from the second digit to the third digit is a signal for executing subtraction, which is formed such that Ig32 1, represents that, in an operation of two bits of X3X2-Y3Y2, digit borrowing from the fourth digit has taken place However, Ig32 1 also represents the magnitude rela tion of two bits that X3X2 is not greater than Y3Y2 On the other hand, the digit borrow propagating signal Ip32 from the second digit to the third digit is another signal for executing subtraction, which is formed for judging that, in an operation of X3X2-Y3Y2, when digit borrowing has taken place from the second digit to two bits of first and zeroth digits and if Ip32 1 is equal to 1, digit borrowing has taken place from the fourth digit Since digit borrowing from fourth digit takes place due to the digit borrowing which has taken place on the first or zeroth digit, Ip32 1 also represents the equivalence relationship of two bits that X3X2 is equal to Y3Y2 The digit borrow generating signals Igi, Igjk and the digit borrow propagating signals Ipi, Ipjk are propagated from the lowest digit to the highest digit. When the digit borrow generating signal Igi0 for the digits from the lowest digit the zeroth digit to each digit the ith digit is obtained, the output circuit 613 generates Zi, for each digit, based on Ipi and Ig i-1 0 However, Z1 is generated based on Ip1 and Ig0 Since no digit is borrowed from the lowest digit, Z0 is determined based on Ip0 only. When at least one of a digit borrow generating signal Ig70 and a digit borrow propagating signal Ip70 for the digits from the zeroth digit to the 7th digit, is 1, this represents that X is not greater than Y More specifically, the magnitude-relation judging signal Ib can be expressed by the following equation. However, the following equations are established. Accordingly, the following equation is then established EQU1.In the output circuit 613 in FIG 7, the magnitude-relation judging signal Ib is generated with the use of the relation of the equation 6.Generally, it is easy to judge whether or not a subtraction result is negative in a subtracting circuit for executing subtraction of X-Y That is, it is enough to judge whether or not a digit is borrowed from the highest digit However, it is difficult to judge whether or n ot a subtraction result is not greater than 0 That is, it is difficult to judge whether or not a subtraction result is equal to 0 In this connection, it may be considered to add a circuit for making sure that all the bits of a subtraction result are 0 or for making sure that X-Y is not negative and X-Y-1 is negative This may increase the amount of hardware of the subtracting circuit In the subtracting circuit 601 in FIG 7, however, most of the hardware is commonly used for the calculation of the output data Z and the generation of the magnitude-relation judging signal Ib representing that X is not greater than Y X-Y is not greater than 0 It is therefore possible to reduce the amount of the hardware. In a fourth operational processing apparatus in FIG 8, the decrementer 201 in FIG 7 is removed but a right 1-bit shifter 604 is interposed between a left shifter 603 and a mantissa result register 205 The left shifter 603 and the right 1-bit shifter 604 form a bidirectional shifter 605.A sec ond selecting circuit 602 is adapted to supply, as a shift amount SH, an exponent E when Ib is equal to 1, and an output LSA of the advancing 1 detecting circuit 202 when Ib is equal to 0, the exponent E and the output LSA being supplied to the left shifter 603 The right 1-bit shifter 604 is adapted to supply, as a mantissa m of the result of an operational processing, a value obtained by executing a right 1-bit shift processing on an output of the left shifter 603 when Ib is equal to 1, and the output itself of the left shifter 603 when Ib is equal to 0.According to the arrangement in FIG 8, when the subtracting circuit 601 having the inside arrangement shown in FIG 7 makes a judgment that the result of an operational processing is a denormalized number Ib 1 the shift amount SH to be given to the left shifter 603 is set to E and a shift operation of the right 1-bit shifter 604 is started As a result, there is executed, on a mantissa M, a left shift processing having a desired shift am ount E-1 On the other hand, when it is judged that the result of the operational processing is a normalized number Ib 0 , the shift amount SH to be given to the left shifter 603 is set to LSA and a shift operation of the right 1-bit shifter 604 is stopped As a result, there is executed, on a mantissa M, a left shift processing having a desired shift amount LSA More specifically, according to the arrangement in FIG 8, the provision of the right 1-bit shifter 604 eliminates the decrementer 201 in FIG 6, thus simplifying the arrangement of the operational processing apparatus The method of determining an exponent e of the result of an operational processing is similar to that shown in FIG 6.In the embodiment in FIG 8, the right 1-bit shifter 604 is disposed at the output side of the left shifter 603, but the right 1-bit shifter 604 may be disposed at the input side of the left shifter 603.This page is translated from the original by using the Google translator. IEEE 754 - Standard binary a rithmetic float. Author Yashkardin Vladimir 10 2 1,55625 exp 10 2 Number 1,55625 exp 10 2 consists of two parts a mantissa M 1 55625 and the exponent exp 10 2 If the mantissa is in the range 1 -2.3 2 Submission of a denormalized exponential form. Take, for example, the decimal number 155,625 Imagine the number of denormalized exponential way 0,155625 10 3 0,155625 exp 10 3 Number 0,155625 exp 10 3 consists of two parts a mantissa M 0,155625 and exponent exp 10 3 If the mantissa is in the range 0,1 -3.3 3 Converting decimal to binary floating-point number. Our problem is reduced to a decimal floating point numbers in binary floating-point number in exponential normalized form To do this we expand the given number of binary digits.155,625 1 2 7 0 2 6 0 2 5 1 2 4 1 2 3 0 2 2 1 2 1 1 2 0 1 2 -1 0 2 -2 1 2 -3 155,625 128 0 0 16 8 0 2 1 0 5 0 0 125 155,625 10 10011011,101 2 - the number of decimal and binary floating-point. Let the resulting number to the normalized form in decimal and binary sy stem 1,55625 exp 10 2 1,0011011101 exp 2 111.As a result, we have the main components of the normalized exponential of binary numbers Mantissa M 1 0011011101 Exponent exp 2 111. 4 Description converting numbers of IEEE 754.4 1 The transformation of a normalized binary numbers in 32 bit format IEEE 754.The main application in technology and programming formats were 32 and 64 bits For example, in VB using the data types single 32 bit and double 64 bits Consider the transformation of the binary number 10011011 101 format single-precision 32 bit IEEE Standard 754 Other formats of the numbers in IEEE 754 is an enlarged copy of the single-precision. To provide the number in the format single-precision IEEE 754 should bring it to the binary normalized form In 3, we have done this conversion on the number 155 625 Now consider, as a normalized binary number is converted to a 32-bit format IEEE 754.Description of the transformation in 32-bit format IEEE 754.Number can be or - Therefore play a bit to designate the sign of 0-positive 1-negative This most significant bit to 32 bit sequence. Then go exponent bits, this allocates 1 byte 8 bits Exhibitor may be, as the number, with the sign or - To determine the sign of the exponent, not to introduce yet another sign bit, add the offset to the exponent in half byte 127 0111 1111 That is, if our exhibit 7 111 in binary , then shifted exponent 7 127 134 And if our exhibitors was -7, then offset Booths 127-7 120 Biased exponent is written in the allotted 8 bits However, when we will need to obtain an exponential binary numbers, we simply subtract 127 from this byte. The remaining 23 bits set aside for the mantissa However, the normalized binary mantissa first bit is always 1, since the number is in the range 1 The table shows the decimal number 155 625 in the 32-bit format IEEE754.001 1011 1010 0000 0000 0000.2 971 1,99584e 292.From the above, given that the bulk of the numbers in IEEE754 format has a stable small relative error The maxi mum possible relative error for the number is Single 2 -23 100 11,920928955078125e-6 The maximum possible relative error for the number of Double 2 -52 100 2,2204460492503130808472633361816e-14.7 5 General information for the number of single and double precision IEEE standard 754.Table 3 Information about the format 32 64 bit in the standard ANSI IEEE Std 754-1985.length number, bit. offset the exponential E , bits. the remainder of the mantissa M , bits. denormalized binary number. normalized binary number. denormalized number of decimal. F -1 S 2 E -126 M 2 23.F -1 S 2 E -1022 M 2 52.normalized number of decimal. F -1 S 2 E-127 1 M 2 23.F -1 S 2 E-1023 1 M 2 52.Abs max error number. Rel max error denorms number. Rel max error norms number. 2 -149 1,40129846 e -45. 2 -1074 4,94065646 e -324. 2 127 2-2 -23 3,40282347 e 38. 2 1023 2-2 -52 1,79769313 e 308. 8 Rounding numbers in standard IEEE 754.In presenting the floating-point numbers in IEEE Standard 754 have often rounded numbers The standard provides four ways to rounding of numbers. Ways to rounding of numbers of IEEE 754.Rounding tending to the nearest integer. Rounding tends to zero. Rounding tends to. Rounding tends to. Table 3 Examples of rounding to one decimal. to the nearest integer. How is rounding shown in the examples in Table 3 When you convert a number to choose one of the ways of rounding By default, this is the first way, rounding to the nearest integer Often in different devices using the second method - rounded to zero When rounding to zero, simply discard meaningless level numbers, so this is the easiest one in the hardware implementation. 9 Computing problems caused by using the standard IEEE754.IEEE 754 standard is widely used in engineering and programming Most modern microprocessors are manufactured with hardware realization of representations of real variables in the format of IEEE754 Programming language and the programmer can not change this situation, a repose of a real number in the microprocessor does not exist When creating the standard IEEE754-1985 representation of a real variable in the form of 4 or 8 bytes seem very large value, since the amount of RAM MS-DOS was equal to 1 MB A program in this system could be used only 0 64 MB For modern operating systems the size of 8 bytes is null and void, nevertheless the variables in most microprocessors continue to be in the format IEEE754-1985.Consider the error computing, caused by the use of numbers in the format of IEEE754.9 1 Errors associated with accuracy of representation of real numbers in the format of IEEE754 A dangerous reduction. This error is always pre sent in computer calculations The reason for its occurrence is described in paragraph 7 4 -6 for double 10 -14 The absolute errors can be significant, as for single 10 31 and for double 10 292,that may cause problems with calculations. If the sample count on the paper, the answer is 1 Absolute error is 7 Why get the wrong answer Number 123456789 in the single 4CEB79A3hex ieee 123456792 dec absolute error reporting is 3 Number 123456788 in the single 4CEB79A2hex ieee 123456784 dec absolute error reporting is -4 Relative error in the initial numbers of approximately 3,24 e-6 As a result, one operation relative error of the result was 800 , ie increased by 2,5 e 8 times This is what I call A dangerous reduction ie catastrophic decrease of accuracy in the operation where the absolute value of the result is much smaller than any of the input variables. In fact, the error precision of the representation of the most innocuous in computer calculations, and usually many programmers are not payin g any attention Nevertheless, they you can be very frustrating.9 2 Errors associated with improper coercion of types of data Wild error. These errors are caused by the fact that the original number submitted in the format of single and double in a format not usually equal to each other For example the original number 123456789,123456789 Single 4CEB79A3 123456792,0 dec Double 419D6F34547E6B75 123456789,12345679104328155517578125 The difference between Single and Double amount 2,87654320895671844482421875.Here is an example for VB Relative error of the result is gt end lt boby gt lt html gt Enter a number 2 2250738585072011e-308 caused a hang of the process with nearly 100 load CPU Other numbers from this range of problems not caused 2 2250738585072009e-308, 2 2250738585072010e-308, 2 2250738585072012e-308 Report a bug received 30 12 2010, 10 01 2011 fixed by the developer Since PHP is a preprocessor is used by most servers, then any user network within 10 days, was able to close any host How to write the developers that the bug only works in 32-bit systems, but if you increase the accuracy of the boundary, then I think that the 64-bit systems, too, hang not verified The reason for the panic is clear any user, at a certain level of diligence and knowledge, had the opportunity to cut down most of the information resources of the planet within ten days I would not like - would result in more examples of such numbers and such errors. 10 The final part. From the above it is clear that the view that the floating-point result is not beyond the relative error in reporting the greatest number is false Errors listed in Item 9 are added together Such errors as dirty and dangerous zero reduction can make calculation errors unacceptable Particular attention in the programming of computer calculations the programmer should be paid to the results close to zero. Some experts believe that the format of numbers represents a threat to humanity You can read about it in the article IEEE754-tick threatens mankind Although many of the facts in this article over-dramatized, and possibly misinterpreted, but the problem is computing correctly reflected philosophically. I m not a dramatization of the calculations on the standard IEEE754 Standard operating since 1985 and fully entered into the standard IEEE754-2008, which broadened the accuracy of calculations However, the problem of reliability computing today is very urgent, and the standa rd IEEE754-2008 and ISO recommendations have not solved this problem I think in this area needed an innovative idea that developers Standard IEEE754-2008 unfortunately do not possess. Innovative ideas usually come from The main innovative ideas in our world were made by amateurs like-minded people not for money A striking example of this situation was the invention of the phone When a school teacher Alexander Graham Bell Alexander Graham Bell came up with a patent for an invention of the telephone to the president of telecommunications company Western Union Company, which is owned by the transatlantic cable connection with an offer to buy his patent for the invention of the telephone, he was not expelled - no The president of that company offered to consider this question the advice of experts in the field of telegraphy, consisting of specialists and scholars in the field of telecommunications Experts gave their opinion that this invention is useless in the field of telecommunications a nd it is futile Some experts have even written a report that it tsirkachestvo and charlatanism nbsp nbsp Alexander Graham Bell, along with his father in law, decided independently to promote his invention After about 10 years, the telecommunications giant Western Union Co was virtually eliminated phone business from the sphere of telecommunication technologies Today you can see in many Russian cities windows that says Western Union, this company which is engaged in transferring money around the world, and once she was the international telecommunications giant We can conclude opinions of experts in innovative technologies are useless If you think that since the invention of the telephone 1877 in people s minds that something has changed, you re wrong. If scientists who are inventing new and professionals who know how to use the well-known can not solve the problem, you need innovation. Links to new ideas in the field of representation of real numbers in hardware 1 Approksimetika 2 If you know of other innovative ideas in the field of representations of real numbers, then we will be happy to get links to these sources. I would suggest to represent real numbers as fixed-point To view the full range of numbers Double enough to have a variable consisting of 1075 bits integer part and 1075 bits of fractional part, ie about 270 bytes per variable In this case, all numbers will be presented with the same absolute accuracy You can work with numbers in the entire range the real axis, that is, it becomes possible to summarize large numbers of small numbers Step numbers on the real axis is uniform, that is the real axis is linear The data type will be only one, ie do not need the whole, real and other types Here the problem is the realization of registers of microprocessors dimension of 270 bytes, but it s not a problem for modern technology. To write p 9 I had to create a program that represents a number as a variable to a fixed point, long 1075 1075 bytes Where the number can be represented as a string of characters ASCII, ie one symbol equals one digits Just had to write all the arithmetic operations with strings ASCII This program is similar to a paper calculation Since mathematical ability microprocessor in it are not used, she said slowly Why I did it I could not find a program that could accurately represent the number of IEEE754 format, in decimal form I also did not find the program although they certainly have what no doubt where you can enter in box 1075 of significant decimal digits. Here for example just the decimal value of the number of double 7FEFFFFFFFFFFFFF 17976931348623157081452742373170435679807056752584499659891747680315726078002853876058955 863276687817154045895351438246423432132688946418276846754670353751698604991057655128207624 549009038932894407586850845513394230458323690322294816580855933212334827479782620414472316 8738177180919299881250404026184124858368,0.You can use the IEEE754 v 1 0 nbsp to study and evaluate the errors when workin g with real numbers given in the format of IEEE754.References 1 IEEE Standard for Binary Floating-Point Arithmetic Copyright 1985 by The Institute of Electrical and Electronics Engineers, Inc 345 East 47th Street, New York, NY 10017, USA. Acknowledgments Sitkarevu For assistance in creating an article. Archive of reviews with comments nbsp View nbsp nbsp Send us feedback on the e-mail. Floating Point Representation Basics. There are posts on representation of floating point format The objective of this article is to provide a brief introduction to floating point format. The following description explains terminology and primary details of IEEE 754 binary floating point representation The discussion confines to single and double precision formats. Usually, a real number in binary will be represented in the following format. Where I m and F n will be either 0 or 1 of integer and fraction parts respectively. A finite number can also represented by four integers components, a sign s , a base b , a significand m , and an exponent e Then the numerical value of the number is evaluated as. -1 sxmxbe Hvor m b. Avhengig av base og antall biter som brukes til å kode for forskjellige komponenter, definerer IEEE 754-standarden fem grunnleggende formater. Blant de fem formatene er binary32 og binary64 formater henholdsvis enkelt presisjon og dobbel presisjonsformater hvor basen er 2.Table 1 Precision Representation. Single Precision Format. Som nevnt i tabell 1 har det enkle presisjonsformatet 23 bits for significand 1 representerer implisitt bit, detaljer nedenfor, 8 bits for eksponent og 1 bit for sign. For eksempel rasjonelt tall 9 2 kan konverteres til enkeltpresjonsflotformat som følger. Resultatet sies å bli normalisert hvis det er representert med ledende 1 bit, det vil si 1 001 2 x 2 2 Tilsvarende når tallet 0 000000001101 2 x 2 3 er normalisert, det ser ut som 1 101 2 x 2 -6 Utelatelse av denne underforståtte 1 på venstre ekstreme gir oss mantissen av float nummer Et normalisert tall gir mer nøyaktighet enn tilsvarende de-normalisert nummer Den underforståtte mest signifikante bit kan brukes til å representere enda mer nøyaktig significand 23 1 24 bits som kalles subnorm representasjon. Flytpunktstallene skal representeres i normalisert form. De subnormale tallene faller inn i kategorien av normaliserte tall. Den subnormale representasjonen reduserer eksponentområdet og kan ikke normaliseres siden det ville resultere i en eksponent som ikke passer inn i feltet. Subnormale tall er mindre nøyaktige, dvs. de har mindre plass til ikke-nullbiter i brøkfeltet enn normaliserte tall. Faktisk faller nøyaktigheten som størrelsen på subnormal reduksjon av antall Imidlertid er den subnormale representasjonen nyttig ved arkivering av hull i flytende punkts skala nær null. Med andre ord kan resultatet ovenfor skrives som -1 0 x 1 001 2 x 2 2 som gir integerkomponentene som s 0, b 2, significand m 1 001, mantissa 001 og e 2 Det korresponderende enkelt-presisjon flytende tallet kan representeres i binær som vist nedenfor. Hvor eksponentfeltet skal være 2 , yet encoded as 129 127 2 called biased exponent The exponent field is in plain binary format which also represents negative exponents with an encoding like sign magnitude, 1 s compliment, 2 s complement, etc The biased exponent is used for representation of negative exponents The biased exponent has advantages over other negative representations in performing bitwise comparing of two floating point numbers for equality. A bias of 2 n-1 1 , where n is of bits used in exponent, is added to the exponent e to get biased exponent E So, the biased exponent E of single precision number can be obtained as. The range of exponent in single precision format is -126 to 127 Other values are used for special symbols. Note When we unpack a floating point number the exponent obtained is biased exponent Subtracting 127 from the biased exponent we can extract unbiased exponent. The following figure represents floating point scale. Double Precision Format. As mentioned in Table 1 the double precision format har 52 biter for significand 1 representerer implisitt bit, 10 biter for eksponent og 1 bit for sign Alle andre definisjoner er de samme for dobbelt presisjonsformat, unntatt størrelsen på ulike komponenter. Den minste endringen som kan representeres i flytpunktsrepresentasjon kalles som presisjon Fraksjonaldelen av et presisjonsnormalisert nummer har nøyaktig 23 biter oppløsning, 24 biter med den underforståtte biten Dette tilsvarer loggen 10 2 23 6 924 7 Karakteristikken for logaritmenes desimalfigurer for nøyaktighet Tilsvarende, i tilfelle med dobbelte presisjonstall presisjonen er logg 10 2 52 15 654 16 desimale sifre. Sikkerhet i flytpunktsrepresentasjon styres av antall significand bits, mens rekkevidde er begrenset av eksponent. Ikke alle reelle tall kan nøyaktig representeres i flytpunktsformat For ethvert nummer som ikke er flytende punktnummer, det er to alternativer for flytpunkts-tilnærming, si nærmeste flytpunktsnummer mindre enn x som x og nærmeste floati ng punktnummer større enn x som x En avrundingsoperasjon utføres på antall signifikante biter i mantissa-feltet basert på valgt modus. Den runde nedmodusen forårsaker x sett til x, og den runde opp modusen gir x sett til x, runden mot nullmodus forårsaker x er enten x eller x avhengig av hvilken som er mellom null og runden til nærmeste modus sett x til x eller x som er nærmest x Vanligvis rundt til nærmeste er mest brukt modus Nærheten til flytpunktsrepresentasjon til den faktiske verdien kalles som nøyaktighet. Spesielle bitmønstre. Standarden definerer noen spesielle flytende punktbitmønstre. Null kan ikke ha mest signifikante 1 bit, derfor kan ikke normaliseres. Den skjulte bitrepresentasjonen krever en spesiell teknikk for lagring av null. Vi vil ha to forskjellige bitmønstre 0 og -0 for samme numeriske verdi null For enkel presisjon flytpunktsrepresentasjon, er disse mønstrene gitt under.0 00000000 00000000000000000000000 0.1 00000000 00000000000000000000000 -0.Similarly, standarden represents two different bit patters for INF and - INF The same are given below.0 11111111 00000000000000000000000 INF.1 11111111 00000000000000000000000 - INF. All of these special numbers, as well as other special numbers below are subnormal numbers, represented through the use of a special bit pattern in the exponent field This slightly reduces the exponent range, but this is quite acceptable since the range is so large. An attempt to compute expressions like 0 x INF, 0 INF, etc make no mathematical sense The standard calls the result of such expressions as Not a Number NaN Any subsequent expression with NaN yields NaN The representation of NaN has non-zero significand and all 1s in the exponent field These are shown below for single precision format x is don t care bits. x 11111111 1 m 0000000000000000000000.Where m can be 0 or 1 This gives us two different representations of NaN.0 11111111 110000000000000000000000 Signaling NaN SNaN.0 11111111 100000000000000000000000 Quiet NaN QNaN. U sually QNaN and SNaN are used for error handling QNaN do not raise any exceptions as they propagate through most operations Whereas SNaN are which when consumed by most operations will raise an invalid exception. Overflow and Underflow. Overflow is said to occur when the true result of an arithmetic operation is finite but larger in magnitude than the largest floating point number which can be stored using the given precision Underflow is said to occur when the true result of an arithmetic operation is smaller in magnitude infinitesimal than the smallest normalized floating point number which can be stored Overflow can t be ignored in calculations whereas underflow can effectively be replaced by zero. The IEEE 754 standard defines a binary floating point format The architecture details are left to the hardware manufacturers The storage order of individual bytes in binary floating point number varies from architecture to architecture. Thanks to Venki for writing the above article Please wri te comments if you find anything incorrect, or you want to share more information about the topic discussed above. Method for eletronically representing a number, adder circuit and computer system US 5923575 A. The invention relates to a method for electronically representing a number V in a binary data word Both the exponent and the mantissa are represented as 2 complement The mantissa is normalized to 0 1 F if the number V is positive where F is the fraction of the mantissa In case that the number V is negative the fraction F is normalized to 10 F Usage of this format allows to design an improved adder which requires less hardware. 11.1 A method for electronically representing a number V in a binary data word, the data word having a set of exponent bits E and having a set of mantissa bits M, the method comprising the steps of. representing the exponent bits E in 2 complement form and. representing the mantissa bits M in 2 complement form whereby. in case that the number V is positive, a fraction F of the mantissa bits M of the number V is normalized to a 01 F form and the exponent bits E are adapted by shifting the number V a number of times and adding the number shifts to the exponent bits E of the number V and. in case that the number V is negative, the fraction F of the mantissa bits M is normalized to a 10 F form and the exponent bits E are adapted by converting the number V into a 2 complement form, shifting the number V a number of times, and adding the number of shifts to the exponent bits E of the number V and. dropping the leading mantissa bit to form a binary word including the resulting exponent bits E and mantissa bits.2 The method according to claim 1.whereby one of the mantissa bits M is a sign bit and the remaining sub-set of bits is the fraction F so that the number V equals. in case that the sign bit indicates that the number V is positive. in case that the sign bit indicates that the number V is negative. a number of computing units and. an inverse log converter. wherein the input log converter is adapted to convert input data words into a log domain and to shift log converted input data words into the data pipeline. wherein the data pipeline is coupled to the computing units, so that when a data word is shifted through the data pipelines consecutive computing units receive the data word as an input. wherein each computing unit has an output coupled to the inverse log converter to perform a conversion back from the log domain to obtain a result and. wherein an input data word V is electronically represented in the log domain in a binary data word, the data word having a set of exponent bi ts E and having a set of mantissa bits M, the exponent bits E being represented in 2 complement form and the mantissa bits M being represented in 2 complement form whereby. in case that the number V is positive, a fraction F of the mantissa bits M Of the number V is normalized to 01 F form and the exponent bits E are adapted by shifting the number V a number of times and adding the number shifts to the exponent bits E of the number V and. in case that the number V is negative, the fraction F of the mantissa bits M is normalized to a 10 F form and the exponent bits E are adapted by converting the number V into a 2 complement form, shifting the number V a number of times, and adding the number of shifts to the exponent bits E of the number V and. dropping the leading mantissa bit to form a binary word including the resulting exponent bits E and mantissa bits.11 A computer system comprising. an input log converter. a data pipeline. a number of computing units, each computing unit having an adde r for adding a first number M A and a second number M B , the first and second numbers being normalized to have either a leading 01 or a leading 10 in a binary representation, wherein the adder circuit comprises. a an adder block for adding the first number M A and the second number M B to obtain a result. b a leading msb detector coupled to an output of the adder block to detect a sequence of leading 0 or 1 bits in the result, the sequence having a length L and. c a barrel shifter to shift the result for a number of L-1 shifts to the left in order to normalize the result and. an inverse log converter. wherein the input log converter is adapted to convert input data words into a log domain and to shift log converted input data words into the data pipeline. wherein the data pipeline is coupled to the computing units, so that when a data word is shifted through the data pipelines consecutive computing units receive the data word as an input. wherein each computing unit has an output coupled to the inverse log converter to perform a conversion back from the log domain to obtain a result. The present invention is related to the following inventions which are assigned to the same assignee as the present invention. 1 Computer Processor Utilizing Logarithmic Conversion and Method of Use thereof, having Ser No 08 430,158, filed on Mar 13, 1995, now U S Pat No 3,597,670. 2 Exponentiator Circuit Utilizing Shift Register and Method of Using Same , having Ser No 08 401,515, filed on Mar 10, 1995, now U S Pat No 5,553,012. 3 Accumulator Circuit and Method of Use Thereof , having Ser No 08 455,927, filed on May 31, 1995, now U S Pat No 5,644,520. 4 Logarithm Inverse-Logarithm Converter and Method of Using Same , having Ser No 08 381,368, filed on Jan 31, 1995, now U S Pat No 5,642,305. 5 Logarithm Inverse-Logarithm Converter Utilizing Second Order Term and Method of Using Same , having Ser No 08 382,467, filed on Jan 31, 1995, now U S Pat No 5,703,801. 6 Logarithm Inverse-Logarithm Converter Utilizing Linear Interpolation and Method of Using Same , having Ser No 08 391,880, filed on Feb 22, 1995, now U S Pat No 5,600,581. 7 Logarithm Inverse-Logarithm Converter Utilizing a Truncated Taylor Series and Method of Use Thereof , having Ser No 08 381,167, filed on Jan 31, 1995, now U S Pat No 5,604,691. 8 Logarithm Converter Utilizing Offset and Method of Use Thereof , having Ser No 08 508,365, filed on Jul 28, 1995, now U S Pat No 5,629,884. 9 Method and System for performing a convolution operation , having Ser No 08 535,800, filed on Sep 28, 1995.TECHNICAL FIELD OF THE INVENTION. The present invention relates generally to computing and digital signal processing and, in particular, to techniques for electronically representing a number. BACKGROUND OF THE INVENTION. For the purposes of computing and digital signal processing, in particular for telecommunication, it is known in the art to represent numbers as binary data words Such a binary data word typically is representative of some real world value In the case of digital signal processing such a binary data word typically represents a sampled value of some real process like sampled speech or video data. To represent a number in a binary data word for the purposes of computing or digital signal processing a number of approaches are commonly used in the prior art Integer numbers are usually represented in 2 complement In the 2 complement form the most significant bit holds th e sign if the data word is not declared to be an unsigned integer value The 2 complement of a binary number is found by reversing all the digits of the number and then adding one For example, the 2 complement of 0001 is 1110 1 1111 In mathematical terms the 2 complement x of a number x is. Where both x and x are represented as a binary number with k digits. The most popular representation for floating--point numbers is the format according to ANSI IEEE standard 754-1985 which has been implemented by nearly all floating-point chip sets including Intel s 8087 287 387, Motorola s 68881 as well as chip sets from AMD The IEEE standard is therefore universal in microcomputers that accept those chips, including the IBM PC. The way a number is electronically represented for computing purposes is highly influential on the performance of the computing or digital signal processing system which process such a number and therefore on the expense in terms of hardware to obtain a given computing through put. By definition, digital signal processing is connected with the representation of signals by sequences of numbers or symbols and the processing of these signals DSP has a wide variety of applications and its importance is evident in such fields as pattern recognition, radio communications, telecommunications, radar, biomedical engineering, and many others. At the heart of every DSP system is a computer processor that performs mathematical operations on signals Generally, signals received by a DSP system are first converted to a digital format used by the computer processor Then the computer processor executes a series of mathematical operations on the digitized signal The purpose of these operations can be to estimate characteristic parameters of the signal or to transform the signal into a form that is in some sense more desirable Such operations typically implement complicated mathematics and entail intensive numerical processing Examples of mathematical operations that may be perf ormed in DSP systems include matrix multiplication, matrix-inversion, Fast Fourier Transforms FFT , auto and cross correlation, Discrete Cosine Transforms DCT , polynomial equations, and difference equations in general, such as those used to approximate Infinite Impulse Response IIR and Finite Impulse Response FIR filtersputer processors vary considerably in design and function One aspect of a processor design is its architecture Generally, the term computer architecture refers to the instruction set and organization of a processor An instruction set is a group of programmer-visible instructions used to program the processor The organization of a processor, on the other hand, refers to its overall structure and composition of computational resources, for example, the bus structure, memory arrangement, and number of processing elements. In a computer, a number of different organizational techniques can be used for increasing execution speed One technique is execution overlap. Execution ov erlap is based on the notion of operating a computer like an assembly line with an unending series of operations in various stages of completion Execution overlap allows these operations to be overlapped and executed simultaneously. One commonly used form of execution overlap is pipelining In a computer, pipelining is an implementation technique that allows a sequence of the same operations to be performed on different arguments Computation to be done for a specific instruction is broken into smaller pieces, i e operations, each of which takes a fraction of the time needed to complete the entire instruction Each of these pieces is called a pipe stage The stages are connected in a sequence to form a pipeline--arguments of the instruction enter at one end, are processed through the stages, and exit at the other end. These are many different architectures, ranging from complex-instruction-set-computer CISC to reduced-instruction-set-computer RISC based architectures In addition, some archit ectures have only one processing element, while others include two or more processing elements Despite differences in architectures, all computer processors have a common goal, which is to provide the highest performance at the lowest cost However, the performance of a computer processor is highly dependent on the problem to which the processor is applied, and few, if any, low-cost computer processors are capable of performing the mathematical operations listed above at speeds required for some of today s more demanding applications For example, MPEG data compression of an NTSC television signal can only be performed using expensive supercomputers or special purpose hardware. Many other applications, such as matrix transformations in real-time graphics, require data throughput rates that exceed the capabilities of inexpensive, single processors, such as micro processors and commercially available DSP chips Instead, these applications require the use of costly, multiprocessor or multiple - processor computers Although multiprocessor computers typically have higher throughput rates, they also include complex instruction sets and are generally difficult to program. Therefore there is a need to provide for an improved method for electronically representing a number in a binary data word, an improved adder circuit and microprocessor incorporating such an adder circuit and an improved computer system. SUMMARY OF THE INVENTION. The invention is pointed out with particularity in the appended claims Preferred embodiments of the invention are given in the dependent claims. The invention is advantageous in that it allows to represent both the exponent and the mantissa of a number in 2 complement form This is made possible by normalizing the mantissa differently depending on whether the number to be represented is positive or negative Such normalizations can be carried out with minimal hardware expense by performing shift operations. In case that the number to be represented is 0 the i nvention allows to encode the value of 0 in the exponent For this purpose a predefined value of the exponent bits indicates that the number equals 0 This predefined value can be for example a leading 1 with a sequence of zeros If the exponent has a width of 4 bits, the value of zero would be represented by 1000 whereby the mantissa is don t care --in the example considered here. Further the method for electronically representing a number is advantageous in that it allows to add two numbers represented in such a way more efficiently with less hardware expense Due to the representation of the mantissa in 2 complement it is not necessary to compare the mantissas of the two numbers to be added before the calculation is carried in contrast to the above referenced IEEE standard. Moreover the mantissas are always added and not subtracted also if they represent negative numbers This is also due to the 2 complement presentation An additional advantage is that no sign logic is needed As a conseque nce a micro processor which uses the teaching of the invention can more efficiently perform summations and therefore have a higher computing throughput If a computer program is to be carried out by the micro processor this has the effect that it can be carried out at a higher processing speed In the case that the computer program is a digital signal processing application this has the effect that the microprocessor can deal with a higher sampling rate. In digital signal processing like finite or infinite impulse response filtering typically a large number of multiplications has to be carried out If the two operands to be multiplied are converted into the log domain the multiplication becomes a summation The result is obtained by converting the sum back into the normal domain A computer system of such a type is disclosed in above-identified related inventions number 1 Ser No 08 430,158 and number 9 Ser No 08 535,800 Implementation options for such a computer system are also described in various of the copending applications or patents 2 to 8.Such a computer system operating in the log domain consists of a number of computing units which comprise an adder in order to perform the multiplication s in the log domain If a number is represented according to the invention in such a computer system this allows to safe hardware for the adders, improve the operational speed and at the same time save precious silicon floor space Also power can be saved since the design of the adders is more compact. BRIEF DESCRIPTION OF THE DRAWINGS. The invention will become more apparent and will be best understood by referring to the following detailed description of a preferred embodiment in conjunction with the accompanying drawings in which. FIG 1 is a flow chart illustrating a preferred embodiment of the method for electronically representing a number of the present invention. FIG 2 is a flow chart of a preferred embodiment of the method for adding two numbers according to the present inventi on. FIG 3 shows a block diagram of a preferred embodiment of an adder according to the invention. FIG 4 shows a micro processor system which incorporates the principles of the invention. FIG 5 shows an embodiment of a computer system which uses the principles of the invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS. Referring to the flow chart of FIG 1 it is explained in more detail how a number V is represented the format of the invention After the number V is inputted in step 100 it is decided in step 102 whether the number V is positive The way this decision is made depends on the way the number V is represented initially If the IEEE representation is used the sign bit can be checked to make the determination. If it is decided in step 102 that the number V is positive the control goes to step 104 in which the number V is put into the form 01 F The exponent of the number V is represented in 2 complement and adapted to the normalization into 01 F correspondingly. First in step 106 the number V is shifted for a number of times so that a leading 01 before the decimal point results This corresponds to the format of 01 F where F stands for the fractional bits behind the decimal point. Second in step 108 the exponent of the number V is adapted according to the number of shifts performed in step 106 If number F is shifted in step 106 to the left in order to obtain the 01 F format this means that the shift has negative value This value is subtracted from the initial exponent of the number V--if any If the number V did not initially have an exponent the number of shifts of step 106 becomes the exponent of the number V The exponent is represented as 2 complement. In step 110 the leading mantissa bit 0 of the mantissa 01 F is dropped The result is outputted in step 112 The result consists of a binary data word 114 which has exponent bits E V 116 and mantissa bits M V 118.The exponent E V is represented without the sign bit in 2 complement form The mantissa M V has a length of N 1 bits M0 V , M1 V , M2 V MN V The leading most significant bit M0 V is set to be equal 1 to indicate that the mantissa is positive The remaining part of the mantissa bits M1 V , M2 V MN V is the fraction F of the format 01 1F to which the number V was shifted in step 106.If it is decided in step 102 that the number V is negative the control goes to step 120 to convert the mantissa into 2 complement representation as well as the exponent, to normalize the mantissa and to adapt the exponent correspondingly. First, in step 122 the number V is converted into a 2 complement representation For the conversion into the 2 complement representation all digits of the number V are inverted and 1 is added to the least significant bit of the inverted number V In step 124 the converted number V is shifted for a number of times so that the format 10 F results similar to the shifting of step 106 Also the exponent of the number F is adapted correspondingly and also represented as a 2 complement. In step 126--similar to step 108--the most significant leading mantissa bit which is 1 is dropped The result is obtained in step 130 which again consists of the exponent bits E V 116 and the mantissa bits M V 118 As opposed to the result obtained in step 112 the mantissa bit M0 V equals 0 to indicate that the value of the number V is negative. In the following examples are given of how a positive number V is represented in the format of the invention. In the first example the number V equals -1 011 and is initially represented in the IEEE format. Since the number V is negative--which is represented by the sign bit in the IEEE format--first the 2 complement has to be determined The sign bit - is represented by 0 so that the initial IEDE representation of V as 01 011 results En 2 complement this is 10 101 after inversion of all bits of 01 011 to 10 100 and adding of 00 001 the original exponent of V--if any--is represented in 2 complement and otherwise remains unchanged In this case no shift ing was necessary to create the format 10 F The resulting mantissa M V is therefore is M0 V 0, M1 V 1, M2 V 0 and M3 V 1 which corresponds to the fraction F 101 of the 10 101 representation of V. In the second example the number V equals 1 010 and is also initially represented in the IEEE format As V is positive it stays 01 010 and the exponent is the same The resulting fraction F is 010.In the next example V equals -1 000 again in IEEE format The 2 complement of 01 000 is 11 000 This does not correspond to the required format 10 F and must therefore be normalized Shifting of 11 000 one shift left results in 10 000 This requires that the original exponent of V is decrement by one. If the actual value of the number V in the format of the invention is to be determined this is done by evaluating. for the case that the sign bit M0 V 1 and thus V positive, or. in case that the sign bit M0 V 0 and thus V negative. Examples are shown in the below table 1.In the example considered in table 1 there are 4 bit positions in the mantissa M V No exponents are shown in table 1--the exponents are assumed to be equal to zero The left most column of table 1 shows the mantissas M V of numbers which are represented according to the invention. Starting from the top of the table the numbers having a leading 0 --in other words M0 V 0--are negative whereas the numbers the lower portion of the table 1 have a most significant bit which is 1 --in other words M0 1--and which are therefore positive The digits after the most significant bit--in this case three bits--are representative of the fraction F of the numbers V. The middle column of the table 1 shows the expanded mantissas of the numbers V of the left most column For the negative numbers this means that 1 is added as the most significant bit This is the inversion of step 128 in which the leading 1 wars dropped In the table the leading 1 appears in brackets Also the decimal point is shown in the middle column of the table 1 corresponding to the normalization performed in the step 124.The same applies analogously to the positive numbers V for which a 0 in brackets is added as an inversion of the step 110 Also the decimal point is shown corresponding to the normalization of the step 106 Using the fraction F as an input to equations 2 and 3, respectively the resulting value is shown in the right most column as a binary value whereby it is assumed that the exponent equals 0 for all the numbers V. If the exponent of a number V is not equal to 0 the real value is obtained by shifting the result shown in the right most column for a number of times corresponding to the exponent. In the following--with reference to FIG 2--it is shown how the unique format of the invention to represent a number V can be advantageously used if two such numbers are to be added In step 200 a number X and a number Y which are to be added are inputted Both X and Y are in the format of the invention. In step 202 the absolute difference D of the exponents E X an d E Y is determined In step 204 it is determined which of the exponents E X and E Y is bigger En step 206 the preliminary assumption is made that the exponent of the result of the summation of X and Y equals the bigger one of the exponents E X and E Y. In step 208 the mantissas M X and M Y are expanded like shown in the middle column of table 1 This means that the leading most significant bit which is 0 for a positive number and 1 for a negative number is reintroduced into the representation of the mantissas to invert steps 110 and 128, respectively. In step 210 the mantissa of the operand X or Y with the smaller exponent is shifted for a number of D shifts to the right The information which of the mantissas has the smaller exponent is obtained from the result of step 204.In step 212 the mantissa which is shifted in step 210 and the other expanded mantissa which was not shifted are added For adding the two mantissas no sign logic is needed since both the shifted and the unshifted mantiss as are represented as 2 complement numbers. In step 214 it is evaluated whether an overflow occurred when the shifted and the unshifted mantissa were added in step 212 Overflow occurred if the shifted and the unshifted mantissas have the same most significant bit and the result of the summation has a different most significant bit If this is the case the control goes to step 216 in which one is added to the preliminary exponent of the result as obtained in step 206 Further in step 216 the result obtained in step 212 of the added mantissas is shifted one position to the right in order to adjust the decimal point The result obtained in step 216 is a final result and is represented in the format of the invention. If it is determined in step 214 that no overflow occurred a sequence of leading 0 or 1 is to be detected in the result obtained by adding the shifted and unshifted mantissas in step 212 The detection of the sequence of leading 0 or 1 is done in step 218.The length of the sequence o f the leading 0 or 1 is denoted L in the following If it is detected in step 220 that the result obtained in step 212 only consists of zeros this indicates that the result of the addition is in fact equal to zero As a value of zero can not be represented in the mantissa when it is in a format according to the invention the value of zero is encoded in the exponent This is done by assigning a predetermined value to the exponent of the result the predetermined value is indicative of the value zero of the result For this purpose any possible exponent value can be selected In the example considered here the exponent is assigned to the value of 10000000 in an 8 bit representation. If it is determined in step 220 that the sequence detected in step 218 does not only consist of zeros, the control goes to step 224 In step 224 the result obtained in step 212 is renormalized to the format of the invention This is done by shifting the result obtained by adding the shifted and unshifted mantissas L-1 times to the left and correspondingly subtracting L-1 from the preliminary exponent of the result obtained in step 206 The resulting number has the form 01 F or 10 F depending on whether the number is positive or negative Since the leading most significant bit in the format 01 F and 10 F is redundant it is thrown in step 226 corresponding to the respective steps 110 and 128 of FIG 1.With reference to FIG 3 now an adder circuit is described which can add the two numbers X and Y In the example considered here the exponents are 8 bit wide and the mantissas are 24 bit wide In the representation of steps 112 and 130 of FIG 1 this means that there are 24 mantissa bits M0-M23 The exponents E X and E Y to be inputted into the adder shown in FIG 3 again are in 2 complement form and the mantissas are normalized in the way as described with reference to FIG 1.The adder shown in FIG 3 has a subtractor 300 which has two inputs to receive the exponents E X and E Y Further the adder of FIG 3 has a z ero detector and multiplexer 302 which also receives the exponents E X and E Y as input values The subtractor 300 has a control output 304 which indicates which one of the exponents E X or E Y is the bigger one of both. The control output 304 is coupled to the zero detector and multiplexer 302 as well as to swap circuit 306 The swap circuit 306 receives the mantissas M X and M Y as 24 bit inputs The swap circuit 306 has a control input 308 which is coupled to the control output 304 further the swap circuit 306 has data outputs 310 and 312.The data outputs 310 and 312 are one bit wider than the inputs of the swap circuit 306--in this case 25 bits instead of 24 bits The data output 310 of the swap circuit 306 is coupled to barrel shifter 314 as a data input The barrel shifter 314 has a control input 316 which is coupled to control output 318 of the subtractor 300.The barrel shifter 314 has a control output 318 which is coupled to data input of adder block 320 The other data input of adder block 320 is coupled to the data output 312 of the swap circuit 306.The zero detector and multiplexer 302 has its output coupled to subtractor adder by 1 block 322 as a data input The other input of the subtractor adder by 1 block 322 is coupled to output 324 of leading most significant bit detector 326.The adder block 320 has an overflow output which is coupled via line 328 to the subtractor adder by 1 block 322 and to barrel shifter 330 The barrel shifter 330 has its data input coupled to data output of the adder block 320 via line 332 The line 332 is 25 bits wide The barrel shifter 330 is also coupled to the output 324 of the leading msb detector 326.The leading msb detector 326 is also coupled via output line 334 to the subtractor adder by 1 block 322 The exponent E Z of the result Z of the summation of X and Y is present at the output 336 of the subtractor adder by 1 block 322 and the normalized mantissa M Z of the result Z is present at the output 338 of the barrel shifter 330.I n operation the exponent bits E X and E Y as well as the mantissa bits M X and M Y of the two numbers X and Y to be added are inputted simultaneously into the adder circuit By means of the subtractor 300 the absolute difference D of the exponents E X and E Y is determined. If the difference D is bigger than the width of the mantissa input into swap circuit 306--in this case 24 bit--the width of the mantissa input is taken as the difference D since this is the maximum number of shifts which can be performed This corresponds to step 202 of FIG 1.The subtractor 300 also determines which one of the exponents E X and E Y is the bigger one This corresponds to step 204 of FIG 2 The information which one of the exponents is bigger is available at the control output 304 According to the logical value of the control output 304 the zero detector and multiplexer 302 is controlled to output the bigger one of the exponents E X and E Y to the subtractor adder by 1 block 322 This corresponds to step 20 6 of FIG 2.The information which one of the exponents E X or E Y is bigger is also inputted into the swap circuit 306 at its control input 308 The swap circuit 306 swaps the inputs M X and M Y so that the mantissa M of one of the numbers X or Y having the smaller exponent is outputted at the data output 310 to the barrel shifter 314.The result of the determination of the difference D is available at the control output 318 of the subtractor 300 and is inputted into the control input 316 of the barrel shifter 314.In the swap circuit 306 the hidden most significant bit is included in the mantissas M X and M Y --corresponding to step 208 of FIG 2 As a consequence the data outputs 310 and 312 of the swap circuit 306 are one bit wider than the mantissa inputs--in this case 25 bits wide The barrel shifter 314 shifts the expanded mantissa of the operand having the smaller exponent for a number of ED shifts to the right--corresponding to step 210 of FIG 2.The result of this shift operation is a vailable at the control output 318 of the barrel shifter 314 and is still 25 bit wide Consecutively both the shifted and the unshifted mantissas are inputted into the adder block 320.If an overflow occurs when the shifted and unshifted mantissas are added in the adder block 320 this is indicated by line 328 both to the subtractor adder by 1 block 322 and the barrel shifter 330 This has the effect that the value of the output line 334 is ignored by the subtractor adder by 1 block 322 and that 1 is added to the exponent inputted by the zero detector and multiplexer 302 into the subtractor adder by 1 block 322 The result of this addition is the final result of the exponent E Z which is outputted at output 336 Correspondingly, the barrel shifter 330 shifts the result outputted by adder block 320 via line 332 one position to the right and drops the leading most significant bit so that the resulting mantissa M Z is obtained at output 338 This corresponds to step 216 of FIG 2.If no overflow o ccurs in the adder block 320 cf step 214 of FIG 2 the leading most significant bit detector 326 which has its data input coupled to the data output of the adder block 320 detects a sequence of leading 0 or 1 to detect the length of the sequence L--like explained with respect to step 218 of FIG 2 The value of L is available at the output 324 of the leading msb detector 326 If the value of L reveals that the result of the summation in adder block 320 is zero this is notified by the leading msb detector 326 to the subtractor adder by 1 block 322 via load output line 334 and a predetermined value which is indicative of the result being zero is loaded into the subtractor adder by 1 block 322 This loaded value is the resulting exponent E Z This corresponds to step 222 of FIG 2.If the result obtained by adder block 320 is not zero, L-1 is subtracted from the exponent inputted by the zero detector and multiplexer 302 into the subtractor adder by 1 block 322 in order to obtain the resulting exp onent E Z Correspondingly the mantissa is normalized by shifting a number of L-1 times to the left in barrel shifter 330 Again the leading most significant bit is dropped in the barrel shifter 330 so that a 24 bit wide resulting mantissa M Z is obtained This corresponds to step 226 of FIG 2.In case that the result obtained at the output of adder block 320 is zero the value of the resulting mantissa M Z is don t care because the value of the exponent indicates that the number Z is in fact zero If however one of the input values X or Y is zero this is detected in the zero detector and multiplexer 302 which compares both exponents E X and E Y with the predefined exponent value which is indicative of zero--in this case 80 h If zero is detected by the zero detector and multiplexer 302 this is notified to the swap circuit 306 via line 340 and the mantissa of the corresponding number X or Y which is 0 is filled with 0 to overwrite any don t care values. With reference to FIG 4 it is explained in greater detail with respect to a preferred embodiment how the invention can be used for computing purposes FIG 4 shows an electronic system 400 which can be any electronic device requiring some kind of computing and or digital signal processing Typical examples are telecommunication devices such as cellular phones. The electronic system 400 has a program storage 402 and memory 404 Computing unit 406 is coupled via a bi-directional bus 408 to the memory 404 A program stored in the program storage 402 can be loaded into the computing unit 406 via line 410.The memory 404 contains a number of data words which are represented in a format according to the invention One of the data words is shown by way of example as data word 412 When the computing unit 406 has to carry out some kind of a digital signal processing calculation it loads the corresponding computer program from the program storage 402 In order to carry out the digital signal processing program data words have to be fetched via the bi-directional bus 408 from the memory 404 The data required for carrying out the computer program is in the unique format according to the invention. This allows to take advantage of the improved adding of numbers which are represented in a format according to the invention in the computing unit 406--for example if the computing unit is a micro processor the micro processor can comprise one or more adders of the type shown in FIG 3 to more economically carry out large numbers of summations. FIG 5 shows a block diagram of a computer system in which the unique representation of a number according to the invention is particularly beneficial The input block converter 500 receives input data words to be inputted into the computer system An input data word is logarithmized by the input log converter 500 and inputted into the first register R0 of data pipeline 502.The data pipeline 502 consists of a number of registers R0 to Rn which are coupled together to form a shift register chain Eac h of the registers Ri is coupled to its corresponding computing unit CUi Each of the computing units CUO-CUn can access its corresponding register R i to access a data word which is stored in the corresponding register. Each of the computing units CUO-CUn has an output which is coupled to reverse log converter 504 The inverse log converter 504 performs an inverse logarithm operation on the output of the computing unit CUi to transform the result of the computation back into the normal domain The results which are obtained by inverting the outputs of the computing units CUi are transferred to an accumulator 506 which adds all the results so that final output results at the output 508 of the accumulator 506.In operation a sequence of data input words are received by the input block converter 500 and a resulting sequence of input data which are in the log domain is shifted into the data pipeline 502 Each computing unit CUi accesses its corresponding register Ri to obtain the corresponding data input value A computation is performed in the computation unit CUi and the result is outputted to the inverse log converter 504 to transform the result of the computation back from the log domain into the normal domain. All the results of the computing units are accumulated in the accumulator 506 after the inverse log operation which is performed by inverse log converter 504 The computation which is carried out in the computation units CUi can be of a finite impulse response filter or infinite impulse response filter type In this case each of the computing units CUi has one coefficient of such a filter operation stored in an internal register which is not shown in the drawing for simplicity To perform such a filter operation in each computing unit the corresponding coefficient has to be multiplied with the input data word stored in the corresponding register Since this multiplication is carried out in the log domain the multiplication becomes a summation In the latter case in fact the computing units CUi are adders which can be implemented by means of an adder of the type as shown in FIG 3 provided that both the input data words in the log domain which are stored in the registers Ri as well as the coefficients of the filter operations which are stored in the computing units are represented in a format according to the principles of the invention. Since in an architecture of the type shown in FIG 5 a large number of computing units exists the use of an adder of the type as shown in FIG 3 has a very substantial positive effect. The same applies analogously to the implementation of the accumulator 506 which can also be realized by adders of the type shown in FIG 3 again provided that the output of the inverse log converter 504 is represented in a format in accordance with the principles of the invention. Normalization of a floating point number. This all depends upon the way floating point numbers are stored Forget binary for now, think in decimal. If I have the value 8 7 6 then I can write it as 87 6 x 10 0 8 76 x 10 1 0 876 x 10 2 0 0876 x 10 3.Normalisation is simply process of choosing which of these is best, according to some rules In decimal, we normally choose 0 876 x 10 2, because it follows these simple rules - The mantissa has no non-zero digits before the decimal point - The mantissa has a non-zero digit immediately after the decimal point Another way of writing this is that the mantissa is in range 0 1 0 99999.Applying this binary floating point numbers When we normalise a binary number we have to apply the same rules to the mantissa It must have no non-zero digits before the decimal I mean, binary point, and a non-zero digit immediately after the binary point Or to put it another way, it must be in the range 0 5 0 999999 in decimal. We do this for several reasons 1 It gets the best use out of our available bits 2 It simplifies the hardware required to do arithmetic. Of course, when we normalise in either decimal or binary, we have to adjust the exponent accordingly to keep the same value. Bob 3 years ago. A number is normalized in order to get the greatest precision This is done by multiplying the number by some power of the number base radix show more A number is normalized in order to get the greatest precision This is done by multiplying the number by some power of the number base radix to get it into a particular range, where it is then truncated or rounded to a fixed number of digits. Since floating point formats have a fixed number of digits, moving the leading digit as far left as possible leaves the most room for low order digits to be retained That s what normalization does, primarily It avoids wasting digit postions by storing leading zeroes. Binary floating point formats can also gain one extra bit of precision by not storing the leading 1 bit The IEEE 488 binary floating point formats do this, for example, and they are used by almost everyone these days Some IBM mainframes still support a base-16 floating point s tandard inherited from the S 360 This is only possible in binary, where the leading digit can only be 1 Zero values indicated by every bit--except perhaps the sign bit--is a 0.If your 8-bit number were to be normalized into an 8-bit field, there s no advantage to normalization However, if you were to normalize the 16-bit value 00101101 01101001 into an 8-bit field, you d get.10110101 1 rounded up to 10110110 if the leading 1 bit is stored, or 1 01101011 0 rounded down to 1 01101011 if the leading 1 is not stored. The bits show leading and trailing bits not stored The bits on the right may be used for rounding, though There are usually different rounding mode options telling how to handle a normalized result that has to lose some bits on the right. Just storing the first 8 bits would get you 00101100, only 5 bits after the leading 1 Normalizing raises that to 7 bits after the leading 1 Normalizing and not storing the leading 1 raises that to 8.husoski 3 years ago. Sign in to add a comment. To expand just a tiny bit on what Bob said, using his example 0 876 x 10 2 is really 876 x 10 2 Because the zero before the decimal while good in show more To expand just a tiny bit on what Bob said, using his example.0 876 x 10 2 is really 876 x 10 2.Because the zero before the decimal while good in print for our eyes is not needed in the computer representation. EddieJ 3 years ago. Sign in to add a comment. Answer this question. Related Questions. Report Abuse. Report Abuse. Sorry, you ve reached your daily asking limit Earn more points or come back tomorrow to ask more. Asking costs 5 points, and then choosing a best answer earns you 3 points Questions must follow the Community Guidelines. Media upload failed You can try to add the media again or go ahead and post the answer. Media upload failed You can try to add the media again or go ahead and post the question. Uploaded image is less than the minimum required 320 x 240 pixel size. Sorry, file format is not supported. You can only upload image s of a size less than 5 MB. You can only upload videos of a size less than 60 MB. Generating preview. Go ahead and post your answer Uploaded video will be live after processing. Go ahead and post your question Uploaded video will be live after processing. Sending request. This may take one or two minutes.

No comments:

Post a Comment